No spin-down on spare disk

Tony_Cristiano · August 25, 2020, 7:16am

My non-active disk drive does not go to sleep at all with version 4.0.1. This drive is not attached to anything. It use to sleep all the time in version 3.9.1 with the same configuration.

It’s a brand new disk model WDC WD40EFAX-68JH4N0

The Rockstor dashboard shows that it’s reading from it every 30 seconds

Why is this happening?
Tony

phillxnet · August 25, 2020, 11:10am

@Tony_Cristiano Thanks for opening this focused thread.

Could you specify the configuration for spin down that you’ve used with this drive.

Although from your graph it now does look very much like something is accessing the drive which of course would stop it from entering sleep anyway.

That dashboard widget, by way of context, surfaces the metrics found in /proc/diskstats:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/smart_manager/data_collector.py#L571-L583


      
                      self.emit("logsize", {"key": "logManager:logsize", "data": file_size})
          
                  self.spawn(file_size, sid, logfile)
          
          
          class DisksWidgetNamespace(RockstorIO):
          
              switch = False
              byid_disk_map = {}
          
              def on_connect(self, sid, environ):
          
                  self.byid_disk_map = get_byid_name_map()

Just in case we are seeing a miss reported activity here.

cat /proc/diskstats

or for a live output of changes:

watch -d 'cat /proc/diskstats'

and the output of the following during the same boot session will be required to work backwards to the by-id names used in the Rockstor Web-UI:

ls -la /dev/disk/by-id/

The iotop program may be a useful tool for the job here:

zypper in iotop

“man iotop” should help with the options and you will likely want the “-o” one. You should be able to see which process is potentially accessing this drive.

Also, to double check if this drive is associated with a pool could you paste the output of the following:

btrfs fi show

Other forum members may have more informed ideas on how we can track down the suspected background process running here that is keeping this drive awake. Assuming it’s not just a miss report on the Dashboard. Also, does the drive still stay awake when the Rockstor Web-UI is not running. I.e. closed completely and not open on any system. The Dashboard processes themselves could be causing this pooling, even though the drive is no a pool member. Also does the drive have any other filesystem on it, and if so is it mounted for instance:

cat /proc/mounts

There may well be a differences in how other filesystems are treated here.

Hope that helps, if only to get a little more context for others to chip in if they are able to.

phillxnet · August 25, 2020, 11:34am

@Tony_Cristiano Re:

Although you may have a preference for the slightly more shiny Netdata Rock-on which may also help with what’s going on here. You can always turn it off/uninstall once the offending program / system has been tracked down. Although Netdata is surprisingly light footed given the massive number of things it tracks.

And in case you’re not familiar with the Rock-on initial setup, see our following doc entry:

http://rockstor.com/docs/docker-based-rock-ons/overview.html

Hope that helps.

Tony_Cristiano · August 25, 2020, 1:57pm

I used the shell in web-ui to investigate with your suggestions.

I found the disk is not part of Rockstor pool. diskstats showed the device is being accessed periodically.

I ran iotop for 5 minutes and here’s the result…

Can you make any sense of that?

phillxnet · August 25, 2020, 2:31pm

Iotop, by default, is not accumulative; it only shows transient access. The access shown there is normal and is associated with normal rockstor access to presumably refresh the Web-UI header. Login via ssh and try, with the Rockstor Web-UI closed completely, or as suggested install the netdata and access it’s Web-UI as we just need more info on this one. I’ll see if I can reproduce locally but it won’t be for a little while I’m afraid.

Yes this is what it looked like from your prior post. But my requested command outputs were looking to see if this was a red hearing or a miss drive reporting. Hence the “watch -d ‘cat /proc/diskstats’” and ls command output requests. That way we can rule those things out. I.e. is it really being kept awake by the reported disk access or is the spin down setting that you have yet to specify not taking effect. To help to clarify if we are talking about the same spin down setting take a look at the following doc entry:

http://rockstor.com/docs/disk-power-down/disk_power_down.html#disk-power-down

Also, to help with the diagnosis, we will need the outputs of the previously suggested commands and questions. I.e. the proc/mounts and

and from earlier:

and:

phillxnet:

or for a live output of changes:
watch -d 'cat /proc/diskstats'
and the output of the following during the same boot session will be required to work backwards to the by-id names used in the Rockstor Web-UI:
ls -la /dev/disk/by-id/

As we may just have a read herring here. All info requested will help others help you diagnose this which in turn may well turn up a bug or otherwise overlooked or changed issue. The more info the better really especially given it looks to be transient repeated access to a drive this not in any pool. Hence question on exact config of this drive, i.e. it’s filesystem and the assumed spin down config already applied. A little tedious potentially I know but it really helps to know if the drive is even mounted or if it’s just a smartmontools/S.M.A.R.T default update thing.

Maybe others with this drive model can chip in here, once we know the applied spin setting used by @Tony_Cristiano. Also note that spindown settings are often not persisted form one boot to the next, hence Rockstor using an hdparm systemd service to re-establish any spindown applied via the Web-UI each time the system boots up.

So in short:

From just that no. Unless that is the only activity you see and you have captured just that bit. In which case we need more info and a different tool. Those familiar with monitoring transient disk access are welcome to chip in as I’m afraid I can’t spend much time on this currently.

Hope that helps.

phillxnet · August 30, 2020, 8:50pm

@Tony_Cristiano Re:

I’ve not reproduced just yet but I have seen the following:

Drive-asleep-but-read-activity

Yet this new drive, which I’ve just attached to a Rockstor 4 instance, is not being woken up by these accesses, at least not for a while, after I’ve pressed the pause button on this drive with the tooltip: “Force drive into Standby mode” within the disks overview table.

This approach of ‘pausing’ the drive and looking to see what or how long before it wakes up may be helpful in diagnosing this issue. I’ve not tried any power down config on this drive just yet but just thought I’d post the pause button thing and the indication of non waking ‘reads’ which does look odd to me.

I have to press this drive into ‘action’ soon but just noting what I’ve seen here of late.

I can confirm however that the spikes of read in the Dashboard do correlate with this ‘spare’ drive’s canonical name within /proc/diskstats. I.e. watching /proc/diskstats shows activity against the correct /dev/sdx name that relates to this drives by-id name. So looks like our activity monitor is OK on that front.

I’ll update if I manage to look into this further before I have to add this drive to it’s intended pool. But I should get the chance again soon to duplicate this arrangement as I have another system that has to have a drive added.

Hope that helps.

Tony_Cristiano · August 31, 2020, 8:23am

Hi
Yes, I did what you did with the “Pause” button and my drive just became awake directly after that. So, I couldn’t get my drive to stay in standby mode at all.

When I killed the dashboard and used a terminal only the /proc/diskstats showed no activity on the drive. Therefore, in my case, it was the dashboard causing the read activity and keeping the drive awake.

I’ve attached that drive to a pool now and since then it sleeps like a baby until required.

So I’m happy with that.

By the way, I installed Netdata rockon and I love it.

phillxnet · August 31, 2020, 11:26am

@Tony_Cristiano Thanks for the update and extra info. Glad your now sorted.

We definitely have a little more work to do here, available time permitting, and interesting that the dash looked to be blocking disk sleep. I seem to remember that the disk overview page could also do this, but didn’t re-awaken drives that had already gone to sleep, at least with my test setups at the time.

The “Pause” button simply executes an:

hdparm -q -y dev-by-id-name

here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L2241-L2251


def enter_standby(dev_byid):
    """Simple wrapper to execute hdparm -y /dev/disk/by-id/device_name which
    requests that the named device enter 'standby' mode which usually means it
    will spin down.  Should only be available if he power status of the device
    can be successfully read without errors (ui inforced)
    :param dev_byid: device name as stored in db ie /dev/disk/by-id type
    :return: None or out, err, rc of command
    """
    # TODO: candidate for move to system/hdparm
    hdparm_command = [HDPARM, "-q", "-y", get_device_path(dev_byid)]
    return run_command(hdparm_command)

And curious that once part of a pool it manages to sleep just dandy. There must be some kind of exploratory probe going on when no fs is found or the like that in turn is having this effect.

Anyway good to know and we can keep an eye on this behaviour and gather info in this thread as we go. I’m quite keen on having functional drive power down capabilities as it really helps with power consumption/noise etc.

Thanks for your input and at least now your drives can get some rest.

Thought you might like Netdata; that was an @Flox addition from a couple of years ago. They also have a free cloud service for up to a fairly generous number of client machines so you can get an overview of all your netdata instances. I believe you sign-up ‘in the cloud’ and then register each of your client machines with the cloud account you created.

Thanks again for the report and extra info.

Tony_Cristiano · August 31, 2020, 1:48pm

While we’re talking about drives going into sleep mode for all the good reason, I couldn’t find any doco on how to put the whole appliance to sleep to further power save. Is this possible? I tried to schedule a suspend task but I don’t think it worked.

phillxnet · August 31, 2020, 2:25pm

@Tony_Cristiano Re scheduled power down, yes, that was added a while back but no accompanying documentation was added at the time. The associated issue:

and pull request by @Flyer :

github.com/rockstor/rockstor-core

Custom Scheduled tasks : adding reboot/shutdown/suspend support

rockstor:master ← MFlyer:issue#735_#1306_shutdown_reboot_tasks

opened 05:05PM - 27 Feb 17 UTC

MFlyer

+677 -318

Refs to #735 - #1306 - #1036 This is a "staging area" for #1036 too : generic… "custom scheduled tasks" will be on a different PR To @schakrava : not ready to be merged, relinting required plus some user alerts and WebUI shutdown alerts Opening PR to let contributors check/test it :) Mirko

may have some info in though. There is also this suggested improvement with a forum link:

indicating a current limit of 24 hrs for wake up.

It was tested as working, but very much depends on accompanying bios settings and may well vary, success wise, between systems. It does use a standards compliant methods, i.e. acpi / RTC stuff, but as we know not all standards are all that standard!

If you can’t get the scheduled suspend bit to work but you can get the scheduled shutdown, you may have an option within the bios to wake up at a set time. There-by having Rockstor do the normal shutdown and bios do a timed power-on.

Let us know who you get on and if you fancy contributing to the docs to tend to this short fall then it would be most welcome. The docs repos is here:

And we have a dedicated doc section on how to contribute:

http://rockstor.com/docs/contribute_documentation.html

There has also been another power save option added that pinged another machine and if it was not found it would go to sleep. Again, another un-documented goodie awaiting doc contributions. This feature was added by @betula-pendula.
Issue:

github.com/rockstor/rockstor-core

[Feature] Suspend/shutdown when certain devices are offline using scheduled tasks

opened 09:56PM - 24 Apr 19 UTC

closed 07:57PM - 10 Nov 19 UTC

p-betula-pendula

I would like to be able to suspend/shutdown Rockstor using a scheduled task when… my devices are offline – I am only using the NAS for a few hours daily and it seems unnecessary to me to let it run 24/7. I guess the demand for such a feature isn't very high, so instead of waiting for it to possibly be added someday I would like to try to implement it myself. My experience with Python, Django and Javascript is quite limited, unfortunately. Before I start, I'd like to know whether you agree with me on how to implement such a feature or whether you want it to be implemented at all. My idea was to enhance the existing shutdown/suspend tasks by adding an option to scan for specific IP addresses and to abort if at least one device responds. A simple ping using ICMP and the `ping` command would be used and if no other device responds, another scan would be scheduled a few minutes later suspending or shutting down when there's still no response. Intuitively I would put this feature side by side with the rtc wakeup feature and adapt the existing `reboot_shutdown.py` script in the backend. I would use the return code of the `ping` command to determine whether other devices are online, which according to the man pages is possible. This would only cover a small subset of what other users might need, but to me it seems to be fairly easy to implement and useful nonetheless. It would work in local networks with fixed IP addresses and for normal devices only that are offline when not in use. The user would be able to configure: - the IP addresses to be scanned - the interval between scans if no hosts are found - how often to scan until the shutdown takes place Any thoughts?

and pull request, again by @betula-pendula:

github.com/rockstor/rockstor-core

Scheduled tasks: add feature to scan for network devices before shutting down

rockstor:master ← p-betula-pendula:issue#2038_shutdown_task

opened 12:45PM - 29 Apr 19 UTC

p-betula-pendula

+356 -173

[Forum entry](https://forum.rockstor.com/t/suspend-shutdown-when-certain-devices…-are-offline-using-scheduled-tasks-2038) and issue #2038. This needs to be looked through because of my lack of experience. especially: - check grammar and phrasing in the UI (and code) - linting - best practices I'm not aware of - possible conflicts with other components It's working flawlessly when tested in my virtual environment. OpenSUSE is yet to be tested.

Hope that helps and let us know how you get on. And if you become sufficiently successful or familiar with any of these do consider contributing the associated doc entries as it’s a shame we don’t have referenced to these, at least once working features in our official docs.