HDD Spindown Problem

ravennevar · September 18, 2018, 5:05pm

Hello. I am new to Rockstor, having just installed it a few hours ago. I am so far impressed with its interface and capabilities. However, I am encountering an issue where one drive will not spin down. I have five 8TB WD Red drives (4 through the motherboard SATA ports and one through an Asmedia SATA card). My system drive is a 120GB SSD, also on the Asmedia SATA card. I have not done anything beyond updating the system and selecting the spindown time (10 mins) for all the WD drives. The default APM of 164 (as well as the rest of the settings in that window) for the WD drives was left untouched. The drive that is not spinning down is on one of the motherboard’s SATA ports. If I manually click “Pause” in the GUI, all drives will go on standby. I can make them all come out of standby by accessing SMART information on the interface. After this, they will spin down again, except for this drive. Any ideas on how to proceed to get all drives to spin down properly? Thank you!

ravennevar · September 18, 2018, 6:40pm

Update: None of the drives are now spinning down after waking them up with accessing SMART information. If I reboot the system, they will also not spin down after the specified 10 minutes. I will do a fresh install and report.

Edit: Fresh install completed. I will be awaiting assistance on how to proceed. Thank you!

phillxnet · September 19, 2018, 5:03pm

@ravennevar Welcome to the Rockstor community, sorry for the delayed response.

From initially reading your first post the failure to spin down problem seemed to be isolated to a single drive, which in turn shared a controller with another drive that did spin down as expected. Assuming this is correct I would go for a firmware version in that drive, or a slightly different model.

Upon reading your:

My initial thought here was that you may have been watching, via the disk page. During the development of this feature it took me quite a while to realise that with some drives, just having the Disks page open would cause them to never spin down. Very frustrating that one. Anyway it’s worth a try as if that is it then we have a red herring in this second posts report. Just a thought.

Another point of interest is that some drives just won’t spin down when set to 10 mins. I found that 20 mins was a little more reliable. Rockstor’s process for setting this ‘feature’ is essentially just the following command:

hdparm -S /dev/disk/by-id/dev-name

See the following code:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1444-L1463


      
          def get_base_device_byid(dev_byid, test_mode=False):
              """A by-id type name parser which simply removes any trailing partition
              indicators in a given dev_byid name. The remaining name will represent the
              base device ie:-
              dev_byid = ata-QEMU_HARDDISK_QM00005-part3
              base_dev_byid = ata-QEMU_HARDDISK_QM00005
              Given the structure of by-id type names this should always follow.
              At time of last update this function is used exclusively to derive the base
              name of a device for SMART interrogation purposes, ie currently called only
              by smart.py/dev_options which is a portal for pre-processing smart commands
              which have been found to be more reliable when acting on the base device ie
              not called on a partition but on the base device. Hence this functions part
              in dev_options pre-processing.
              Previously this was a by-id type compatibility wrapper for get_base_device
              which worked with sda type names.
              Since the move to by-id format Disk.name db entries the above simpler
              surface syntax method can be used to derive the base device. Previously a
              list order artifact in lsblk's output was relied upon to establish the base
              device.
              N.B. a caveat of this method is that it only works for by-id type names

And as this setting is not persistent over reboot, and in some cases can be undone by bios, a systemd service is established. See:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1833-L1843


      
          for devname in devlinks:
              # check if device name is by-id type
              if re.match("/dev/disk/by-id", devname) is not None:
                  # Reject all alternative subdirectory names within /dev/disk/by-id:
                  # e.g. "/dev/disk/by-id/scsi-SDELL_PERC_6/i_Adapter_00..."
                  # as len("/dev/disk/by-id/dev-name".split("/")) = 5
                  if len(devname.split("/")) > 5:
                      continue
                  is_byid = True
                  # for openLUKS dm mapper device use dm-name-<dev-name>
                  # as we can most easily use this format for working

Also note that there are no database entries or other ‘state’ remembered or managed by Rockstor other than that file. So if you delete it and refresh your Disks page then no setting is know and all spindown and apm values are returned to default. However each device may or may not have a memory of it’s last setting and it’s actually quite tricky to recall that setting hence the comment ‘record’ within that file informing Rockstor’s display. So in short to help diagnose an issue here you can always delete the:

/etc/systemd/system/rockstor-hdparm.service

file and power cycle you machine and you should be back to scratch / default.

Just some background so as to help reduce the problem space as it were.

So in short Rockstor simply executes, initially instantly, and later via systemd the above detailed hdparm command. And also note that a drives behaviour is not guaranteed. Some drives simply ignore this setting ‘request’ and there also exists a ‘special’ program to deal with WD’s unique take on how this is done. Rockstor does not currently use this program. But from your report it seems that you did have some success initially. And yes the SMART info re-check does seem to reliable wake drives up, at least I found no drive that stayed asleep at this point during the development.

See if not having the Disks page open, and ideally any Rockstor Web-UI page to be thourough, helps to narrow down the failure to spin down there after.

Also you could take a look through the Rockstor log (System - Logs Manager), or the system journal:

journalctl

and look for any issues on that systemd service.

You could also check, once it exists via a spindown / APM setting having been applied, that the service contents is as expected by looking at the content of that file:

cat /etc/systemd/system/rockstor-hdparm.service

I wouldn’t edit the file as it is read back in as a form of database to remember/display the applied settings per device to date: given the already broached issue retrieving this info.

Unfortunately I have failed as yet to around to writing an technical wiki entry for exactly how this feature is implemented in Rockstor but the pull request write-up is near enough to what I intend to add as such:

github.com/rockstor/rockstor-core

drive power down interface. Fixes #885

rockstor:master ← phillxnet:885_drive_power_down_interface

opened 03:48PM - 25 Apr 16 UTC

phillxnet

+1129 -66

Adds a per drive power down interface / feature based on idle time using the sta…ndard hdparm parameters. Given this is closely related to a drive's Advanced Power Management (APM) setting a display of the current setting and an option to change this setting is also included in the disks table and on the same configuration page. Care has been taken to graphically associate the APM setting with the Power Status (spin down) setting by grouping these columns one after the other and maintaining this relationship on the settings page also. Hdparm is itself an interface to the kernel's own drive control subsystem so we are sticking to core linux utilities for this feature. The hdparm switches used are -C to set idle spin down time and -B to read and set APM level. N.B. It is not possible to read a drives current setting for -C (idle spin down) which complicates matters, the meaning for these values can also vary between drives. The settings used are drawn from man hdparm and are apparently more reliable for newer drives. Hdparm settings are maintained over a reboot but not over a power cycle. To address this a new stand alone rockstor systemd service is introduced (rockstor-hdparm.service). It does not depend on any other rockstor systemd service and no other rockstor service depends on it. The service is used simply to execute the tested hdparm commands that are otherwise executed on demand via the WebUI. That is if no error message and no non zero return code is received from a proposed (via user config entry) hdparm command then that same command is placed in this systemd service unit to be applied on the next boot to address the power cycle loss of these settings otherwise. As idle time / spin down is associated with rotational media we do not apply any setting to a device if it fails an included is_rotational test. This test is based on a drives udevadm info readings for rpm and Automatic Acoustic Management (AAM) readings. The kernels own proc reading of rotation has been frequently reported as currently unreliable hence establishing our own mechanism for this. This function (in system/osi) is considered as conservative and will err on the "not confirmed as rotational" side. If the is_rotational function returns false no setting is applied and user confirmation of this is simply to not update our disk table with any requested settings. This could be improved upon but may benefit from future enhancements in a notification system. As is the following info level message is logged:- "Skipping hdparm settings: device not confirmed as rotational" This should soon be more readily available by improvement in WebUI log view functionality. The core of this feature's function is several light weight property extensions to the disk model that provide the power status, hdparm -C setting, and current APM level to the disks page by way of the additional 2 columns. There is also included a pause function that simply executes hdparm -y to request a drive immediately enters standby mode. No full sleep mode is activated by the included mechanisms (ie hdparm -Y) as such a state requires that a drives interface be re-set and was considered overly heavy handed and potentially more problematic. The active settings change path takes the users input and tests it for sanity then attempt to run the proposed hdparm command one switch at a time; starting with the -B setting as this is more often problematic due to some drives not supporting this APM level set option. If any errors or non zero return codes are received then this setting is not included in the final systemd entry. The -C switch is then also tested in isolation and again if any errors or non zero return codes are encountered then at this stage no entry will be submitted to the systemd file. This means in some circumstances we may have only -C settings for a device but if the -B setting passed our tests then it is by default included; often simply re-affirming the default setting unless the user has specified otherwise. The core method to update the systemd service is implemented in update_hdparm_service which is utilized only by set_disk_spindown that tests the viability of the proposed hdparm switches. To facilitate the option to not include a -B setting we use the 0 value for that switch. This is a symmetrical arrangement in that if an hdparm -B reading fails or returns 'not supported' the reading mechanism conflates these responses into the 0 value (see get_disk_APM_level). This is otherwise an unused APM value (ie 1-255) and so we can maintain a simple interger for this value. This way we avoid applying any -B option to a drive that has otherwise failed to cleanly respond to our -B request or has done so but with a 'not supported' response. Like wise the mechanism is employed to remove any existing -B entry by passing zero as the requested value. The same reading mechanism translates the off reading to 255 which requires our interface logic to translate this back to 'off' but this was considered a cleaner system than dealing with the raw values of 0 (our flag) or 1-255 (written values) or 'off' which represents a reading of the the written value 255. This way we maintain symmetry with reading (using our low level reader) and writing and only translate in the UI for readability. On the UI front the APM settings are disabled by way of their tickbox option and the tick box is also disabled when an APM read of 0 (error or not supported) is encountered. Note that all hdparm commands use the dev by-id names to maintain correct targets over system reboots. Several general purpose mechanism have been included in this pr to facilitate translating the currently db drive.names of sda / sdb to these more stable by-id drive names which are created by the udev subsystem and represent simple symbolic links to their associated sda / sdb type names. It is intended that these translation functions be employed on a wider basis going forward. By default we offer a 20 minute idle spin down time if a prior setting is not found. Given prior settings cannot be established using system calls we establish the single point of truth to be direct reading from the device and the systemd file where reading from the device is not possible (ie with -C settings). This avoids using the django db and was intended to keep the whole hdparm feature as light as possible. If the systemd file does not exist it is created from the empty template held in /opt/rockstor/conf/rockstor-hdparm.service. When newly established the service is also enabled but not started as the associated commands have already been executed by way of the tests prior to their entry in the systemd file. The service being enabled is sufficient to ensure that they will be run on next boot which is all that is needed. If on the other hand no entries are found in an existing /etc/systemd/system/rockstor-hdparm.service file, ie by way of UI instigated edits, then the service is disabled and the service file removed in order to prepare for new settings and the re enabling of the service in that instance. In light of issue #1275 and it's associated pr #1287 which re-factors the disks table creation code modified in this pr I have prepared some handlebar helpers and included them in this pull request, they are however untested as they rely on what is as yet un-merged UI code. Testing of the supplied spin-down values has been mostly successful with values of 20 minutes and less as higher values are less reliable to non functional. It was also found that with no WebUI open the drives are much more reliably suspended, however opening the WebUI and viewing the disks page after drives have entered standy mode and even refreshing this view via browser or via use of the "Rescan" button doesn't then wake the drives that have entered standby. So there appears to be some command active that inhibit the drives entering standby (while the WebUI is open) but doesn't then activate these drives once they are in standby mode. This was a hard one find. Also note that even with the WebUI open multiple drives did successfully enter standby when set to 30 seconds or 1 minute but no longer. However 20 mins and below seem to work find when the WebUI isn't active. This has mostly been tested when viewing the disks page and is quite a time consuming business. Hence the submission of this pr at the beginning of a testing cycle. Reference in the user interface is also made to the recently merged "Task execution time windows" feature. Note: This pr submission text is also intended as the basis of a wiki entry upon successful review and merge. The wiki entry can then serve as a technical manual on this subsystem.

ie quoting from there: “Note: This pr submission text is also intended as the basis of a wiki entry upon successful review and merge. The wiki entry can then serve as a technical manual on this subsystem.”

Hope that helps and let us know how it goes. My own test system for this feature is currently working as intended but does not have any WD reds in.

ravennevar · September 19, 2018, 6:27pm

Thank you for such a comprehensive response @phillxnet!

I believe I was not looking at the disk page when this happened, but will certainly make sure when I test it again. I will also leave the spindown timer at 20 minutes to see what kind of a response I get from the drives. If that doesn’t change anything, then I could try lowering the APM for the drives. Thank you for all the pointers. I will follow them and post on results.

ravennevar · September 20, 2018, 5:29pm

Things appear to be working now, with a minor hiccup. I left the spindown timers at 20 mins and that seemed to take care of the drive that was not spinning down. The rest of the settings were default. The minor hiccup happened when I stayed logged in to the web GUI. Even though I was on the Dashboard page and not the Disk page, it still would not allow the drives to spin down. I logged out and after 20 mins all drives went on standby. I will continue to test and then decide what type of pool to build.

On a separate note, does the installer normally start installation automatically? It happened to me the few times I installed Rockstor. If all drives were empty, the installer would just proceed to install to a disk of its choice. I did not even get a chance to change install location, time zone, etc. However, if any drive has any partitions on it, then it would not do auto installation and allow me to change all options. Is this the way it is supposed to work?

phillxnet · September 20, 2018, 6:01pm

@ravennevar Thanks for the update.

Super. Yes the 10 mins setting was just flat out ignored by some of my test drives. But I left it in as it’s quite a nice setting for some installs if it works.

Excellent and thanks for the report. I would also suspect the dashboard as it makes many calls to the system but wasn’t sure. However I definitely noticed the Disks page ‘blocking’ spin down. Nice find and thanks for sharing your findings.

I would suggest a btrfs raid1, ie 2 copies on 2 different devices, irrespective of the number of drives. Also no compression or anything else.

This has happened to me also. I think it’s a bug. I’m in the process of moving Rockstor’s linux base anyway but it’s a long drawn out process that isn’t yet ready for a full on announcement. Our installer, as far as I know, is just a re-branded upstream CentOS Anaconda, haven’t had much to do with that side of things myself.

I don’t know, not from Rockstor’s perspective, but it may be a bug that is triggered by our particular kickstart configuration.

Thanks again for your detailed report / feedback and I’m chuffed your drives are now spinning down properly.

Do keep in mind that it is well worth subscribing to one of the Rockstor update channels as both offer many updates / fixes from what is available from our now rather outdated iso. Also note that the stable subscription is now far ahead of the testing as we are in the (slow) process of changing our development model/priorities so that those subscribing to the stable channel (and consequently helping to sustain Rockstor’s development) now get far more frequent updates. However many users manage just fine with the last released testing channel version of 3.9.1-16.

Let us know how your config goes and do keep a note of suggestions / difficulties as you go as all suggestions / reports are welcome.

See also the official docs: Welcome to Rockstor — Rockstor documentation

ravennevar · September 21, 2018, 11:18pm

Thank you for the response. I was just able to come back to the forum. I have continued to test when I have some time and things seem to work ok, but I ran into another minor issue, which I would like to report. I tried waking up the drives from standby using ‘blkid’ at the command line and it worked. However, the drives never went back to standby mode. I was not logged into the GUI during this test (in fact, I had been logged out for hours). The way I got the drives to spin down again without rebooting the system was by going logging into the GUI and looking at a couples of areas, including the Disk page. After this, I logged out and the disks spun down after 20 minutes.

Regarding the pools, my initial idea was to do a RAID6, using 3 of the drives for storage and 2 for parity. I do understand that there are still some issues with RAID56 and BTRFS. The only problem with using RAID1 is that my storage will be even less than with a RAID6 setup. I also have 5 drives (not pairs) but you say this is irrespective of the number of drives. How would this work under Rockstor and BTRFS? Is there an alternative to doing something similar to pooling 3 drives with, say, unionfs, and then using SnapRAID on the other 2 drives as parity? Maybe using BTRFS and snapshots? I am trying to get the most space availability along with disk failure protection. Thanks again!

phillxnet · September 22, 2018, 10:03am

@ravennevar

That’s a rather curious finding, my current guess is that we have insufficient info to pin that one down. It may just be that something scanned the drives and stopped them from sleeping, or that there is a caveat in the firmware concerning the first time the wake. Currently too little info to surmise for sure. Interesting though but do keep in mind that this is not a hard and fast setting. Each disk can choose to do what ever it fancies, depending on it’s settings (APM etc) and firmware. It may have been that there was for example a coincidental S.M.A.R.T access from smartmontools by way of a periodic check. Difficult to tell with info so far but keep an eye on this as all info can be useful. Although I’m not sure what else we could do as our inter-operation with these settings is actually fairly light. Good to know what we can though, especially with commonly used disks such as yours (WD reds).

In btrfs the parity raid levels of 5 and 6 have known outstanding issues so raid1 is really the safest bet. As to the pairs concept btrfs raid1 works at the block level not the drive level; this is the case for all the btrfs raid levels:
From https://btrfs.wiki.kernel.org/index.php/Glossary we have:

“Traditional RAID methods operate across multiple devices of equal size, whereas btrfs’s RAID implementation works inside block groups.”

Also worth looking at the RAID-0 to RAID-10 sections in that same reference as they all differ slightly (given the block basis) from what is normally associated with pure disk based raid concepts.

So there is no requirement to match drive sizes at all. If space is left on at least 2 drives in a pool (Rockstor speak for a btrfs volume - which can span multiple disks) then btrfs raid1 can continue to use that space. Also note that you can ‘transition’ live from one raid level to another - although it can take ages, it is possible: given enough free space of course. Best to just go with raid1 really until the parity raid levels improve. Although, given btrfs can have one raid level for data and another for metadata (within the same pool/volume), one current weakness of the parity raid levels can be circumvented by not using that raid level for the metadata: ie raid 5 or 6 for data and raid1 for metadata. This is not something Rockstor can deal with currently but it might be a nice addition while the btrfs parity raid levels mature. See the following forum thread and it’s given linux-btrfs mailing list links for a discussion on the current btrfs parity raid issues:

and in turn:

The btrfs parity raid code is a lot younger than it’s equivalent raid1 / raid10 code which has lead to some reputation issues all around. Especially given raid5/6 is a common user favourite, hence entertaining the idea of extending Rockstor’s capability to deal with different raid levels for data and metadata. I personally would like this but it has to be done in a way that doesn’t complicate things as I see one of Rockstor’s strengths as it’s usability. Down in no small part to btrfs’s extreme flexibility which in turn presents few barriers, if some considerable challenges UI wise.

All unnecessary and also unsupported (unrecognised) in and by Rockstor given the block level raid nature. Btrfs can already pool drives of varying sizes and present them as a single Pool (btrfs volume).

Hope that helps.

ravennevar · September 22, 2018, 2:12pm

Thank you again @phillxnet for your excellent and quick responses! I will look into RAID1 for my pool.

What is then the main purpose/use of the snapshots in BTRFS/Rockstor?

phillxnet · September 22, 2018, 5:08pm

@ravennevar In order try and keep things as simple as possible Rockstor only recognises a sub set of btrfs’s capabilities, although this is growing.

And in the case of snapshots Rockstor only recognises snapshots created at a certain level within the filesystem (pool / volume). And has it’s own concept of a ‘clone’, which is essentially a writable snapshot ‘promoted’ to a higher (mount point) level within the file-system (pool / volume). We also use snapshots internally for the replication function (currently only functional in the stable channel) where a share can be replicated from one Rockstor instance to another, while only transferring the changes made since the last sender task (it’s one way).

From SysadminGuide - btrfs Wiki
we have the following excerpt from the “Snapshots” subsection:

"A snapshot is simply a subvolume that shares its data (and metadata) with some other subvolume, using Btrfs’s COW capabilities.

Once a [writable] snapshot is made, there is no difference in status between the original subvolume, and the new snapshot subvolume."

You might find that whole page interesting / useful as Rockstor is fundamentally based around the principals of btrfs, where each Share (read ‘share of pool’s space’) is actually a subvolume (sub file-system) within the overall Volume (file-system).

Best if you first play around with the various snapshot related capabilities prior to relying on them for ‘real’ data tasks.

The docs subsection Snapshots has some words on the matter.

You might also be interested in the Multi Period Snapshots Howto entry within the docs.

Hope that helps.