HDD Spindown Problem

Hello. I am new to Rockstor, having just installed it a few hours ago. I am so far impressed with its interface and capabilities. However, I am encountering an issue where one drive will not spin down. I have five 8TB WD Red drives (4 through the motherboard SATA ports and one through an Asmedia SATA card). My system drive is a 120GB SSD, also on the Asmedia SATA card. I have not done anything beyond updating the system and selecting the spindown time (10 mins) for all the WD drives. The default APM of 164 (as well as the rest of the settings in that window) for the WD drives was left untouched. The drive that is not spinning down is on one of the motherboard’s SATA ports. If I manually click “Pause” in the GUI, all drives will go on standby. I can make them all come out of standby by accessing SMART information on the interface. After this, they will spin down again, except for this drive. Any ideas on how to proceed to get all drives to spin down properly? Thank you!

Update: None of the drives are now spinning down after waking them up with accessing SMART information. If I reboot the system, they will also not spin down after the specified 10 minutes. I will do a fresh install and report.

Edit: Fresh install completed. I will be awaiting assistance on how to proceed. Thank you!

@ravennevar Welcome to the Rockstor community, sorry for the delayed response.

From initially reading your first post the failure to spin down problem seemed to be isolated to a single drive, which in turn shared a controller with another drive that did spin down as expected. Assuming this is correct I would go for a firmware version in that drive, or a slightly different model.

Upon reading your:

My initial thought here was that you may have been watching, via the disk page. During the development of this feature it took me quite a while to realise that with some drives, just having the Disks page open would cause them to never spin down. Very frustrating that one. Anyway it’s worth a try as if that is it then we have a red herring in this second posts report. Just a thought.

Another point of interest is that some drives just won’t spin down when set to 10 mins. I found that 20 mins was a little more reliable. Rockstor’s process for setting this ‘feature’ is essentially just the following command:

hdparm -S /dev/disk/by-id/dev-name

See the following code:

And as this setting is not persistent over reboot, and in some cases can be undone by bios, a systemd service is established. See:

Also note that there are no database entries or other ‘state’ remembered or managed by Rockstor other than that file. So if you delete it and refresh your Disks page then no setting is know and all spindown and apm values are returned to default. However each device may or may not have a memory of it’s last setting and it’s actually quite tricky to recall that setting hence the comment ‘record’ within that file informing Rockstor’s display. So in short to help diagnose an issue here you can always delete the:

/etc/systemd/system/rockstor-hdparm.service

file and power cycle you machine and you should be back to scratch / default.

Just some background so as to help reduce the problem space as it were.

So in short Rockstor simply executes, initially instantly, and later via systemd the above detailed hdparm command. And also note that a drives behaviour is not guaranteed. Some drives simply ignore this setting ‘request’ and there also exists a ‘special’ program to deal with WD’s unique take on how this is done. Rockstor does not currently use this program. But from your report it seems that you did have some success initially. And yes the SMART info re-check does seem to reliable wake drives up, at least I found no drive that stayed asleep at this point during the development.

See if not having the Disks page open, and ideally any Rockstor Web-UI page to be thourough, helps to narrow down the failure to spin down there after.

Also you could take a look through the Rockstor log (System - Logs Manager), or the system journal:

journalctl

and look for any issues on that systemd service.

You could also check, once it exists via a spindown / APM setting having been applied, that the service contents is as expected by looking at the content of that file:

cat /etc/systemd/system/rockstor-hdparm.service

I wouldn’t edit the file as it is read back in as a form of database to remember/display the applied settings per device to date: given the already broached issue retrieving this info.

Unfortunately I have failed as yet to around to writing an technical wiki entry for exactly how this feature is implemented in Rockstor but the pull request write-up is near enough to what I intend to add as such:

ie quoting from there: “Note: This pr submission text is also intended as the basis of a wiki entry upon successful review and merge. The wiki entry can then serve as a technical manual on this subsystem.”

Hope that helps and let us know how it goes. My own test system for this feature is currently working as intended but does not have any WD reds in.

Thank you for such a comprehensive response @phillxnet!

I believe I was not looking at the disk page when this happened, but will certainly make sure when I test it again. I will also leave the spindown timer at 20 minutes to see what kind of a response I get from the drives. If that doesn’t change anything, then I could try lowering the APM for the drives. Thank you for all the pointers. I will follow them and post on results.

Things appear to be working now, with a minor hiccup. I left the spindown timers at 20 mins and that seemed to take care of the drive that was not spinning down. The rest of the settings were default. The minor hiccup happened when I stayed logged in to the web GUI. Even though I was on the Dashboard page and not the Disk page, it still would not allow the drives to spin down. I logged out and after 20 mins all drives went on standby. I will continue to test and then decide what type of pool to build.

On a separate note, does the installer normally start installation automatically? It happened to me the few times I installed Rockstor. If all drives were empty, the installer would just proceed to install to a disk of its choice. I did not even get a chance to change install location, time zone, etc. However, if any drive has any partitions on it, then it would not do auto installation and allow me to change all options. Is this the way it is supposed to work?

@ravennevar Thanks for the update.

Super. Yes the 10 mins setting was just flat out ignored by some of my test drives. But I left it in as it’s quite a nice setting for some installs if it works.

Excellent and thanks for the report. I would also suspect the dashboard as it makes many calls to the system but wasn’t sure. However I definitely noticed the Disks page ‘blocking’ spin down. Nice find and thanks for sharing your findings.

I would suggest a btrfs raid1, ie 2 copies on 2 different devices, irrespective of the number of drives. Also no compression or anything else.

This has happened to me also. I think it’s a bug. I’m in the process of moving Rockstor’s linux base anyway but it’s a long drawn out process that isn’t yet ready for a full on announcement. Our installer, as far as I know, is just a re-branded upstream CentOS Anaconda, haven’t had much to do with that side of things myself.

I don’t know, not from Rockstor’s perspective, but it may be a bug that is triggered by our particular kickstart configuration.

Thanks again for your detailed report / feedback and I’m chuffed your drives are now spinning down properly.

Do keep in mind that it is well worth subscribing to one of the Rockstor update channels as both offer many updates / fixes from what is available from our now rather outdated iso. Also note that the stable subscription is now far ahead of the testing as we are in the (slow) process of changing our development model/priorities so that those subscribing to the stable channel (and consequently helping to sustain Rockstor’s development) now get far more frequent updates. However many users manage just fine with the last released testing channel version of 3.9.1-16.

Let us know how your config goes and do keep a note of suggestions / difficulties as you go as all suggestions / reports are welcome.

See also the official docs: http://rockstor.com/docs/

Thank you for the response. I was just able to come back to the forum. I have continued to test when I have some time and things seem to work ok, but I ran into another minor issue, which I would like to report. I tried waking up the drives from standby using ‘blkid’ at the command line and it worked. However, the drives never went back to standby mode. I was not logged into the GUI during this test (in fact, I had been logged out for hours). The way I got the drives to spin down again without rebooting the system was by going logging into the GUI and looking at a couples of areas, including the Disk page. After this, I logged out and the disks spun down after 20 minutes.

Regarding the pools, my initial idea was to do a RAID6, using 3 of the drives for storage and 2 for parity. I do understand that there are still some issues with RAID56 and BTRFS. The only problem with using RAID1 is that my storage will be even less than with a RAID6 setup. I also have 5 drives (not pairs) but you say this is irrespective of the number of drives. How would this work under Rockstor and BTRFS? Is there an alternative to doing something similar to pooling 3 drives with, say, unionfs, and then using SnapRAID on the other 2 drives as parity? Maybe using BTRFS and snapshots? I am trying to get the most space availability along with disk failure protection. Thanks again!

@ravennevar

That’s a rather curious finding, my current guess is that we have insufficient info to pin that one down. It may just be that something scanned the drives and stopped them from sleeping, or that there is a caveat in the firmware concerning the first time the wake. Currently too little info to surmise for sure. Interesting though but do keep in mind that this is not a hard and fast setting. Each disk can choose to do what ever it fancies, depending on it’s settings (APM etc) and firmware. It may have been that there was for example a coincidental S.M.A.R.T access from smartmontools by way of a periodic check. Difficult to tell with info so far but keep an eye on this as all info can be useful. Although I’m not sure what else we could do as our inter-operation with these settings is actually fairly light. Good to know what we can though, especially with commonly used disks such as yours (WD reds).

In btrfs the parity raid levels of 5 and 6 have known outstanding issues so raid1 is really the safest bet. As to the pairs concept btrfs raid1 works at the block level not the drive level; this is the case for all the btrfs raid levels:
From https://btrfs.wiki.kernel.org/index.php/Glossary we have:

“Traditional RAID methods operate across multiple devices of equal size, whereas btrfs’s RAID implementation works inside block groups.”

Also worth looking at the RAID-0 to RAID-10 sections in that same reference as they all differ slightly (given the block basis) from what is normally associated with pure disk based raid concepts.

So there is no requirement to match drive sizes at all. If space is left on at least 2 drives in a pool (Rockstor speak for a btrfs volume - which can span multiple disks) then btrfs raid1 can continue to use that space. Also note that you can ‘transition’ live from one raid level to another - although it can take ages, it is possible: given enough free space of course. Best to just go with raid1 really until the parity raid levels improve. Although, given btrfs can have one raid level for data and another for metadata (within the same pool/volume), one current weakness of the parity raid levels can be circumvented by not using that raid level for the metadata: ie raid 5 or 6 for data and raid1 for metadata. This is not something Rockstor can deal with currently but it might be a nice addition while the btrfs parity raid levels mature. See the following forum thread and it’s given linux-btrfs mailing list links for a discussion on the current btrfs parity raid issues:

and in turn:

The btrfs parity raid code is a lot younger than it’s equivalent raid1 / raid10 code which has lead to some reputation issues all around. Especially given raid5/6 is a common user favourite, hence entertaining the idea of extending Rockstor’s capability to deal with different raid levels for data and metadata. I personally would like this but it has to be done in a way that doesn’t complicate things as I see one of Rockstor’s strengths as it’s usability. Down in no small part to btrfs’s extreme flexibility which in turn presents few barriers, if some considerable challenges UI wise.

All unnecessary and also unsupported (unrecognised) in and by Rockstor given the block level raid nature. Btrfs can already pool drives of varying sizes and present them as a single Pool (btrfs volume).

Hope that helps.

Thank you again @phillxnet for your excellent and quick responses! I will look into RAID1 for my pool.

What is then the main purpose/use of the snapshots in BTRFS/Rockstor?

@ravennevar In order try and keep things as simple as possible Rockstor only recognises a sub set of btrfs’s capabilities, although this is growing.

And in the case of snapshots Rockstor only recognises snapshots created at a certain level within the filesystem (pool / volume). And has it’s own concept of a ‘clone’, which is essentially a writable snapshot ‘promoted’ to a higher (mount point) level within the file-system (pool / volume). We also use snapshots internally for the replication function (currently only functional in the stable channel) where a share can be replicated from one Rockstor instance to another, while only transferring the changes made since the last sender task (it’s one way).

From https://btrfs.wiki.kernel.org/index.php/SysadminGuide
we have the following excerpt from the “Snapshots” subsection:

"A snapshot is simply a subvolume that shares its data (and metadata) with some other subvolume, using Btrfs’s COW capabilities.

Once a [writable] snapshot is made, there is no difference in status between the original subvolume, and the new snapshot subvolume."

You might find that whole page interesting / useful as Rockstor is fundamentally based around the principals of btrfs, where each Share (read ‘share of pool’s space’) is actually a subvolume (sub file-system) within the overall Volume (file-system).

Best if you first play around with the various snapshot related capabilities prior to relying on them for ‘real’ data tasks.

The docs subsection Snapshots has some words on the matter.

You might also be interested in the Multi Period Snapshots Howto entry within the docs.

Hope that helps.