Power Status not available

ayumifanshawn · January 5, 2020, 6:01am

hello so far love this software after trying many others if i can get this on issue resolved it will be perfect!

my setup vmware esxi 6 running on a dell r720 my disks are connected to a netapp disk shelf the drives are sata this is connected to a lsi controller that is just passing the disks to rockstor version 3.9.1-0.

my issue is i know these drives support standby as they worked in freenas but the hourglass icon is grayed out and i am unable to enter the apm settings.

i found this post

it is 4 years old and tried a few of the suggestions but they did not work or i did not fully understand them or may be the fact that this is 4 years old im not very familiar with linux at all so if any one has a sugestion or where to start pleas explain it for a linux noob thanks in advance!

phillxnet · January 5, 2020, 2:25pm

@ayumifanshawn Welcome to the Rockstor community.

Glad your liking things so far. Not sure I can resolve your issue but I can chip in with more info hopefully.

FreeNAS is not linux based but nanoBSD based so the comparison is not relevant unfortunately.

I’m assuming you have the following:

grey-hour-glass

Now to how Rockstor decides to grey out apm the hourglass icon.

This drive power down / apm setting feature was added in the following pull request:

github.com/rockstor/rockstor-core

drive power down interface. Fixes #885

rockstor:master ← phillxnet:885_drive_power_down_interface

opened 03:48PM - 25 Apr 16 UTC

phillxnet

+1129 -66

Adds a per drive power down interface / feature based on idle time using the sta…ndard hdparm parameters. Given this is closely related to a drive's Advanced Power Management (APM) setting a display of the current setting and an option to change this setting is also included in the disks table and on the same configuration page. Care has been taken to graphically associate the APM setting with the Power Status (spin down) setting by grouping these columns one after the other and maintaining this relationship on the settings page also. Hdparm is itself an interface to the kernel's own drive control subsystem so we are sticking to core linux utilities for this feature. The hdparm switches used are -C to set idle spin down time and -B to read and set APM level. N.B. It is not possible to read a drives current setting for -C (idle spin down) which complicates matters, the meaning for these values can also vary between drives. The settings used are drawn from man hdparm and are apparently more reliable for newer drives. Hdparm settings are maintained over a reboot but not over a power cycle. To address this a new stand alone rockstor systemd service is introduced (rockstor-hdparm.service). It does not depend on any other rockstor systemd service and no other rockstor service depends on it. The service is used simply to execute the tested hdparm commands that are otherwise executed on demand via the WebUI. That is if no error message and no non zero return code is received from a proposed (via user config entry) hdparm command then that same command is placed in this systemd service unit to be applied on the next boot to address the power cycle loss of these settings otherwise. As idle time / spin down is associated with rotational media we do not apply any setting to a device if it fails an included is_rotational test. This test is based on a drives udevadm info readings for rpm and Automatic Acoustic Management (AAM) readings. The kernels own proc reading of rotation has been frequently reported as currently unreliable hence establishing our own mechanism for this. This function (in system/osi) is considered as conservative and will err on the "not confirmed as rotational" side. If the is_rotational function returns false no setting is applied and user confirmation of this is simply to not update our disk table with any requested settings. This could be improved upon but may benefit from future enhancements in a notification system. As is the following info level message is logged:- "Skipping hdparm settings: device not confirmed as rotational" This should soon be more readily available by improvement in WebUI log view functionality. The core of this feature's function is several light weight property extensions to the disk model that provide the power status, hdparm -C setting, and current APM level to the disks page by way of the additional 2 columns. There is also included a pause function that simply executes hdparm -y to request a drive immediately enters standby mode. No full sleep mode is activated by the included mechanisms (ie hdparm -Y) as such a state requires that a drives interface be re-set and was considered overly heavy handed and potentially more problematic. The active settings change path takes the users input and tests it for sanity then attempt to run the proposed hdparm command one switch at a time; starting with the -B setting as this is more often problematic due to some drives not supporting this APM level set option. If any errors or non zero return codes are received then this setting is not included in the final systemd entry. The -C switch is then also tested in isolation and again if any errors or non zero return codes are encountered then at this stage no entry will be submitted to the systemd file. This means in some circumstances we may have only -C settings for a device but if the -B setting passed our tests then it is by default included; often simply re-affirming the default setting unless the user has specified otherwise. The core method to update the systemd service is implemented in update_hdparm_service which is utilized only by set_disk_spindown that tests the viability of the proposed hdparm switches. To facilitate the option to not include a -B setting we use the 0 value for that switch. This is a symmetrical arrangement in that if an hdparm -B reading fails or returns 'not supported' the reading mechanism conflates these responses into the 0 value (see get_disk_APM_level). This is otherwise an unused APM value (ie 1-255) and so we can maintain a simple interger for this value. This way we avoid applying any -B option to a drive that has otherwise failed to cleanly respond to our -B request or has done so but with a 'not supported' response. Like wise the mechanism is employed to remove any existing -B entry by passing zero as the requested value. The same reading mechanism translates the off reading to 255 which requires our interface logic to translate this back to 'off' but this was considered a cleaner system than dealing with the raw values of 0 (our flag) or 1-255 (written values) or 'off' which represents a reading of the the written value 255. This way we maintain symmetry with reading (using our low level reader) and writing and only translate in the UI for readability. On the UI front the APM settings are disabled by way of their tickbox option and the tick box is also disabled when an APM read of 0 (error or not supported) is encountered. Note that all hdparm commands use the dev by-id names to maintain correct targets over system reboots. Several general purpose mechanism have been included in this pr to facilitate translating the currently db drive.names of sda / sdb to these more stable by-id drive names which are created by the udev subsystem and represent simple symbolic links to their associated sda / sdb type names. It is intended that these translation functions be employed on a wider basis going forward. By default we offer a 20 minute idle spin down time if a prior setting is not found. Given prior settings cannot be established using system calls we establish the single point of truth to be direct reading from the device and the systemd file where reading from the device is not possible (ie with -C settings). This avoids using the django db and was intended to keep the whole hdparm feature as light as possible. If the systemd file does not exist it is created from the empty template held in /opt/rockstor/conf/rockstor-hdparm.service. When newly established the service is also enabled but not started as the associated commands have already been executed by way of the tests prior to their entry in the systemd file. The service being enabled is sufficient to ensure that they will be run on next boot which is all that is needed. If on the other hand no entries are found in an existing /etc/systemd/system/rockstor-hdparm.service file, ie by way of UI instigated edits, then the service is disabled and the service file removed in order to prepare for new settings and the re enabling of the service in that instance. In light of issue #1275 and it's associated pr #1287 which re-factors the disks table creation code modified in this pr I have prepared some handlebar helpers and included them in this pull request, they are however untested as they rely on what is as yet un-merged UI code. Testing of the supplied spin-down values has been mostly successful with values of 20 minutes and less as higher values are less reliable to non functional. It was also found that with no WebUI open the drives are much more reliably suspended, however opening the WebUI and viewing the disks page after drives have entered standy mode and even refreshing this view via browser or via use of the "Rescan" button doesn't then wake the drives that have entered standby. So there appears to be some command active that inhibit the drives entering standby (while the WebUI is open) but doesn't then activate these drives once they are in standby mode. This was a hard one find. Also note that even with the WebUI open multiple drives did successfully enter standby when set to 30 seconds or 1 minute but no longer. However 20 mins and below seem to work find when the WebUI isn't active. This has mostly been tested when viewing the disks page and is quite a time consuming business. Hence the submission of this pr at the beginning of a testing cycle. Reference in the user interface is also made to the recently merged "Task execution time windows" feature. Note: This pr submission text is also intended as the basis of a wiki entry upon successful review and merge. The wiki entry can then serve as a technical manual on this subsystem.

which in turn details how this feature was implemented on the linux side of things.

Essentially we use the hdparm utility; quoting from the pr:

"The hdparm switches used are -C to set idle spin down time and -B to read and set APM level.
N.B. It is not possible to read a drives current setting for -C (idle spin down) which complicates matters, the meaning for these values can also vary between drives. The settings used are drawn from man hdparm and are apparently more reliable for newer drives.

Hdparm settings are maintained over a reboot but not over a power cycle. To address this a new stand alone rockstor systemd service is introduced (rockstor-hdparm.service). It does not depend on any other rockstor systemd service and no other rockstor service depends on it. The service is used simply to execute the tested hdparm commands that are otherwise executed on demand via the WebUI. That is if no error message and no non zero return code is received from a proposed (via user config entry) hdparm command then that same command is placed in this systemd service unit to be applied on the next boot to address the power cycle loss of these settings otherwise."

Now I’m going to work through the code hierarchy so that if we find a bug anyone with time/capability to address this can have both this sketch and the pr as a reference on how how this all works.

The decision as to if the related hdparm options are to be offered (greyed out hour glass or not) is dependant on the following code:

First the front end (Web-UI) html template code:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/static/storageadmin/js/templates/disk/disks_table.jst#L131-L143


      
          <td>
            {{#if (powerStateNullorUnknown this.power_state)}}
              <i class="glyphicon glyphicon-pause"></i>{{this.power_state}} <i class="glyphicon glyphicon-hourglass"></i>
            {{else}}
              {{#if (powerStateActiveIdle this.power_state)}}
                <a href="#" class="pause" data-disk-id="{{this.id}}" title="Force drive into Standby mode." rel="tooltip"><i class="glyphicon glyphicon-pause"></i></a>
              {{else}}
                <i class="glyphicon glyphicon-pause"></i>
              {{/if}}
              {{this.power_state}} <a href="#disks/spindown/{{this.id}}" title="Click to configure Spin Down." rel="tooltip"><i class="glyphicon glyphicon-hourglass"></i></a>
            {{/if}}
            {{this.hdparm_setting}}
          </td>

So we see that if poweStateNullorUnknown is True we show the icons but don’t give them a link. This is the greyed out state, otherwise they are orange which is the link colour to the settings page.

And poweStateNullorUnknown is a handlebar helper convenience function that is defined here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/static/storageadmin/js/views/disks.js#L258-L267


      
          // Simple helper to return true / false on powerState = null or unknown
          // Untested. Presumably we do:
          // {{#if (powerstateNullorUnknown this.power_state)}}
          // in upstream disks_table.jst
          Handlebars.registerHelper('powerStateNullorUnknown', function (pstate) {
              if (pstate == 'unknown' || pstate == null ) {
                  return true;
              }
              return false;
          });

The untested comment was due to me at the time accomodating major concurrent changes in these areas of the code that were to be merged just prior to my pull request. They ended up working as intended, we just need to update those comments.

So the this.power_state fed into the handlebar helpers from the html template comes from each disk object in turn. This is defined as a property within the Django disk model itself here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/models/disk.py#L91-L96


      
          def pool_name(self, *args, **kwargs):
              try:
                  return self.pool.name
              except:
                  return None

Which in turn references the get_disk_power_status() which is from the import:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/models/disk.py#L23-L24


      
          from system.osi import (
              get_disk_power_status,

located in our system.osi:

Where we are approaching the ‘linux’ level:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1440-L1457


      
          device.
          N.B. a caveat of this method is that it only works for by-id type names
          however given the fact that it's return is simply the passed dev_byid
          contents then if given any string with no '-part3' type ending then that
          same device name will be returned unaltered, only in the format expected by
          smart.py/dev_options.
          Also given we disable smart functions for all devices attributed with a
          fake-serial number by scan_disks which are also the only encountered
          devices which fail to get a by-id type name we should never actually be
          called using a non by-id type name anyway.
          :param dev_byid: device name as per db entry, ie by-id type without path
          although all path elements should be ignored anyway.
          :param test_mode: currently unused internal self test flag defined in
          system/smart.py
          :return: the original dev_byid string with any '-part#' type ending removed
          if found.
          N.B. No path is added to the device in either return case and irrespective
          of path status of passed dev_byid.

So it looks like if the entire thing is greyed out it is because get_disk_power_status() returned “unknown” for your drive:

This procedure, as per the pr description, uses the hdparm command so we then look to see if it sees what it thinks it sees. I.e. is the output of the specific hdparm, for your drive, actually unknown.

From the comments and the rest of the code:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1459-L1473


      
          # split by by-id section delimiter '-'
          name_fields = dev_byid.split("-")
          if len(name_fields) > 2 and re.match("part", name_fields[-1]):
              # The passed device has at least 3 fields ie bus, uniqueid, partname
              # eg: busname-model_serial_or_uniqueid-part3 The passed device name has
              # a -part* ending so process it away by re-joining all elements except
              # the last from our previous split('-').
              base_dev_byid = "-".join(name_fields[:-1])
          else:
              # our passed device name had no -part* ending so return unaltered.
              base_dev_byid = dev_byid
          # return the consequent result
          return base_dev_byid

we see that the specific command run, skipping the intracasies of the device name retrieval, is:

hdparm -C -q /dev/disk/by-id/whatever-it-is

By way of example if I execute that on a real device here that is ‘supported’ we have:

hdparm -C -q /dev/disk/by-id/ata-ST3000VN000-1HJ166_W6A0J98V
 drive state is:  active/idle
echo $?
0

which has no error line out and gives us the 4 fields the last bit of code expects. We then strip out the last column ([3]) and return it. Hence the display for that drive is then: “active/idle”.

If we now execute that same command on say a virtio device that doesn’t support power down and is where I got the first “unknown” image from in this post, we have:

hdparm -C -q /dev/disk/by-id/virtio-13579
 drive state is:  unknown
echo $?
25

So here we have no error line but a code (which is ignored currently anyway) but hdparm tells us anyway that it is “unknown” and thus we treat it as such and grey out the table cell contents by the above mechanism.

My strong suspicion is that as your are ‘passing’ drives through via ESXI to the linux that is Rockstor, ESXI is failing to pass this info in a way it can be used. Rockstor’s expects to be in direct control of hardware and some ‘passthrough’ variants are in part ‘fake’ and so don’t appear as ‘real’ or are in some way masked. The raid controller reference you sighted in an example of this in hardware rather than hypervisor and so ‘special’ attention needs to be taken in that case, ie sometimes specific to the particular controller / driver.

So to see what’s output from your hdparm for the specific drive first get your dive names from:

ls -la /dev/disk/by-id/

and select one that is showing this greyed out “Power Status” column. Rockstor’s Web-UI uses these names. It might also be helpful for you to also execute the echo command directly after the hdparm command as we can then see what the error return code was.

It may be to get this function you will have to pass the controller itself through to Rockstor, or run Rockstor on real hardware. Lets see what the output is from your version of this command:

hdparm -C -q /dev/disk/by-id/youre-drive-by-id-name-here
echo $?

So in short we now have the code path that ‘greys out’ this “Power Status” column but you have 2 potential confounding factors. You may be using an lsi raid controller, you state only that it is lsi, and you are running Rockstor within a hypervisor that may mask some elements of what is available to the linux that Rockstor users (CentOS in your case). Both are big blockers but lets just see what that command outputs.

If the issue is ‘just’ the LSI controller, assuming it’s a raid controller, then there is another way to set drive spin down that is as yet not supported by Rockstor; but we do have the following issue open that details this other way:

github.com/rockstor/rockstor-core

extend custom smart parameter support

opened 02:42PM - 28 Jun 16 UTC

phillxnet

enhancement

Currently -d and -T options are supported; however in some instances it may be u…seful to support the: "--set" parameter such as to provide a work around for those drives where the hdparm command is unable to set spin down times due to certain LSI controllers not implementing this function of hdparm. N.B. The --set variant is suggested to avoid the ambiguity of it's short variant "-s" which is a shared short reference with the --smart option. Note that additional documentation within the Web-UI will be required so as not to confuse this 'work around' with the main drive spin down feature. Ie indicate that the "--set standby,60" type options should only be added where an "unknown" is displayed in the 'Power Status' column on drives attached to a raid controller for example. N.B. this setting in the context of raid controller custom smart options shares the need to have a custom device name and controller port specified; such as with other LSI specific custom SMART options. Eg: ``` smartctl -d 3ware,1 --set standby,120 /dev/twl0 ``` For port 1 attached device on the first LSI controller (/dev/twl0 type). This needs further investigation / testing as the custom SMART options as currently implemented are applied on every smart call and it needs to be determined if this call itself causes the drives to spin up. In which case it's facility is negated by the poling used in determining basic smart support / status. In which case a move to event driven drive state change would be required first. Input validation could be extended akin to the existing mechanisms with absolute "--set" parameters checked against an array as per the current checks ie: ``` var standbyOptions = ["standby,off", "standby,0", "standby,12", "standby,60", "standby,120", "standby,240", "standby,241", "standby,242", "standby,244", "standby,248", "standby,253"]; ``` and the consequent isStandbySwitch(option) function and isNotStandbyOption(option) function in:- src/rockstor/storageadmin/static/storageadmin/js/views/smartcustom_disks.js

Quoting from that issue for convenience:

“Currently -d and -T options are supported; however in some instances it may be useful to support the: “–set” parameter such as to provide a work around for those drives where the hdparm command is unable to set spin down times due to certain LSI controllers not implementing this function of hdparm.”

From there you will see that the alternative way to configure a drives standby time is via smartctl. And that issue details how this might be done via an extension of our existing Disk Custom S.M.A.R.T Options but this is only a sketch up and not as yet proven as a workable proposition.

But again you state passing just the drives through, this I think means that if it is the LSI controller not allowing hdparm to report, set spindown then you will have to do this in the modified “linux” that is the ESXi leve as that is the only thing that has direct access to this controller.

Hope that helps, at least to clarify the levels involved here; both in hypervisor / rockstor as guest / drive / controller visibility at these levels and ultimately how the greyed out Web-UI element is decided upon.

My suspicion is that you will have to visit your hypervisor to do these spin down settings as it has direct access to the controller, which it in turn masks off from Rockstor and just passes the disks themselves, which hdparm may not then be able to setup spindown on. Lets see your command outputs to add a little info. And done worry if any of the above references don’t mean much to you; I have simply sketched out this stuff as then others can chip in with potential patches, suggestions more easily as the info of what happens on the Rockstor side is then there for them to do so.

ayumifanshawn · January 5, 2020, 4:07pm

wow huge response thanks ill try some of this when i get some free time today! also just to clarify the lsi controller is handed strait to the rockstor os exclusively vmware is not in the way at all as far as that goesalso the lis controller is a LSI SAS9207-8E.

phillxnet · January 5, 2020, 4:42pm

@ayumifanshawn Thanks for the update re:

Yes so that’s simplified things a little then. I’d look to that last issue references as given this info you may just be able, for now at least, to set your disk timeouts via the command line using the smartctl command.

Keep us posted on this as if there is sufficient interest that helps to prioritise feature extensions such as is detailed in that last issue reference.

The output of the suggested commands will still be useful however as then we can see what to expect in setups such as yours and hopefully advise the user accordingly to use the alternative method; once it exists within the Web-UI of course.

Hope it goes well.

ayumifanshawn · January 6, 2020, 1:01am

i have not gon further in this as i want to make sure this is worth investing my time discovered that raid 5/6 is not suggested and is the main thing im looking for getting a general consensus on another thread before i go further with the software

phillxnet · January 6, 2020, 7:00pm

@ayumifanshawn Thanks for the update.

Yes it is a shame that the parity raid levels of 5 and 6 in btrfs are less mature. But did you know that btrfs raid 1 stores only 2 copies (currently) and so will only half your available space. It’s done on a block basis so if you have 3 or more drives it will still only store 2 copies but on 2 different physical devices. It also generally performs better than the parity raids.

Hope that helps.