CHECK POWER STATUS spins up disk

shocker · August 8, 2018, 7:45pm

Hello,
I’ve started receiving those messages today:
Aug 8 19:18:10 storage smartd[6280]: Device: /dev/sdf [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Aug 8 19:48:15 storage smartd[6280]: Device: /dev/sdg [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Aug 8 20:18:10 storage smartd[6280]: Device: /dev/sdo [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Aug 8 22:18:10 storage smartd[6280]: Device: /dev/sdf [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)

/dev/sdf:
APT: No idVendor found -> not USB bridge device
outgoing cdb: 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]: 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]: 09 0c 00 00 00 ff 00 00 00 00 00 00 00 50
ATA_16 stat=50 err=00 nsect=ff lbal=00 lbam=00 lbah=00 dev=00
drive state is: active/idle

any idea what can cause this?

phillxnet · August 15, 2018, 6:31pm

@shocker Hello again.

The Rockstor specific code, drive power status wise, all uses hdparm and given your messages are from smartd I don’t think it’s a Rockstor specific issue. I.e. I’d look to smartmontools and it’s settings for the cause.

Strange that it’s only recently been happening. First thought on this is that the upstream smartmontools package has been updated and now includes this warning. I chose the hdparm settings / drive power probe as it was stated to ‘mostly’ not wake drives. Check your:

/etc/smartmontools/smartd.conf

also Rockstor can, under user intervention, edit this file. The relevant code is:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/smart.py#L342-L360


      
              a = False
              e = False
              for i in o:
                  # N.B. .* in pattern match to allow for multiple spaces
                  if re.match("SMART support is:.* Available", i) is not None:
                      a = True
                  if re.match("SMART support is:.* Enabled", i) is not None:
                      e = True
              return a, e
          
          
          def toggle_smart(device, custom_options="", enable=False):
              switch = "on" if (enable) else "off"
              # enable SMART support of the device
              return run_command(
                  [SMART, "--smart=%s" % switch] + get_dev_options(device, custom_options)
              )

So if you have used this facility you should also see the Rockstor header within that file.

Hope that helps and let us know how you get on.

shocker · August 16, 2018, 4:57am

Thank you @phillxnet for your feedback.
I have checked my smartd.conf file and this is the only line that is uncommented: DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q

To be honest I have this server for two years and it was error free in /var/log/messages. Now I’ve noticed this as my NFS server was hanging due to this behaviour.

On other server where I mounted the rockstor server via NFS I get this:
4174881.921475] nfs: server 10.10.2.2 not responding, still trying
[4174881.921508] nfs: server 10.10.2.2 not responding, timed out
[4174886.433518] nfs: server 10.10.2.2 not responding, still trying
[4174966.123268] nfs: server 10.10.2.2 not responding, timed out
[4174987.179735] nfs: server 10.10.2.2 not responding, timed out
[4175002.220039] nfs: server 10.10.2.2 not responding, timed out
[4175002.220052] nfs: server 10.10.2.2 not responding, still trying
[4175282.562638] nfs: server 10.10.2.2 OK
[4175282.562666] nfs: server 10.10.2.2 OK
[4175282.562730] nfs: server 10.10.2.2 OK

On rockstor management all my HDD’s have: active/idle No spin down option.

phillxnet · August 16, 2018, 10:16am

@shocker

So that’s the default contents of that file. I.e. you haven’t applied any custom config.

At a guess, if these are related, then it could be that the drives are powering down and only powering up upon NFS access, with the consequence of a wait that is the nfs server not responding perhaps. This might tie in with the smartd message indicating that it’s check on power status will spin up the drive. It may be that an update in smartmontools has changed the behaviour so that this message is now produced rather than actually spinning up the drive. Just a guess really.

Your best bet is to research this smartd / smartmontools message and with a quick look it seems related to the ‘-n’ element of that default config.

Let us know how this goes.

The relevant code in the upstream smartd project has some comments on what this message means:

github.com

jcsp/smartmontools/blob/master/smartd.cpp#L3015-L3029


      
          // user may have requested (with the -n Directive) to leave the disk
          // alone if it is in idle or sleeping mode.  In this case check the
          // power mode and exit without check if needed
          if (cfg.powermode && !state.powermodefail) {
            int dontcheck=0, powermode=ataCheckPowerMode(atadev);
            const char * mode = 0;
            if (0 <= powermode && powermode < 0xff) {
              // wait for possible spin up and check again
              int powermode2;
              sleep(5);
              powermode2 = ataCheckPowerMode(atadev);
              if (powermode2 > powermode)
                PrintOut(LOG_INFO, "Device: %s, CHECK POWER STATUS spins up disk (0x%02x -> 0x%02x)\n", name, powermode, powermode2);
              powermode = powermode2;
            }

There have been no recent changes in Rockstor code (assuming you have been updating regularly) that account for your observed changes so it does rather point to an upstream update.

It might help others on the forum if you give a description and details of the hardware you are seeing this issue with.

Could you indicate where you got the following from:

As it looks like the output from for example:

hdparm --verbose -C /dev/sdf

eg: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833147

We do use the ‘-C’ hdparm switch in the following code:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1365-L1382


      
          def system_reboot(delta="now"):
              # New delta param default to now used to pass a 2 min delay
              # for scheduled tasks reboot/shutdown
              try:
                  cmd = [SHUTDOWN, "-r", delta]
                  o, e, rc = run_command(cmd)
              except CommandException as e:
                  # Catch / log harmless -15 return code - command executes as expected.
                  if e.rc == -15:
                      logger.info("Ignoring rc=-15 from command ({}).".format(cmd))
                      return e.out, e.err, e.rc
                  # otherwise we raise an exception as normal.
                  raise e
              return o, e, rc
          
          
          def system_suspend():
              # This function perform system suspend to RAM via systemctl

Hope that helps

shocker · August 16, 2018, 12:20pm

Yes, the code was from hdparam:
~]# hdparm --verbose -C /dev/sdf

/dev/sdf:
APT: No idVendor found -> not USB bridge device
outgoing cdb:  85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 00 00 00 ff 00 00 00 00 00 00 00 50
      ATA_16 stat=50 err=00 nsect=ff lbal=00 lbam=00 lbah=00 dev=00
 drive state is:  active/idle

I have changed now all the APM’s from 255 (off) to 254 to see if there will be any improvement as I don’t need my disks to idle.
There are some virtual machines that are using some mounted folders as virtual disks and this should keep my nas busy all the time, no idea why the idle is happening.

For the last 4h I don’t see anything else happening with APM 254 but I’ll keep an eye on /var/log/messages

Thanks!