[SOLVED] SMART not working

Hi.

SMART service is started but all tabs on any drive are empty.
Does it require additional configuration? Is there any documentation about it?

Rockstor version: 3.8-14
Here is the config

 /dev/sda -l error \
                  -l selftest \
                  -t \      # Attributes not tracked:
                  -I 194 \  # temperature
                  -I 231 \  # also temperature
                  -I 9      # power-on hours
 /dev/sdc -a
 /dev/sdd -a

sda - ssd shows some parameters
sdd sdc - shows error when I click Update button

   Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 40, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk_smart.py", line 123, in post
    return self._info(disk)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/transaction.py", line 371, in inner
    return func(*args, **kwargs)
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk_smart.py", line 90, in _info
    state=l[1], etype=l[2], details=l[3]).save()
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/base.py", line 654, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/base.py", line 687, in _do_insert
    using=using, raw=raw)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
IntegrityError: null value in column "details" violates not-null constraint
DETAIL:  Failing row contains (6, 7, 8, 14789, active or idle., Command ABoRTed, null).

@alexey As you have discovered from your post edit it is required to press the Refresh button to initially populate the SMART info that comes up when clicking on a drive name.

So from the looks of it your ssd drive (sda) appears to be showing full smart info and working as expected, ie mostly populated Identity , Attributes, etc tabs. But all your other drives are producing the error you posted: is that correct?

At a guess I would say your other drives are behind a raid controller of some sort, this can complicate the retrieval of SMART data and if this is the case then we may have to look into make this parsing more robust as it looks like the response from the commands run to retrieve the SMART data are not being parsed correctly and throwing the consequent data base mechanisms by being empty.

Could you give more details of the hardware and how it’s all hocked up as then we can start to narrow down what’s going wrong and if there exists a current work around, ie by using custom SMART parameters on those problematic drives.

Thanks for reporting this, such reports have in the past led to improvements, although it can take quite a bit of back and forth, but we haven’t had any for a while now.

thanks for quick reply.

I have in my setup:

  1. ssd - it’s SMART working
  2. two WD Red 2Tb in btrfs raid 1

I have not any hardware raid controller.

Does btrfs raid1 require special parameters?

@alexey OK, so that’s a bit strange!

No raid level within a pool doesn’t affect the smart subsystem at all as the smartctl program pretty much talks directly to the drives themselves.

I’m surprised this is happening with the WD Red drives as they are a pretty popular drive and we definitely have other Rockstor users with them.

Can anyone else with WD Red’s (preferably 2 TB) confirm these findings?

I did have something similar with Seagate drives when they were less than 10 hours old (ie single digit hours) and fixed Rockstor’s response in that case. Again a parsing error where hours of life was not correctly parsed when very low. Maybe we have something similar going on here.

my drives model is WDC WD20EFRX-68A

here is smartctl output

[root@server owncloud-data]# smartctl /dev/sdc -a
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.6.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC300611808
LU WWN Device Id: 5 0014ee 6ad8ed6d7
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Oct  3 22:50:14 2016 KRAT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (26460) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 267) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   174   169   021    Pre-fail  Always       -       4300
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1620
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31506
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       70
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1607
194 Temperature_Celsius     0x0022   108   092   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 8 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 8 occurred at disk power-on lifetime: 20282 hours (845 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:36.342  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:36.342  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG

Error 7 occurred at disk power-on lifetime: 20282 hours (845 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:36.342  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG

Error 6 occurred at disk power-on lifetime: 20282 hours (845 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG

Error 5 occurred at disk power-on lifetime: 20282 hours (845 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:33.109  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG

Error 4 occurred at disk power-on lifetime: 20282 hours (845 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00   8d+04:38:02.556  SMART WRITE LOG

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

@alexey Thanks for the additional info; I think I see what might be causing you issue. In the case of the drive you provided smartctl output for there are error log entries. I’m guessing they are formatted in a way that is throwing the Rockstor parsers. In order to test this could you ssh as the root user into your Rockstor box and execute the following commands:

cd ~
mkdir /root/smartdumps
/usr/sbin/smartctl -a /dev/sdc > /root/smartdumps/smart-a.out
/usr/sbin/smartctl -c /dev/sdc > /root/smartdumps/smart-c.out
/usr/sbin/smartctl -l error /dev/sdc > /root/smartdumps/smart-l-error.out
/usr/sbin/smartctl -l selftest -l selective /dev/sdc > /root/smartdumps/smart-l-selftest-l-selective.out
/usr/sbin/smartctl --info /dev/sdc > /root/smartdumps/smart--info.out
/usr/sbin/smartctl -H --info /dev/sdc > /root/smartdumps/smart-H--info.out
/usr/bin/lsblk > /root/smartdumps/lsblk.out

tar czf /root/smart-issue-report.tar.gz /root/smartdumps/*.out

Assuming in the above that the sdc drive is one of the problematic smart reporting drives that is.

Then if you could attach the resulting /root/smart-issue-report.tar.gz file to an email addressed to support@rockstor.com with a note explaining that your are responding to this forum post and my request (add a link to it in the email) then hopefully in time I can have a look at it and get it sorted. Note though that this may take some time.

I just forwarded the e-mail from @alexey to you @phillxnet. It’s very nice of you, thanks for helping out. But I want to ask you and other members here, about coming up with a better approach going forward. I’ve increased the attachment upload size on the forum to 50MB, which I think is plenty to upload debug info. Do you have any recommendations?

@suman Cheers; I’m having a quick look at @alexey submission now.

Just confirmed that when I try to use the forum for tar.gz type files I get the following error:

Sorry, the file you are trying to upload is not authorized (authorized extension: jpg, jpeg, png, gif).

which is one reason I have been using the published support email for Rockstor to convey the required command outputs to address these SMART parsing issues by way of assisting with support.

I agree that it would be better to transfer these files directly via the forum however there are some, potentially insignificant, privacy concerns with disclosing the serial numbers of Rockstor users devices with open forum postings of such smart output. Although smartctl does have a switch to obscure or remove these serial numbers from it’s output, for this very purpose, my concern is that this would interfere with the reproduction of such parsing errors as I have dealt with a number of times in the past via this same mechanism.

I propose that we allow the tar.gz file extension / type within the forum software, and to address the serial number issues re smartctl’s output interested Rockstor developers can request that such files are only transferred via the private message function within the forum; thus minimising the exposure of this potentially sensitive information while maintaining parsing error reproduction validity.

Does this sound like a viable way to proceed?

@alexey Thanks for sending the output as requested. I’m pretty sure I have reproduced the same issue you have reported and so can hopefully soon take a look at sorting it. I have opened an issue under which any related commits should appear:

@alexey

I’ve just submitted some code changes that should sort your reported issue here re SMART data with your particular drives. Hopefully this should pass review and consequently end up in a future Rockstor update.

If you fancy ‘test driving’ the fix your self and ‘living dangerously’ you could manually apply the following changes to the following file:

/opt/rockstor/src/rockstor/system/smart.py

at around line 223 for testing channel updates you need to replace the word:

None

at the end of that line with the following (including the single inverted commas):

'No Sector Details Available'

as indicated in the following commit link:

No worries if you don’t fancy making these changes manually and if you do remember that manual changes can easily lead to system breakage so please evaluate in this light. After making the change you will have to execute the following as root:

systemctl restart rockstor

for the changes to take effect.

So just a heads up that a fix in now in the queue and please don’t worry about applying the above changes manually if you are not at least a little familiar with the linux command line and not happy to break your system, although taking a backup copy of the file in question should help alleviate the risk here.

Hint: the easy to use nano editor is installed by default

Hope that helps.

@alexey Quick notification that the issue opened as a result of your report has now been fixed and merged as of Rockstor testing channel updates version 3.8.15-3
Thanks for your assistance in this issue.