Question about Rockstor install, with issues I'm seeing

I have been experimenting with Rockstor, planning to use it as a primary NAS, sometime in the near future.

Right now I have it running on a slightly different machine than the one I’m planning for the build.

A AMD Phenom X4 940 in a Gigabyte GA-MA78GM, with 4Gb mem. This is a temporary setup, as this board has a defective sata slot, which makes it not boot when anything is connected to that. Otherwise its fully functional.

On this I have made a USB install, which seems to run reasonably well, allthough the ports are only USB2. Its not fast by any means, but is functional.

Hooked up I have three 500Gb discs, and one 200Gb disc, running Raid5. It all works, I can transfer files to and from the NAS, and speeds are good, much better than on my current NAS’es

Now comes some of my concerns.

I cant read SMART attributes for 3 out of the four discs. I get an error window like this:

One of the drives show SMART values OK.

SMART works on all 4 discs when mounted in my main PC.

SMART readouts are IMHO vital for checking up if any discs are becoming bad, and them not working is not promising.

Another issue I’m seing is the dashboards not working. They work right after boot, then get slow to update, and in the end only the one showing memory status continues to update. This can not be normal behaviour?

Initially I only had the three 500Gb discs in the system. So I tried adding the 200Gb disc. I was expecting to have to add it to the pool, but RockStor had added it by itself, without me doing anything? This can not be wanted behaviour?
I recall having a pool on this drive with the same name as on the new pool, perhaps thats why it was added to the Raid5 array?
I didn’t see any ill effects, but wondered what happened to the data on the pool on the disc. They seemed to have vanished. I did a balance and a scrub, and everything seems to work fine.

Well this became a long post, I hope you have some answers, especially regarding SMART not working.

Anybody?

I’m a little worried about the SMART problems, and so is ROCKSTOR, claiming the 3 drives will fail when I log in via SSH.

I suspect this is a SW issue rather than a HW issue, as all the discs show SMART values properly when I add them to my primary (Win7) machine.

None of them reports critical errors, allthough the 200Gb one has a few bad blocks (as it has had since it was 6 months old).

Now, this is only a test setup I’m running, so nothing is crucial yet, but it does seem weird that only one drive would show the SMART values, when all 4 of them appear to be OK.

Is there any settings I could change to try to make it work?

@KarstenV Thanks for reporting this and Welcome to the Rockstor community.
We have an outstanding issue where the reporter had 1 of their drives causing Rockstor to throwing SMART info retrieval errors; however their Traceback looked more like a regular smart output than what you have posted so this may not be related. I have linked to this forum thread anyway so we might pool the information. I suspect it’s to do with the parsing of the information where certain drive’s output is not being parsed correctly by Rockstor’s smart parser and so errors are thrown.

Could you tell us the models of the drives that are causing your errors, the original reporters problem drive was a Samsung SpinPoint F2 EG.

Thanks for helping out with this, it may take a while to get round to this as it only seems to affect a very few devices but in your case it’s obviously pretty significant.

The drives giving the errors are:

Western Digital Blue WD5000AAKS 500GB
Western Digital Caviar GP WD5000AACS 500GB
Maxtor DiamondMax 10 6B200M0 200GB

The drive working is actually also a “Western Digital Blue WD5000AAKS 500GB” allthough its an older disc with lower serial number.

I wonder why the one WD Blue works, while the other doesn’t.

These drives are all old, and will not be used in my final build, but I have been using them recently in a Netgear Readynas RND104, and in that setup SMART monitoring worked.
This is not a “serious” issue but just something that should work, so any error reporting is trustworthy.

When I look at the error messages given, I can actually read the SMART values in the error report, so the drives are reporting the data, but it seems Rockstor doesn’t interpret them properly.

@KarstenV That’s great thanks and yes very good points it’s just narrowing down the problem.
So that’s good news in a way as we now have 4 examples.
Is the firmware version of the two WD5000AAKS 500GB drives different?

So pretty much the same as the previously linked report then.

What we need is the difference between the smart output from the two WD5000AAKS 500GB drives.
or better a copy of each output so we have presumably near identical pass and fail examples.
Does the error message give you a chance to download a zipped copy of your logs?
Rockstor runs the following SMART commands for information purposes:-
where sdx is actually sda, sdb, sdc etc.

/usr/sbin/smartctl -H --info /dev/sdx
/usr/sbin/smartctl -a /dev/sdx
/usr/sbin/smartctl -c /dev/sdx (@suman, this one and all following have no throw=False is that right?)
/usr/sbin/smartctl -l error /dev/sdx
/usr/sbin/smartctl -l selftest -l selective /dev/sdx

Can you tell from you logs which one is causing the errors. Rockstor logs are in /opt/rockstor/var/log
and I think rockstor.log is the one. Each exception should also log the command that threw it.
Thanks.

I started typing my response, and phil’s last reply is pretty much what I wanted to say. So, I’ll keep this brief. Thanks @phillxnet

Can you provide the requested information? If you can send the gzipped /opt/rockstor/var/log, that could also be helpful. I am working on improving our lab here so we can add better support going forward. Thanks for your patience.

@phillxnet Just the first two do. Besides accounting for more variety of drives, that code can definitely use another iteration. @mchakravartula started adding test cases and helping to improve, so we are definitely making progress.

@suman I have dug up 3 x WD5000AAKS 500GB here (firmware B2 and A0) but they have all failed very badly so pretty sure they don’t even power up / get recognized any more (not a drive model I will use again) but I’m hoping once I’ve put some more time into hardware at this end I can try them for smart response at least.

1 Like

My knowledge of Unix cmd’s is not that great.

But I have run the commands you mentioned for the 2 WD blue drives. The one that fails (sda) and for the one that passes (sdc), an put them into at text file, so that you can compare the outputs.

To me the outputs are more or less identical, except for the outputs from “smartctl -a /dev/sdx” and “smartctl -l error /dev/sdx” which are quite different.

Allthough there are some errors logged, the drive itself seems to think everything is fine.

I hope you can see through all the information and see what the error is.

The outputs are here (long post):

[root@RockStor Test NAS log]# /usr/sbin/smartctl -H --info /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue (SATA)
Device Model: WDC WD5000AAKS-00TMA0
Serial Number: WD-WCAPW3955769
LU WWN Device Id: 5 0014ee 2ab0f3bc6
Firmware Version: 12.01C01
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Sun Jul 12 21:30:40 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@RockStor Test NAS log]# /usr/sbin/smartctl -H --info /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue (SATA)
Device Model: WDC WD5000AAKS-00TMA0
Serial Number: WD-WCAPW3345293
LU WWN Device Id: 5 0014ee 1003fa9c9
Firmware Version: 12.01C01
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Sun Jul 12 21:31:40 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@RockStor Test NAS log]# /usr/sbin/smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue (SATA)
Device Model: WDC WD5000AAKS-00TMA0
Serial Number: WD-WCAPW3955769
LU WWN Device Id: 5 0014ee 2ab0f3bc6
Firmware Version: 12.01C01
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Sun Jul 12 21:33:18 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (12600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 157) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 222
3 Spin_Up_Time 0x0003 175 170 021 Pre-fail Always - 6250
4 Start_Stop_Count 0x0032 096 096 000 Old_age Always - 4599
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10658
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 096 096 000 Old_age Always - 4347
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 372
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4611
194 Temperature_Celsius 0x0022 104 077 000 Old_age Always - 46
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 337 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It “wraps” after 49.710 days.

Error 337 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED

Error 336 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 d8 70 18 0f 08 00:46:38.653 READ FPDMA QUEUED
60 08 e8 d0 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 e0 c8 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 d8 c0 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 d0 b8 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED

Error 335 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:36.519 READ FPDMA QUEUED

Error 334 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 d8 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 e8 d0 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 e0 c8 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 d8 c0 70 18 0f 08 00:46:34.248 READ FPDMA QUEUED
60 08 d0 b8 70 18 0f 08 00:46:34.248 READ FPDMA QUEUED

Error 333 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Conveyance offline Interrupted (host reset) 90% 8816 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@RockStor Test NAS log]# /usr/sbin/smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue (SATA)
Device Model: WDC WD5000AAKS-00TMA0
Serial Number: WD-WCAPW3345293
LU WWN Device Id: 5 0014ee 1003fa9c9
Firmware Version: 12.01C01
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Sun Jul 12 21:33:52 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (12600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 157) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 170 168 021 Pre-fail Always - 6466
4 Start_Stop_Count 0x0032 083 083 000 Old_age Always - 17659
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 11482
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 096 096 000 Old_age Always - 4356
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 306
193 Load_Cycle_Count 0x0032 195 195 000 Old_age Always - 17659
194 Temperature_Celsius 0x0022 104 082 000 Old_age Always - 46
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@RockStor Test NAS log]# /usr/sbin/smartctl -c /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (12600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 157) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

[root@RockStor Test NAS log]# /usr/sbin/smartctl -c /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (12600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 157) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

[root@RockStor Test NAS log]# /usr/sbin/smartctl -l error /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 337 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It “wraps” after 49.710 days.

Error 337 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:40.923 READ FPDMA QUEUED

Error 336 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 d8 70 18 0f 08 00:46:38.653 READ FPDMA QUEUED
60 08 e8 d0 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 e0 c8 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 d8 c0 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED
60 08 d0 b8 70 18 0f 08 00:46:38.652 READ FPDMA QUEUED

Error 335 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:36.536 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:36.519 READ FPDMA QUEUED

Error 334 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 d8 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 e8 d0 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 e0 c8 70 18 0f 08 00:46:34.249 READ FPDMA QUEUED
60 08 d8 c0 70 18 0f 08 00:46:34.248 READ FPDMA QUEUED
60 08 d0 b8 70 18 0f 08 00:46:34.248 READ FPDMA QUEUED

Error 333 occurred at disk power-on lifetime: 8819 hours (367 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


40 51 00 47 68 18 40 Error: UNC at LBA = 0x00186847 = 1599559

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


60 08 f0 40 70 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 e8 48 70 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 e0 c0 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 d8 c8 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED
60 08 d0 d0 6f 18 0f 08 00:46:31.982 READ FPDMA QUEUED

[root@RockStor Test NAS log]# /usr/sbin/smartctl -l error /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

[root@RockStor Test NAS log]# /usr/sbin/smartctl -l selftest -l selective /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Conveyance offline Interrupted (host reset) 90% 8816 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@RockStor Test NAS log]# /usr/sbin/smartctl -l selftest -l selective /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

1 Like

@KarstenV Thanks that’s great. looks like they even have the same firmware version as well; perfect.
So if you could execute the following commands and send us the resulting file that would be dandy as then we would have original formatting and any strange characters or what not that might be throwing things.

Looks like Rockstor is failing on the smart error logs though.
On your Rockstor box as root (ie ssh root@):-

cd ~
/usr/sbin/smartctl -a /dev/sda > smart-a-fail.out
/usr/sbin/smartctl -a /dev/sdc > smart-a-pass.out
/usr/sbin/smartctl -l error /dev/sda > smart-l-error-fail.out
/usr/sbin/smartctl -l error /dev/sdc > smart-l-error-pass.out
/usr/sbin/smartctl -l selftest -l selective /dev/sda > smart-l-selftest-l-selective-fail.out
/usr/sbin/smartctl -l selftest -l selective /dev/sdc > smart-l-selftest-l-selective-pass.out
tar czf smart-error-logs.tgz /opt/rockstor/var/log
tar czf smart-report-pass-fail.tar.gz *.out *.tgz

(don’t worry about the “Removing leading” message)
Then send the smart-report-pass-fail.tar.gz file to support@rockstor.com referencing this forum post.

I know it’s a load of messing about but I think your specific example will be really useful and may form a part in the ever growing test coverage so as to avoid regressing once it’s fixed. (@mchakravartula hope this helps)

Thanks for your help on this so far.
N.B. if you are on a linux desktop you could use the following command there to retrieve the generated file:-

scp root@rockstor-ip:/root/smart-report-pass-fail.tar.gz .

The “rocktor-ip” is the name or ip you used to access your Rockstor webUI and the dot at the end is to signify the current directory on your linux desktop.

I have run all the commands, but don’t seem to be able to get the file onto my Windows computers.

Is there an easy way to copy the file to a share?

I have a share called Media, could I issue a command line that copied the file into that share?

@KarstenV Excellent Thanks. I am not a windows user myself but I suspect the easiest way may well be to use WinSCP rather than copying files into you share and possibly upsetting that shares client apps with a file that is owned by root and then needing to clean up afterwards. @suman has recently suggested the same program so I think we are on safe ground with this one.

Wikipedia page and official site for WinSCP also it is licensed under the GPLv3.

You will need the scp or SecureCopy protocol, user root.

Let me know how you get on with this as if all is well we can add WinSCP to our documentation as a tool for this job.

Winscp did the job. Very easy to use, and could even be set up Norton Commander like (two windows) as I like it.

I will send the file to the mentioned e-mail in a short time.

@KarstenV That’s great, cheers for chipping in. This is bound to help and an invaluable find on your part.

Thanks for the kind reply.

I don’t know how much of this is down to skill or just luck…

I just happened to have some old hardware lying around, that I decided to use for a test, and got some unexpected results.

I have actually added 2 more (pata) discs to the system, and out of all 6 of them (replaced the usb boot drive with a hdd, boots and works faster this way), only 2 report correctly :smile:

So in my case 4 out of 6 discs report some smart error logs, that RockStor doesn’t handle very elegant. It probably tells more about the state of my old hardware, than the state of RockStor.

I am in the process of filling up my pool, to try and see how Rockstor handles a faulty (removed) drive. If RockStor copes with this properly, it could end up driving my future NAS :smile:

@KarstenV, we still have some serious work ahead of us in DR matters, but we are making good progress. I think that to speed up development with this category of features, we could certainly use help from users who try to implement these features manually(using scripts, for example) and sharing their results, recipes etc… This is the next best thing to actual code contribution. Also, forum threads like these are very helpful, so thank you for that.

@KarstenV As of 3.8-8.11 (testing) there is at least one fix for in-elegant smart reporting. I am rather hoping that this has resolved your issues re you’re big wall of red text with “… 4 out of 6 discs …”. Could you let us know here if there is any improvement in this behaviour for your?

Thanks for you help on this one.

The disks on which I was seeing the problems are not in use on my Rockstor system anymore. Replaced by bigger and much newer disks.

But I will find the time to get them hooked up, and test it.