SMART monitoring workaround for NVMe SSDs with smartd inkluding e-mail notification

While checking my Rockstor server (5.0.14), I noticed that SMART is only enabled for my SATA Disks (2x 18TB HDD + 1x 500GB SSD), but not for my PCIE nvme SSDs (2x 4 TB):

I stumbled across this thread: SMART service won't turn on - #2 by phillxnet

So I did the same checks on my system:

admin@Kolibri:~> sudo systemctl status smartd.service
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
     Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-11-12 10:02:47 CET; 2 months 19 days ago
       Docs: man:smartd(8)
             man:smartd.conf(5)
   Main PID: 878 (smartd)
     Status: "Next check of 0 devices will start at 15:32:47"
      Tasks: 1 (limit: 4915)
        CPU: 546ms
     CGroup: /system.slice/smartd.service
             └─878 /usr/sbin/smartd -n -q never

Nov 12 10:02:47 Kolibri systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Nov 12 10:02:47 Kolibri (smartd)[878]: smartd.service: Referenced but unset environment variable evaluates to an empty string: smartd_opts
Nov 12 10:02:47 Kolibri smartd[878]: smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.25-default] (SUSE RPM)
Nov 12 10:02:47 Kolibri smartd[878]: Opened configuration file /etc/smartd.conf
Nov 12 10:02:47 Kolibri smartd[878]: Configuration file /etc/smartd.conf parsed but has no entries
Nov 12 10:02:47 Kolibri smartd[878]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Nov 12 10:02:47 Kolibri systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
admin@Kolibri:~> 

… so I also have the issue about a missing config (I haven’t touched the SMART settings since the rockstor installation.

When I run the smartctl manually, all 5 disks show up as expected:

admin@Kolibri:~> sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device
/dev/nvme1 -d nvme # /dev/nvme1, NVMe device
admin@Kolibri:~> sudo smartctl -a /dev/nvme1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.25-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SSD_M.2_PCIe4_4TB_InnovationIT_Y
Serial Number:                      H031302309130264
Firmware Version:                   H230829a
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 4.096.805.658.624 [4,09 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       2.0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4.096.805.658.624 [4,09 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 2309130264
Local Time is:                      Fri Jan 31 15:19:03 2025 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.7460W       -        -    3  3  3  3     5000   10000
 4 -   0.7260W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    0%
Data Units Read:                    39.301.209 [20,1 TB]
Data Units Written:                 20.074.168 [10,2 TB]
Host Read Commands:                 253.402.037
Host Write Commands:                247.394.123
Controller Busy Time:               455
Power Cycles:                       68
Power On Hours:                     6.457
Unsafe Shutdowns:                   27
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               39 Celsius
Temperature Sensor 2:               46 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

admin@Kolibri:~> 

I then also edited the SMART configuration via the Rockstor UI (Configure S.M.A.R.T. Deamon) and added:

DEVICESCAN -a

and rebooted.

The smartd.service now does not show this warning anymore:

admin@Kolibri:~> sudo systemctl status smartd.service
[sudo] password for root: 
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
     Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-01-31 15:35:33 CET; 9min ago
       Docs: man:smartd(8)
             man:smartd.conf(5)
   Main PID: 882 (smartd)
     Status: "Next check of 5 devices will start at 16:05:33"
      Tasks: 1 (limit: 4915)
        CPU: 59ms
     CGroup: /system.slice/smartd.service
             └─882 /usr/sbin/smartd -n -q never

Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/nvme1, is SMART capable. Adding to "monitor" list.
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/nvme1, state read from /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Jän 31 15:35:33 Kolibri smartd[882]: Monitoring 3 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_MG09ACA18TE-53S0A01BFJDH.ata.state
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.Crucial_CT525MX300SSD1-1712166362A2.ata.state
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/sdc [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_HDWG51J-4350A03SFQ3H.ata.state
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130400.nvme.state
Jän 31 15:35:33 Kolibri smartd[882]: Device: /dev/nvme1, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Jän 31 15:35:33 Kolibri systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
admin@Kolibri:~> 

… the 500GB SSD (“only” the system drive) now has some errors that I have to investigate further :roll_eyes:

And the nvme SSDs are still not showing up in the Rockstor webUI, although smartd has detected them.


Why are the nvme drives not shown with SMART data?
Does one of the developers has any thoughts on this?
Any further logs I should check?

Cheers
Simon

2 Likes

Hi @simon-77,

Not my area of expertise, unfortunately, but I do recall mentions of shortcomings for SMART reports and nvme drives not too long ago. I found the following thread below that I think is what I was remembering:

Does the situation in that thread look to be similar to what you are seeing/experiencing?

3 Likes

Thanks @Flox for pointing out this thread, it is discussing the exact same issue as I have. To summarize the discussion in one sentence:

The output of smartctl seems to have changed significantly from SATA drives to nvme drives, and is therefore not yet implemented in Rockstor.

At least now I know that there is no configuration issue in my setup.

2 Likes

As I am “lucky” of having a failing SSD in my Rockstor server :face_with_spiral_eyes:, I could investigate the SMART logging & notification features a bit further.

First things first:

  • Rockstor only supports monitoring SMART of SATA drives (SSDs & HDDs) for now
  • SMART was made mandatory for NVMe SSDs
  • the information available via SMART changed significantly from SATA to NVMe
  • smartmontools support for NVMe is still considered experimental (according to their wiki)

As a “workaround” we can utilise the SMART Disk Monitoring Daemon smartd

… but as I have noticed before, smartd is not monitoring anything by default in rockstor, because there are no instructions in the config file:

To enable it, we have to set the DEVICESCAN directive in the config file. This can be done in the Rockstor webUI - e.g. Storage > Disks > S.M.A.R.T button and enter it here.

To get e-mail notifications (the root@localhost will be forwarded by Rockstor to the e-mail configured in the webUI), simply add this to the custom configuration of smartd:

DEVICESCAN -a -m root@localhost

After restarting the service, smartd is monitoring all my drives (3x SATA & 2x NVMe) and also detecting errors:

admin@Kolibri:~> journalctl -u smartd
Feb 01 11:38:57 Kolibri systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Feb 01 11:38:57 Kolibri (smartd)[881]: smartd.service: Referenced but unset environment variable evaluates to an empty string: smartd_opts
Feb 01 11:38:57 Kolibri smartd[881]: smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.33-default] (SUSE RPM)
Feb 01 11:38:57 Kolibri smartd[881]: Opened configuration file /etc/smartd.conf
Feb 01 11:38:57 Kolibri smartd[881]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], TOSHIBA MG09ACA18TE, S/N:53S0A01BFJDH, WWN:5-000039-c88cbcefc, FW:0105, 18.0 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], found in smartd database 7.3/5528: Toshiba MG09ACA... Enterprise Capacity HDD
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.TOSHIBA_MG09ACA18TE-53S0A01BFJDH.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], Crucial_CT525MX300SSD1, S/N:1712166362A2, WWN:5-00a075-1166362a2, FW:M0CR040, 525 GB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], found in smartd database 7.3/5528: Crucial/Micron Client SSDs
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], TOSHIBA HDWG51J, S/N:4350A03SFQ3H, WWN:5-000039-c78c886b0, FW:0104, 18.0 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], not found in smartd database 7.3/5528.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], state read from /var/lib/smartmontools/smartd.TOSHIBA_HDWG51J-4350A03SFQ3H.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, SSD_M.2_PCIe4_4TB_InnovationIT_Y, S/N:H031302309130400, FW:H230829a, 4.09 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130400.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, SSD_M.2_PCIe4_4TB_InnovationIT_Y, S/N:H031302309130264, FW:H230829a, 4.09 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, state read from /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Monitoring 3 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], previous self-test was interrupted by the host with a reset
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors
Feb 01 11:38:57 Kolibri smartd[881]: Sending warning via <mail> to root@localhost ...
Feb 01 11:38:57 Kolibri smartd[881]: Warning via <mail> to root@localhost: successful
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], previous self-test was interrupted by the host with a reset
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_MG09ACA18TE-53S0A01BFJDH.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.Crucial_CT525MX300SSD1-1712166362A2.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_HDWG51J-4350A03SFQ3H.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130400.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Feb 01 11:38:57 Kolibri systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.

I even got his mail sent to my inbox:

This message was generated by the smartd daemon running on:

   host name:  Kolibri
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors

Device info:
Crucial_CT525MX300SSD1, S/N:1712166362A2, WWN:5-00a075-1166362a2, FW:M0CR040, 525 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

Cheers
Simon

4 Likes