SMART monitoring workaround for NVMe SSDs with smartd inkluding e-mail notification

As I am “lucky” of having a failing SSD in my Rockstor server :face_with_spiral_eyes:, I could investigate the SMART logging & notification features a bit further.

First things first:

  • Rockstor only supports monitoring SMART of SATA drives (SSDs & HDDs) for now
  • SMART was made mandatory for NVMe SSDs
  • the information available via SMART changed significantly from SATA to NVMe
  • smartmontools support for NVMe is still considered experimental (according to their wiki)

As a “workaround” we can utilise the SMART Disk Monitoring Daemon smartd

… but as I have noticed before, smartd is not monitoring anything by default in rockstor, because there are no instructions in the config file:

To enable it, we have to set the DEVICESCAN directive in the config file. This can be done in the Rockstor webUI - e.g. Storage > Disks > S.M.A.R.T button and enter it here.

To get e-mail notifications (the root@localhost will be forwarded by Rockstor to the e-mail configured in the webUI), simply add this to the custom configuration of smartd:

DEVICESCAN -a -m root@localhost

After restarting the service, smartd is monitoring all my drives (3x SATA & 2x NVMe) and also detecting errors:

admin@Kolibri:~> journalctl -u smartd
Feb 01 11:38:57 Kolibri systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Feb 01 11:38:57 Kolibri (smartd)[881]: smartd.service: Referenced but unset environment variable evaluates to an empty string: smartd_opts
Feb 01 11:38:57 Kolibri smartd[881]: smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.33-default] (SUSE RPM)
Feb 01 11:38:57 Kolibri smartd[881]: Opened configuration file /etc/smartd.conf
Feb 01 11:38:57 Kolibri smartd[881]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], TOSHIBA MG09ACA18TE, S/N:53S0A01BFJDH, WWN:5-000039-c88cbcefc, FW:0105, 18.0 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], found in smartd database 7.3/5528: Toshiba MG09ACA... Enterprise Capacity HDD
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.TOSHIBA_MG09ACA18TE-53S0A01BFJDH.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], Crucial_CT525MX300SSD1, S/N:1712166362A2, WWN:5-00a075-1166362a2, FW:M0CR040, 525 GB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], found in smartd database 7.3/5528: Crucial/Micron Client SSDs
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc, type changed from 'scsi' to 'sat'
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], TOSHIBA HDWG51J, S/N:4350A03SFQ3H, WWN:5-000039-c78c886b0, FW:0104, 18.0 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], not found in smartd database 7.3/5528.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], state read from /var/lib/smartmontools/smartd.TOSHIBA_HDWG51J-4350A03SFQ3H.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, SSD_M.2_PCIe4_4TB_InnovationIT_Y, S/N:H031302309130400, FW:H230829a, 4.09 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130400.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, opened
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, SSD_M.2_PCIe4_4TB_InnovationIT_Y, S/N:H031302309130264, FW:H230829a, 4.09 TB
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, is SMART capable. Adding to "monitor" list.
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, state read from /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Monitoring 3 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], previous self-test was interrupted by the host with a reset
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors
Feb 01 11:38:57 Kolibri smartd[881]: Sending warning via <mail> to root@localhost ...
Feb 01 11:38:57 Kolibri smartd[881]: Warning via <mail> to root@localhost: successful
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], previous self-test was interrupted by the host with a reset
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_MG09ACA18TE-53S0A01BFJDH.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.Crucial_CT525MX300SSD1-1712166362A2.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/sdc [SAT], state written to /var/lib/smartmontools/smartd.TOSHIBA_HDWG51J-4350A03SFQ3H.ata.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130400.nvme.state
Feb 01 11:38:57 Kolibri smartd[881]: Device: /dev/nvme1, state written to /var/lib/smartmontools/smartd.SSD_M_2_PCIe4_4TB_InnovationIT_Y-H031302309130264.nvme.state
Feb 01 11:38:57 Kolibri systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.

I even got his mail sent to my inbox:

This message was generated by the smartd daemon running on:

   host name:  Kolibri
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors

Device info:
Crucial_CT525MX300SSD1, S/N:1712166362A2, WWN:5-00a075-1166362a2, FW:M0CR040, 525 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

Cheers
Simon

4 Likes