SMART not supported?

I want to share my findings regarding SMART support on NVMe as I also stumbled across this issue.

SMART on SATA vs NVMe

For SATA drives (HDDs & SSDs) there is a list of ATA-SMART-attributes with the idea of having normalized values, where higher values are always better for any attribute.

Unfortunately this system of vendor independant values never worked well and had a lot of vendor-specific differences and exceptions in how these values are treated.

For NVMe SSDs the SMART standard became mandatory, The consortium decided to not continue the ATA-SMART-attributes and use log pages instead. The first log page contains unified NVMe-SMART-attributes which are completly different to the ATA-SMART-attributes.

Here is also a great blog post about this:
https://utcc.utoronto.ca/~cks/space/blog/tech/NVMeAndSMART

Smartmontool NVMe support

The NVMe support of smartmontool (smartctl) is still considered experimental (according to their wiki).

For example, here are the attributes (-A) reported by smartctl of my drives:

SATA SSD:

admin@Kolibri:~> sudo smartctl -i -A /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.33-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     Crucial_CT525MX300SSD1
Serial Number:    1712166362A2
LU WWN Device Id: 5 00a075 1166362a2
Firmware Version: M0CR040
User Capacity:    525.112.713.216 bytes [525 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  1 14:53:47 2025 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       37
  5 Reallocate_NAND_Blk_Cnt 0x0032   099   099   010    Old_age   Always       -       19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       11168
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3686
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   088   088   000    Old_age   Always       -       193
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       236
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       34
194 Temperature_Celsius     0x0022   055   039   000    Old_age   Always       -       45 (Min/Max 7/61)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       19
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       1
202 Percent_Lifetime_Remain 0x0030   088   088   001    Old_age   Offline      -       12
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       76023495678
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       2376709354
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       2438291973
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       1925
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       88

NVMe SSD:

admin@Kolibri:~> sudo smartctl -i -A /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.33-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SSD_M.2_PCIe4_4TB_InnovationIT_Y
Serial Number:                      H031302309130400
Firmware Version:                   H230829a
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 4.096.805.658.624 [4,09 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       2.0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4.096.805.658.624 [4,09 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 2309130400
Local Time is:                      Sat Feb  1 14:56:52 2025 CET

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    0%
Data Units Read:                    38.835.756 [19,8 TB]
Data Units Written:                 11.391.884 [5,83 TB]
Host Read Commands:                 241.217.843
Host Write Commands:                161.121.091
Controller Busy Time:               389
Power Cycles:                       66
Power On Hours:                     6.461
Unsafe Shutdowns:                   21
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               40 Celsius
Temperature Sensor 2:               50 Celsius

Discussion about NVMe SMART attributes

The best discussion I found so far about interpreting the NVMe SMART attributes, was on Reddit:
https://www.reddit.com/r/linuxadmin/comments/15g7dh1/identify_the_wear_level_of_the_ssd_using/

I find the information presented by NVMe SSDs much more clear and useful (than the ATA-SMART-attributes):

  • Available Spare: represents the percentage of spare blocks available by the SSD controller
  • Available Spare Threshold: is the limit of the associated Available Spare value, at which point the SSD will consider itself failing
  • Percentage Used: is the estimated used percent of the drives lifespan

Here is another example of the NVMe SSD in my workstation after about 2 years in service there is some notable usage of 3 % :innocent:

[simon@Bussard: ~]$ sudo smartctl -i -A /dev/nvme0
[sudo] password for root: 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.33-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Patriot M.2 P310 1920GB
Serial Number:                      P310BACA2112100173
Firmware Version:                   ECFM53.1
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 1.920.383.410.176 [1,92 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1.920.383.410.176 [1,92 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 5010200e64
Local Time is:                      Sat Feb  1 15:02:38 2025 CET

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        28 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    3%
Data Units Read:                    94.352.586 [48,3 TB]
Data Units Written:                 44.777.078 [22,9 TB]
Host Read Commands:                 563.588.212
Host Write Commands:                333.236.343
Controller Busy Time:               1.767
Power Cycles:                       804
Power On Hours:                     3.635
Unsafe Shutdowns:                   24
Media and Data Integrity Errors:    0
Error Information Log Entries:      2.191
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

my thoughts

As the (big) change in SMART attributes is not implemented in Rockstor I configured an automatic monitoring of all my drives via smartd including e-mail notifications as expained here.

Honestly, I am very happy with the new SMART attributes of NVMe drives. Some years ago I tried to use the ATA-SMART-attributes with a buch of SATA drives from different vendors and it was a nightmare to find which vendor is using which attributes and what the values actually mean for each drive.

The new NVMe SMART attributes seem to be much more expressive and comparable between vendors.

I think Rockstor could parse these NVMe SMART attributes and display the values of all drives simultaneously in a clear, simple but instructive table (Temperature, Percent Used, Data Written, …)

4 Likes