Unable to update disk state for nvme device

Hi together,

I have installed the rockstor onto my nvme device and rockstor logs the following line every minute when checking the disk state:

[04/Jun/2016 13:32:26] ERROR [smart_manager.data_collector:342] Failed to update disk state.. exception: object of type 'NoneType' has no len()
[04/Jun/2016 13:33:27] ERROR [storageadmin.util:46] request path: /api/disks/scan method: POST data: <QueryDict: {}>
[04/Jun/2016 13:33:27] ERROR [storageadmin.util:47] exception: object of type 'NoneType' has no len()
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 40, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 204, in post
    return self._update_disk_state()
  File "/opt/rockstor/eggs/Django-1.6.11-py2.7.egg/django/db/transaction.py", line 371, in inner
    return func(*args, **kwargs)
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 79, in _update_disk_state
    if (do.serial in serial_numbers_seen) or (len(do.serial) == 48):
TypeError: object of type 'NoneType' has no len()
[04/Jun/2016 13:33:27] DEBUG [storageadmin.util:48] Current Rockstor version: 3.8-13
[04/Jun/2016 13:33:27] ERROR [smart_manager.data_collector:342] Failed to update disk state.. exception: object of type 'NoneType' has no len()

Maybe it is logged because smartctl doesnā€™t support nvme device in the current CentOs version? https://www.smartmontools.org/ticket/657

Is it possible to not check the disk state until smartctl has support for nvme devices?

Iā€™m also seeing fake serials for the nvme device, but I think it is not related to the above issue. Iā€™ve tried to specify own udev rules, unfortunately this had no effect on rockstor side.

[root@rockstor ~]# ls -al /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 280 Jun  4 13:51 .
drwxr-xr-x 6 root root 120 Jun  4 13:51 ..
lrwxrwxrwx 1 root root   9 Jun  4 13:51 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M3CLUK0U -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun  4 13:51 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M3VR0FVS -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun  4 13:51 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6DT9PEV -> ../../sda
lrwxrwxrwx 1 root root   9 Jun  4 13:51 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M7ST1PS4 -> ../../sdb
lrwxrwxrwx 1 root root  15 Jun  4 13:51 nvme-S2GLNCAH102797F -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun  4 13:51 nvme-S2GLNCAH102797F-disk1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun  4 13:51 nvme-S2GLNCAH102797F-disk2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jun  4 13:51 nvme-S2GLNCAH102797F-disk3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root   9 Jun  4 13:51 wwn-0x50014ee261f6918a -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun  4 13:51 wwn-0x50014ee261f7d976 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun  4 13:51 wwn-0x50014ee2628ab5e0 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun  4 13:51 wwn-0x50014ee2b7e073dc -> ../../sdc

Thanks!

Cheers
snafu

@snafu Welcome to the Rockstor community and thanks for the detailed post.

As is I donā€™t see any smartctl related errors although we do have an auto exclusion by device name from all smartctl calls in the src/rockstor/storageadmin/views/disk.py file at around line 215 depending on release:

This file on an installed system would be in:-
/opt/rockstor/src/rockstor/storageadmin/views/disk.py
So you could exclude all smart calls being made on the nvme device if you added itā€™s name, if indicative of itā€™s type, to that match set, with an additional ā€˜|ā€™ symbol of course. ie changing:

re.match('vd|md|mmcblk', do.name)

to

re.match('vd|md|mmcblk|nvme', do.name)

In fact your post is rather timely as I am currently working in this area for our change to using by-id type names such as you have listed so thatā€™s good. I have in fact already excluded all fake-serial labelled drives from having any smart calls made on them in the future anyway and have just added the nvme exclusion on smart calls as well in issue #1320 on GitHub.

Back to your reported logs, they indicate an issue with (my) code on dealing with the serial number as returned by what in release code is currently the src/rockstor/fs/btrfs.py file by the scan_disks method.

It would be really helpful if your could post the output of the following commands:-

lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID

which is what scan_disks uses to find what disks are attached and some info about them.
And the second command would be:

udevadm info --name nvme0n1p1

and as a double check on how the other nvme device shows up:

udevadm info --name nvme0n1p2

Unfortunately I havenā€™t yet come accross nvme devices and from the by-id type naming it looks like your device is actually showing up as 4 devices but with the base device (no -disk or -part suffix in by-id) and the first ā€˜-disk1ā€™ device sharing a name, this is definitely going to confuse Rockstorā€™s scan_disks() and update_disk_state() methods.

I think the issue you are seeing revolves around there appearing to be 4 devices sharing the same serial number ie S2GLNCAH102797F and given this canā€™t be as we use a devices serial number as a unique tracking device we have this issue. In fact the scan_disks procedure accounts for this by reporting repeat serial numbers as fake-serial as then we retain uniqueness but then show the exception by the big red warning and the various (an increasing) exclusions in function these devices are then given as without a unique hardware level (rather than file system level) identifier we canā€™t track them.

Thanks for you help on this one and hopefully with the output of those command we can see what going wrong with reading these deviceā€™s serial numbers.

Also a screen grab of the disk page may well be helpful also.

Hi Philip,

thanks for your detailed analysis.

Please note that the udev rule is a custom one:

cat /etc/udev/rules.d/61-persistent-storage.rules
KERNEL=="nvme*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode"

KERNEL=="nvme*", ENV{ID_SCSI_SERIAL}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_SCSI_SERIAL}"

KERNEL=="nvme*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/nvme-$env{ID_SCSI_SERIAL}-disk%n"

If you also need the output without the custom udev rule, please let me know.

Here are the information requested:

lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID

NAME="sda" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M6DT9PEV" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="0:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdb" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M7ST1PS4" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="1:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdc" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M3CLUK0U" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="2:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdd" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M3VR0FVS" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="3:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="nvme0n1" MODEL="" SERIAL="0025385161500c23" SIZE="238.5G" TRAN="" VENDOR="" HCTL="" TYPE="disk" FSTYPE="" LABEL="" UUID=""
NAME="nvme0n1p1" MODEL="" SERIAL="" SIZE="500M" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="ext4" LABEL="" UUID="1c34b6b6-18a5-4955-be55-42bad6be1f0d"
NAME="nvme0n1p2" MODEL="" SERIAL="" SIZE="15.7G" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="swap" LABEL="" UUID="1463272b-00a0-4051-b9ee-af6ad6def7e2"
NAME="nvme0n1p3" MODEL="" SERIAL="" SIZE="222.3G" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="btrfs" LABEL="rockstor_rockstor" UUID="e6ed005f-d010-410c-b665-0c9bfd583dcb"

udevadm info --name nvme0n1p1

P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
N: nvme0n1p1
S: disk/by-id/nvme-S2GLNCAH102797F
S: disk/by-id/nvme-S2GLNCAH102797F-disk1
S: disk/by-uuid/1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: DEVLINKS=/dev/disk/by-id/nvme-S2GLNCAH102797F /dev/disk/by-id/nvme-S2GLNCAH102797F-disk1 /dev/disk/by-uuid/1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: DEVNAME=/dev/nvme0n1p1
E: DEVPATH=/devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
E: DEVTYPE=partition
E: ID_FS_TYPE=ext4
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: ID_FS_UUID_ENC=1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: ID_FS_VERSION=1.0
E: ID_MODEL=Samsung_SSD_950
E: ID_MODEL_ENC=Samsung\x20SSD\x20950\x20
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_FLAGS=0x80
E: ID_PART_ENTRY_NUMBER=1
E: ID_PART_ENTRY_OFFSET=2048
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_SIZE=1024000
E: ID_PART_ENTRY_TYPE=0x83
E: ID_PART_TABLE_TYPE=dos
E: ID_REVISION=BXX7
E: ID_SCSI=1
E: ID_SCSI_SERIAL=S2GLNCAH102797F
E: ID_SERIAL=20025385161500c23
E: ID_SERIAL_SHORT=0025385161500c23
E: ID_TYPE=disk
E: ID_VENDOR=NVMe
E: ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
E: MAJOR=259
E: MINOR=1
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=85049

udevadm info --name nvme0n1p2

P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
N: nvme0n1p2
S: disk/by-id/nvme-S2GLNCAH102797F
S: disk/by-id/nvme-S2GLNCAH102797F-disk2
S: disk/by-uuid/1463272b-00a0-4051-b9ee-af6ad6def7e2
E: DEVLINKS=/dev/disk/by-id/nvme-S2GLNCAH102797F /dev/disk/by-id/nvme-S2GLNCAH102797F-disk2 /dev/disk/by-uuid/1463272b-00a0-4051-b9ee-af6ad6def7e2
E: DEVNAME=/dev/nvme0n1p2
E: DEVPATH=/devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
E: DEVTYPE=partition
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=1463272b-00a0-4051-b9ee-af6ad6def7e2
E: ID_FS_UUID_ENC=1463272b-00a0-4051-b9ee-af6ad6def7e2
E: ID_FS_VERSION=2
E: ID_MODEL=Samsung_SSD_950
E: ID_MODEL_ENC=Samsung\x20SSD\x20950\x20
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_NUMBER=2
E: ID_PART_ENTRY_OFFSET=1026048
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_SIZE=32899072
E: ID_PART_ENTRY_TYPE=0x82
E: ID_PART_TABLE_TYPE=dos
E: ID_REVISION=BXX7
E: ID_SCSI=1
E: ID_SCSI_SERIAL=S2GLNCAH102797F
E: ID_SERIAL=20025385161500c23
E: ID_SERIAL_SHORT=0025385161500c23
E: ID_TYPE=disk
E: ID_VENDOR=NVMe
E: ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
E: MAJOR=259
E: MINOR=2
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=85190

Picture:

@snafu Thanks for the info, this is great.
One point I would say is that your custom udev rule threw me for a bit as you have it creating '-disk# ā€™ names for the partitions when I think these by-id type names should more properly be ā€˜-part#ā€™, shouldnā€™t make a difference at the moment but will upset things later on in Rockstor as we move to by-id type names. Do please correct me if you know otherwise itā€™s just that is what Iā€™ve seen of by-id partition naming conventions in /dev/disk/by-id

I see 2 main concerns here.

1: which I noticed initially but didnā€™t highlight as needed a little more info is that no disk db entry should contain a None so this is a worry initially. And currently I donā€™t see how this could have happened. Is this a really old Rockstor instances that has been upgraded over time as maybe early on this was possible and we are not now dealing with this properly.
This comes from the exception you are seeing in the logs re the len(do.serial) as every entry in the do.serial should have either a legitimate serial or a fake-serial-uuid stamped in and never a None. So I was looking for how this situation arrived.
We could just add a clause that also removes these entries as per the fake-serial- entries but I need to think a little more about that and test here first also.
To confirm this entry in your db could you do the following:

psql -U rocky storageadmin
Password for user rocky:

Enter rocky as the pass
you should then have a prompt as follows:

storageadmin=#

At that prompt could you enter the following carefully (prompt included here):

storageadmin=# select name, size, serial from storageadmin_disk;

and paste the output here.

To quit from this psql command line interface execute the following command \q ie

storageadmin=# \q

2: The second concern is that we donā€™t normally concern ourselves with anything other than the btrfs partition on the system drive with regard to what we store in the db and show in the disk page so the presence of the ā€˜baseā€™ device ie ā€œnvme0n1ā€ in addition to the special case, as itā€™s the only partition we list in disks, ā€œnvme0n1p3ā€. And given they have the same serial number, which is correct as they are the same device (a point I was thrown on by the -disk naming), they are seen as having duplicate serials. Normally scan_disks would rule the base disk out. It may be the naming that is throwing things but a quick looks at it simply matches by base name contained within the partitionā€™s name. Need a close look than I can give it now unfortunately.

Iā€™m afraid I canā€™t look further into this just now but the above info will help with rooting this issue out.
Also it would be good to see what the output from the previous commands is without the custom udev rule, but only if you are OK with having re-assert the rules as they are sort of working for you now.

N.B. Please be aware that the wrong command within the psql command line is a pretty risky business so do take care there.

Thanks for you continued patience and help with this. I would like to get it sorted or at least understood but can only spend a little time on it at a go Iā€™m afraid.

We are narrowing this one down though.

Could you also now include the readout for the base device ie:-

udevadm info --name nvme0n1

Currently I suspect at least an issue with the udev rules as it canā€™t be correct that both a partition and the base device end up pointing to the same base device ie both nvme-S2GLNCAH102797F-disk1 (which should be -part1) and nvme-S2GLNCAH102797F (presumably the base device) point to a partition device ie /dev/nvme0n1p1 when I would expect the latter to point to /dev/nvme0n1. Ie the normal arrangement is more akin to:-

ata-KINGSTON_SMS200S330G_50026B724C085CD1 -> ../../sdc
ata-KINGSTON_SMS200S330G_50026B724C085CD1-part1 -> ../../sdc1
ata-KINGSTON_SMS200S330G_50026B724C085CD1-part2 -> ../../sdc2

I also suspect an issue with device name parsing but I will need to look more carefully when the time comes.

Thanks again.

@phillxnet

Yes, this is quite confusing Iā€™ve changed it to the -part suffix, sorry for that.

No, it is a new installation Iā€™m on the stable branch (3.8-13).

Here is the data from the table:

storageadmin=# select name, size, serial from storageadmin_disk;
   name    |    size    |                      serial                      
-----------+------------+--------------------------------------------------
 nvme0n1p3 |  233098444 | 
 nvme0n1   |  250085376 | fake-serial-269b21c9-6680-4c90-b532-c4af638cf971
 sda       | 1932735283 | WD-WCC4M6DT9PEV
 sdb       | 1932735283 | WD-WCC4M7ST1PS4
 sdc       | 1932735283 | WD-WCC4M3CLUK0U
 sdd       | 1932735283 | WD-WCC4M3VR0FVS
(6 rows)

Output of udevadm:

[root@rockstor ~]# udevadm info --name nvme0n1
P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1
N: nvme0n1
E: DEVNAME=/dev/nvme0n1
E: DEVPATH=/devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1
E: DEVTYPE=disk
E: ID_PART_TABLE_TYPE=dos
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=4483

Here comes the output without the udev rule:

[root@rockstor ~]# lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID
NAME="sda" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M6DT9PEV" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="0:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdb" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M7ST1PS4" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="1:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdc" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M3CLUK0U" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="2:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="sdd" MODEL="WDC WD20EFRX-68E" SERIAL="WD-WCC4M3VR0FVS" SIZE="1.8T" TRAN="sata" VENDOR="ATA     " HCTL="3:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="data_pool" UUID="962464f8-1a72-4cc7-93f2-c52b711ed2f1"
NAME="nvme0n1" MODEL="" SERIAL="" SIZE="238.5G" TRAN="" VENDOR="" HCTL="" TYPE="disk" FSTYPE="" LABEL="" UUID=""
NAME="nvme0n1p1" MODEL="" SERIAL="" SIZE="500M" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="ext4" LABEL="" UUID="1c34b6b6-18a5-4955-be55-42bad6be1f0d"
NAME="nvme0n1p2" MODEL="" SERIAL="" SIZE="15.7G" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="swap" LABEL="" UUID="1463272b-00a0-4051-b9ee-af6ad6def7e2"
NAME="nvme0n1p3" MODEL="" SERIAL="" SIZE="222.3G" TRAN="" VENDOR="" HCTL="" TYPE="part" FSTYPE="btrfs" LABEL="rockstor_rockstor" UUID="e6ed005f-d010-410c-b665-0c9bfd583dcb"
[root@rockstor ~]# udevadm info --name nvme0n1p1                                             
P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
N: nvme0n1p1
S: disk/by-uuid/1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: DEVLINKS=/dev/disk/by-uuid/1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: DEVNAME=/dev/nvme0n1p1
E: DEVPATH=/devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
E: DEVTYPE=partition
E: ID_FS_TYPE=ext4
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: ID_FS_UUID_ENC=1c34b6b6-18a5-4955-be55-42bad6be1f0d
E: ID_FS_VERSION=1.0
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_FLAGS=0x80
E: ID_PART_ENTRY_NUMBER=1
E: ID_PART_ENTRY_OFFSET=2048
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_SIZE=1024000
E: ID_PART_ENTRY_TYPE=0x83
E: ID_PART_TABLE_TYPE=dos
E: MAJOR=259
E: MINOR=1
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=4711
[root@rockstor ~]# udevadm info --name nvme0n1p2 
P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
N: nvme0n1p2
S: disk/by-uuid/1463272b-00a0-4051-b9ee-af6ad6def7e2
E: DEVLINKS=/dev/disk/by-uuid/1463272b-00a0-4051-b9ee-af6ad6def7e2
E: DEVNAME=/dev/nvme0n1p2
E: DEVPATH=/devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
E: DEVTYPE=partition
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=1463272b-00a0-4051-b9ee-af6ad6def7e2
E: ID_FS_UUID_ENC=1463272b-00a0-4051-b9ee-af6ad6def7e2
E: ID_FS_VERSION=2
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_NUMBER=2
E: ID_PART_ENTRY_OFFSET=1026048
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_SIZE=32899072
E: ID_PART_ENTRY_TYPE=0x82
E: ID_PART_TABLE_TYPE=dos
E: MAJOR=259
E: MINOR=2
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=4944

Ok. Thank you very much for your help. If I should test scary things on my machine, please let me know. It would be not a big problem, because I do not have any data migrated until now.

1 Like

@snafu Sorry not to have gotten back sooner, just wanted to give you an update that as of Rockstor 3.8-14.02 testing channel updates the following code change as previously referenced are now included.

Just to let you know as this thread lead to that change. Upon our smartctl version being able to support nvme devcies then the exclusion for nvme can be removed. So at least for now we shouldnā€™t have any more smartctl --info probes cluttering up the logs on these types of devcies.

There is also another thread in the forum where @f_l_a has reported the same smartctl version problem regarding nvme support and had a good, but unfortunately unsuccessful attempt to install a newer version, but did make some headway on retrieving serial numbers from these devices using specific nvme tools:

The rogue smartctl --info calls however are the least of our problems and I appreciate this, but just to let you know that your input has already helped here.

Unfortunately I havenā€™t managed to look further into your problem specifically which seems to revolve around having a null entry for the serial field in your nvme0n1p3 db entry. This should not be possible and would definitely point at a bug in the code somewhere. Probably caused by not accounting properly for the nvme drive as it appeared, either before or after the udev rules. If you are game (and this is risky) you could try just deleting that entry in the db directly. It should just be re-created on the next disk Rescan but could leave your system in an unstable state. Also as from the referenced update Rockstor now uses by-id names so it is particularly important to get those right via the udev rules but should also hopefully help with seeing what is going on.

Thanks for providing such a wealth of info, it is bound to help with nvme support in the future.

@phillxnet

Thank you for your information.

I just want to inform you that I have tried to remove the entry from the database, but unfortunately the serial is empty again. Iā€™m running latest stable version.

If necessary I could add some logging lines to the serial no. parser, but unfortunately I can provide this not until the beginning of the next week.

@snafu I have not forgotten about this thread and thanks for the update.
Did you ensure that the db entry containing the unexpected ā€˜nullā€™ serial was removed entirely. It may well have been re-assered seconds later? And what was the sql command you used to achieve this. I suspect we have a bug re nvme and your chosen udev rules.

By way of helping to explain the part a devices serial plays within Rockstorā€™s device management we now have a new technical manual entry within our wiki in this forum Device management in Rockstor This may help with our progress on your current findings.

Thanks for offering to help and I await your feedback on confirming successful removal of the entire disk entry that had the null serial assigned to it. The referenced doc should help with where to add logging to narrow down whatā€™s going wrong here.

From a serial perspective the latest stable doesnā€™t differ from the testing channel but there are major enhancements to the disk name management in testing channel updates that may well help with logging. And it also incorporate your original finding of nvme smartmontools support (or current lack there of in the Rockstor available versions) so thanks for your efforts on this one. The addition is as indicated earlier in this thread nvme wise. If all these changes hold up they should appear in the next stable channel release anyway.

Thanks and no rush or worries.

@philxnet Thanks again for the wiki entry. It is very helpful to understand how rockstor assign unique ids. I hope the kernel will report serials for the devices more consistence in the future.

Nevertheless I just found out that the udev rule is not working anymore with 4.6:

[root@rockstor ~]# udevadm info --name=nvme0n1p3
P: /devices/pci0000:00/0000:00:1d.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p3
N: nvme0n1p3
S: disk/by-id/nvme--part3
...

I have to use ID_SERIAL instead of ID_SCSI_SERIAL. Sorry Iā€™m not inducted into what and why it has changed in 4.6.

So my rule is looking now:

SUBSYSTEMS=="nvme", ATTRS{serial}=="?*", PROGRAM="/usr/bin/echo $attr{serial}", ENV{ID_SERIAL}="%c"
KERNEL=="nvme[0-9]*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_SERIAL}"
KERNEL=="nvme[0-9]*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_SERIAL}-part%n"

Here is what data is within the table now:
Before I removed entries nvme0n1p3 and nvme0n1 with: DELETE FROM storageadmin_disk WHERE name=ā€˜nvme0n1ā€™ OR name=ā€˜nvme0n1p3ā€™:wink:

storageadmin=# select name, size, serial from storageadmin_disk;
   name    |    size    |     serial      
-----------+------------+-----------------
 sdb       | 1932735283 | WD-WCC4M7ST1PS4
 sdd       | 1932735283 | WD-WCC4M3VR0FVS
 nvme0n1p3 |  233098444 | 
 sda       | 1932735283 | WD-WCC4M6DT9PEV
 nvme0n1   |  250085376 | S2GLNCAH102797F
 sdc       | 1932735283 | WD-WCC4M3CLUK0U
(6 rows)

Sorry, I just checked the udev rule and saw the empty serial. It seems that this was the reason for the empty serial of my previous attempt.

If Iā€™m correct the only concern is here that the base disk is not ruled out by scan_disks.

@snafu This is good stuff and well done for developing the udev rules further. Appologies but Iā€™m a little unclear on exactly what the current state of things is. Iā€™m assuming that the udev rules are now as your last post indicated though.

Thanks for the db command clarification and outcome ie:

DELETE FROM storageadmin_disk WHERE name='nvme0n1' OR name='nvme0n1p3'

and Iā€™m also assuming that the db output then no longer contained those entries, at least for a few seconds, but that they then re-appeared.

We are definitely making progress here though as your last db output now shows that the nvme0n1 base device does at least have itā€™s serial which is great.

But Iā€™m a little confused by your ā€œā€¦ I just checked the udev rule and saw the empty serial.ā€ bit and a by-id disk name of ā€œnvmeā€“part3ā€ does suggest a missing serial between the ā€˜-ā€™ characters on the part3 naming rule but I donā€™t see it missing in the rules.

Anyway:

Not quite as Iā€™d rather not have any means by which a null serial enters the db but this observation is definitely helpful as it pointed me at a scan_disks helper that simply doesnā€™t know about nvme type names, root_disk(), so Iā€™m going to have a quick look at that but it would help if you could clarify the current status and also, if you would, a current long listing of the /dev/disk/by-id directory ie:

ls -la /dev/disk/by-id/

just as you did in your very first post in this thread. That way I can see more where we are at on the by-id names / serials with your current udev rules.

Also could you post the output of the following:

cat /proc/mounts | grep /dev/

As this is what the root_disk() looks at and scan_disks() uses itā€™s returned result to identify the base device; so yes I think we are definitely wearing this one down; if only gradually.

So yes as you say:

exactly.

Thanks for your assistance and Iā€™ll have a quick look at the name parsing in root_disk() to try and sort at least this part, but the above updated info would help with this. Cheers.

@philxnet

Correct. They re-appeared after a couple of seconds.

Sorry for the confusion. This was the output with the ā€œoldā€ udev rule containing the ID_SCSI_SERIAL environment variable. On 4.6 this one seems to be no longer reported.

Outputs:

[root@rockstor ~]# ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 280 Jul 16 11:48 .
drwxr-xr-x 6 root root 120 Jul 16 11:48 ..
lrwxrwxrwx 1 root root   9 Jul 16 11:48 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M3CLUK0U -> ../../sdc
lrwxrwxrwx 1 root root   9 Jul 16 11:48 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M3VR0FVS -> ../../sdd
lrwxrwxrwx 1 root root   9 Jul 16 11:48 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6DT9PEV -> ../../sda
lrwxrwxrwx 1 root root   9 Jul 16 11:48 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M7ST1PS4 -> ../../sdb
lrwxrwxrwx 1 root root  13 Jul 16 11:48 nvme-S2GLNCAH102797F -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Jul 16 11:48 nvme-S2GLNCAH102797F-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jul 16 11:48 nvme-S2GLNCAH102797F-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jul 16 11:48 nvme-S2GLNCAH102797F-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root   9 Jul 16 11:48 wwn-0x50014ee261f6918a -> ../../sdb
lrwxrwxrwx 1 root root   9 Jul 16 11:48 wwn-0x50014ee261f7d976 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jul 16 11:48 wwn-0x50014ee2628ab5e0 -> ../../sda
lrwxrwxrwx 1 root root   9 Jul 16 11:48 wwn-0x50014ee2b7e073dc -> ../../sdc

ā€“

[root@rockstor ~]# cat /proc/mounts | grep /dev/
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/nvme0n1p3 / btrfs rw,relatime,ssd,space_cache,subvolid=257,subvol=/root 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
/dev/nvme0n1p3 /home btrfs rw,relatime,ssd,space_cache,subvolid=258,subvol=/home00 0 0
/dev/nvme0n1p1 /boot ext4 rw,relatime,data=ordered 0 0
/dev/sda /mnt2/data_pool btrfs rw,relatime,space_cache,subvolid=5,subvol=/ 0 0
/dev/nvme0n1p3 /mnt2/rockstor_rockstor btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/ 0 0
/dev/sda /mnt2/home btrfs rw,relatime,space_cache,subvolid=258,subvol=/home 0 0
/dev/sda /mnt2/media btrfs rw,relatime,space_cache,subvolid=260,subvol=/media 0 0
/dev/sda /mnt2/lhooge btrfs rw,relatime,space_cache,subvolid=264,subvol=/lhooge 0 0
/dev/sda /mnt2/vmshare btrfs rw,relatime,space_cache,subvolid=259,subvol=/vmshare 0 0
/dev/nvme0n1p3 /mnt2/rock-ons-root btrfs rw,relatime,ssd,space_cache,subvolid=314,subvol=/rock-ons-root 0 0
/dev/nvme0n1p3 /mnt2/home00 btrfs rw,relatime,ssd,space_cache,subvolid=258,subvol=/home00 0 0
/dev/nvme0n1p3 /mnt2/root btrfs rw,relatime,ssd,space_cache,subvolid=257,subvol=/root 0 0
/dev/sda /mnt2/pictures btrfs rw,relatime,space_cache,subvolid=608,subvol=/pictures 0 0

@snafu Excellent. Cheers for the clarification and quick response.

OK so by-id names now looking dandy.

Thanks for the /proc/mounts.

Once I have it ready are you game to trial a little bit of code changing, probably only a few lines that will need adding / changing by hand? All Python so indentation (4 space a go) will be critical.

No worries otherwise; obviously this is not without risk but you indicated being OK with this earlier so assuming this is still the case. Iā€™d trial here obviously but no nvme devices as of yet.

1 Like

Yes, sure. You can send me a patch I can test it on the machine.

1 Like

@snafu OK, Iā€™ve made some changes to src/rockstor/system/osi.py root_disk() and have associated them with the issue I opened as a result of this forum thread:

But given these changes were made against master, which has seen a fair bit of refactoring in osi.py in particular, in comparison with the 3.8-14 stable release version; Iā€™m not sure if a patch is going to cut it.

However the actual changes to root_disk() are relatively trivial and are available to view side by side in:

You could first make a copy of your existing osi.py ie:

cp /opt/rockstor/src/rockstor/system/osi.py /root/

So that if something fails edit wise you can just copy it back and apply another restart (see below)

Just ignore the line numbers as yours (in stable) will be a lot different. Obviously you donā€™t need to bother with the comments (lines beginning with ā€˜#ā€™). If this works to successfully identify the base device of your root disk then it can be added to a future update and should then in time overwrite your changes and restore the said comments.

Take care with the slightly altered structure / indentation:- only 2 lines changed and 2 added that are not comments anyway. You can always delete the existing comment lines if they are a hassle. Also Rockstor does pre-install the nano editor if that helps.

After applying the changes you will need to do a:

systemctl restart rockstor

And watch the /opt/rockstor/var/log/rockstor.log of course.

Pretty sure you will also need to re-delete the db entries as Iā€™m hoping the null serial will now no longer be generated; and as you say we shouldnā€™t have the nvme0n1 in their at all, given itā€™s represented by itā€™s part3 partition. At least if scan_disks() is better behaved with these changes anyway.

No worries if this is too much messing around, let me know and Iā€™ll try harder on the patch front. Pretty simple changes though.

Thanks.

@snafu Well I tried a little harder on the patch front and found out that one can get a patch direct from GitHub via:

https://github.com/phillxnet/rockstor-core/commit/01f924651eaceead998765229dedf84b1b141377.patch

which is quite fancy. Not sure if it will apply OK though due to the number of changes as mentioned in my last post.

@snafu By the sound of it you are already familiar with the patch program but given I just tested applying the patch Iā€™ve linked to in this thread I thought I might as well do a little wiki entry on the same. Itā€™s a bit rough but should help with instances where we (some what rarely so far) have a need for this procedure. I always have to re-discover the -p level and the -d option so at least itā€™s there now for future reference.

Do fee free to improve that document if you fancy.

@phillxnet

Unfortunately the patch differs very much from current stable channel. I do not have a root_disk method. Maybe the best is to test this patch directly on the testing channel?

@snafu Yes, that was my concern and sorry my mistake as root_disk() was moved from fs/btrfs to system/osi in the following commit:

So if you are game to hand modify your root_disk() in /opt/rockstor/src/rockstor/fs/btrfs.py that would be an option.

Give me a moment and I will just try something here, ie we might be able to change the target files in the patch as the root_disk() function was unaltered in the move.

Thanks for persevering and your patience with my slip up.

I have successfully patched the method in btrfs.py

After restarting rockstor instance and deleting the data from storageadmin_disk only the root partition of rockstor is listed:

storageadmin=# DELETE FROM storageadmin_disk WHERE name LIKE 'nvme%';
DELETE 2

ā€“

storageadmin=# select name, size, serial from storageadmin_disk;
 name |    size    |     serial      
------+------------+-----------------
 sdb  | 1932735283 | WD-WCC4M7ST1PS4
 sdd  | 1932735283 | WD-WCC4M3VR0FVS
 sda  | 1932735283 | WD-WCC4M6DT9PEV
 sdc  | 1932735283 | WD-WCC4M3CLUK0U
(4 rows)

ā€“

storageadmin=# select name, size, serial from storageadmin_disk;
   name    |    size    |     serial      
-----------+------------+-----------------
 sdd       | 1932735283 | WD-WCC4M3VR0FVS
 nvme0n1p3 |  233098444 | S2GLNCAH102797F
 sda       | 1932735283 | WD-WCC4M6DT9PEV
 sdb       | 1932735283 | WD-WCC4M7ST1PS4
 sdc       | 1932735283 | WD-WCC4M3CLUK0U
(5 rows)

So this patch seems to work. Thanks for your help in fixing this issue!

1 Like

@snafu Thanks, was still struggling with sed ing the patch as presented and some how ended down a rabbit hole of sorts.

Thanks for testing this out and generalising your sql command. Iā€™ll do a pull request to hopefully get this into the next stable release.

Could you post a fresh screen grab of the Disks page as a quick reference / skim read element for others?