So my current suspicion is that this is my fault, sorry. I recently found and fixed a rather long term and elusive bug concerning Rockstor with 27 or more drives when the system drive was also named sda (plus a few other caveats). During the development of this fix there were a number of inadvertent side effects on disk recognition. For each I developed a test to ensure the ‘fixed’ code (drive recognition level) functioned as expected; and as it had done previously. The aim of course was to introduce no regressions and only add the fix. In the case of an nvme system drive I now suspect I missed the mark and there currently exists no testing, regression or otherwise, for this arrangement: which did work at least at some point as evidenced by yours and others reports. The change I suspect was for issue:
and was fixed by the changes in pull request:
which was released in Rockstor stable channel version: 3.9.2-31 (20 days as of writing).
If you would indulge me a little more I would like to ask that you further provide what I hope will be enough info for me to create a test to reproduce your issue: missing/detached nvme system disk, ie it was there and after an update is no longer recognised as attached. This should help to avoid the same regression going forward and should also help with fixing what ever when wrong in the first place.
The procedure required on your end is a little cumbersome but given your supplied work around should be trivial for your. Essentially I require the same procedure and info as I requested from @kingwavy referenced in the indicated issue which in turn grew from the following forum thread:
More specifically in my 14th May post in that thread:
Repeating the request in this thread for ease and slightly modified for this instance:
Could you also post the output of the following commands:
and when the above command has a serial entry such as:
SERIAL=""
we resource udev via:
get_disk_serial location:
which in turn parses the following command. If you could execute this command on one of the drives that is showing the above empty serial (if any) it may help track this bug down:
udevadm info --name=devname-here
Also if you could remove the “#” and following space in front of the following line on your installed version located here:
/opt/rockstor/src/rockstor/system/osi.py
and then enable debug logging via:
/opt/rockstor/bin/debug-mode on
Then either reboot or restart the rockstor service via:
systemctl restart rockstor
We should then be able to see in your logs what scan_disks() is passing to _update_disk_state() and confirm or otherwise my suspicion that the nvme disk is just not being parsed correctly by scan_disks(). Or at least narrow down where the problem originates. Look in the main rockstor log for debug stuff, either via the UI component in System - Log Manager, or in:
/opt/rockstor/var/log/rockstor.log
So given this is suspected as a recent regression it would be good to get this one sorted as soon as possible and have some tests so that the same doesn’t happen again.
Sorry to have to ask all of this but it would be invaluable to get this output so that we can be assured that your particular instance is catered for as it is currently the only reproducer we have had reported.
Once we narrow down what is causing this bug I will open an issue with the relevant details and a fix can then be logged against that issue.
Thanks again for your help with this one and for helping to support Rockstor development via a stable subscription.