Solved: Clone boot drive from SSD to NVME dracut issues

I originally installed Rockstor a few months ago on a 2.5” SSD I had from another project. ASRock H270M-ITX/ac motherboard.

To free up a SATA port and use that SSD in another project, I bought an Intel M.2 PCIe 3.0 x4 flash drive - SSDPEKKW128G7X1. I then used clonezilla booted off a USB to copy my old SSD across to this (basic or advanced made no difference).

When I rebooted to the new SSD though, it sat for ages initial boot, then gave a heap of dracut delay errors, before dropping to a dracut prompt. It gave an error that cannot find the UUID. Looking in the dev directory, the drive UUID was missing, and also missing the /dev/nvme devices that should exist for a PCIe drive.

If I booted the old SSD (into Rockstor) it worked fine, and I got /dev/nvme entries properly. So the full boot knew about it, but not dracut/grub.

After much hunting, I found the following solution.

I went back to the original SSD, booted it, went to prompt and did as root:

echo ‘add_drivers+=" nvme "’ > /etc/dracut.conf.d/nvme.conf

dracut -f

I got this info from:
https://bugzilla.redhat.com/show_bug.cgi?id=910734

I then cloned this drive again to the PCIe SSD, and everything works now.

The problem was that the dracut ramdrive didn’t have the nvme driver - even modprobing didn’t do anything as no driver to load, so I had to add the nvme driver to the ramdrive so it could find it during boot. The first command tells it to include the driver, the second command rebuilds the ramdrive image, which will then include this driver. It then just worked.

Possibly could do it a cleaner way, but this worked for me. (I contemplated using a rescue disk to get it to boot, then try to fix it from there and avoid recloning it, but the clone only took 20 minutes) When I ran the above commands the PCIe SSD was fitted, not sure if it would make a difference if it wasn’t.

I spent a while looking for a solution to this one, so thought I would share the solution. I saw some other posts on Rockstor working with this sort of drive, just confirming it does for me anyway.

I’m not sure if I did a clean install direct to the PCIe SSD it would have just worked since it would ‘know’ that it needed the nvme driver during setup, or if something should be changed upstream to include this driver.

(Originally during install I had dracut problems when booting th install USB, I needed to set the bios to boot as legacy rather than UEFI. That all seems unrelated, except once installed it still only boots as legacy, not as UEFI. No bios setting helped my above problem).

Hope this helps someone.

2 Likes

@Ivan Nice find; and a belated welcome to the Rockstor community.

Thanks for sharing your solution here.

On a related note we had a bit of a job getting nvme device’s partitions recognised as they have a rather unique naming scheme, see @f_l_a and my discussion on this in the following forum thread:

which in turn also links to:

where @snafu developed the needed (at the time) udev rules to surface the required unique serial number and helped with diagnosing the naming issues our code originally failed to understand:

https://github.com/rockstor/rockstor-core/issues/1397
and it’s pr:
https://github.com/rockstor/rockstor-core/pull/1399

It may be that udev since then has been updated, by upstream CentOS, to no longer make custom udev rules necessary. What is your experience with Rockstor’s current dealing with this, ie do you get any serial number warning on your Disks page?

Thanks for your contribution here, most valuable, especially as these (strangle dev named) devices become more popular.

I’d read a lot of those, anything nvme, but that gave me some hope (and only read them after buying the drive).

The disk list looks fine.

The new disk is at the bottom, displays fine - all the other bits must be coming together!

There doesn’t seem to be any errors in the main Rockstor log related to this.

If anything else you want me to check let me know.

Ivan

@Ivan Thanks for the feedback.

Yes, good news then; we must of had a relevant udev upgrade from upstream. Cool.

It might be nice to just make sure that once you remove the now detached device with serial starting S1SMNWA… which is presumably your prior 2.5” SSD root device from before the clonezilla transfer, that all is still OK. This can be done via the bin icon against the “detached-(random-uuid)” named device).
Cheers.

If you are at all nervous about this then no worries it’s just an outstanding loose end from the root device switching and would be nice to confirm as working as intended before you get too far along with your deployment.

Thanks for helping to proof these functions / features by the way.

I’ve removed the old drive now, no problems that I can see.

@Ivan Thanks for the confirmation.