Software RAID Rockstor OS Failing to Boot

NTsoftware · January 30, 2018, 7:05pm

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

Software RAID Rockstor OS Failing to Boot
rescue boot option hangs at:
[…*.] A start job is running for dev-disk-by\x2duuid-d3bfedc8\x2d…46f\x2d906f\x2d4fb8d784a9af.device (3min 55s / no limit)

Detailed step by step instructions to reproduce the problem

Need help with a Linux software RAID1 mirrored Rockstor OS installation that is failing to boot after a disaster event.

Backstory: (I appologize in advance for the length of this post, but I suspect the extensive detail will be helpful)

In a search for a preferred small business type 1 hypervisor (HV) deployment, we decided to pursue Citrix XenServer. The core principal is developing a reasonably powerful single physical system that provided most if not all the back office services our small business clients would need. As dictated by client status, additional physical XenServer’s would be added for redundancy and/or expanded operations. Our first deployments have been quite successful. We develop most of our back office services on CentOS based distros, ideal for XenServer which is also built on CentOS.

As we continued our tests, we realized having data stored in virtual drives within the various virtual machines (VM) was inefficient. Resizing virtual drives, transfering data in and out of virtual systems, sharing of data amoung virtual and physical machines on the network; these were just some of the issues we wanted to resolve so we started to consider a NAS deployment. Ideally, the NAS would exist as a VM within the HV. We tested a few of the top rated open source offerings, but really wanted to make Rockstor our solution of choice because it too, is based on CentOS; CentOS 7 in fact which is a perfect match for XenServer 7.

Our first disappointment with attempting to deploy Rockstor on XenServer was almost immediate. XenServer does not serialize its virtual disks, therefore, Rockstor cannot use them. Ultimately, this drove us to investigate methods to pass the physical hard drives of the HV to the Rockstor VM. This seamed to also present a desirable side effect to one of the inefficiencies we wanted to resolve, moving data between the physical and virtual worlds. If data ultimately resided on a pysical hard drive in its natural state, then said hard drive could be removed and connected to other systems for data transfers.

Our efforts to pass a physical drive to the Rockstor VM was proving to be a failure. This triggered the idea of passing an entire storage controller to the Rockstor VM instead. This turned out to be quite easy, however, booting the Rockstor VM initially through us a curve ball. In short order, we resolved that by installing the /boot partition of the Rockstor VM on a XenServer virtual drive. This allowed the Rockstor VM to get the necessary information to subsequently connect to the physical controller being passed directly from XenServer and boot from the Rockstor OS.

In the end, the installation was ideal. Rockstor performance was consistent with a bare metal installation, however, since it existed as a VM in XenServer it bridged both the virtual and physical systems. Other VMs could connect over the XenServer internal network which transferred data about 4X faster than the physical Gigabit Ethernet connections and was immune to failures of the physical network components. The data, in its natural form, actually resided on physical drives in the real world making data transfers extremely efficient and flexible. Since the underlying filesystem is BTRFS, this also sets up the ability to deploy redundant backup onsite and/or offsite. One drawback does exist, however, capturing a snapshot of the VM in XenServer only protect the /boot partition of Rockstor so a bare metal disaster recovery practice is required.

Current Configuration:

SuperMicro X8DTL-6F with dual Xeon X5650, 96GB RAM, Intel 82599ES 10 Gigabit SFP+ Adapter, AMD Radeon WX 3100 GPU
Storage: Hyper-V PCIe SSD Adapter (XenServer OS and local storage), (2) Intel SATA 120 GB SSD (Rockstor OS Linux RAID1 Mirror), (8) 3TB Enterprise SAS & (4) 6TB Enterprise SATA HDs

Rockstor VM: /boot (XenServer virtual disk /dev/xvda1); / (Linux software RAID1 mirror /dev/md127 on Intel SSDs as /dev/sda2 & /dev/sdb2 per Rockstor howto: http://rockstor.com/docs/mdraid-mirror/boot_drive_howto.html);
SWAP partition (Linux Software RAID0 /dev/md126 on Intel SSDs as /dev/sda1 & /dev/sdb1); Disk Pools: Primary (BTRFS 8-Drive RAID10 /dev/sdc through /dev/sdj); Secondary (Variable BTRFS usage of (4) 6TB SATA HDs)

This configuration has been operating extremely well for well over a year until the following event:

After building a second Windows 10 VM on the XenServer with PCIe GPU pass through of the AMD Radeon WX 3100, the first Windows 10 VM (also configured for WX 3100 PCIe GPU pass through) was crashing on boot. If we removed the PCIe GPU pass through configuration, it booted normally. Only one of the VMs configured to utilize the WX 3100 using PCIe GPU pass through are operating at any one time. We have had success in sharing the single WX 3100 with multiple VMs up to this issue. The WX 3100 drivers installed on the second Windows 10 VM were newer and offered the ability to switch between a “Proffessional” and “Gaming” configuration. Since the first Windows 10 VM booted normally without the WX 3100 pass through, we believed the newer driver “Gaming” install changed the WX 3100s hardware configuration in a manner that was making the original Windows 10 VM crash when connected to the GPU.

We proceeded to remove the original AMD WX 3100 driver from the first Windows 10 VM and attempted to load the new one. The attempts to install the newer driver and duplicate the WX 3100 setup of the second Windows 10 VM proved to be unsuccessful. After a few iterations of reverting back to the uninstalled GPU driver snapshot of the first Windows 10 VM, everything hung. The XenServer HV console was black and unresponsive to keyboard commands. We were unable to ping the XenServer management IP so SSH was not available as well. Not having any other means to attempt a safe reboot of the XenServer HV, we pressed the hardware reset button.

The XenServer rebooted successfully. Standard procedure is to start the Rockstor VM before any other VM so the NAS system is available for those VMs that rely on its shares. The Rockstor VM boot was slow and not normal. It did eventually finish and the WebGUI was accessible, however, one of the physical HDs LED of the “Primary” pool was indicating a problem. All of the shares of the “Primary” disk pool were offline. Accessing the Rockstor VM console confirmed the “Primary” pool was throwing errors complaing about one of the drives.

Since the drives are reasonably new enterprise class, we wanted to power cycle the entire system to see if it would resume normal operation. We executed a proper shutdown/power down of all VMs and the XenServer HV. When we brought the XenServer HV and Rockstor VM back up, the physical HDs LED was still indicating a problem. We shutdown/powered down the entire system again, removed the faulty drive and replaced it with a 6TB SATA not being used from a physical Rockstor machine with the goal of repairing the “Primary” disk pool with the newly inserted driver per the “Data loss Prevention and Recovery” webpage (http://rockstor.com/docs/data_loss.html). We rebooted the XenServer HV, however, when we attempted to reboot the Rockstor VM, it dropped to a dracut prompt. The OS Linux RAID was failing to mount even on a “Recovery” boot.

Using the Rockstor 3.9.1 ISO and booting to “Rescue a Rockstor system”, consistent with the “Mirroring Rockstor OS using Linux Raid” webpage, we were able to gather the following information:

After executing:
sh-4.2# mdadm --assemble --scan
mdadm: /dev/md/NAS-1:127 has been started with 2 drives.
mdadm: /dev/md/NAS-1:126 has been started with 2 drives.

sh-4.2# mdadm --detail /dev/md12[67]

/dev/md126:
Version : 1.2
Creation Time : Mon Apr 17 18:27:28 2017
Raid Level : raid0
Array Size : 4198400 (4.00 GiB 4.30 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Mon Apr 17 18:27:28 2017
      State : clean

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

 Chunk Size : 512K

       Name : NAS-1:126
       UUID : 5e4e2932:43e7dedf:3d04c168:0df7203b
     Events : 0

Number   Major   Minor   RaidDevice State
   0       8      129        0      active sync   /dev/sdi1
   1       8      145        1      active sync   /dev/sdj1

/dev/md127:
Version : 1.2
Creation Time : Mon Apr 17 18:27:51 2017
Raid Level : raid1
Array Size : 115052544 (109.72 GiB 117.81 GB)
Used Dev Size : 115052544 (109.72 GiB 117.81 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jan 30 09:01:06 2018
      State : clean

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

       Name : NAS-1:127
       UUID : d5ee6105:00d07159:29fd91f4:a5de2e54
     Events : 3283

Number   Major   Minor   RaidDevice State
   0       8      130        0      active sync   /dev/sdi2
   1       8      146        1      active sync   /dev/sdj2

sh-4.2# btrfs fi show

warning devid 4 not found already
warning devid 2 not found already
warning, device 4 is missing
Label: ‘Primary’ uuid: 79cdb256-36ad-419a-935b-61d5f1c11f05
Total devices 8 FS bytes used 6.89TiB
devid 1 size 2.73TiB used 1.72TiB path /dev/sde
devid 2 size 2.73TiB used 1.72TiB path /dev/sdf
devid 3 size 2.73TiB used 1.72TiB path /dev/sdh
devid 5 size 2.73TiB used 1.72TiB path /dev/sdd
devid 6 size 2.73TiB used 1.72TiB path /dev/sda
devid 7 size 2.73TiB used 1.72TiB path /dev/sdc
devid 8 size 2.73TiB used 1.72TiB path /dev/sdb
*** Some devices missing

Label: ‘NAS-1’ uuid: d3bfedc8-8348-446f-906f-4fb8d784a9af
Total devices 2 FS bytes used 7.53GiB
devid 1 size 109.72GiB used 35.04GiB path /dev/md127
*** Some devices missing

btrfs-progs v3.19.1

Attempting to mount /root:

sh-4.2# mount /dev/md127 /mnt/md127

mount: wrong fs type, bad option, bad superblock on /dev/md127,
missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

Same result mounting by uuid. Heres the grub information:

/boot/grub2/grub.cfg

…
menuentry ‘Rockstor (4.10.6-1.el7.elrepo.x86_64) 3 (Core)’ --class rockstor --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option ‘gnulinux-4.8.7-1.el7.elrepo.x86_64-advanced-d3bfedc8-8348-446f-906f-4fb8d784a9af’ {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1’
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint=‘hd0,msdos1’ c089a0e5-faa1-42db-bb82-d5b907391687
else
search --no-floppy --fs-uuid --set=root c089a0e5-faa1-42db-bb82-d5b907391687
fi
linux16 /vmlinuz-4.10.6-1.el7.elrepo.x86_64 root=UUID=d3bfedc8-8348-446f-906f-4fb8d784a9af ro rootflags=subvol=root rd.md.uuid=d5ee6105:00d07159:29fd91f4:a5de2e54 rd.md.uuid=5e4e2932:43e7dedf:3d04c168:0df7203b rhgb quiet LANG=en$
initrd16 /initramfs-4.10.6-1.el7.elrepo.x86_64.img
}
menuentry ‘Rockstor (4.8.7-1.el7.elrepo.x86_64) 3 (Core)’ --class rockstor --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option ‘gnulinux-4.8.7-1.el7.elrepo.x86_64-advanced-d3bfedc8-8348-446f-906f-4fb8d784a9af’ {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1’
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint=‘hd0,msdos1’ c089a0e5-faa1-42db-bb82-d5b907391687
else
search --no-floppy --fs-uuid --set=root c089a0e5-faa1-42db-bb82-d5b907391687
fi
linux16 /vmlinuz-4.8.7-1.el7.elrepo.x86_64 root=UUID=d3bfedc8-8348-446f-906f-4fb8d784a9af ro rootflags=subvol=root rd.md.uuid=d5ee6105:00d07159:29fd91f4:a5de2e54 rd.md.uuid=5e4e2932:43e7dedf:3d04c168:0df7203b rhgb quiet LANG=en_$
initrd16 /initramfs-4.8.7-1.el7.elrepo.x86_64.img
}
menuentry ‘Rockstor (0-rescue-b34d6bf30b1a4111bd4ec91f7d242246) 3 (Core)’ --class rockstor --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-b34d6bf30b1a4111bd4ec91f7d242246-advanced-d3bfedc$
load_video
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1’
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint=‘hd0,msdos1’ c089a0e5-faa1-42db-bb82-d5b907391687
else
search --no-floppy --fs-uuid --set=root c089a0e5-faa1-42db-bb82-d5b907391687
fi
linux16 /vmlinuz-0-rescue-b34d6bf30b1a4111bd4ec91f7d242246 root=UUID=d3bfedc8-8348-446f-906f-4fb8d784a9af ro rootflags=subvol=root rd.md.uuid=d5ee6105:00d07159:29fd91f4:a5de2e54 rd.md.uuid=5e4e2932:43e7dedf:3d04c168:0df7203b rhg$
initrd16 /initramfs-0-rescue-b34d6bf30b1a4111bd4ec91f7d242246.img
…

Booting to the “rescue” menu entry hangs at the following:

[…*.] A start job is running for dev-disk-by\x2duuid-d3bfedc8\x2d…46f\x2d906f\x2d4fb8d784a9af.device (3min 55s / no limit)

and will count time forever.

So the question is, how do I get the Rockstor OS (/dev/md127 back online?

After that, I expect follow the “Data loss Prevention and Recovery in RAID10 Pools” by executing:

btrfs device delete missing /mnt2/Primary

followed by: (spare 6TB SATA Drive is currently /dev/sdg)

btrfs device add /dev/sdg /mnt2/Primary

Any feedback would be greatly appreciated! Unfortunately, because we’ve enjoyed flawless functionality of our Rockstor configuration, we haven’t been keeping our backup images of the Rockstor OS RAID up to date. While we have a dated image that potentially could be used to restore the Rockstor OS, we would lose a fair amount of changes made since its creation. We did use btrfs restore to copy out the root partition so we’re hoping that may be an option if we have to rebuild the Rockstor OS. We’ve also imaged the Rockstor OS (/dev/sda & /dev/sdb) as they currently exist in hopes we can do no more harm.

UPDATE

Editing the rescue menu option of the original installation with rd.break=pre-mount, I was able to gain access to a command prompt and begin reviewing the status at the time the boot fails. What seems significant at first glance is mdadm --detail /dev/md127 reports a raid0 with the Name of NAS-1:126 and a UUID of 5e4e2932:43e7dedf:3d04c168:0df7203b using /dev/sdi1 and /dev/sdj1, which is consistent with /dev/md126, the SWAP partition. Attempting a temporary mount throws an unknown filesystem type ‘swap’ error also consistent with /dev/md126. When we originally deployed the software RAID1 per the webpage http://rockstor.com/docs/mdraid-mirror/boot_drive_howto.html, through multiple identical installs and testing to ensure the result was as desired, the md numbering of 126 and 127 seemed to be inconsistent from one attempt to the next even though the procedure was identical and concerted efforts were used to try to control which md device would be root and which one would be SWAP. At this point, our operating hypothesis is the disk id’s have been inverted as a result of the disaster event and attempt to recover. We will investigate a means to correct this issue and absent any advice to the contrary will attempt to make said correction.

NTsoftware · February 2, 2018, 9:49pm

Based on our continued research on a Linux software RAID /root directory and recovering our system since we only used the ROCKSTOR OS RAID for Rockstor and Rockons, we’ve rebuilt the originating OS. This has caused us to re-evaluate the value of placing the OS on a software RAID as information indicates Linux will refuse to boot a degraded software RAID device.

We originally pursued the software raid using btrfs because it supported SSD trim and other optimization versus the Intel RST motherboard RAID controller that would provide a solution to booting in a degraded mode. We redeployed the software raid because there are 4 additional SATA drives on the OS controller which is configured for “initiator target” IT mode to provide the best connection to the btrfs system.

We successfully used btrfs restore command to extract all the data out of the degraded array while it was offline. Upon reactivating our Rockstor appliance and upgrading the 3.9.1 ISO, which involves a Linux kernel upgrade, we pursued bringing the array back online.

We started by mounting the pool in degraded mode and added a temporary physical drive in the drive slot where the original failed. Both of those commands were successful. Then we executed btrfs device delete missing command. It has yet to exit and it has been a few hours. We have noticed btrs fi show indicates the used space of the new drive is growing about 3GiB/hour so were hoping that’s an indication it is working on putting the array back in order.

We’ve seen a few others mention the “delete missing” process seems to stall, but the issue was unresolved. btrs device usage shows the new drive RAID10 Data is using 5 GiB and slowly climbing over time, but it does not show the RAID10 Metadata or System line items as it does for the other drives. The Rockstor VM performance monitoring is consistent with a single cpu of the 4 allocated is pegged at 100%, but the second SSH console is still completely responsive.

At this point, we’ll wait and see, but if anyone has some useful input to the rebuilding process, by all means pipe in! Thanks.

NTsoftware · February 5, 2018, 3:19am

The speed of the array rebuild continued at a snails pace of a few GiB per hour. Given the array is 12 TiB, this was a big problem. Hat’s off to a comment by “Grizzly” in the forum as the recommendation to disable quotas did the trick. It took about a minute to complete its execution, but now the CPU usage of the 4vCPU VM have returned to normal operation sharing a total CPU load of about 25% equally between all processors accompanied by disk read and write stats consistent with the LSI SAS2008 MB controller. At the new rate, should take about 5 hours to finish the “delete missing” command.

grizzly · February 6, 2018, 12:15pm

Yes I think disabling quotas becomes a necessity at scale. Need to ensure Rockstor functions just as well without. I believe some good work has been achieved lately removing dependencies on quotas.