Hi All, Hopefully I can get some help.
A few weeks ago my Rockstor appliance failed overnight. I’ve been using it as a backup device, as well as for sharing files around our studio.
As far as I can figure out there was a single drive failure in the 6 drive raid 6. There was also (possibly unrelated) a memory failure in the box.
The drive failure took the whole system offline. which seems unusual for a raid 6, is should operate fine in a degraded state. Replacing the failed drive did nothing.
I then discovered the memory failure. I think the failed memory corrupted the OS/boot drive (usb stick)
I rectified the memory and confirmed with a 24 hour memtest and created a new boot USB but it failed to remount the pool.
I attempted the recovery advice in the Rockstor manual. The boot drive corrupted again.
I then purchased a new motherboard, cpu and ecc memory and tried again. New boot but I’m unable to remount the storage drives.
I re-connected the hard drives and got this when I tried to remount the pool using the web interface. Studio-Storage is the main drive pool.
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 353, in _btrfs_disk_import
mount_root(po)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 142, in mount_root
run_command(mnt_cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 98, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = [’/bin/mount’, ‘/dev/disk/by-label/Studio-Storage’, ‘/mnt2/Studio-Storage’]. rc = 32. stdout = [’’]. stderr = [‘mount: wrong fs type, bad option, bad superblock on /dev/sdf,’, ’ missing codepage or helper program, or other error’, ‘’, ’ In some cases useful info is found in syslog - try’, ’ dmesg | tail or so.’, ‘’]
I also had the following show up on the appliance boot screen
[ 22.666886] usbhid 1-1.3.1:.0: can’t add hid device: -110
[ 114.119442] BTRFS: failed to read the system array on sda
[ 114.132233] BTRFS: open_ctree failed
btrfs fi show brings up the following:
label ‘rockstor_rockstor’ uuid: 81e92640-151f-4855-a75d-11b5da754478
Total devices 1 FS bytes used 1.45GiB
devid 1 size 12.50GiB used 4.04GiB path /dev/sde3
warning, device 1 is missing
checksum verify failed on 12642209480704 found 22541675 wanted C6B7ABC3
checksum verify failed on 12642209480704 found 2BFFA285 wanted 6FDAC69D
checksum verify failed on 12642209480704 found 2BFFA285 wanted 6FDAC69D
bytenr mismatch, want 12642209480704, have 116272015881216
Couldn’t read chunk tree
Label: ‘Studio-Storage’ uuid: 07c20ccd-ace8-4c9a-802e-dd582c6e91df
Total devices 6 FS bytes used 3.99TiB
devid 2 size 2.73TiB used 1.01TiB path /dev/sdc
devid 3 size 2.73TiB used 1.01TiB path /dev/sdb
devid 4 size 2.73TiB used 1.01TiB path /dev/sda
devid 5 size 2.73TiB used 1.01TiB path /dev/sdg
devid 6 size 2.73TiB used 1.01TiB path /dev/sdf
*** Some devices missing
The last email I received from Customer support on the 31 August is that it seems the drive names have changed upon reboot and interfered with the system pool. This was apparently a bug that has since been rectified.
Anyway, I’m pretty new to Linux, and to open source in general and would appreciate any help that I can get.
There are only a couple of things there that aren’t backed up elsewhere, but I’d like to know that this kind of thing can be recovered. There’s not much point having the newest, fanciest file system on the block if something like this loses all your data.
Edit: I should note that I’m running Rockstor 3.8-14.11 and have a stable updates subscription.