I broke it...Split RAID configuration

D_Jones · March 23, 2016, 1:25pm

Hope someone might be able to help me fix this, new to Rockstor, comfortable on CLI, but haven’t played with Linux in 10 years. I’m probably the cause of this, and there is no critical data on my NAS yet so I could just reinstall, but I wanted to share this so I learn more about btrfs, and if it’s a bug the community becomes aware of it.
Built and installed Rockstor on 5x4TB set up as RAID 6 (I know, bleeding edge), created shares, added data, everything looked good. At one point the first (hotswapable) drive in the array /dev/sdb was inadvertently un-docked and re-docked. Tried installing a couple rockons both before and after this all of which failed, but the ones afterwards failed with a file system read only error. I thought maybe the RAID 6 choice was casing the problems so tried converting to RAID 1 multiple times the balance failed. After lots of forum reading and trying several ideas I have been unable to mount the pool rw. This looks like it’s part of my problem
[root@jonesville ~]# btrfs fi df /mnt2/Jones-Pool
Data, RAID6: total=21.00GiB, used=20.58GiB
System, RAID6: total=9.56MiB, used=16.00KiB
Metadata, RAID1: total=8.00GiB, used=1.44MiB
Metadata, RAID6: total=4.03GiB, used=1.13GiB
GlobalReserve, single: total=400.00MiB, used=0.00B
I only have a couple Meg of data loaded.
All the shares have disappeared from the Rockstore web UI
This might be helpful as well:
[root@jonesville ~]# btrfs fi show
Label: ‘rockstor_rockstor’ uuid: fc1f40be-7805-4d82-8c17-28907b119917
Total devices 1 FS bytes used 22.47GiB
devid 1 size 24.67GiB used 24.27GiB path /dev/sda3
Label: ‘Jones-Pool’ uuid: d6e89bda-6953-4eee-9217-79dae2958b40
Total devices 5 FS bytes used 21.71GiB
devid 1 size 3.64TiB used 11.35GiB path /dev/sdb
devid 2 size 3.64TiB used 11.35GiB path /dev/sdc
devid 3 size 3.64TiB used 12.35GiB path /dev/sdd
devid 4 size 3.64TiB used 11.35GiB path /dev/sde
devid 5 size 3.64TiB used 11.35GiB path /dev/sdf
Followed several rescue suggestions from btrfs forums with no luck.
Open to suggestions, thanks for the help.
Del

Spectre694 · March 23, 2016, 10:15pm

How exactly did you try to mount it mount -o degraded,rw or mount -o degraded,rw,recovery?
Also how healthy was the pool before you tried to change the RAID level?

D_Jones · March 24, 2016, 2:56pm

Dont’ remember for sure, but I think it was the first. Just tried both again with following results.
[root@jonesville~]# mount -o degraded -rw recovery /mnt2/Jones-Pool
mount: special device recovery does not exist
[root@jonesville~]# mount -o degraded -rw /mnt2/Jones-Pool
[168602.000678] BTRFS error (device sdb): Remounting read-write after error is not allowed
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error

this is not the first time I’ve seen the bad superblock error but so far I’ve been unable to find a method to fix it.
Thanks for the help!

Spectre694 · March 24, 2016, 8:37pm

I wonder if the RAID didn’t get messed up with the unmount/mount and then the attempt to change RAID levels.

Hmm reboot and then try these:

btrfs device scan (force rescan of disks)
then
mount -o degraded,rw,recovery /dev/sdb /mnt2/Jones-Pool
or
mount -o degraded,rw /dev/sdb /mnt2/Jones-Pool

If that doesn’t hopefully it’ll spit errors out on the same physical disk every time and you can try to mount it without that disk present.

D_Jones · March 25, 2016, 1:52pm

Based on what I’ve seen before I thing /dev/sdb is the problem drive. Here’s the output from your suggestions.

[root@jonesville ~]# btrfs device scan
Scanning for Btrfs filesystems
(Didn’t expect no output, Jones-Pool still shows in Rockstor UI)

[root@jonesville ~]# mount -o degraded,rw,recovery /dev/sdb /mnt2/Jones-Pool
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.

[root@jonesville ~]# mount -o degraded,rw /dev/sdb /mnt2/Jones-Pool
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.

Here is dmesg | tail output
[ 578.070264] systemd[1]: systemd-journald.service: got WATCHDOG=1
[ 600.961805] systemd[1]: Got notification message for unit systemd-logind.service
[ 600.961824] systemd[1]: systemd-logind.service: Got notification message from PID 2478 (WATCHDOG=1)
[ 600.961835] systemd[1]: systemd-logind.service: got WATCHDOG=1
[ 600.962561] systemd[1]: Got notification message for unit systemd-journald.service
[ 600.962577] systemd[1]: systemd-journald.service: Got notification message from PID 1371 (WATCHDOG=1)
[ 600.962588] systemd[1]: systemd-journald.service: got WATCHDOG=1
[ 635.960744] systemd[1]: Got notification message for unit systemd-journald.service
[ 635.960756] systemd[1]: systemd-journald.service: Got notification message from PID 1371 (WATCHDOG=1)
[ 635.960764] systemd[1]: systemd-journald.service: got WATCHDOG=1
@Spectre694 Thanks again for the help.

D_Jones · March 25, 2016, 2:06pm

Tried a few more btrfs check commands, not sure if these add any helpful information
[root@jonesville ~]# btrfsck -b /mnt2/Jones-Pool
Superblock bytenr is larger than device size
Couldn’t open file system

[root@jonesville ~]# btrfsck --qgroup-report /mnt2/Jones-Pool
Superblock bytenr is larger than device size
Couldn’t open file system

Spectre694 · March 27, 2016, 7:30pm

I can’t say for sure but I’m guessing that sdb unmounted it became inconsistent (btrfs doesn’t have a way to mark disks as such) and when you tried to change RAID levels without scrubbing first it started writing garbage data.

Still though there are a couple things left to try. Is the array mountable as readonly at least? [I’m guessing not]

D_Jones · March 28, 2016, 1:44pm

The array is mounted read only. I can see the pool, but can’t change anything because everything I try gives me a error because the pool is read only.

D_Jones · April 4, 2016, 7:20pm

@Spectre694 I finally decided it was time to wipefs and start over since it was only test data. The tipping point was when I found this thread http://thread.gmane.org/gmane.comp.file-systems.btrfs/54516/focus=54530 looks like if I didn’t get it right the first time I was done. Good news is, if anyone else runs into this is there is a patch out, just not in the kernel yet. Thanks again for the help.

Spectre694 · April 4, 2016, 7:48pm

Sorry about that late response but good find I didn’t know about that particular bug. That one is definitely worth keeping on eye on since I’m using RAID 10 on my box. Hopefully that patch is merged soon. Either way glad to help and good luck.