Help with RAID10 disk failure, BTRFS replace won't start

straighttalk86 · March 4, 2021, 1:57pm

Brief description of the problem

I have a drive that failed in a 4 disk RAID10 pool. I have been following the Data Lossa Raid 1 documentation without any luck. The drive is completely dead so it doesn’t register as a device anymore. I mount the pool and run the replace:

btrfs replace start 1 /dev/sdf /mnt2/raider10

But when i look at the status it says ‘Never started’. It won’t let me mount the pool as RW, an error is thrown

Detailed step by step instructions to reproduce the problem

fdisk -l 

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x00017bcd

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048     5222399     2098176   82  Linux swap / Solaris
/dev/sda3         5222400  1953523711   974150656   83  Linux

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sde: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


btrfs fi show

Label: 'rockstor_rockstor'  uuid: 9a41126b-4153-42d4-b106-ff205d2f260e
        Total devices 1 FS bytes used 2.47GiB
        devid    1 size 929.02GiB used 5.04GiB path /dev/sda3

Label: 'raider10'  uuid: 182033a2-cad6-4a4f-aece-1a8ddf038736
        Total devices 4 FS bytes used 407.76GiB
        devid    2 size 1.82TiB used 206.04GiB path /dev/sdc
        devid    3 size 1.82TiB used 207.01GiB path /dev/sdd
        devid    4 size 1.82TiB used 207.01GiB path /dev/sdb
        *** Some devices missing

Web-UI screenshot

phillxnet · March 6, 2021, 6:09pm

@straighttalk86 Welcome to the Rockstor community.

For this, on the command line, you can use the device name ‘missing’. But Rockstor should be able to do this for you. In your picture where it shows the “Some Missing” and “Device errors detected” you can try to follow the “Maintenance required” section.

I.e. add degraded,rw to the mount options after first refreshing backups, if needed, with the degraded,ro. You didn’t mention if you have already tried these mount options. The degraded option is required because the pool is missing a member: 3 when there are 4 registered. Plus raid 10 has 4 as it’s minimum.

Also another approach, if you have enough free space (and you are only at 11% allocated), and if you can achieve read/write access via the addition of the degraded option, is to move to raid 1 where you existing drive count is sufficient. You could the return the pool to a healthy state and able to mount without the degraded option (check via a reboot) and then add the disks and move back to raid10. Long winded but doable.

But don’t do anything until you have refreshed your backups via an existing degraded,ro mount.

A wrinkle here is that Rockstor doesn’t know about the replace capability within btrfs yet. We have the following GitHub issue open for this:

github.com/rockstor/rockstor-core

Implement a disk replacement UI

opened 05:31PM - 10 Jan 17 UTC

phillxnet

Thanks to maxhq in the following forum thread for highlighting this issue. At t…imes it is desired to replace an existing disk with another one, ie the function of: btrfs replace start devid /dev/sdX /mnt2/pool-name A user level interface for this process would make for a nice improvement. We should ensure to allow for the -r switch where reads from the source device are only carried out if not other zero-defect mirror exists (recommended for poorly devices). See: https://btrfs.readthedocs.io/en/latest/btrfs-replace.html The status of this or other current replace operations could be shown in a tab on the same page, akin to the balance tab in pools maybe. The current status can be had via: btrfs replace status /mnt2/pool-name N.B. it is generally considered to be a longer process to use replace rather than: "btrfs dev add" and then "btrfs dev delete", it might make sense to suggest this course of action in the same UI. https://forum.rockstor.com/t/problems-with-disk-replacement/2660

So for now the option option is to add a disk if you can get a degraded,rw mount.

Another key point here is to know which version of Rockstor you are using. Is it the far newer 4 ‘Built on openSUSE’ variant. Or our now older CentOS variant. Tricky to change mid problem though and the older btrfs in the CentOS variant may be able to get you out of this but the 4 variant has a higher chance of success but you’d have to build you own installer (see: rockstor-installer) as it’s still pre-stable release. Just a thought.

Sorry if I’ve missed your point here but you don’t mention trying the degraded,rw options simultaneously as in the “Maintenance required” section.

Let us know what exactly you have tried and do-not attempt a repair outside the adding a disk as raid 10 is entirely able to survive a single disk failure in many cases. It’s just unfortunate we don’t yet support Web-UI replace. But all in good time.

So if you could clarify if you’ve tried the degraded,rw and if that then worked then forum members familiar with this scenario can help via advise on your options.

Hope that helps.

straighttalk86 · March 6, 2021, 8:17pm

Philip, thanks for the reply. I am new to hosting my own NAS so a lot of this is new. I just recently paid for the stable update and I am running Rockstor version: 3.9.2-57 with Linux kernel: 4.12.4-1.el7.elrepo.x86_64

I briefly tried creating the Rockstor 4 installer from my Ubuntu server but ran into issues getting the kiwi-ng installer to work. I saw in some of the posts that it would work best creating the installer from an openSUSE base? So i might try that later.

I have tried:
mount -o degraded,rw /dev/sdb /mnt2/raider10

It errors saying that the dev is write-protected. i tried mounting other devices in the raid as well and they all say they are write-protected. i don’t rmeember setting anything up to make them write-protected… so i’m not sure how to overcome that.

I tried your suggestion of changing the RAID level to 1, but that failed because of it being mounted as RO. Everything seems to hinge at this point on being able to mount the pool as RW?

I wasn’t quite sure what this meant, if you could explain a bit further on that. I do have the files backed up on another drive. But not as any special backup or dump if that makes sense.

Thanks!

phillxnet · March 6, 2021, 8:42pm

@straighttalk86 Re:

That was all I meant actually. If this is not the only copy of the data then you are in a more workable position.

Re:

It would be better to add the degraded,rw mount options to the pool within Rockstor’s Web-UI and see who that works. Rockstor already has a mount so if you are mounting again this may confuse things. So adding the custom mount options within Rockstor’s Web-UI should instruct it to unmount / remount with these options. The Web-UI should then reflect these new options if it is possible in the first place of course. You may also find that once these options have been entered that a full reboot may help. Sometimes the kernel (especially the older versions) has difficulties in re-mounting a btrfs vol (pool in Rockstor speak) and a reboot with these options in place gives the whole system a fresh go at it.

Yes, it has to be an openSUSE Leap 15.2 install that you use to build the installer on. It’s because currently our kiwi-ng instructions depend on the host system being the same as the target system. Kiwi-ng has made improvements in that area, since we developed the config, but I’ve yet to try them out.

See: GitHub - rockstor/rockstor-installer: The kiwi-ng configuration used to create Rockstor 4 'Built on openSUSE' installers.

Warning

It is recommended that a discrete OS install is used to build the Rockstor installer due to the root user requirement and the scope of operations performed. We hope later to transition to a newer mechanism within kiwi-ng where by KVM is employed to create the required build environment. Any assistance with this endeavour would be much appreciated. See: kiwi-ng’s new boxbuild capability.

But in your circumstance, if you do go the Rockstor 4 route to get the newer btrfs; which is recommended for all repair scenarios, then there is unfortunately one more hoop to jump through. You will have to first mount degraded,ro the problem pool before you can import it on a freshly installed system. This is because of a chicken and egg situation in that you need a pool to be Rockstor managed before you can add custom mount options. And you already know that your pool is poorly and needs these options but Rockstor, as yet, has no means of knowing these options are needed before the pool becomes Rockstor managed. So the current work around is to first mount the pool at the expected location and then do the normal import procedure:

N.B. any of the pool device members can be used with equal effect on an import. And as you already know, but for east of others reading, the attached devices members can be found-out via a:

btrfs fi show

And note that the canonical names of sda, sdc etc will very likely change from one boot to another. Especially if a power cycle was involved.

If you pool is named (labelled) “raider10” then this command should do it. So if you go the Rockstor 4 route to get the much newer btrfs you will also have to first mount the pool with at least these options before you import it via the Web-UI. See doc section:

Import BTRFS Pool http://rockstor.com/docs/disks.html#import-btrfs-pool

for importing an existing pool on a freshly installed Rocksor.

This capability was added in the following pr:

github.com/rockstor/rockstor-core

allow import of ro,degraded cli mounted pool. Fixes #1890

rockstor:master ← phillxnet:1890_allow_import_of_ro,degraded_cli_mounted_pool

opened 06:00PM - 11 Feb 18 UTC

phillxnet

+83 -42

Facilitates the import of a prior cli mounted pool, this enables the ability to …use initial mount options that are currently not available via the Web-UI prior to an import. The focus here is to enable an import of a pool that requires, for example, ro,degraded mount options. 1. Fix import failure (prior TODO:) via Read-only quota enable attempt on import. 2. Fix failure on qgroup assign on ro fs and avoid these and related calls from higher level functions via (4) below. 3. Improves code consistency re PQGROUP_DEFAULT use. 4. Make additional use of the existing quota disabled mechanism to improve ro behaviour during import and subsequent share/snap refresh. 5. Avoid assigning pqgroups within db that failed creation due to ro fs. 6. Improve quota disabled robustness re redundant e.rc checks from btrfs command execution as per a previous pr code review (see pr #1874) 7. Improve / update code comments. 8. Improve pqgroup/quota management. Fixes #1890 Please see issue text for further context. @schakrava Ready for review. Tested by creating a 2 disk "rock-pool" (raid1) Shares: - rock-share - rock-share-clone Snaps: - rock-share-snap - rock-share-snap-rw - rock-share-clone-snap System was power cycled during which both disks were removed. Upon boot both "detached-..." disk entries were removed and pool / share refresh to ensure all entries were removed. One of the raid1 members was re-attached and the system booted up again. All prior mount points were removed: - chattr -i /mnt2/rock-{pool,share,share-clone} - rm -rf /mnt2/rock-{pool,share,share-clone} Cli mount to facilitate currently Web-UI impossible mount options prior to import: N.B. as per Rockstor standard locations re vol label: Make initial mount point either via UI attempted import or via: - mkdir /mnt2/rock-pool Mount ro,degraded via cli as per Rockstor expected location: - /bin/mount -o ro,degraded /dev/disk/by-label/rock-pool /mnt2/rock-pool An import via the Web-UI was enacted and pool, share, snap state verified to be as expected. The above test method was also carried out with the addition of the rock-pool having quotas disabled prior to ro,degraded mount: - btrfs quota disable /mnt2/rock-pool to ensure that we have functional quota disabled ro,degraded import capability.

And we hope in the future to enable custom mount options within the Web-UI for imports to avoid this command line requirement in the first place.

Hope that helps.

straighttalk86 · March 6, 2021, 9:15pm

I see the Extra Mount Options section now. I added the ‘degraded,rw’ options and it doesn’t try to remount when clicking the check. So I rebooted with the options set and the Pool didn’t mount at load up. Then I tried mounting from the shell:
mount -o degraded,rw /dev/sdb /mnt2/raider10

got this:

So it won’t let me mount with rw because a drive is missing but then it won’t let me add a drive without it being mounted as rw?

I’m in the process of creating an openSUSE vm to create the rockstor 4 installer or is there a place to download a prebuilt installer?

Thanks for your help. I appreciate it a lot. I have the data backed up but having these issues makes me edgy!

phillxnet · March 7, 2021, 10:30am

@straighttalk86 Thanks for the update.

Ok so this is progress of sorts as we are narrowing down what’s going on.

Not quite. The degraded mount option allows you to mount a pool, ro or rw, if a disk is missing: if the remainder of the pool is sufficiently undamaged and if the number of missing disks can be tolerated by the raid level. Btrfs raid 10 can handle a single disk missing which is the reported number of missing disks you have. But interestingly in the case of your pictured error message we have:

BTRFS warning (device sdb): missing devices (1) exceeds the limit (0), writeable mount is not allowed.
BTRFS error (device sdb): open_ctree failed

And Yes, you generally need a rw mount to make changes to a pool like delete a missing disk, add a disk, change raid level etc.

There was an issue with earlier btrfs/kernel version where pools could inadvertently contain unconverted chunks that were not at the raid level intended for the entire pool. It may be that you have been affected by this. I.e. only 1 disk missing yet the warning/error indicates a limit of 0 for writeable mount. Or it may be that an instance, again from earlier kernels, where there one was given only one shot at mounting rw on btrfs raid 1 pools. There after only ro mounts were permitted. So to investigate this further try, with your current setup of Rockstor having no mount currently, a ro mount in the same place and then paste the output of the following command.

btrfs fi usage /mnt2/raider10

And remember that you can use any of the attached (read existing) members as your device in the mount command. So if one doesn’t work, just try another. It’s a known issue that can occure with multi-drive pools. See:

Only one disk of a multi-volume filesystem will mount

section here: Problem FAQ - btrfs Wiki

Aleviated by executing a:

btrfs device scan

To re-tally whats what.

So lets see what the output of the above ‘usage’ command gives on a ro mount.

That would be great as there have been years of improvements in the kernel and the btrfs-progs so you might as well be taking advantage of them if you can.

Yes, exactly. And experience in this case is good. But do keep in mind your experience thus far is with btrfs from some years ago: hence our move to an openSUSE base as they are obviously more able to maintain their kernels in this regard that we were. So all in you will be better off on the 4 variant assuming we have fixed enough of the transition bugs from CentOS’isms to openSUSE’isms.

Just be super careful on selecting the correct drive for your install. Ideally you need to have a spare system drive for this. That way you can always boot your prior install if you end up fixing this pool. But do not have both the old and the intended new rockstor system disks attached simultaneously as this is know to not work and we have an issue open for this:

github.com/rockstor/rockstor-installer

failures when installing on a system with an existing Rockstor install

opened 02:55PM - 04 Oct 20 UTC

phillxnet

feature request

It has been observed that when installing on a system that already has a Rocskto…r install on another disk, the resulting installer will: 1: fail to run the JeOS first boot wizard (Locale, Keyboard Layout, License Agreement, Time Zone, 'root' user pass). 2: fail to start the Rockstor services as Postgresql is not auto started. This looks to be caused by some systems reading their config from the already existing Rockstor install and thus failing to invoke themselves on the fresh install. Fix: re-install with the offending prior Rockstor system drive removed, or wiped.

But this relates only to the system disk. But I’d also detached all data disks prior to a fresh install anyway as then you can be sure to not accidentally install over an existing and required data disk.

Then once you have this new system up and running with only it’s system disk you can power it off and re-attach your poorly 3 disk raid10 pool and we can take a look at that usage command after a command line ro mount. Plus once you have a ro mount via command line you can then import the pool. This may help with ensuring you have all the data copied off if you do end up having to start over with the newer kernel/btrfs.

Hope that helps and hang in there. There are many folks on the forum how have successfully retrieved data/pools after drive failure and in this case I think you have been affected by one of the above mentioned early bugs.

straighttalk86 · March 8, 2021, 11:58am

I was able to update my system to Rockstor 4.0.5. Once that was installed I was able to mount the pool as degraded,rw and was then able to add my new disk. My system is back to working order. thanks for the help!

phillxnet · March 8, 2021, 1:01pm

@straighttalk86 Thanks for the update and I’m glad your now sorted.

Cheers and your welcome.