[Solved] Not sure how to recover from Raid1 drive failure

First off

Hey everyone! I hope everyone is doing well. Still loving Rockstor and still recommending it whenever I can (also hooked my dad up with Rockstor). Sorry that I kind of disappeared on you guys, but you know how that goes. I guess, no news is good news, right? :wink:

Description of the problem

I have a pool containing 2 drives in raid1 configuration. One of the drives failed and the pool is now in read-only mode. I removed the broken drive and added a new one, however, I am unsure on how to rebuild the pool.

What I tried

I went through the documentation but Iā€™m unsure on what to do. I tried making the pool a ā€œsingleā€ pool, but this fails when balancing. According to my research that could mean that there were stripes on the other drive that are now missing.

I tried going through the Re-raid process to add the new drive to the pool, but also this fails because the remaining drive is in read-only mode.

I did read something about having to restart to make the pool read-write again (Iā€™m not clear on when this can be done. When the failing drive has been removed?)

Iā€™m a bit surprised that getting the pool back in order seems to be a tedious process. In my mind it should be as simple as sticking in a new drive and re-replicating the data. Clearly Iā€™m not completely understanding the process or how btrfs works.

Further questions

While doing research I read about raid1c3 and raid1c4. Iā€™m not sure if they are available in Rockstor, but what I am wondering about these raid levels is the following:

  • In raid1 at least 2 copies of the data are kept, allowing 1 drive to fail. In raid1c3, 3 copies of data are kept, so I assume 2 drives are allowed to fail. And by extension raid1c4 would allow for 3 drives to fail. Is this correct?

  • When you put your drives in raid1, but add more drives, what happens to those drives? For example, letā€™s say you set up raid1 with 3 drives, then 2 copies are maintained. What happens with the 3rd drive? Is it just there as hot-spare? Is it used for striping? Or ā€¦ ?

In the logs of my backup I noticed that some files could not be copied because of ā€œInput/Output Errorsā€. So Iā€™m surely looking at some data-loss.

Partly my fault of course, since I didnā€™t have backups before. But thatā€™s why I had raid1, to have a copy of the data in case a drive goes bad. I guess that wasnā€™t enough.

howdy stitch.

Sorry to hear about your hardware failure. This forum post talks about replacing drives What is the process for replacing a bad drive?. TLDR

I believe you should be able to replace the drive in cli

btrfs replace [drive id] [new drive] /mnt2/pool-name

The data that is corrupted you should be able to run scrub and either replace the bad sectors from a snapshot or remove the bad files and replace from a cold storage.

Wiki says that when you place a third drive in raid 1 (mirror) you donā€™t increase the total striped volume of the pool you just add yet another spare of the data that is maintained across the array essentially giving you 2 failure before you lose everything.

1 Like

@stitch10925 Hello again, nice to hear you are still around.
Re:

Yes, btrfs will mostly go read-only on significant issues to avoid cascade failure. In your case, if the drive died, then you have a btrfs-raid1 (2 drive minimum) with only one drive. So the defined by btrfs profile 2 copies across 2 independent drives canā€™t be met. So a read-only stance ensures no more data is written with only one copy. But all original data/metadata now exists only on one drive (Pool member). And as the pool is now read-only; you are unable to modify it: which includes adding a drive to it.

Likely what you need is a degraded mount option: this normally allows for a Pool to be mounted read-write (rw) even when it is in a degraded state (missing a prior member). You should then, via the rw access, be able to use the Resize/ReRaid Pool option within the Poolā€™s detail view. I.e. add a drive to the Pool so that it can again meet itā€™s goal of storing 2 copies of data or metadata on 2 independent drives.

If you are running a more recent Rockstor version (last few years) the Web-UI should advise you on these steps. Take a look at the following doc section, specifically the reference to our Maintenance required Web-UI element:

https://rockstor.com/docs/data_loss.html#pool-degraded-indicators

@KilroyLichking Thanks for chipping in on this one. We should keep in mind that if a btrfs-raid1 Pool has more than 2 drives: it still only keeps 2 copies on two different drives. It can just spread the use of the drives out: but there is still only two copies. Which leads into:

These profiles are basic extension of btrfs-raid1 but the c3 means 3 copies across 3 different drives: again irrispective of actual drive count. And the c4 qualifier similarly means a minimum of 4 copies on 4 different drives, irrispective of total Pool drive count.

Exactly as you have indicated in your synopsis.

Yes.

The add to available space, and spread the read-write load of the Pool.

Yes.

But on every write 2 different drives may be chosen, i.e. write1 to drive 1&2, write2 to drive 1&3. The key here is that btrfs raid is chunk based, not drive based. But is drive aware in order to maintain independent drive use. See our:
The nature of btrfs: Data loss - prevention and recovery in Rockstor ā€” Rockstor documentation

We have an explanation, along with redundancy capabilities in the following doc section:

Redundancy Profiles: Pools ā€” Rockstor documentation

And to answer:

We use to require a Kernel backport as they are newer profiles based on the older btrfs-raid1. But if your install is ā€œBuild on openSUSE Leap 15.6 ā€¦ā€ then it has a new-enough kernel to accomodate. Plus our docs need updating on the following how-to: Installing the Stable Kernel Backport ā€” Rockstor documentation as we have supported btrfs-raid1c3 and btrfs-raid1c4 and many of the sane combinations of mixed raid involving these newer more robust profiles since 4.5.9-1 via:

Regarding:

There is currently no hot (online) spare capability in btrfs: but I think there are moves to introduce this in the future. I think there are development moves in this direction however.

So I think the key to your question re approach to enable rw (to add a drive or change raid level) is the degraded mount option: intended to inforce maintenance attention to what is a degraded situation. I.e. stop all writes as they can no-longer be redundant to the indicated profile: or prior drive count in many cases also. I.e. as @KilroyLichking indicated if a btrfs-raid1 pool has 3 drives, you can loose (full function or entirely) one of the pool members and still mount degraded and remove (Pool knowledge wise) the fault or entirely missing drive. As the remaining members (2 in this case) can still sustain that btrfs-raid profile. But if one runs at the bare minimum drive count: you would have to ReRaid to a lower redundancy. Or add a drive to the Pool to return to the minimum level.

Ergo it is generally advised that Pools in more important settings are not maintained at the minimum drive count for the desired btrfs-raid level: as you then face fewer recovery options when a drive does ultimately fail/become-fault. I.e. you loose the option to just mount degraded and remove the problem drive and be on your way again (post removing the degraded option of course).

So to be clear here: no btrfs-raid1 pool (irrespective of drive count) can handle more than one drive failure. There are, after all, still only 2 copies. Hence the addition of the c2 and c3 variants.

Hope that helps and do ask further as Iā€™ve here just fielded a few of the questions. The Web-UI however should guide you appropriately in such settings. And yes, we still do not have a drive replacement capability within the Web-UI but we can online ReRaid and add/remove Pool members.

4 Likes

Hey @phillxnet,

Thanks for the extensive reply. The Web-UI did indeed advise me on the steps to take. The problem was that I didnā€™t know HOW to take those steps. So I have been doing some research and testing over the last couple of days.

I managed to fix the problemā€¦ or at least, everything seems to be working and files that I could not access correctly anymore are now working again.

Here is what I did:

  1. I removed the physical drive and then removed it from Rockstor
  2. I added a new physical drive to the server
  3. I then rebooted the system after which Rockstor showed me the pool was ā€œunmountedā€
  4. In the UI I went to the newly added disk and told Rockstor to use the whole disk
  5. Via CLI I mounted the pool with the ā€œdegradedā€ option
  6. Using the Rockstor UI I added the new disk to the pool
  7. Using the CLI I then checked ā€œbtrfs filesystem showā€ which showed me 2 drives in the pool (the original first disk and the newly added 2nd disk), but it gave the error ā€œDevice missingā€
  8. In the UI, in the pool configuration, I found a link that allowed me to remove missing devices. I clicked that and the pool was happy again (I found the btrfs command to do the same thing, but I somehow could not get it to work? Luckily I found the UI link)
  9. Rebooted again to check if the Pool would mount by itself again, which it did
  10. Did a scrub and balance which took quite a while, but now everything seems to be up and running

I did have to recreate the NFS and Samba shared for the pool, which was a bit of a bummer. Especially since I used a custom NFS command.

I hope this can be of use for other users in the future. I hope I listed everything correctly since I tried out A LOT of things.

3 Likes

Glad this seemed to have worked for you! Thanks for the recipe, too.

One quick question, since you didnā€™t explicitly call it out:
For steps 1 and 2, did you have the system off, or did you do the swap while the box was up? Mainly asking since youā€™re wrote ā€œrebootā€ in step 3.

1 Like

@Hooverdan

My drives are hot-swappable, so I just pulled it. Since Rockstor marked the drive as ā€œdetachedā€ I didnā€™t think too much about it to be honnest.

So, in short, the system was still running at that point.

2 Likes