Unable to modify the mount options of degraded pool

So I just recently switched my large (120ish TB nas) over to Rockstor after testing it for a handful of months.

I really enjoy everything it has to offer but I am currently running into my first major issue.

I have a pool that is running raid5c1 and it is my biggest pool. It experienced a hard drive failure, as one does. The good thing is that the pool didn’t have any issue picking up the slack and I have been running in parity.

The bad starts right after that. This failure occurred at the beginning of the month and I am still running in parity 11 days later as I just cannot get the missing disk properly rebuilt on a new drive. All this time in parity has caused lots of corruption to occur and now the house of cards is crumbling.

Today I woke up to something like 20k read IO/errors on 2 of the drives in my pool. I rebooted the system and tried to mount the pool in ro mode so I could work on getting something done with this missing drive, but now I can’t get into ro mode.

Below is the rather helpful screenshot I am getting

I can recreate the issue at will but nothing pops up in the system logs related to this. I am really struggling to right now. Luckily there isn’t anything “critical” on here, that is all on a raid 1 and backed up offsite. But that doesn’t make the current experience any less painful or frustrating.

I have therefore 3 major questions

  1. How do I get this pool mounted in readonly mode?
  2. What is the correct way to deal with replacing a drive that dies suddenly? There has to be a better way that doesn’t take weeks
  3. (Not mentioned yet), what is the correct way to perform scrubs?

I am kinda lumping things together here, but scrubs just don’t happen on this pool which is likely a contributing factor as to why I am where I am. They take absolutely forever. I am running all WD Red Pros (except for 2 Seagate IronWolf drives), so there is no SMR shenanigans going on.

Below is a screenshot showing the pain, you can actually see in the speed when content began ending up on this pool.

I acknowledge the scrub speed issue is a btrfs raid5 thing but hopefully I can get some guidance on it as well.

Edit:

I should add, I did what was advised in the screenshot (stopping rockstor, unmount and restarting rockstor) and the only thing that did is remount the pool in rw which is not the desired outcome here

I’ve come back to the server this morning to see that it has mostly stabilized. It still has alot of corruption errors and one disk is complaining about read io errors, but for whatever reason, it let me change it to ro mode now. I didn’t do anything between posting this thread and posting this reply so I don’t know why it suddenly worked, but I will take it.

I do still have points 2 and 3 to address though

2 Likes

Sorry to hear you’ve run into trouble.

to see whether you get more/new errors you could reset the stats. You already know where the IO errors have shown up (based on your screenshot) so that might help to understand whether your system is behaving more nicely now.

For

Not sure, whether you have used the “replace” function or the add and then delete process? In general, the replace approach is considered faster than an add/delete approach, but that’s of course often times restricted due to number of available connections to have to add a disk without physically removing one.

My experience has been that, when close to capacity adding a disk first and then perform a removal. That was my approach a few years ago (albeit on the CentOS based version) and it took two weeks to replace every disk (from smaller to larger capacity), incidentally on a RAID5/6 setup. I did move to a raid10-1c3 setup (in place conversion which took a while, too, but went fine) and have subsequently used the replace disk option.

In your case, since the disk is already dead and you’ve added a replacement drive (if I understand correctly), you might need to try to perform the removal using the command line.

As for:

When on RAID5/6 I ran weekly scrubs scheduled from the WebUI. Unfortunately, don’t have an records of the speed (though it was not fast, that I do remember).
Currently, running newish Ironwolf, I get 725.70MiB/s which I replaced earlier this year (after running those Reds for about six years) where I usually got around 455.50MiB/s, but that’s a combo of newer drives as well as using raid10-1c3.
However my former RAID5/6 array was way smaller than yours at the time, When the scrub starts it’s usually in parallel, since the scrub is performed essentially by device (of that pool). There are cross connections if checksums don’t add up, etc. However, you want to do this during times where there’s not massive utilization by users required. I think the btrfs development recommends monthly or less. Some time ago I switched to monthly, but not because of some major wisdom I have discovered.

There are other forum members that are still using RAID5/6 setups, hopefully they can provide some insights, too.

The scrub is implemented in the WebUI by pool (i.e. all devices start scrubbing away at the same time). You could experiment with the command line and only do one device at a time and stagger that over time, that might improve the performance, but I have no evidence, so this would be a experiment.

https://btrfs.readthedocs.io/en/latest/btrfs-scrub.html#man-scrub-start

Another question, does your system use ECC RAM? If not, some of the other errors could also be related to bitrot in conjunction with memory issues (though, granted that’s not an everyday occurrence).

To start, thank you :slight_smile:
My stats will be reset later tonight when I pull the system down to add my replacement drive(s) so I will have a clear look at the pain after that

I tried to replace the drive when it reported full failure and I received an IO error immediately. I then removed the drive (physically turning off the machine and unplugging it) and attempted to do a remove after adding a new drive to the pool. That is what I have been waiting on for the past near 2 weeks to complete. I suspect its doing a full rebalance in the process of removing the dead drive. I am going to try just doing a replace after I get the new drives in place, though by my math that is still going to take an unacceptable 1 month to process.

Something seems wildly amiss here and I don’t really know what. Moving bits should not take nearly that long but it does for some reason.

To the points about scrubs, I have scrubs running on a raid1 and raid0 pool (of about 15Tb and 4Tb respectively) and those complete quite fast. Something around 4 hours for the 15Tb pool (which is apparently also experiencing issues with errors now) and 20 minutes for the 4Tb (though tbf, that is raid0 NVME).

Meanwhile, I cannot get a scrub to complete on my fat raid5 due to the time it takes to complete.

Lastly on ECC. Yes, though it is unbuffered IIRC (I am unsure how to check off the top of my head)

I know that raid5/6 is different than the other raids but I just can’t understand why it would process everything so much slower. I might need some help in checking my configuration and seeing if I need to change something?

if I remember correctly, if the device is missing (i.e. detached, physically) you want to mount the pool in degraded state (rw I would think, too) to then execute the device removal on the command line.
I think if one is already running it would tell you that, too.

And the unbuffered ECC I think is the usual type anyway, so at least the likelihood of memory error during I/O operations is less

Well, I apparently did do a replace when I tried to swap the drives around. It was apparently at 2.1% complete. I know this because I shut the server down, added my new drives and tried to do a replace of the missing drive over to one of the new ones and it says its still running. What is it replace over? I have no idea. Can I cancel it? Nope, cancel does nothing. Nor does a restart. So this is fun :upside_down_face:

So, for my understanding, you tried to

you mounted the pool degraded like this (rw, not ro)

mount -o degraded /mnt2/mediapool

and then tried to cancel using

btrfs replace cancel /mnt2/mediapool

and that did not work?

And for a remove instead of a replace (since the device is now “detached”) you would then use (still mounted as degraded)

btrfs remove missing /mnt2/mediapool

but of course you cannot do that while the replace action is not canceled…

I think I am finally getting this all sorted out.

I am back where I was at the start of all this, with a pool with a missing disk, slowly corrupting.

I did manage to get the pool into ro mode, and then was surprised pikachu when I couldn’t cancel the replace. I was unable to get the pool back into rw mode, it was tossing some weird “cannot enable rw for Raid56, please load module” garbage but after I another reboot, it mounted in rw properly and my replace was able to be canceled.

My new drive is here, I am going to attempt a proper replace with it. There is 9.5Tb on there. Here’s to hoping it finishes before I die

I appreciate your help so far, even if most of it has been just me stumbling around aimlessly

3 Likes

So for anyone else that comes along looking at performance issues with btrfs raid5, turns out that scrubbing large pools is taxing on the disks.

My performance issues were directly related to my SAS breakout board overheating. It took me quite a while to track this down, but this morning I noticed that all my pools in my nas were slow as fuck. Didn’t matter what pool, everything was slow. There was no resource contention, so after digging into it, it turned out that my iowait for pretty much all drives was between 95% and 99%. Which is… awful lol.

The reason is, my SAS breakout board was overheating. Upon checking, the air around my MOBO was around 85F, the air around my hard drives was around 90F, and the air around my breakout boards was around 100F. The heatsinks on my Breakout boards themselves were 115F and 155F (yes, 155), so it was throttling everything under the sun to try and cool off. Stopping my replace helped the board cool down but of course that isn’t a long term solution.

I ordered some fans, opened the machine up and got a big box fan running on it right now. My replace is still moving slowly, currently at 1.6% but given it took a day to move from 1.2 to 1.3%, I would say its progress. My iowait is currently 15ish% across all drives. My scrubs are actually using raid now (my raid1 scrub ran around 450Mb/s which is so much better than before), and generally everything seems more stable.

TLDR: Computers don’t like heat.

5 Likes