Advice on disk replacement combined with pool expansion

Flox · June 24, 2020, 7:15pm

As one of the drives in my Raid1 arrays has started showing some errors (most likely age-related), I need to replace it and would like to take the opportunity to expand my storage capacity as well.

Because I see multiple options in how to proceed, I wanted to ask for the community’s feedback based on people’s experience with such procedure. I’ve thus read through several relevant posts in the forum and the Github repositories, but given these span a rather wide period of time, their comments and recommendations may not be accurate anymore due to recent improvements in Rockstor and Btrfs itself.

I thus thought I would lay out the options I have in front of me and see what the consensus would be on each one of them. By providing as much information and resource as possible, I’m hoping this can benefit other users as well.

In this spirit, thanks for any experienced user who would correct any inaccuracy I might write.

Aims and requirements

My current data pool consists of:

Drive A: 3 TB HDD
Drive B: 3 TB HDD

Drives A and B are combined in a single Raid1 pool. Note also that the pool is rather full so I have about ~2.7 TB to deal with.

Unfortunately, Drive A needs to be replaced. In the end, I would thus like to have the following pool (still in Raid1):

Drive B: 3 TB HDD
Drive C: 8 TB HDD
Drive D: 8 TB HDD

I would also like to try doing everything from Rockstor webUI and avoid the command line, as an exercise and test of Rockstor. Overall time to complete the move is paramount, however, so if a cli-approach has a substantial advantage over a webUI-only approach, I’ll pick that.

Finally, this would be conducted once the openSUSE rebase has been completed, meaning it would be running kernel and btrfs versions of Leap 15.2.

Options

Quotas would be disabled prior to any operation.

Because I need to both add and replace disks, there are several strategies combining replacement and addition of a drive. Notably, as I currently only have one SATA port free on my motherboard, I see the following options available to me:

option A:
- Remove Drive A from the pool
- Add Drive C and Drive D to the pool in one go
option B:
- Add Drive C to the pool
- Remove Drive A from the pool
- Add Drive D to the pool
option C:
- Replace Drive A with Drive C in the pool
- Add Drive D to the pool

Comments

Option A

While this option seems straightforward, it would lead to the pool having a single device at the end of the first step, which would thus imply a conversion from Raid1 to single, which is time-consuming and demanding on IO events as well (if I’m correct, at least). Furthermore, it would also require me to switch again from single to Raid1 at the end of step 2, thereby adding even more time and IO wear on the drives. I would thus think option A as the least favorable.

Option B

If my understanding is correct, this procedure would create a balance at the end of each step listed, resulting in a total of 3 balances. A big advantage is that no Raid level conversion would be required.

In details, the procedure would be:

Turn off the machine
Plug in Drive C
Turn on Rockstor and use the “Resize/ReRaid” feature to add Drive C to the Raid1 pool
Monitor progress of the triggered Balance procedure using the “Balance” tab.
Once the balance has completed, use the “Resize/ReRaid” feature to remove Drive A from the pool.
Monitor progress of the triggered Balance procedure using the “Balance” tab.
Once the balance has completed, turn OFF the machine, unplug Drive A, plug in Drive D, turn ON Rockstor, and then use the “Resize/ReRaid” feature to add Drive D to the pool.
Monitor progress of the triggered Balance procedure using the “Balance” tab.
Once the balance has completed, use Rockstor as usual.

Option C

Similar to option B, option C would not require a conversion of Raid level. Furthermore, if my understanding is correct, option C would only imply two balances, an advantage over option B. Nevertheless, it would also not be possible to do entirely through the webUI as the implementation of a disk replacement is not yet implemented (see related Github tracking issue).

Although I still need to test it in a VM, the procedure would be similar to:

Turn off the machine
Plug in Drive C
Turn on Rockstor
Remote open an SSH session and use the btrfs replace start -r <Btrfs-device-id-of-DriveA> /dev/sd<letter-of-DriveC> /mnt2/pool_name.
Monitor status with: btrfs replace status /mnt2/pool_name.
Once completed, resize the filesystem to take advantage of the bigger drive: btrfs fi resize <Btrfs-device-id-of-DriveC>:max /mnt2/pool_name.

Note: here, I’m not sure how disks and pools would look like in Rockstor webUI… still need to test that one.

Once completed, turn OFF the machine, unplug DriveA, plug in Drive D, turn Rockstor ON, and use the “Resize/ReRaid” feature to add Drive D to the pool.
Monitor progress of the triggered Balance procedure using the “Balance” tab.
Once the balance has completed, use Rockstor as usual.

Overall

Between options B and C, it seems to me that the biggest difference lies in how efficient the add_C+remove_A procedure is when compared to replace_A_with_C. As I haven’t tested it yet, that’s something I wonder.

On one hand, the Github issue linked above reads:

N.B. it is generally considered to be a longer process to use replace rather than:
“btrfs dev add” and then “btrfs dev delete”, it might make sense to suggest this course of action in the same UI.

On the other hand–and a year more recent–the following forum post reads:

I think that the general opinion is that a ‘btrfs replace’ is the more preferred, read efficient, method to btrfs dev add, btrfs dev delete, or the other way around.

This is furhter supported by the Btrfs wiki, which reads:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

Replacing failed devices

Using btrfs replace

When you have a device that’s in the process of failing or has failed in a RAID array you should use the btrfs replace command rather than adding a new device and removing the failed one. This is a newer technique that worked for me when adding and deleting devices didn’t however it may be helpful to consult the mailing list of irc channel before attempting recovery.

Interpretations

Based on the information above, the btrfs replace route (option C) now seems to be the preferred method from an efficiency perspective but I’m unsure of how Rockstor would “deal” with it. Due to recent improvements in drive removal and pool attribution, I believe I should be able to remove any physically removed disk that would be detected as detached without too much problem, however.

If the gain in efficiency over option B is not that substantial, though, it may not be worth the additional “hassle”.

As mentioned, I still plan on testing option B and C in a VM and compare the overall time needed to complete each, for instance, but I would appreciate if others had any experience in similar operations as of late.

In advance, thanks!

StephenBrown2 · June 25, 2020, 8:02pm

As I am experiencing some write-errors on one of my disks as well, I’ll be going through the same or similar process soon also. My plan is currently most similar to option B, as I don’t have any SATA ports free, and it’s the options available to me in the Web-UI. Since you have a free port, I’d recommend going with Option C, as Rockstor doesn’t really care what happens as long as btrfs can see the pool, Rockstor should be good to go, especially since you’d be doing the final balance from the Web-UI.

I’m not a maintainer though, so I am interested in what others recommend, and why.

Hooverdan · June 27, 2020, 5:02pm

@Flox, first of all, good luck

based on the thread earlier in the year where I replaced all of my drives, @phillxnet recommended to add one disk first (if I had a port to spare), so that there is enough room for balancing activities. As I was on RAID5/6 that was more important than for other RAID configurations.
You mentioned early in this post that you’re pool is pretty full (2.7TB), so I am wondering whether it would pose a risk if you run out of space during balancing as well. Here’s the caveat that I got from Phil back then

It sounds to me that you would be better off with Option B or C. I essentially did option B for my “all drive” replacement because I didn’t want to fiddle with the “replace” option, but I also had plenty of time (but also bigger drives to replace). At that time (and as you can see from that thread I quoted from) I was already on a “bleeding edge” kernel as well as btrfs tools … but still didn’t trust the replace option (not out of any experience or major testimonials, it just sounded iffy especially for RAID5/6, which you’re not using).

Hooverdan · February 28, 2025, 6:54pm

@Flox, I cam across this when double-checking my approach to replace a suddenly failing disk. I am wondering whether you ever tested all options listed above, or just picked one and went with it?

My current scenario is like @StephenBrown2’s, device decided to be detached and after a reboot showed massive errors. On a RAID10c3, I am currently running the

 btrfs replace start -r /dev/sde /dev/sdf /mnt2/<Pool Name>

action (using the command line) of a 10TB drive with a 14TB. (Effective usage of the drive shows as 7.9TB out of 9.1TB usable). It looks like this will take about 28 hours give or take.

Next steps will be to replace all remaining drives ($$$s) with 14TBs …

Flox · February 28, 2025, 7:19pm

Sorry one of your drives failed… but I’m glad you caught it before it is too too bad.
I have unfortunately not had the time to test that back then and I since have still not tested it… I’m ashamed to admit it.

I believe you have more experience in replacing drives than I do, actually.

Sorry I wasn’t able to help,

Hooverdan · March 1, 2025, 11:54pm

well, it took just under 23 hours to run the replace. Now, it seems a drive from a similar batch (same date of expired warranty) just decided to throw IO errors, so I’m going to replace that one also.

In any case, that was substantially faster than when I did the add/remove last time around. and that was going from 4 to 10TB per drive. Each add/remove probably took 2 days+. I suspect, the improvement over the experience from 5 years ago is probably a combo of better btrfs version/kernel and faster drives, since the rest of the server hardware remained unchanged since then.

Update:
the second disk replacement was run without the -r option, and it completed in under 19 hours (since the disk to be replaced had a very small number of I/O errors reported, I took the risk to use it also as a source disk).