I’ve had a drive in my raid1 pool looking like it might be on the way out, so I bought a matching replacement but hadn’t put it in yet.
Last night my rockstor crashed and when I plugged a monitor in it was mentioning some read or write errors in /dev/sdb. I rebooted this morning and it seemed ok but I felt it was time to swap that drive. I have been following http://rockstor.com/docs/data_loss.html guide, including a reboot because my device doesn’t allow hot swap.
It’s a fresh drive, so I didn’t wipe it. I’m up to:
In my case, the removed drive was /dev/sdb (new one has same letter) and the existing one from the pool is /dev/sda. The opposite to the docs. So I typed:
btrfs replace start /dev/sdb /dev/sda /mnt2/Red2x3
/dev/sda appears to contain an existing filesystem (btrfs).
ERROR: use the -f option to force overwrite of /dev/sda
Naturally I’m hesitant to force this, as I don’t want it to remove data from the good drive. Can I get a little guidance? The btrfs manual online says “start srcdev targetdev path”. I would have thought srcdev would be the good drive with data, but this is the opposite to the rockstor docs.
I also didn’t do anything like a balance prior to removing the dying drive, should I be worried that some data could be lost or should the good drive have been managed safely?
First, you need to get the ID of the failed disk (Basically, the disk number in the pool). You can use the device ID, but I believe this would be a bad idea if they’re the same.
# btrfs fi show /mnt2/Red2x3
Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 6 FS bytes used 5.47TiB
devid 1 size 1.81TiB used 1.71TiB path /dev/sda3
devid 2 size 1.81TiB used 1.71TiB path /dev/sdb3
devid 3 size 1.82TiB used 1.72TiB path /dev/sdc1
devid 4 size 1.82TiB used 1.72TiB path /dev/sdd1
devid 5 size 2.73TiB used 2.62TiB path /dev/sde1
*** Some devices missing
In the above example (shamelessly ganked from BTRFS wiki) you can see that the filesystem has 6 devices, one of which is missing. We can see devid’s 1-6, so the missing one is 6.
Now we’re ready to start the replacement:
# btrfs replace start 6 /dev/sdb /mnt2/Red2x3
You can monitor the status with:
# btrfs replace status /mnt2/Red2x3
I suggest opening another shell in order to also monitor for damaged files during the replacement:
I missed the part apart about what the devid actually is when I was reading up.
I did a btrfs fi show before removing the disk, since the docs said the take note of device IDs.
#btrfs fi show
Label: ‘Red2x3’ uuid: 00c9d8d3-3a53-47b4-b73c-b216ebeec87d
Total devices 2 FS bytes used 843.30GiB
devid 1 size 2.73TiB used 859.06GiB path /dev/sda
devid 2 size 2.73TiB used 859.06GiB path /dev/sdb
Now the output is as expected from your post.
#btrfs fi show
Label: ‘Red2x3’ uuid: 00c9d8d3-3a53-47b4-b73c-b216ebeec87d
Total devices 2 FS bytes used 843.30GiB
devid 1 size 2.73TiB used 861.06GiB path /dev/sda
*** some devices missing
Thanks very much. The process seems to be working, and I have your dmesg suggestion running in another screen instance. I’m away from home for the next day so it can take its time - 1.6% done
I would like to revive this topis as my current use case is similar to the one described. I have a pool defined as “single” for data (and raid one for metadata) and would like to replace a drive which is about to fail with a different one of larger size. Would it be enough to execute the btrfs replace start ID /dev/new/drive /mount/point and btrfs would take care of the resize or i would need to perform some additional steps?
Hi,
I likely won’t have the answer for you, but could you mention the current version of Rockstor you’re running, and what your pools look like (you mentioned similar scenario, but I am assuming not exactly the same).
Well, it is quite simple actually.
I have a pool od 4x4TB and 2x2TB drives.
The pool config is data=single, metadata=raid1, system=raid1
The reason behind it that I do not care if I lose data but I do care to know the data is corrupted.
One of the 4TB drives is functional but on it’s way out. I have a 8TB drive and want to use the btrfs replace command to remove the failing drive, to insert the new drive and extend the capacity of the pool. Preferably without the need to balance after each step.