Question with btrfs replace for new disk

Hi guys,

I’ve had a drive in my raid1 pool looking like it might be on the way out, so I bought a matching replacement but hadn’t put it in yet.

Last night my rockstor crashed and when I plugged a monitor in it was mentioning some read or write errors in /dev/sdb. I rebooted this morning and it seemed ok but I felt it was time to swap that drive. I have been following http://rockstor.com/docs/data_loss.html guide, including a reboot because my device doesn’t allow hot swap.

It’s a fresh drive, so I didn’t wipe it. I’m up to:

btrfs replace start <devid_of_the_failed_drive> /dev/sdb /mnt2/mypool

In my case, the removed drive was /dev/sdb (new one has same letter) and the existing one from the pool is /dev/sda. The opposite to the docs. So I typed:

btrfs replace start /dev/sdb /dev/sda /mnt2/mypool

The output was:

btrfs replace start /dev/sdb /dev/sda /mnt2/Red2x3
/dev/sda appears to contain an existing filesystem (btrfs).
ERROR: use the -f option to force overwrite of /dev/sda

Naturally I’m hesitant to force this, as I don’t want it to remove data from the good drive. Can I get a little guidance? The btrfs manual online says “start srcdev targetdev path”. I would have thought srcdev would be the good drive with data, but this is the opposite to the rockstor docs.

I also didn’t do anything like a balance prior to removing the dying drive, should I be worried that some data could be lost or should the good drive have been managed safely?

Thanks in advance

Hi @mattyvau,

I think you’re missing the mark on the drive letter assignment in your command.

The btrfs replace command looks like this:

# btrfs replace <failed_disk_id> <disk_to_use_as_replacement> <btrfs_mountpoint>

First, you need to get the ID of the failed disk (Basically, the disk number in the pool). You can use the device ID, but I believe this would be a bad idea if they’re the same.

# btrfs fi show /mnt2/Red2x3
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 6 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sda3
    devid    2 size 1.81TiB used 1.71TiB path /dev/sdb3
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc1
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd1
    devid    5 size 2.73TiB used 2.62TiB path /dev/sde1
    *** Some devices missing

In the above example (shamelessly ganked from BTRFS wiki) you can see that the filesystem has 6 devices, one of which is missing. We can see devid’s 1-6, so the missing one is 6.

Now we’re ready to start the replacement:

# btrfs replace start 6 /dev/sdb /mnt2/Red2x3

You can monitor the status with:

# btrfs replace status /mnt2/Red2x3

I suggest opening another shell in order to also monitor for damaged files during the replacement:

# dmesg -wH | grep BTRFS | grep path | grep -oP "^.*path: \K.*?(?=(\)|$))"

Hope this gets you back on your feet.

2 Likes

Thanks @Haioken,

I missed the part apart about what the devid actually is when I was reading up.

I did a btrfs fi show before removing the disk, since the docs said the take note of device IDs.

#btrfs fi show
Label: ‘Red2x3’ uuid: 00c9d8d3-3a53-47b4-b73c-b216ebeec87d
Total devices 2 FS bytes used 843.30GiB
devid 1 size 2.73TiB used 859.06GiB path /dev/sda
devid 2 size 2.73TiB used 859.06GiB path /dev/sdb

Now the output is as expected from your post.

#btrfs fi show
Label: ‘Red2x3’ uuid: 00c9d8d3-3a53-47b4-b73c-b216ebeec87d
Total devices 2 FS bytes used 843.30GiB
devid 1 size 2.73TiB used 861.06GiB path /dev/sda
*** some devices missing

So from this I gather it would need to use:?

btrfs replace start 2 /dev/sdb /mnt2/Red2x3

Assuming that your new disk is /dev/sdb, that should be correct.

Note that with only 1 disk to read from, your replacement is probably going to take a while.

Thanks very much. The process seems to be working, and I have your dmesg suggestion running in another screen instance. I’m away from home for the next day so it can take its time - 1.6% done :slight_smile:

Congrats, sounds like it’s on it’s way back to health!

1 Like

mark for learn、the trouble shooting very usefull

I would like to revive this topis as my current use case is similar to the one described. I have a pool defined as “single” for data (and raid one for metadata) and would like to replace a drive which is about to fail with a different one of larger size. Would it be enough to execute the btrfs replace start ID /dev/new/drive /mount/point and btrfs would take care of the resize or i would need to perform some additional steps?

Hi,
I likely won’t have the answer for you, but could you mention the current version of Rockstor you’re running, and what your pools look like (you mentioned similar scenario, but I am assuming not exactly the same).

1 Like

Well, it is quite simple actually.
I have a pool od 4x4TB and 2x2TB drives.
The pool config is data=single, metadata=raid1, system=raid1
The reason behind it that I do not care if I lose data but I do care to know the data is corrupted.
One of the 4TB drives is functional but on it’s way out. I have a 8TB drive and want to use the btrfs replace command to remove the failing drive, to insert the new drive and extend the capacity of the pool. Preferably without the need to balance after each step.