What is the process for replacing a bad drive?

clink · March 28, 2021, 9:28pm

Turns out the spare drive has a bad sector or two. Not enough for SMART to say it is bad but it is causing BTRFS to have a cow.

It is not a big deal as it is an older drive and I have a new 4Tb on order. Im just wondering what the procedure would be to replace it and have rockstor/BTRFS migrate the data.

phillxnet · March 29, 2021, 10:35am

@clink Hello again.

To help folks in addressing this thread in isolation could you copy in the info for the pool concerned. I.e. is it the system drive or a member in a multi drive pool for example. I know you have other forum threads that likely have this info but no harm in having a repeat hear and helps folks chip in who are more fly by readers of the forum. The output of the following commands should help to inform folks of your btrfs setup:

btrfs fi show

and

btrfs fi usage /mnt2/pool-name-here

In short if the pool is a multi dev one and it’s dev count including this suspect device is at the minimum of it’s btrfs raid profile then you can, via Rockstor Web-UI either:

Add a new member, then when the resulting balance is complete remove the suspect member.
Reduce the raid level to a point where the suspect device can be removed without resorting to a degraded mount option.

In both cases there will be stress applied to the ‘suspect’ device. A further command line only option is to use the as yet unsupported btrfs option of replace. See the following issue we have open for some notes on this btrfs capability:

github.com/rockstor/rockstor-core

Implement a disk replacement UI

opened 05:31PM - 10 Jan 17 UTC

phillxnet

Thanks to maxhq in the following forum thread for highlighting this issue. At t…imes it is desired to replace an existing disk with another one, ie the function of: btrfs replace start devid /dev/sdX /mnt2/pool-name A user level interface for this process would make for a nice improvement. We should ensure to allow for the -r switch where reads from the source device are only carried out if not other zero-defect mirror exists (recommended for poorly devices). See: https://btrfs.readthedocs.io/en/latest/btrfs-replace.html The status of this or other current replace operations could be shown in a tab on the same page, akin to the balance tab in pools maybe. The current status can be had via: btrfs replace status /mnt2/pool-name N.B. it is generally considered to be a longer process to use replace rather than: "btrfs dev add" and then "btrfs dev delete", it might make sense to suggest this course of action in the same UI. https://forum.rockstor.com/t/problems-with-disk-replacement/2660

As indicated in that issue we do already have a Web-UI placeholder for this. I’m not entirely certain what the UI will make of this going on underneath it but once it’s completed it should self correct. But it will likely be a little confused while the command line replace action is in-progress.

But in a multi dev pool scenario the result is a transparent transition of the outgoing drives date to the rest of the pool, within the remit of the btrfs raid level at the time. The pools will also remain accessible during this transition, all-be-it at reduces performance as it will likely be very busy fulfilling the data migration/re-raid operations.

Hope that helps.

clink · March 29, 2021, 10:56am

Thanks Phill,

I was just looking for a general overview and this answered my question.

The reason I just needed the general overview is the bad drive is the one I am using for testing transfer speeds and only contains copies of data that exist elsewhere. I also set up a container running BackupPC to backup my non-cloud based systems and Im using that drive as the storage for that test.
So I can loose it and there is no issue. I just thought it would be a good chance to work with and test the replacement process in the UI if there was one.

In this case it is a single disk pool.

nas:~ # btrfs fi show
Label: 'plat-cloud-backup'  uuid: 156482f2-05b9-46c7-8b0e-98cd72370e52
	Total devices 1 FS bytes used 613.12GiB
	devid    1 size 2.73TiB used 616.06GiB path /dev/sdd

nas:~ # btrfs fi usage /mnt2/plat-cloud-backup/
Overall:
    Device size:		   2.73TiB
    Device allocated:		 616.06GiB
    Device unallocated:		   2.13TiB
    Device missing:		     0.00B
    Used:			 615.06GiB
    Free (estimated):		   2.13TiB	(min: 1.06TiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)

Data,single: Size:612.00GiB, Used:611.19GiB
   /dev/sdd	 612.00GiB

Metadata,DUP: Size:2.00GiB, Used:1.93GiB
   /dev/sdd	   4.00GiB

System,DUP: Size:32.00MiB, Used:112.00KiB
   /dev/sdd	  64.00MiB

Unallocated:
   /dev/sdd	   2.13TiB

phillxnet · March 29, 2021, 1:06pm

@clink

OK, so especially given the data is not critical, your could simply add a disk to this pool, maintaining it’s single btrfs raid status (dup metadata) and then once that is finished and you have a 2 disk single btrfs raid level pool, remove from that pool the suspect device. There by, some what long windedly, maintain the pool arrangement / configuration yet transitioning it’s substrate: backing device in this case . And this would also give you a feel for how Rockstor represents the interim stages. Slightly risky as you have a known failing disk and are not simultaneously moving to a redundant raid scenario, but still, doable in the context of throw away data. And hopefully there will be no new occurrences of bad sectors that might otherwise kill a pool in a non redundant raid arrangement. Options are necessarily reduced when working at the lowest dev count limit so always best to attempt to achieve at least a +1 dev count above the btrfs raid lower dev count limit as you then circumvent the need to address both redundancy and raid level. But a single disk can only be (non degraded) a btrfs single raid level anyway.

Hope that helps.

clink · April 2, 2021, 6:31pm

The above does not work. When you add a drive to the set it tries to rebalance. The rebalance fails due to the errors on the original drive.

I had to back out the add drive then do a

btrfs replace [drive id] [new drive] /mnt2/pool-name

from the command line to get it to replace the drive with the good one.

Mite be worth adding that function to the web gui.

Note: I replaced it with a bigger drive and BTRFS does not see the bigger drive untill you resize it to max with the command.

btrfs fi resize [id]:max [btrfs pool mount]

phillxnet · April 3, 2021, 12:58pm

@clink Thanks for the feedback.
Re:

Yes, ‘errors’ can cover a variety of situations; some will be blockers and other not. But the balance enacted on adding a device is not critical to it’s addition to the pool, in a add then remove scenario. Always tricky if there are errors given the can affect the pool in so many ways.

Thanks for sharing your findings and your eventual work around, much appreciated and can serve as a guide for others facing similar dead ends re options within the Web-UI.

Definitely. Fancy putting in a pull request? Unfortunately it’s a non trivial feature that requires extensive testing across all the various drive / btrfs raid level permutations so will take quite the attention to get sorted. However from your report it looks like the the Web-UI ended up sorting itself out in the end post the replace command line intervention. So that’s good to know.

Glad you’ve got this sorted and thanks again for the feedback. We are on shifting sands with all our dependencies and btrfs is constantly being updated and poorly pools are very difficult to predict re capabilities. The balance enacted post Web-UI drive addition is non critical. If it fails it will generally mean the data is just not spread out evenly. But the ‘internal’ balance enacted on a drive removal is quite a different animal, it is auto initiated by the filesystem itself (with add drive we just run a balance afterwards as that is what most folks expect). And this integrated drive removal balance also doesn’t show up in btrfs balance status commands. We infer it’s function to try and present it similarly within the Web-UI.

Re re-size required after replace, yes that’s a regular question on the btrfs mailing list. Known behaviour currently and when we do implement a replace function we should either just do this or do a tick to ask for it or something.

Hope that helps and thanks again for the report.

clink · April 3, 2021, 1:07pm

@phillxnet
I just opened up a bug report on the git repo. Seems when the drive is replaced and the errors cleared, it does not clear the web interface DEV Error. So it looks like the NAS still has a bad pool after it is fixed.

As to putting in a pull request, Im not a programmer. I work as a Sr Cloud Infrastructure Engineer. While I can program in a small handful of languages, it is just hack coding to get the job done. lol

Note: Part of the reason I am posting stuff like this is to document it so others who run into the issue can find a quicker solution to the problem. Just helping with the documentation so to speak.

phillxnet · April 3, 2021, 1:26pm

@clink
Re:

Excellent and much appreciated. I’ve responded to the issue on GitHub. And it’s hopefully influenced the creation of a couple of others. All good stuff and thanks for your diligence in all of this, the docs (like all other bits) are constantly in need of attention but we have had quite the community contribution there so that’s fantastic. But as always, always much to do.

Bit by bit and thanks again for sharing your findings work arounds.

Tex1954 · February 10, 2022, 3:23pm

I have yet to successfully replace a “DEAD” drive in either a Raid-0 or Raid-10 configuration. My only resort has been to reinstall Rockstor, import the pools I want to keep then recreate the bad pool with the replaced drive. It works, but takes hours to copy from backups.

Would like to see some work on this!

And thanks for the command line method, I may try that next time.

Cheers!

(Buy Rockstor Case Badges https://www.ebay.com/itm/185283692081 )