Pool not working after HDD crash

One of my 4 device raid5 pool crashed by hardware and isn’t even recognised in bios anymore.

So i removed the entry from the disk-screen in the web-ui.
But the pool doesn’t “start” or mount. Here is some output:

[root@vm-nas videos]# btrfs fi show
Label: ‘rockstor_rockstor’ uuid: c1ebf744-1014-4ec6-9ba7-691ff20897ad
Total devices 1 FS bytes used 1.75GiB
devid 1 size 13.91GiB used 5.04GiB path /dev/sda3

warning, device 3 is missing
checksum verify failed on 207500640256 found 0F5FF06C wanted 0BD2F649
checksum verify failed on 207500640256 found 0F5FF06C wanted 0BD2F649
bytenr mismatch, want=207500640256, have=9210045573432974129
ERROR: cannot read chunk root
Label: ‘storage’ uuid: 0af818cf-40ea-4770-b396-e041369b094d
Total devices 4 FS bytes used 895.86GiB
devid 1 size 1.82TiB used 310.38GiB path /dev/sdb
devid 2 size 1.82TiB used 310.38GiB path /dev/sdc
devid 4 size 1.82TiB used 310.38GiB path /dev/sdd
*** Some devices missing


So devid 3 is missing. But how to remove it and start the raid up again? Or is there no chance and my data is gone?
3 of the 4 devices are still working as expected…

sda = boot device (not part of “storage” pool)
sdb-sde = storage pool … one of them is now in hardware heaven

one try:
btrfs device delete missing /mnt2/bilder/ results in:
ERROR: error removing device ‘missing’: no missing devices found to remove

Please help as there are some important files there (Yeah i know backup … but it’s some days old)

@skeal Welcome to the Rockstor community.

Sorry to hear of your drive passing on. I’m afraid I can’t go through this with you currently as it’s rather poor timing for me (as it is for you by the sounds of it) but I believe there is a known issue / quirk with raid5/6 on replacing devices, it may be that this affects your setup. I believe the recommendation for raid5/6 is to first add an additional drive and then remove the missing drive. This procedure is described in Rockstor official docs and specifically in Data loss Prevention and Recovery in RAID5/6 Pools. But take particular care as the first degraded mount is the only one where you may get write access, subsequent mounts may only be read only which leaves you with no option to repair but only to retrieve the data and consequently you will there after have to rebuild the array and then restore the data.

Also note that btrfs parity raid levels (raid5 and raid6) have recently been re-classified as not production ready. So if you do end up having to rebuild I’d go for raid1 or raid10.

Hope it works out and maybe someone else on the forum can jump in and advise as I know there are other forum members who have successfully repaired their raid 5/6 pools.

When I run into things like this I usually check the btrfs wiki FAQ and work from there. They have a section on replacing failed drives. Using Btrfs with Multiple Devices - btrfs Wiki Although I won’t claim to fully understand this I will mention it. There appears to be a pretty nasty problem that was found somewhat recently with BTRFS RAID 5/6 setups where replacing a drive would do unpleasant things to you data checksums. Someone with more knowledge may be able to chime in. Here’s a link the the Rockstor forum that may have some more information as well. Btrfs Raid 5/6 warning

1 Like

So you mean classic Raid 1 where 4 2TB drives result in 2 TB space? This is no option :slight_smile:

Raid 10 would also result in 2TB … This is not the right way in a private setup i think. Or i do 2 pools with 2 Raid 1 resulting in 4TB … also not nice but a little better.

@skeal

I agree, however btrfs raid1 is actually 2 way not n-way like mdadm raid. There are plans to expand it’s capability to specify the number of copies but given it’s currently only 2 way you would get half of the total disk space for storage; roughly.
So (4 X 2 TB drives) / 2 gives you 4 TB usable space but in a single pool.

There is a section on Redundancy profiles, btrfs raid levels, in the Rockstor official documentation on the Pools page.

Hope that helps, also note that btrfs can convert a pools raid level ‘on the fly’, though this often takes a long time depending on the raid levels involved and the amount of data stored.

Ok thank you. I’ve ordered 2 new NAS-HDDs with 3 TB. The other drives also aren’t fresh anymore… So better use new ones and the others for a not important pool like tv-recordings etc.

Then I also will try to restore the pool with a replacement drive like said in the documentation… Hopefully it works :slight_smile:

I generally use RAID10, I wouldn’t use RAID5/6 without a UPS that would gently power off. The risk on RAID5/6 is power off events. This is the reason why RAID cards have a Battery Backup Cache. If you don’t have a UPS that can gentle power off the unit, I would NOT use RAID5/6.

RAID10 with 4 3TB drives should give you 6TB (i have this in my system) I also have 6 2TB drives in the other pool i use.

I do have a UPS but after switching to new OS i sadly haven’t integrated it yet.

So then i use Raid1 and when using 2x3TB and 2x2TB how much space is available?

Tomorrow the new drives arrive and i try to recover files… Is there a chance to get the data directly by mounting a disk directly? This would be enough. No need for a working pool anymore then.

And … I know read the f*** manual but… I don’t get it:
I had sda(n) for basic rockstor system. And sdb-sde for my storage pool. One of these has now gone with the hardware error. How do I do https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices correctly then with the new drive?

mount -o degraded /dev/sdb /mnt2
btrfs replace start 3 /dev/sde /mnt2
btrfs replace status /mnt2

like this?