No notification on disk failure

Hi there,

as I am evaluating RockStor and currently testing the basic features, I did the following:

  • created a RAID5 on two SATA disks (pool, share and Samba share)
  • copied data onto it
  • now I simulate a disk failure: I unplug disk 2
  • data is still accessible - good!
  • /var/log/messages says what it should say:

BTRFS: bdev /dev/sdb errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
BTRFS: bdev /dev/sdb errs: wr 11, rd 0, flush 0, corrupt 0, gen 0

An email is sent 10 minutes later with a warning from smartd: “Device: /dev/sdb [SAT], unable to open device”

But: I get no notification via Dashboard or instant mail due to btrfs errors.
Only when I go to Storage - Disks I see a trash bin icon with tooltip “Disk is unusable because it is offline.”

I guess it normally should display a warning if there is something wrong with a disk. Am I missing something or is it a bug?

Regards,
maxhq

1 Like

Did you really mean RAID5, as that should have failed to create in the first place seeing as RAID5 requires a Minimum of 3 disks…

It doesn’t matter for btrfs.

“Note that the minimum number of devices required for RAID5 is 2. In
case of a 2 device RAID5 filesystem, one device has data and the other
has parity data. Similarly, for RAID6, the minimum is 3 devices.” from https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Raid_5_and_Raid6

Interesting given that when I tried to remove a Dead (Only dead because I did something stupid to it) from a 3 Disk Raid5 it complained about the minimum level of redundancy would’t be met.

Was it balanced? (extra words here for spam filter)

Nope drive replacement is still ongoing, about 105GB left I think.

Edit

Also @suman it would be nice if we could force the GUI to mount a degraded array since doing it manually doesn’t mount the subvols automatically. (I don’t think unless I missed a Trick) and there might be some data you want to copy off in the meantime.

@Dragon2611 Can you please add two details: 1. Where did it complain about the minimum redundancy level? 2. Was that after a restart or instantly?
Thanks a lot!

It was in the CLI, I don’t remember precisely so BRTFS-progs rather than rockstor.

I think I accidently tried to remove the missing disk before adding the replacement, the docs aren’t exactly stellar on rebuilding an array.

I have just done a similar test. 4 disks in RAID 1 Mirror config. Pull the power and data cord from disk 3. I am able to access all of my data and shares, but did not get any notification of disk failure from the GUI. If I go to the Storage>Disks menu I see the trash can with the same tooltip (“Disk is unusable because it is offline.”)

It has only been a few minutes and I have not performed a balance yet. I am waiting for an e-mail or some form of error message to appear before I continue. Is there any kind of fix in the works for this yet?

1 Like

Yes, disk failure notification feature is one of the top issues and we’ll be working on it soon. @lakshmipathi_g may find your test useful as he’s looking into using udev/pyudev for this.

Great @suman!

I waited a few hours and started a balance. The three remaining disks rebuilt the RAID 1 in a few hours. The status page now reads 100% and zero errors, however my pool size has not changed from when I had four disks.

Before: 4x500GB SATA in one pool. Capacity: 931GB usable
Now: 3x500GB SATA in one pool. Capacity: 931GB usable

How do I get the pool/share capacity to update after balancing?

What does a RAID1 with more than 2 disks mean? Doesn’t the capacity make sense, then?

RAID1 with 2+ disks means that you’ll be always on a working RAID1 for total disks - failed disks > 1 (so with total disk = 4, failed disk simultaneously = 2 your always on a good raid 1, total disk = 2, failed disk = 1…bye bye raid1)

Flyer

On raid 1 you can have 2disks and 1faliure and still be good. On raid 5/6 thigs are a bit different.

Edit: Hot-spare and hot-swap is in the works, so a bit of patience, i think it will be here. Patches are submitted on the mailinglist, so it is to bugfix etc

Oooh, hot-spare/hot-swap should REALLY help a lot. According to the memtest calculator it really reduces the risk inherent to RAID5/6 rebuilds. (Other than the RAID write hole)

1 Like

Here is the link on the mailinglist describing hot-spares etc:
http://www.spinics.net/lists/linux-btrfs/msg48916.html

1 Like

Just subscribed to my stable updates (We got to support right)?

Started testing in a VBox VM with one boot vdisk and two data vdisks and saw the same behavior on raid 1. I hope the notification will be ready soon as this is important.

1 Like

Really appreciate your support! This is an important feature indeed. Hope to put it behind us soon.

1 Like

Hey guys,

I’ve been away from the project for about a year now and I am back considering Rockstor again for a new deployment. The missing disk failure notification feature was one of the reasons I didn’t use Rockstor last time around.

Can anyone update me on this now? Does the current release support disk failure detection & notification?