Raid5/6 stability

g6094199 · April 28, 2017, 5:11pm

As i’m following the btrfs kernel list on a daily basis i can say this: dont trust raid5/6! And you shouldn’t for some future kernel releases. Why? Because there are still serious bugs that aren’t even located yet! Some bugs are located and fixed but NOT in for-next or mainline! And only a few are located, fixed and accepted for 4.11 and/or for-next.

Liu Bo (Oracle) and Qu Wenruo (Fujitsu) are here to mention most, imho. Which take much effort bringing Raid5/6 to be stable. With 4.11 we will be able to have a working scrub and a detection (and auto correction, as i remember) on damaged data on file open.

A major problem which i’m waiting to be fixed for 1,5 years now is a working offline scrub (aka “fsck”), cause i lost 12TB while doing a disk replace, where the new disk died while replacing and so the famous “can’t open_ctree” occurred and the fs was gone. Qu has done much work here and has a offline scrubbing function in his tree, which wasn’t reviewed for some months now. There wasn’t much interest from the devs to bring this into the kernel, atm. I expect it to be maybe in 4.14. So don’t hold your breath. Nobody knows what errors get detected after we can recover the fs…I don’t think this thing gets stable this year or the next.

There is still a discussion to mark raid5/6 as unstable and telling this to the user on creation time, which shows the unstable state on the whole code.

But to be positive: There is progress (see and test 4.11)! There are some devs working hard on fixing the problems, but often its hard to find and fix them! Feedback is always welcome! Just join the kernel list and post your problem with the latest mainline! Since Rockstore is using an el-repo kernel, its most of the time on the current stable. Bug reports here will also help!

Personally i’m using a Rockstor NAS as Backup storage at work and on a home NAS. But on productive systems i’m running ZFS which is not that flexible as BTRFS but rock stable. Sadly i won’t recommend ZFS for consumer usage, even when using NAS4free, because its very complex and much work on the cli has to be done to get it personalized to our needs. It also needs a huge ammount of resources which results in higher hw and energy cost.