BTRFS RAID Parity beyond RAID5/6

I am going to repost from other threads/forums.

http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/#comment-784446

So, just to summarise the above post;
Fixes for BTFRS and Linux kernel to support additional parity, up to 6 parity.
Patches here-> http://lwn.net/Articles/588106/

This would really let rockstor shine in the NAS domain with much higher levels of RAID protection. This is especially good as drives approach 10TB and beyond. Else the only real option is RAID1+0 to ensure drive rebuilds dont become excessivly long and potentially dangerous.

Designated hot spares would also be useful.

Cheers

@GIDDION Thanks for linking to this threat from the related:-
http://forum.rockstor.com/t/question-about-raid5-6-stability/56

I suggest that future contributors to this thread consider the above linked thread as well, especially given the linked threat has more history and currently greater engagement; but does primarily address RAID 5/6 and not beyond as this thread does.

Sorry if this has already been addressed, but as of end of July 2016 the RAID56 functionality of btrfs is effectively labelled “do not use” (see https://btrfs.wiki.kernel.org/index.php/RAID56, https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg55161.html)

Moreover, it doesnt appear that there is any actual effort on developing this functionality, as the primary development sponsor (Facebook) does not require it for their use case.

The question to the Rockstor devs is, therefore- what is your strategy to provide single box parity raid functionality? perhaps a GUI integration for megacli? or maybe offer ZFS as an option?

zfs is an option … just the legality of it is questionable. But it would address a lot of people problems. (Thou let’s not forget that zfs raidZ (5,6, and this triple parity thingy) does not allow single disk addition to the pool :smiley: so the “normal people” that can afford to buy one disk in few months are f**** :* good luck !

Also, 10TB disks are different animal all together. Due to write technology used on those, they don’t write in single sector type of fashion but in “fields” where updating part of data overwrites other part of data, so update have to read a massive amount of data (2MB and more !!!) then write it down in once.

http://www.anandtech.com/show/7290/seagate-to-ship-5tb-hdd-in-2014-using-shingled-magnetic-recording

So if you have a non sequential IO (gosh, let’s say a metadata update) you will “seek the disk to death”. Seagate doesn’t care because on paper their hdd looks blazing fast, in reality you get a punishing seek performance. Of course btrfs 30second in memory data check pointing helps but still far from ideal. Zfs has a solution for it with ssd cache - but there are people poping up complaining that failed SSD destroyed whole array ( not to say that bcache and btrfs are any more stable / reliable … their not … I would not use this combo for holding my pron backup :smiley: ) Again FUN !

Those 10TB disks are designed for “cold storage”, where you don’t modify your data at all, just add to it. Btrfs (AND ZFS) was designed for people that actually use their storage & modify stuff - that’s why there are rockon’s here that work pretty well (freenas has their chroot jails and those deliver too !). Imagine putting a owncloud on one of those 10TB disk ? OC got mysql backing store - this pushes data down to disk in very random pattern, plenty of it will get hidden in extents due to small size … this results in more random pattern … FUN !

Somebody did a fantastic comparison on freenas page which clearly points out how many active developers are there in each project ( I think for btrfs it was 2 :smiley: ). So if RAID 5 & 6 is such a deal breaker for you guys … come up with something ! There is a full proof written tutorial how to contribute some code (even I could follow it), help to get some of burning issues out of the way and mesh up something that allows simple pool creation in ZFS and we’re golden ! We may even convince all freenas people to come to us and abandon their project, right? ( could not resists some good old sarcasm :slight_smile: )

Hi all,
I was having nice coding over Rockstor issues when looking to this forum found out 3 different threads all about same f…antastic topic:

:see_no_evil: :hear_no_evil: :speak_no_evil: Btrfs RAID 5/6 :see_no_evil: :hear_no_evil: :speak_no_evil:

Some considerations/questions over btrfs & links on page:

  • Does btrfs suffer write hole?? Yes, for sure. So let’s go with the old ext4 and build raid5 without panicing…no I was kidding, ext4 may have write holes too.
  • Btrfs kernel wiki page changes: added big extra red box with another alert about unstable raid 5/6 after a parity problem on july 2016 ( data switched over disks / possible recovered-not really recovered data )
  • Have you checked all mail-archive messages till the end?? (last update on 21 sep about checksum and parity matters / to be solved in user space and kernel space or user space before kernel, etc etc etc)…so, some cool guys care about it.

Last but not least, because i think lot of people mix different things:
first time I explained raid to my mother ( a long time ago, maybe I was 19-20 ) she said “Ohh cool, it’s like having 2 copies of the same book in my bookcase! If one get lost there’s always another one” (Yep, my mother is an avid reader and accidentally happened to have 2 copies of the same book :sweat_smile:)
My answer : “You’re right, but you’re wrong too: if the bookcase goes on fire you have 0 copies, because they were all in the same place”

What does it mean?? Raid1 Raid5 Raid6, parity or not, Raid is not Backup (I think it’s better to remember it)

Please, if interested in btrfs raid 5/6 consider using one single thread

Flyer/Mirko

P.S.: Talking about ZFS cool raid5/6 raidz / raidz2 : hope no one will ever have to enjoy disk resilvering, but if it happens keep calm and have a big5/6 mug of coffee…