Question about RAID5/6 stability

Hello, I have been following Rockstor for a couple of months with great interest.


Now I have purchased a second hand rackmount server to hold some left over hdd’s and I want to use Rockstor on it.
My server is an IBM / Xyratex HS-1235e, which is a rebranded Intel SSR212MC2 reference design.
I installed Rockstor on it and aside from a few small issues (for which I will open a new discussion) it is working great.

This server has 12 3.5" bays and I have 10 2tb disks. I would like to use raid 6, but btfrs raid ⅚ is still marked as being unstable.
What are the experiences with raid ⅚ on rockstor? Is it stable enough for home office use?
Will the raid survive one or two disk failures? 
Or would I be beter off using raid 10 or 1?

Thanks for your interest in Rockstor and welcome to our community! Feel free to participate here and on our github, your feedback and contributions are very important to improving Rockstor.

You are right about issues with raid5/6. I am looking forward to 3.19 kernel where things might improve. But right now, we don’t test it much. Sorry at this point, I can’t give you any comfort with respect to raid5/6.

I use Rockstor extensively, so do some other users and I use/stress-test raid0, raid1 and raid10 for my needs and haven’t lost any data in the last couple of years!

If raid5/6 support does improve in 3.19(expected in about a month), you may be able to convert your raid1 pools to raid5/6. But as they say, restrictions apply

Thanks for your reply.
Raid 1 it is going to be! (once I get the disks unstuck)

Can someone provide an update on the status here?

The BTRFS wiki hasnt been updated since Feb and it refers to v3.8. I notice the latest rockstor is running v4.
Has the write hole issues been resolved?

I think my post here is along these lines, it could be combined and possibly moved to features.
http://forum.rockstor.com/t/btrfs-raid-parity-beyond-raid5-6/154/1

Hi Rob

From a feature point of view, the RAID 5/6 support is complete, meaning that the long missing features drive replacement and scrubbing were added in Kernel 3.19. The latest Rockstor version as of today uses kernel 4.0.2, thus having the above features included.

Is it well tested and bullet proof? No and RAID 5/6 fixes will arrive with the next couple of kernel versions.

The RAID 5/6 write hole is not fixed, btw as this is a fundamental problem in the way RAID 5/6 is currently implemented.

I personally switched to RAID 5 with my 10 TB NAS for home use. Let’s see how this turns out, until now I don’t have any problems. I have a full backup of my data on another machine and an additional offsite backup of my most crucial data so it would be just an annoyance if the NAS completely screws up. And I don’t use it for business…

Having a backup is crucial anyways, as RAID is NOT a backup! Software is never fault free and other things such as flooding, fire or lightning can completely destroy all data.

Regarding the patch you mentioned: Rockstor does not have the manpower to maintain their own kernel versions, and I personally think they should not apply any patches to the kernel. How can you guarantee the patch does not break anything? To me, this is a no-go.

To sum up, I belive you should ask yourself if you can life with a small chance of data loss. If yes try RAID 5/6 and give some feedback here in the forum. If data integrity is crucial to you (business data, …), then I’d wait at least another couple of kernel versions and go with RAID 1, as this is considered stable by the BTRFS guys (http://www.phoronix.com/scan.php?page=news_item&px=MTgzMzM). And dont’s forget to schedule a regular backup.

2 Likes

I totally agree here. We ship with elrepo kernel and that’s as far as we are willing to go in terms of providing a bleeding edge kernel. Given that BTRFS is still rapidly evolving, it would take a real kernel programmer or more to maintain a separate fork and keep up to date. Having said that, I do like this feature and hope for it to be part of BTRFS some day.

I’ve been testing 4.1 for a couple of weeks now which includes a bunch of improvements, and planning to update Rockstor from 4.0.2(current) to 4.1 soon. Looks like a lot of good changes(eg: quota enhancements) are going into 4.2. I am really looking forward to them!

tl;dr: Raid 5/6 definitely still has its problems. Over the past week I had a drive fail in a raid6 pool and it ultimately took out the entire pool. At least with 4.0.2, the implementation still has a hard time dealing with drive failure. Regular scrubs are vital and monitoring your hdd health is vital if you’re going to run raid56 right now. I added a .forward file to the /root directory with my email address so I now get emails on any errors that trigger mail to root which happens to include errors with drives that make it in to dmesg.

What I’ve learned though is if you have a drive problem, unmount the pool, kick out the drive, then remount degraded and begin the repair process. If you do this everything seems to work fine. Rockstor does not yet have a way to do this process in the GUI, however it also wouldn’t be necessary with a fully or at least mature btrfs raid56 implementation so I wouldn’t consider it a gap in the product.

In my particular circumstance the failing drive was dropping out and then coming back online. The problem is when it came back online, it was not recognized to be out of sync and requiring a rebuild. It was scoooped back up no questions asked. The inconsistency of the drive was incorporated in to the filesystem, with no way for the filesystem to know what was right and what was wrong. This ultimately led to data corruption of the entire filesystem and a c_tree that could no longer be read and could not be recovered.

@seijirou Thanks for testing and thorough explanation. Things are looking noticeably better in 4.1 in general. Rockstor will go to 4.1 soon.

Is this something you can reproduce on demand to test a different kernel?

@suman

I don’t know but it is my intention to find out. I’m going through a badblocks sweep of 7 drives right now, excluding the one I know to spectacularly fail and drop off at about 740GB through a dd read (1.5T capacity). I plan on building a new raid6 pool out of these drives and attempting to recreate the problem in a more controlled fashion. I will share what I can find on the btrfs mailing list and here as well.

I can say that 4.1.1-1 as well as the 4.1 progs were unable to fix the problem but it may be that on 4.1+ kernel and/or progs the problem couldn’t/wouldn’t happen in a manner that’s then uncorrectable.
If I can make any headway in reproducing the problem on 4.0.2 I will document everything I can find, and then start over on 4.1.1-1 and see if it’s different.

I am just migrating all my data to a RAID5 pool and wanted to ask about the status of raid5 on btfrs. I know, I should have done this before my migration started but I am confident anyhow.

I also wanted to ask about raid5 with different disk sizes, any experience? I wann use 3x3TB and one 500GB disk which is available and why not use it :wink:

I also wanted to aks, I this topic wouldn’t be better suited for the BTRFS subforum?

I am running my RockStor with a 7 disk RAID6 setup.

It has been pretty solid, and even recovered from a pretty bad crash while removing a disk preparing for it to be replaced.

My setup is 4x1.5 TiB (1.4 TiB reported by Rockstor) and 3x2 TiB (1.8 TiB), and this gives me 7.85TiB available space.
It seems BTRFS just distributes data on available disks, making sure there’s sufficient parity. What rules it follows I dont know, but I do tend to see that less data is being put onto the smaller disks as they start to fill up, especially after running a balance (which by the way takes forever).

My experience is that RAID56 is pretty solid, and all the tools for adding/removing/replacing disks work fine on the command line.
Much of the functionality is not present in the WEB-UI for RAID56 yet, they are probably waiting for the 4.2 kernel release, as its supposed to be even better with RAID56 setups.

I feel pretty safe with my RAID6 setup, having allready had bad things happen to the pool, and being able to recover from it without loosing data.

1 Like

You are right. Just changed it.

1 Like

this sounds very promising :smile:
thanks for that, I am still copying data from disk to disk, is going to take another night, but after that my raid5 is fully functional and I am curious to find out :smile:

I just came from FreeNas and almost did not try Rockstor because of this thread. I wonder how many that had a bad drive take out you whole array are Seagate 1.5 and 3tb drives, as i fell victim to them and found out it was drive issues. Here is a survey of drives by a storage company Backblaze who released their drive failure rate and long story short I will never buy Seagate again. So this sounds like raid 5/6 with btrfs is as stable as if i built it using md on debian and used say ext4 or jfs or xfs?

i would not say raid5/6 is that stable, at least the recovery code seems to have some problems/gaps here and there, which might or might not influence your recovery.

for myself i will wait, for some company i work at i will wait longer, for some backup’ed data sure why not.

Hi there!
Just installed Rockstor and was so ready to get going. Especially when I saw that I could mix and match my different drive sizes. But then I came to this thread, guess I was just sort of cruising about the forum. Am I to gather that raid5 recovery is not working at all?

I’ll be jamming in 1x2TB, 2x1TB and 1x500GB to start off with, then add 2x3TB when I get data migrated.

Aside from my worries, looks damn promising :slight_smile:

Is it a good idea to be using different size drives in a redundant RAID? Or were thinking first to do RAID 0 to get all the data in one pool, and then to upgrade?

raid56 is working, afaik from this official site from feb 2015 where it got introduced in 3.19 it should be more or less stable, but still has the writehole.

Everyone has their own definition of stable, for me its not that stable that i want to use it in “production”, raid0/1/10 are good though. long answer says it all :smiley:

felixbrucker: Good answers.

I actually like the pragmatic answer in your link also :smile:

I personally use BTRFS in RAID6 for my media files, and am not all that concerned about stability.
My pool has allready recovered from what I think of as pretty serious errors (crashes during resizes, a failing disk and so on) and I havent lost any data.
But I do have my most important files backed up on another NAS / in the cloud, so a crash would not be a complete disaster.

RockStor / BTRFS does not handle recovery all that well on RAID56 yet, so you have to be comfortable with reading online documentation, and using the cmd line.