Question about RAID5/6 stability

herbert · September 28, 2015, 7:49am

Hi,

I did migrate my system from plain ubuntu server to rockstor few weeks ago and finished a few days ago
My process was more than straight forward, if you read my posts you gonna understand.

My recommendation, if you have the disks available for data migration, create the pool and raid level you need from the beginning. This would save you a lot of time balancing your disks. Rockstor is not really good at the moment in giving you feedback how the balancing process status is. So you have to work with CLI anyhow if you wanna be updated.

If you do not have enough disks to create the pool with the right raid level to move your data from A to B the process I have chosen was.

I took a single disk 3TB which was empty and added it to rockstor. Moved data from another disk to it. Added the newly cleared disk (another 3TB) to the pool and changed to RAID0 because of disk space. Moved date from the third 3TB disk to the RAID0 pool and added the disk again. After adding all my disks and moving all my data to BTRFS on a rockstor pool I changed to RAID5

On my HP NL40 every balancing process of the data ( around 6TB) took between 12 and 24 hours, during this time it is better not to move or add new data to the system, otherwise it is going to take way longer. Worst case is, that your disk run full and balancing stops, happened to me as well, because I started the RAID5 migration with to less space available.

suman · November 24, 2015, 3:38pm

Really helpful comment @herbert, thanks! I’d just add that we, the core contributors of the Rockstor project are mostly occupied with developing features on top of BTRFS, and as a result don’t get to really benchmark pool balancing performance and other BTRFS behavior at the moment. So, user feedback like this is really useful to everybody in the community!

Dragon2611 · November 26, 2015, 5:04pm

The Raid5 with a 4.3x kernel was able to recover from me accidentally zapping the wrong drive in Proxmox (Clicked next one to many times during a re-install and forgot to change it from /dev/sda.

That I I now backup most stuff to Hubic using duplcati just in case I do something stupid like that again.
Might get a 2nd server to run Rockstor on at somepoint.

Learning2NAS · December 24, 2015, 5:39am

Good news guys. There is a discussion going on in the btrfs mailing list that indicates that Raid 5/6 is stable as of 4.4 Kernel, which is about to release. Yay!

The bad news, according to that discussion, is that btrfs still doesn’t have a way to monitor file-system health and alert the user to any failures. Frankly, I’m more worried about disk health than file-system health. Should this really be a concern?

seijirou · December 28, 2015, 4:21pm

In that same mailing list only yesterday it came out in a ‘Raid5/6 crashed on scrub’ thread that “[Btrfs] doesn’t yet have a concept of faulty devices, and then I’ve seen it get confused when drives reappear with new drive designations (not uncommon).”

d549e5 · December 28, 2015, 4:37pm

Yeah, we ran into that new drive designation problem a few times now with repurposed disks. Basically the problem is bad enough that the only real solution is to erase and start from scratch. But I do feel that BTRFS is maturing very quickly the last few months.

Learning2NAS · December 31, 2015, 1:37am

This situation is sad. When will I be able to use Rockstor as my real NAS? I want to support this software SO BADLY. It has/does everything a NAS should do, but the underlying file system is still not ready. What the heck is everyone doing with Rockstor in the meantime? Demoing the capabilities with data that is backed up elsewhere? I don’t have that much spare storage laying around, so I’m precluded from doing that.

d549e5 · December 31, 2015, 9:07am

Well, the system itself works very well. And as they say, RAID is not a backup and neither is BTRFS RAID.

Using pulled disks like we did is asking for problems in any RAID system. You are right that other RAID6 systems would handle the situation better. But using healthy disks in Rockstor still allows you to mix / match drive sizes and use checksumming to guard agains bitrot.

And of course you need a backup of your data. Don’t have enough space? Use something like crashplan. Don’t have the bandwidth either? Buy one of those shingled 8TB archive disks and backup to that.

Rockstor is cutting edge storage and yes, BTRFS is not all there yet. But as I wrote, it is maturing very quickly, and the NAS system that Rockstor’s team built on top of it is already very good and getting better all the time.

LFletcher · February 26, 2016, 9:58am

Has there been much progress on the RAID5/6 stability/supportability in recent months?

bdarcus · February 26, 2016, 9:12pm

I’m using it for a home NAS, storing photos (backed up to Amazon Cloud Drive), and serving up music files. I have no problems with it, though am using RAID10.

Spectre694 · February 27, 2016, 4:04pm

@LFletcher

The RAID 5/6 implement itself is quite stable. The only problems I know of with it are the scrubs are still ridiculously slow and free space reporting still doesn’t work.

That said I couldn’t use RAID 6 on mine as the main array because of the speed problem. My RAID 10 scrubs at around 1TiB an hour the RAID 6 around 100GiB and hour.

BTRFS on a whole though still doesn’t have disk monitoring or a way to drop flaky disks.

LFletcher · March 1, 2016, 12:12pm

Thanks for the response.

I’m looking to run a 100TB+ box and wishing to maximize space, RAID10 isn’t going to be a valid solution.

If scrubs are really slow, and there are issues with free space reporting, drive monitoring, etc it seems its still a long way away from where I would be happy to use it. Which is a shame.

Spectre694 · March 2, 2016, 5:15pm

No btrfs is certainly not a solution at that scale yet. btrfs would at minimum need hot spares for that plus a raidz3 equivalent and top level striping would be desirable too on top of actually being able to scrub that in under a year.

ZFS or a hardware storage appliance are your only realistic options for that scale that I know of at this time.

mojo · March 8, 2016, 11:40pm

Hello,

I’m new to the forum here. Our company has just purchase a 45 drives unit that shipped with Rockstor. I’m just going through the treads here becoming familiar with things and came across this post. What is the current state of Rockstor with Raid 5 or 6 configuration at a larger scale? such as 100tb to 250tb? We were counting on making strips of 15 drives each at Raid 6 each, so it would be 90 tb x3. Raid 1 or 10 is not an option.

We are also making this our main storage for a while to come so we need smart status to keep us accurately informed.

Thanks,

Spectre694 · March 9, 2016, 1:33am

Same as my posts here and here from about a week ago only change with the newer 4.4.3 kernel is slightly faster RAID 10 scrubs. OP was with the 4.3.3 kernel.

BTRFS (the underlying FS) is still not ready for that kind of scale yet. As much as I like Rockstor, (admitedly it is still growing too though, quite well and quickly ) until BTRFS improves you should be looking at ZFS on FreeNAS since you already have the hardware. Hotspares and alternate/nested RAID levels should also be considered.

Also 90TB is not what you will have for space as you didn’t take out for redundancy. 15 6TB disks in a RAID 6 will net around 78TB of usable space.

And finally this always bears repeating

RAID is NOT a backup RAID IS availability/performance

bug11 · March 9, 2016, 7:22am

Well, just installed Rockstor on my new server and tried to import existing Raid5 setup. This was the message, and when using winscp to find the mount folder, all files are gone

EDIT: had to delete everything and start over. Luckily it was just a few TB of movies/series

Dragon2611 · May 28, 2016, 10:45am

Did you try mounting via the CLI it’s possible the order of drives had changed and it just needed poking.

Edit: Opps didn’t realise that post was 2 months ago.

Learning2NAS · February 27, 2017, 9:55pm

Here I am, again, reviving a dead thread.

After this post was made an issue was found with the RAID5/6 code in BTRFS. I think everyone knows about that by now because it was a big deal for a little while. The error has since been corrected and, for several months, the patch has been rolled into the current kernel. Everything seems to be up and running better than ever.

My RAID1 config is tight on space and I would like to convert to a RAID6, but not at the risk of losing production data (some of which has not been backed up yet because my off-site replication is bandwidth limited). Anyone using RAID5/6 since the fix? What are your results? Any Rockstor contributors able to weigh in on RAID5/6 status?

Spectre694 · February 27, 2017, 10:10pm

Can you post a link to that info? ASFAIK they only fixed one of the big scrub issues. So it’s still pretty broken more or less usable but recovery’s on it are hit or miss.about what it used to be.

EDIT: I know I stated it was fairly stable earlier in this thread but since then there have been multiple showstopping bugs discovered and a rather blase attitude towards fixing the RAID 5/6 has caused me to change my mind

amattia · February 28, 2017, 8:08am

I’m following the btrfs dev list since september, and they fixed a lot of raid56 bugs
If I remember correctly they fixed all the known bugs excluding the write hole.
Sadly, the patches have not been committed in time to be included in kernel 4.10,
so I think they will be included in 4.11 (and maybe backported to 4.9 LTS ?)
But even with the patches, the devs don’t think raid 56 is stable (see current status links)

Old msg with a listing of the known bugs:
https://www.spinics.net/lists/linux-btrfs/msg60736.html

Patch msg:
https://www.spinics.net/lists/linux-btrfs/msg62581.html

Patch test msg:
https://www.spinics.net/lists/linux-btrfs/msg63022.html

Current status:
https://www.spinics.net/lists/linux-btrfs/msg62280.html
https://www.spinics.net/lists/linux-btrfs/msg62274.html