Looking for 2-drive fault tolerance on 6-drive array

Hooverdan · March 8, 2025, 5:59pm

I’ve used RAID5 before without any issues, but then moved over to the RAID10c3 for convenience. The scrubbing took way too long, and because I was concerned about potential impacts of the write-hole I felt the need to run scrubs maybe more often than usual (you also should read up on how the scrub be executed by device to avoid further delays, which I didn’t know at the time).
On the other hand, a lot depends on your use case, too. If, e.g. the majority of the time the system used to server up files, rather than have data written to it, then the risk of data loss due to unclean shutdowns can be conceivably lower. Of course, having an UPS connected further reduces that risk (but that’s an additional expense).
In the 8 or 9 years I’ve been running Rockstor, I have had only 1 drive outright fail, a couple started to show issues and were replaced. Only recently had I one quickly going out and a second one (from the same production month) starting to show some issues. Again, the risks of dual failure will (not exclusively) be a function of your use case.

And I would be remiss not to state that any RAID config is not a backup, so I assume you’ve put some thought into that on how to protect your data in addition to running it on Rockstor.

Here are some resources that discuss some more in-depth challenges (and some workaround) on RAID5/6 (granted they’re a couple of years old, so might not be all valid anymore):

https://lore.kernel.org/linux-btrfs/20200627030614.GW10769@hungrycats.org/

In any case, if you’re going to do it, I’d say raid6 for data and raid1c4 for metadata, since you can separate the data and metadata. That likely gives you the best compromise of risk vs. space utilization.

@phillxnet recommended this in another thread some time ago as well:

and it’s something I’ve been contemplating at some point given my use case of mostly read-from storage, but not got around to investing in.