Rockstor in VM on Hardware Raid

@b_b Hello again.
Re:
1, 2, 3 - all clear I think. Assuming the minimum system requirements are met - predominantly access to a static reliable serial for said devices.

That’s a tricky one - as from the btrfs point of view it kind of does and I think that was the meaning here. Hardware raid adds disk fail redundancy (underneath - if in a redundant profile of course) but it does it blind to the consequences. There is also the possibility of passing two raid backed virtual drives through:

  • but you cover that in point 5. So agreed on that point. But we have the single point of failure that it will likely be the same raid controller - but multi-port HBA has this same single point of failure: however the HBA doesn’t invisibly mix the drives!! So I would say passing two virtual drives is better but still suffers from having something unaware of the data integrity (hw or md/dm raid) as a whole undermining something that is (checksum cow fs).

We agree I think on 5 as true - there-abouts. But the key element here (re 6.) is that drives are directly managed by the kernel - i.e. same ‘house/ball-park’ as btrfs itself. The particular subsystem responsible for the type of block device has the call here. But it will be far more ‘native’ than what is essentially a dump hw raid. HW raid just wants to present the OS with presumed in this case redundant of sorts block device. But it will drop an entire drive to do this. And potentially good copies of other data in the ‘deal’. That was all we had at one point: we now have software raid that can be more clever at this. Especially when that raid system is intergrated into the fs.

We also have to consider flexibility into the mix here. We have thus far only considered redundancy: a system that continues to serve is of more use than one that must be replaced, or taken off line to rebuild a newer bigger pool. Btrfs can do this online - while serving. That is another element of data integrity - a systems ability to adapt to changing use. See online growing/shrinking ReRaid capabilities that are also a win (for some).

I quite like this break down - after some discussion we should consider patching that in somewhere in the docs as this type of discussion comes up from time to time and I am know expert but it’s pretty much as I see things - given our Minimum System requirements (specifically the ‘drive’/device serial thing)

We don’t actually cater to this. We are all-in on btrfs as our entire design is build around it’s unique capabilities.

Why? - Without hw raid and their often proprietary (or firmware linked) dive format - one can move a set of drives from one machine to another. So I do not agree here. If we are talking of the summary presented in 7. Btrfs is an in-kernel filesystem: given a equal or later kernel these exact same dives can be attached to equivalent ports on an other machine - in any arrangement. It is very much not like hw raid in that respect.

Again: only sort of. If both virtual drives are from the same raid controller: they are not independent: a key design element that btrfs assumes with its chunk based raid. Each chunk in say raid1 (2 chunks per data/metadata) is placed on what is assumed to be independant devices. But if they are from the same raid controller they are not - they just look that way.

That is what makes this discussion interesting to a wider audience I think. We have to be pragmatic with the resource we have. But I’m hoping this thread helps to clarify the associated risks; and to inform folks that Rockstor, btrfs really, can only do what it’s foundations allow. And can be completely hobbled if not instantiated correctly. Hence our advice as it stands. But that is all very well and ideal: but much can be mitigated with other systems of say regular back-up. And with btrfs comes potentially more back-up options and snapshot capability etc. So definitely we have to be pragmatic in real-world situations. But also realistic. Say one has a VM instance with a single dev raid backed pool that senses catestrophic corruption because ‘what-ever’. You loose all data access as the Pool as a whole may well be lost, with no repair scenario/options. But you were not returned corrupt data - potentially for years!! You know know this data store is defunct - that is the first step of data integrity - availability should not trum validity. I.e. ‘Oh good I got access to my data - it’s all nonsense - but still’ So concerns of uptime etc are also a thing with storage designs. Restores can take ages and cost continuity of access. But checksumming and cow can really help with ensuring data remains data - not nonsense. Availability is another thing: I think best rolled in with data integrity - hw raid can’t do that in the same way - and likewise independant sw raid similarly. It has it’s concerns: not the pools concern. Btrfs when resourced for it’s volume management brings flexibility and continuity. All is managed under the same roof as it were. But yes there is still the block layer of the kernel etc (but that can undermined by hw raid is my thinking here).

Yes, a number of the major NAS providers have now moved over to btrfs. But I don’t know of any that have also dropped the legacy mdraid (dedicated software raid in-kerne). They all have a very large investment in that - and it’s good. More mature than the drive management in btrfs - but not integrated and not nearly as flexible. They have basically substituted their prior fs (ext2/3/4 I think it was) for btrfs to get snapshots send/receive cow etc.

Kind of yes: but I would say that mdraid is far better than almost any hardware raid. Even given it was carried predominantly by one person for quite some time. Plus you have that portability thing where any mdraid drive can be taken to any equal or newer linux kernel if the hardware has the same port - or can carry an adaptor. That is definitely not the case with hw raid. Plus all it’s management (outside the drive of course) is in-kernel. A key in-house advantage. The kernel is then the on-stop shop for the entire storage subsystem - and all in software down as far as is reasonable. So fixable as we go - not left with bugs that never get fixed as the seller is more interested in the next hw raid model: and its maintenance - if there is any.

In time I think we will see the like of synology move from mdraid and onto the more flexible forward-looking device management within btrfs itself. But they are a large ship and it takes time to turn such things.

You may also be interested in time in our next ‘layer’ once we get our dependencies all up-to-scratch (current testing channel) I at least hope to begin laying the ground work for GlusterFS. Another whole layer of redundancy/availability. But stead now - lots to do as we stand.

Hope that helps. I am no expert but this is how I see things and I like to think Rockstor is fleshing out in its capabilities as we go and I am entirely happy with our choice of technologies: we just need to get them all up-to-date and shiny before we branch out feature wise.

1 Like