Using different RAID types for metadata and data

KarstenV · August 31, 2018, 6:29am

Forgive me if this has been discussed before, but I haven’t seen it mentioned here in Rockstor’s forum.

I have been reading up upon the state of RAID5/6.

The general consensus seems to be that in the later kernels its stable and usable.

The write hole is still a problem, and BTRFS’s error recovery options still leave something to be wanted.

Since I’m running my NAS behind a UPS, I think the write hole is less of a problem.

One poster on the BTRFS mail list suggests that you run BTRFS with -draid5 -mraid 1 (or -draid6 -mraid1), which would keep the data in RADI5/6 and the metadata in raid 1.
This should allow even better error correction, since BTRFS relies on metadata to be OK, for the correction to do its job. And since raid 1 is not susceptible to the write hole, metadata would be OK, even after a power interruption, and BTRFS would at least be able to see which files contain incorrect data, even if it can’t fix them.

So my idea was to run my pool in this setup (-draid6 -mraid1). Of course after the kernel has been updated in Rockstor to one containing the latest BTRFS fixes.

Would this work with Rockstor in its current state / setup?

Would it be possible to implement setup of this to the Web UI?

vesper1978 · August 31, 2018, 11:05am

Hi @KarstenV

I am not a Rockstor developer, but it looks like it is not possible to set draid and mdraid to different values currently.

Here’s the current code where both draid and mraid get set during BTRFS pool creation:

And here’s the current code where they get set during changing RAID levels:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/fs/btrfs.py#L214-L228


def pool_raid(mnt_pt):
# TODO: propose name change to get_pool_raid_levels(mnt_pt)
o, e, rc = run_command([BTRFS, 'fi', 'df', mnt_pt])
# data, system, metadata, globalreserve
raid_d = {}
for l in o:
    fields = l.split()
    if (len(fields) > 1):
        block = fields[0][:-1].lower()
        raid = fields[1][:-1].lower()
        if block not in raid_d:
            raid_d[block] = raid
if (raid_d['metadata'] == 'single'):
    raid_d['data'] = raid_d['metadata']
return raid_d

Hopefully this is a feature that can get added in a future version of Rockstor

phillxnet · August 31, 2018, 11:46am

@vesper1978 Thanks for posting / looking into this one.

That bit of code only retrieves the existing raid level (get) and is used to inform the pool model of what actually exists.

Well done digging them out by the way.

There are a few other places, mainly when changing raid levels, that would need changing. Not impossible but would need a fair bit of rigour to make sure we don’t break anything as the project has always assumed they are both set equally, with the odd caveat: as ‘quoted’.

@KarstenV I had seen the discussion of the btrfs mailing list re accompanying parity data raid with raid1 metadata. This is an interesting idea and maybe we could offer a kind of custom setting which equates to this. But again this may require a model change or an across the board change in how we represent and read/confirm existing raid levels.

There is also the unit test for the strangely named ‘pool_raid’ here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/fs/tests/test_btrfs.py#L61-L70


      
          
          def setUp(self):
              self.patch_run_command = patch("fs.btrfs.run_command")
              self.mock_run_command = self.patch_run_command.start()
              # # setup mock patch for is_mounted() in fs.btrfs
              # self.patch_is_mounted = patch('fs.btrfs.is_mounted')
              # self.mock_is_mounted = self.patch_is_mounted.start()
              # setup mock patch for mount_root() in fs.btrfs
              self.patch_mount_root = patch("fs.btrfs.mount_root")
              self.mock_mount_root = self.patch_mount_root.start()

That might be the best place to start re code changes as then we know we can at least correctly retrieve this mixed scenario. We could then track the entire treatment of raid levels through the code and ensure we can deal with splitting the data and metadata levels. Bit of a job and in the critical path so not really a priority, at least for me, currently but we could edge up to this.

What are the thoughts on simply offering 2 additional levels, as a first off, ie raid5/meta1 and raid6/meta1. This way can more easily maintain the ‘ease of use’ element of simplifying the raid levels presentation, while at the same time extending our capabilities. Fair bit of work there but definitely doable.

Agreed, any takers for drawing up the changes needed as a start. Remember to take into account the limitations we place on raid level conversions. These have been established by experimentation and would need to be re-tested with these 2 additional ‘levels’. And we would need to carry through to all UI elements the, initially twin, values of data and metadata.

Also note the start_balance single element ‘convert’ which would have to become a little more sophisticated, ie maybe a tuple.

github.com

rockstor/rockstor-core/blob/master/src/rockstor/fs/btrfs.py#L1407-L1427


      
                  return -1
              # if no exception, and no caught WARNING, find the max 2015/qgroup
              res = 0
              for l in o:
                  if re.match("{}/".format(QID), l) is not None:
                      cid = int(l.split()[0].split("/")[1])
                      if cid > res:
                          res = cid
              return res
          
          
          def qgroup_create(pool, qgroup=PQGROUP_DEFAULT):
              """
              When passed only a pool an attempt will be made to ascertain if quotas are
              enabled, if not '-1/-1' is returned as a flag to indicate this state.
              If quotas are enabled then the highest available quota of the form
              2015/n is selected and created, if possible (Read-only caveat).
              If passed both a pool and a specific qgroup an attempt is made, given the
              same behaviour as above, to create this specific group: this scenario is
              primarily used to re-establish prior existing qgroups post quota disable,
              share manipulation, quota enable cycling.

All doable but we have to tread carefully here as this code has taken ages to stabilise.

That may have been in the tread posted here:

Hope that helps and I’m in principle in favour of this capability being adding, but I’d like to keep things simple as most will be best sticking to the basic defaults and too much mixing of raid levels could get pretty messy, ie when working out the minimum drive count requirement and raid level changes etc.

Maybe ideas on how this might be presented / represented to the user. I favour the simple approach first but would welcome a UI representation of at least the data and metadata and that shouldn’t be too much more complexity (hopefully).

KarstenV · August 31, 2018, 6:49pm

Whoa, a couple of long answers

My short answer is that this is something that would be a nice addition, with a short explanation as to what would be the benefits.

Perhaps one day RAID5/6 is fixed to a degree that this would not be needed, but until then it would be a worthwile thing, improving chances of recovery on errors.

And a simple addition of the extra levels of raid would be the straightforward way to do it UI wise.