Drop enforcing the raid level when balancing?

phillxnet · February 6, 2022, 7:27pm

Because a raid1 has 2 disk minimum, you can’t remove a disk from it without resorting to a degraded mount. That’s btrfs. We guide you through this in more real, rather than contrived circumstances. I.e. if a disk was missing due to failure.

The balance involved with a move from raid1 to single has to complete. It’s probably still working in the background so there are still raid1 content that is awaiting the full balance that has to complete. You have to be patience here as the last re-raid wizard indicates. And if you watch the data allocated to each drive and free space you likely will see it move. Also the pool details should show the new raid level once the operation is complete.

We have not changed this area of this code since it was last tested. You may have an issue but you may also have found a corner case. Difficult to tell without a reproducer. How much data did you have on these drives. A nice info checker is the usage command:

btrfs fi usage /mnt2/pool-name-here

It should tell you the amount of differing raid blocks on the pool. Also note that in some settings where there is no or very little data there can be corner cases.

For the:

You must still have some raid1 content. Take a closer look via that command to see. Btrfs wise we are default openSUSE and rely on them to make-stuff-so. And given they employ a few of the major contributors I think we are in good hands. When in doubt: more info. If you are testing the limits look for a simple reproducer from blanked disk so we have an exact case to look out for and to see if this is a known upstream issue.

Hope that helps and let us know if you can find the raid1 entry here. But Rockstor will not allow removal of a disk that results in bringing the pool below it’s minimum as that would necessarily enter to pool into degraded territory and so the degraded proviso (custom mount option) would be required to do that.

Or we may just have a failure to update the raid level in the db. Single is not a common raid profile given it’s not raid. But this raid level is informed by a simple bit of code here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/fs/btrfs.py#L405-L419


      
              return total_devices - attached_devids
          
          
          def degraded_pools_found():
              """
              Primarily intended to indicate the existence of any degraded pools, managed
              or otherwise. Originally used by data_collector to feed real time Web-UI
              indicators. Non-managed pool coverage allows for the indication of a
              degraded mount requirement pre-import or on fresh disaster recovery
              installs.
              :return: Number of degraded pools as indicated by any line ending in
              "missing" following an associated "Label" line.
              """
              # --raw used to minimise pre-processing of irrelevant 'used' info (units).
              cmd = [BTRFS, "fi", "show", "--raw"]

And that is called on each Web-UI initiated refresh of the pool. You may just have to refresh the Pools page for it to allow the removal.

Sorry to not be of more help here, lots of things thrown together. If you end up having what looks like a specific issue it’s best to focus on that issue in a specific thread in the forum to help folks follow just that issue. Here we began discussing a design decision then moved to raid level change bugs. Also note that command line operations can interfere with running Rockstor processes and no raid level change is imidiate. Snapshots are almost. But raid level changes with consequent balances (what we do for the reasons discussed earlier and years of doing it this way) take time. Almost all data has to be re-written. In almost all cases on the pool front the pool is the source of truth. But the Web-UI indicates the user preference which we are oblighed to enforce.

Cheers and keep the analysis coming. All good. But remember we need exact and ideally small reproducers for bug reports. And we have had not other reports of this type for a few years now. But hey, they have to start somewhere so if this is a corner case or an obvious thing that has been missed then great. But exact reproducers from clean drives is the way to expose an issue and help anyone trying to fix it to have the ability to know their fix has worked. Bar tracking down the exact cause of course.

Hope that helps and again thanks for your engagement here. I myself am on a hiatus of sorts after the years long push to get to our new v4 “Built on openSUSE” base. But there are still things happening in the background, some of which will help to prove the function of all key capabilities which will be nice. And we hope to use this new testing setup (based on openQA) to establish a better safety net going forward where we have to immediately make thousands of changes to up-our-anty re technology versions. All good and all in good time hopefully. This will be the start of our next testing channel release which I have yet to announce bar a mention in our first v4 stable release notes: