Cannot remove "detached" disks

phillxnet · December 10, 2019, 1:14pm

@Azzazel Thanks for the tests.

Yes currently the testing channel, for our legacy CentOS base, is way behind. 3.9.1-16 was the last CentOS based testing channel release but is now over 2 years old:

It’s essentially the same as our 3.9.2 stable channel release, linked above, at that time.See the following thread for the background and some more links relating to this more recent history:

So this is good news and confirms my suspicion that we may be putting at least one cart before at least one horse; but ideally we want to confirm this with 3.9.2-50 code. For this Appman is at your service:

https://appman.rockstor.com/

As a Stable subscriber you can now change your Appliance ID as and when you fancy with immediate affect. So in your case you could change it to your temporary btrfs raid1 test setup such that it can be updated to latest stable via your existing subscription: just use your existing activation code after the Appliance ID change has been entered. You would of course want to change this back to your main machine their after as otherwise it would no longer receive updates. So given this facility you can easily upgrade your temporary test machine to latest stable for the purpose of this test.

Note however that since this test machine is already on latest testing, if you move it now to the Stable subscription it will say it’s done it (instantly) but that is a bug. It’s now fixed but only in the code that it doesn’t offer to upgrade to because it thinks it’s already updated. It’s a rather dark bug that way
So make sure to run

yum update rockstor

when going from last CentOS testing (3.9.1-16) to Stable to ensure you are actually running 3.9.2-50 or later as of now. As otherwise the Web-UI states available version as installed. Not good. If going direct from a prior stable install, ie the iso itself, then now worries it should be good and should update as expected. Confirm via:

yum info rockstor

So in short, via our yet to be officially announced here on the forum (though it’s referenced in the Update Channels doc entry, through Appman you can upgrade your testing machine to latest Stable. Just remember to change your Appliance ID in Appman back to your regular machine their after. Otherwise it will receive no more updates. This new Appman facility is meant as a convenience for those that help to support Rockstor’s development and who, in doing so, have incurred an inconvenience: the whole activation code rigmarole. This was found to be required as a prior donations only system just didn’t work. I didn’t donate for one, before I became a contributor that is. So I’m hoping Appman helps to re-address this balance at least a bit.

Anyway if you could please repeat this experiment but with the 2 years newer code of 3.9.2-50 which, most relevant in this case, has very many changes to device management (hopefully only improvements), that would be great. Sorry to ask this of you but you do seem to be up for it

Thanks for these:
using different quotes to ease comparing them here and omitting the partitions for clarity,
and ordering them as sda, sdb, sdc we have:

## Before
NAME=”/dev/sda" MODEL=“Virtual disk " SERIAL=“6000c2994c673296d1cc4f7a9a87e24d” SIZE=“16G” TRAN=”" VENDOR=“VMware " HCTL=“3:0:0:0” TYPE=“disk” FSTYPE=”" LABEL="" UUID=""
NAME="/dev/sdb" MODEL=“Virtual disk " SERIAL=“6000c29a33240a77529fccf8ce00ff19” SIZE=“200G” TRAN=”" VENDOR=“VMware " HCTL=“3:0:1:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“test_raid1” UUID=“7123d669-be9c-4ca0-bf5c-c3a18622836d”
NAME="/dev/sdc" MODEL=“Virtual disk " SERIAL=“6000c2964bd0af6aa755d22714f7e770” SIZE=“200G” TRAN=”" VENDOR=“VMware " HCTL=“3:0:2:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“test_raid1” UUID=“7123d669-be9c-4ca0-bf5c-c3a18622836d”
## after
NAME=”/dev/sda" MODEL=“Virtual disk " SERIAL=“6000c29d42b475d90dd0df22a2391814” SIZE=“16G” TRAN=”" VENDOR=“VMware " HCTL=“2:0:0:0” TYPE=“disk” FSTYPE=”" LABEL="" UUID=""
NAME="/dev/sdb" MODEL=“Virtual disk " SERIAL=“6000c2981f1f665fa00609faff46782f” SIZE=“200G” TRAN=”" VENDOR=“VMware " HCTL=“2:0:1:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“test_raid1” UUID=“7123d669-be9c-4ca0-bf5c-c3a18622836d”
NAME="/dev/sdc" MODEL=“Virtual disk " SERIAL=“6000c2944e04e183a591cc26bd7c7a36” SIZE=“200G” TRAN=”" VENDOR=“VMware " HCTL=“2:0:2:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“test_raid1” UUID=“7123d669-be9c-4ca0-bf5c-c3a18622836d”

We have confirmation that all 3 drives have effectively been replace by, from Rockstor’s perspective, 3 new drives. And this is what it indicates within the Web-UI also. So it is behaving exactly as expected. This situation does not, as far as I’m aware, have a parallel with real disks/devices. And Rockstor’s remit is to track real devices. So yes your backup restore system is at least confusing for Rockstor as it works to track devices irrespective of their content, at least once a pool is request as managed via creation within the Web-UI or import.

An interesting test case as it goes. Do please consider re-running, via the Appman trick, on our latest code if you would. I’m expecting the same behaviour re detached disks but would like to have confirmed that the newer code also lets you remove those ‘detached’ disks post restore as per your testing code experiement, ie with btrfs raid1.

You could, for the time being, by way of a work around, convert the pool to raid1 via the Web-UI and once that has finished it’s potentially lengthy balance, remove the detached disks their after. I’m not likely to get to look more closely at that code for a bit but your report has highlighted a potential improvement. But the consequences in your case are due to a corner case of all drives being supplanted by re-incarnations of themselves. And our current blanket ban on removing a device from a raid0 pool is legitimate as that is not something that is possible under any circumstances (other than the group simultaneous re-incarnation scenario your have reproduced). If we were to allow the removal of a device from a btrfs raid0 only when it was detached then in the scenario where a SATA cable becomes detached a user may inadvertently think they can just remove that disk. But they cannot. Re-attaching it and then changing the raid profile to one with redundancy is their option. I’ll have a think as their are likely many more common corner cases than yours that we are probably missing and I’m hoping to start the testing channel off again soon to explore these, as well as required modernisation, and as these are potentially non trivial critical code changes we have to tread carefully.

But if you are really game (read you have good and proven backups) the following code segment is what is giving your grief (correct as of posting this but may change later):

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/views/pool.py#L569-L572


      
          )
          for d in disks:
              if d.pool is not None:
                  e_msg = (

this is true for stable channel only as there have been many changes throughout since testing.

You might also note a further constraint in place just before the above code here:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/views/pool.py#L559-L564


      
              ]
          
          num_disks_selected = len(disks)
          dnames = self._role_filter_disk_names(disks, request)
          new_raid = request.data.get("raid_level", pool.raid)

Hope that helps and thanks for helping to test Rockstor’s limits re drive re-incarnation