Moving pool on top of LUKS in-place

Hi,

I have 3 disk pool on raid-1 and thinking of moving it on top of LUKS (like my diy backup nas on opensuse). I ordered fourth disk so does rockstor go how crazy if I just do cryptsetup on new disk, then btrfs replace 3rd disk to new 4th. After that is completed wipe the old 3rd disk, btrfs replace 2nd with it and so on and at the end just empty wipe 1st disk and add that as the 4th disk of the pool?

@Jorma_Tuomainen Hello again.

Re:

This is entirely possible within Rockstor but do NOT use the btrfs replace route:

Rockstor as of yet is relatively untested re btrfs replace, and has no ‘native’ support (read recognition) for the states of a pool during replace. But as of 3.9.2-49/50 the disk removal is much improved. I would thus strongly suggest you take the disk add then disk removal route. I believe the btrfs code for this is also more mature, so hopefully safer on that front also. I know that replace is quicker but from a Rockstor point of view I would definitely make sure you are on at least 3.9.2-50 and then add/remove then add/remove then add/remove: waiting for each in turn to finish of course. Much safer and as stated Rockstor should then know what’s happening and also, via the associated Pools details page, be able to give you a rough idea of the progress. Note though that, as will all such procedures, your machine will be very much less responsive during this task. I would also suggest disable quotas via the Web-UI before this operation also as that will massively speed up proceedings and lessen the possibility of the Web-UI timing out due to low level system load issues concerned with such big operations. You can always turn it back on again via the Web-UI. As state this as their were know issues re enabling quotas such as one having to execute certain related commands twice for them to fully take effect !!

So to the more specific element of LUKS, when Rockstor LUKS support was developed it was actually part of it’s remit that it facilitate moving, disk by disk, an entire pool from non LUKS to full LUKS; disk by disk. Definitely go with the default of full disk though, i.e. no partitions on the data disks.

The pull request that initially introduced LUKS was:

And it’s definitely worth a ready of the comments their as one of the test scenarios reads as such:

“… an existing 3 disk raid 1 pool totalling 225 GB was taken through the following semi typical scenario for moving an existing pool from non LUKS formatted devices to LUKS formatted devices and their consequent mapped counterpart Open LUKS Volumes.”

So that looks to fit your remit. Although the data involved was ‘play’ size it was enough to prove the concept in practice on real (and very constrained) hardware. But not that at the time we had far less capable disk / pool reporting and so we then acknowledged the caveat of

“N.B. pending existing code issue to open here re no UI feedback during this removal.”

Which was duly opened:

and was closed / fixed as of Stable Channel 3.9.2-49/50 via:

Remember to give each disk in the pool set the same LUKS passphrase, not essential but way less hassle as otherwise you will have to remember which passphrase goes with which disk, and you may be asked for them (during boot, say, if that is your choice) in varying orders. This will not result in the same encryption on each disk as their is still a random salt involved at the LUKS level.

I would also advise reading our Rockstor official docs LUKS HowTo: LUKS Full Disk Encryption which take you through how Rockstor ‘views’/presents LUKS and it’s associated ‘special’/mapped devices.

An important point here is that underneath Rockstor obviously uses vanilla LUKS but it makes some assumptions, mainly for easy of use and automation purposes, about such things as how the mapped devices are named. As a consequence it is required that you setup LUKS via the Rockstor Web-UI. But this does not mean one cannot then use this configuration elsewhere of course as it is one valid configuration ‘style’ only. Just that any random way of naming / configuring LUKS and it’s associated devices is not going to work on Rockstor. Has to be this way if we are to keep our code base sustainable. Their is method in how we have done it and hopefully that will become clear upon going through the process.

As a final recommendation I would advise that you setup a parallel arrangement within a virtual machine with at least 5-10 GB data disks and go through the entire process their first. And make sure you understand the consequence of all the actions you take. I.e. such as the “Boot up configuration” option for example. And once all is done re-assure yourself via a few reboots etc.

Now as this requires at least the very latest Stable release you could, via:

https://appman.rockstor.com/

Change you existing subscriptions Appliance ID over to that of the test virtual machine install. Then once you are satisfied with how the procedure is to unfold, you can move your Appliance ID in Appman back to that of your real machine. The effect should be immediate so no worries on waiting. You existing activation code will remain unaffected which means you need not do anything to the ‘real’ machine as it will just experience a ‘lapse’ in Stable Channel subscription authentication as it’s Appliance ID is temporarily (for the duration of the test) invalidated, until you come to re-assert it’s Appliance ID their after.

Hope that helps and good luck. Also this is undoubtedly a major operation with many moving parts, literally as well as logically, so I strongly suggest you do the suggested reading and the suggested testing of the method on a virtual machine mock-up. And run the very latest versions of Rockstor of course. Plus make sure this is not your only copy of the data as this is also a very strenuous exercise for the computer as a whole and specifically the disks and their for a common failure point the PSU. Also memory is well tested with your entire data passing through it many times.

P.S. As this is such a big operation also ensure that you actually are running latest stable via:

yum info rockstor

as when moving from testing to stable it can sometimes only ‘show’ latest but not actually be running latest. OK going straight to stable though: I’m thinking of the VM setup that may end up tripping you up otherwise.

1 Like

Good points, and I’m running newest stable. Was planning the whole disk anyway (setup that my backup NAS has) and same passwords. And should avoid most of the problems since root disk will stay unencrypted.

I do have the backups anyway so I might not do the mockup installation, should be straightforward enough. But will stick with the GUI and add/remove instead of replace.

Also, no auto unlock seems to be the only option that can be unlocked via web gui or ssh and still keep the data safe if someone takes the whole server with them?

@Jorma_Tuomainen Re:

Their is currently no Web-UI way to unlock, unless you count our terminal within the Web-UI of course.

I would count the “Manual passphrase via local console” as secure in this case as otherwise the assailant doing the ‘lifting’ would also have to ensure no power interruption. I.e. entering the passphrase during the next boot would again be required hence secure. The entered passphrase stands aside from Rockstor and is totally controlled by the LUKS system. It’s an option the LUKS system controls and is backed into the relavean systemd units. This options is really the minimum viable option in my opinion as the ‘lower’ option of via ssh is super janky. But of course you would require a local terminal on the machine as the passphrase for “Manual passphrase via local console” is requested during the boot and boot will not complete without it, ie it waits indefinitely for this input.

Also note the Web-UI warning language re the “No auto unlock” “Only single device pools are recommended.” So I would, in your circumstance, recommend “manual passphrase via local console” where each boot/reboot you have to enter via local console. This protects from box theft as the passphrase left in memory only long enough to do it’s job. Power off / interruption means no passphrase so locked devices. And this option will work ‘out of the box’ as in no ssh in.

We are basically missing an ‘unlock part of the time’ options via Web-UI. This would require improvements / enhancements to Rockstor’s mounting mechanism which currently are not quite up to this. But much has been improved in that direction.

Hope that helps to clarifies things, as in middle “Manual” option would work to secure box to theft if theft involved power interruption. Lowest option is really only their for completeness where Rockstor has nothing to do with unlocking at all. But then middle option is essentially this also as it uses systemd build ins to unlock via local console during boot. Rockstor’s part in this is just writing the config to do that.

The LUKS container configuration Web-UI screen covers all of this hopefully. For a breakdown see Step 2: Boot up Configuration docs section.

Local console is kinda inconvenience since a) server has no display or keyboard in it b) IPMI console is pita to use compared to ssh.

I have vanilla opensuse with script which just unlocks devices and mounts the pool? Since that is the backup of the NAS it has RAID-0.

@Jorma_Tuomainen

Ok yes, that does rather limit your options.

This could be tricky as Rockstor does the mounts usually but if it finds them mounted in the usual place with the usual options you should be good. You could looks to what happens when Rockstor does the unlocking / mounting and copy the minimum part of that. We definitely have more flexibility to build in here and I think a Web-UI ‘mount this’ would be good but as always their are a number of corner cases that spring up.

Good luck with this and do share what you come up with. Definitely doable and I tested the ‘ssh unlock’ option which from memory invoked some systemd script that took the pashphrase. But as this was ‘per drive’ I was concerned about multi pool unlocking, hence the suggestion for single device pools only. But if your script ensured all drives were unlock in very short succession and you made sure never to use this script with a degraded mount option then you might be OK. Otherwise you might end up with a degraded mount as the minimum number of devices appeared for that pool and the as the last drive appeared you could end up with split brain scenario. Hence the single drive suggestion in the docs.

I definitely want to add a mount / unmount Web-UI capability which would address this outright (assuming it tied into LUKS) but it just isn’t their yet. Good luck and given their is no fstab entry for Rockstor data drives you might be OK but if using the Web-UI it will routinely check if a pool is mounted and if not it will attempt to mount. If this happens part of the way through your script you could mash your pool, but I think this would only be likely if a degrade option is used, otherwise btrfs refuses to mount any pool with missing devices so their in may be your loop hole. It’s definitely worth taking a look at our source code as all functions on our side are there. But when working on the LUKS I did find that systemd did quite a few things in the background re the creation, of the mapped devices. And if I remember correctly some would take a while to appear as it seemed to be on a polling/timing type arrangement. We just configure LUKS but on the other had expect all things to be mounted and if not we mount them, but we do have an expectation that these mounts are done at boot.

See how you get on and do take a look at that LUKS pool request as then you can see the part played by Rockstor and the part played by regular LUKS.

What about disabling rockstor start at boot (since it has no use before pool is online anyway) and make script decrypt luks disks and start rockstor (which should mount the pool)?

@Jorma_Tuomainen Re:

Yes that looks like the best way to do it in your circumstance. All Rockstor systemd services begin with “rockstor” and can be found in:

 ls -la /etc/systemd/system/rockstor*
-rw-r--r-- 1 root root 217 Oct 15 18:48 /etc/systemd/system/rockstor-bootstrap.service
-rw-r--r-- 1 root root 204 Oct 15 18:48 /etc/systemd/system/rockstor-pre.service
-rw-r--r-- 1 root root 227 Oct 15 18:48 /etc/systemd/system/rockstor.service

So systemctl disable on all of those:

systemctl disable rockstor-pre rockstor rockstor-bootstrap
Removed /etc/systemd/system/multi-user.target.wants/rockstor-pre.service.
Removed /etc/systemd/system/multi-user.target.wants/rockstor.service.
Removed /etc/systemd/system/multi-user.target.wants/rockstor-bootstrap.service.

so they don’t auto start on boot and then if you start the last to run (top) one rockstor-bootstrap it should invoke the rockstor service which in turn then invokes rockstor-pre (itself depending on postgresql and runs initrock to begin with). We may have more robustness work to be done in these services so do keep an eye on them and any suggestions always welcome.

Post this disabling of all rockstor’s main services, and a reboot their after, you should have nothing mounted on /mnt2:

mount | grep "/mnt2"

There are other rockstor related systemd services such as the one established if one sets hdparm parameters but that’s not relevant here.

So execution order of the main ‘start up’ systemd units is as follows:

rockstor-pre
rockstor
rockstor-bootstrap

But their defined inter dependencies (via their respective After= and Requires= entries) means that the last will invoke and wait for each in turn to start before it attempts to start itself.

So with all disabled and a consequent reboot rockstor is completely inactive.

You can then do your bit here.

Then an invocation of the top service and you should be good. Assuming their is sufficient time / dependency on the appearance of the Rockstor expected LUKS mapped drives names.

post:

systemctl start rockstor-bootstrap

you should then have the rockstor managed mounts back in place:

mount | grep "/mnt2"                                                                                                                         
/dev/vda3 on /mnt2/ROOT type btrfs (rw,relatime,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot)
/dev/vda3 on /mnt2/home type btrfs (rw,relatime,space_cache,subvolid=260,subvol=/@/home)

The above is an example openSUSE variant with no data pools attached but you get the idea.

Hope that helps and I think you’ve hit on the ideal way, with your circumstance/requirements, to go about this. Do keep us posted and consider sharing your results as it may be that others find this approach useful.

I may have missed something here but I’m sure it will come up if need be. And do remember that Rockstor has very specific expectations on the configuration of LUKS but only enacts them via standard LUKS config files. So as long as use use the Web-UI to configure stuff you should be good, and can then view what it has done via the normal LUKS config it establishes.

Since I need to open the encryption with cryptsetup, is there any requirements for device name?

@Jorma_Tuomainen Re:

Yes, very much so:

So I would strongly recommend that you setup Luks with one of the other option and then looks to see what the names are. You can always change the “boot up configuration” after having viewed what names are used. You will then get a good look at all the names used. Copying them in your own script should do it. But from memory I’m pretty sure they are configured in LUKS’s default config files anyway so you just need to invoke the systemd unlock mechanism. See how you get on. But best to setup as recommeded in Web-UI first and work backwards from there. And as I say you can jump between settings. If particularly sensitive then just use the middle option and temporarily attach a keyboard. Or as previously suggested do the entire thing in a VM first so you know what has to be done and how devices are named. It will make sense to you once you see it and is not an uncommon configuration pattern for machine configured LUKS setups.

Hope that helps.

I’ll put the keyfile on until I get all the disks encrypted so then I can do it in one go with names of all the disks.

@Jorma_Tuomainen Re:

Sound best until your familiar with what Rockstor sets up. Bit by bit best in this case. Also note from the Step 2: Boot up Configuration Rockstor’s official docs LUKS Full Disk Encryption we have in the first screen shot of that section of the Web-UI:

“Non native or manually configured keyfiles are reported as “(custom)” and left ‘as is’. But once a non keyfile config option is submitted a return to keyfile config will yield a default Rockstor native keyfile generation and registration. No keyfile is currently deleted, custom or native.

The most relevant element here has been highlighted in bold.

That same Web-UI page gives the location and example name of a Rockstor native keyfile. This may be important to you as you have indicated a desire not to have one in the final configuration. Hence the Web-UI note that Rockstor will not delete them once created. This is intentional as otherwise we may inadvertently lock users out of their data, and once they have requested the keyfile they may not understand/appreciate it’s auto removal if they experiment with other settings. So do make sure to read all Web-UI and doc entries as all and more of what I have spoken of here should also be in the Web-UI instructions and referenced docs. Our primary aim in Rockstor is to democratise technology. Ultimately in the more advanced cases such as LUKS and security this is a very difficult task and we have erred on the cautious re data access preservation rather than absolute security.

Of course, as again indicated in the Web-UI and docs, you could also do a system drive encryption which would ensure the safety of the keyfiles, but that just moved the ‘enter pass on boot’ options to much much earlier and is a far more difficult problem at that point with no local terminal access.

Hope that helps. Not all relevant to your particular questions but may help others know of what options re LUKS Rockstor offers.

1st disk is balancing, big disks :slight_smile: I’ll do the add/removal from webui with key files and secure the thing after whole pool is working properly on top of luks.

Finished a little while ago, all 4 data disks now on top of LUKS. Trying to move to manual mounting once I have time. Quick script to parse /etc/crypttab etc.

@Jorma_Tuomainen Glad you got the pool moved to LUKS and thanks for the update.

Hope the manual mounting then invoking of Rockstor services works out. Should do though.