I’m new to Rockstor and there seem to be quite a few issues around recovering from a failed boot drive. My question is this: If I get the system configured the way I want it can I power it down, image the boot drive to a spare drive, verify it works, and use that spare drive to recover the system at a much later date should the main boot drive fail? My concern is if updates are applied in the meantime and/or critical data regarding the pool is stored on the boot drive, might that not work?
Is this is a viable strategy or is there more of a real-time relationship between the boot drive and the data pool such that popping in a boot drive that’s say a year old may cause problems?
The basic advice for a failed boot drive seems to be starting over with the data drives removed, adding the drives and importing them, restoring the last configuration backup, and crossing your fingers for good luck. And that, apparently, doesn’t restore Rock-ons? There are several posts here regarding people having trouble recovering from a failed boot drive.
I consider easy recovery from a failed boot drive especially important as Rockstor doesn’t support mirrored boot drives and many are using USB thumb drives for the boot drive which are relatively prone to failure due to Rockstor’s frequent writes to the boot drive.
I have on a number of occasions myself recommended only fast, good quality USB sticks as the regular ones, as you allude to, are not really up to it. OpenMediaVault have had a similar experience with users employing poor quality USB key system ‘disks’. I’ve had good results from the ‘ssd on a stick’ SanDisk Extreme USB 3.0 but it’s unfortunately been replaced with the Go which looks to be a down grade. Always best of course to use a real ssd or hdd but that’s not always an option for hardware that people want o run Rockstor on. But that doesn’t make it our ‘fault’. But we do still make quite a few writes, which comes from our enterprise linux base which given the price of reasonable hardware these days out weights our borderline compatibility with poor quality USB system disks. But you might be interested in our long term pans to move to a more transaction update read only root type scenario. But as I say that’s a long term plan. We need to complete our move to an openSUSE base prior to working to support their transaction server install option (quite some challenges along the way for that one).
I don’t see why not. Obviously this will work better if you avoid creating any shares (btrfs subvols). In my in-house development related testing of Rockstor I regularly exchange / move Rockstor system drives between data drives. In fact if you take care not to delete ‘detached-…’ disks one can then transition the system disk back to it’s original system and it will invisible pick up from where it was with that pool as those original ‘temporarily detached’ disks will simply be seen as re-attached. But of course the disks associated with the second system will then show up as ‘detached-…’ themselves.
Rockstor really only mounts the pools and subvols (shares, clones, snapshots) it has previously been asked to import and reads them as it finds them. If it’s database is found to be missing say a new subvol (assuming that subvol is created as another Rockstor instance might create it, ie at a certain ‘level’) then it simply updates it’s database (stored on the system disk). Likewise if a share (btrfs subvolume) was to be deleted during the ‘offline time of your system image backup’ then upon first boot, or in fact a Web-UI page refresh, then the db is just updated with this info.
Also note that there have been many improvements to the pool / share import and db / pool maintenance and the entire project really since the last testing channel release as the stable channel is now far ahead. Please see the following for the improvements made:
Last released update 8 days ago, with regular releases over the projects 6 years on GitHub.
The stable channel subscription is part of our development sustainability ‘plan’.
On the imaging the system disk though, do make sure you use a system that is btrfs aware. Not all systems are. Of course a simply dd would be the most base approach. And do make sure not to connect both Rockstor system to at the same time. And with an image based copy this will actually be catastrophic as btrfs will confuse the 2 system btrfs volumes as the will share the uuid: which is very bad. So essentially Rockstor doesn’t like having 2 pools (volumes) with the same label but that’s really just a Web-UI confusion element. But if your imaging system is set to auto mount btrfs volumes then you will have a very bad day as once the image is made and your imaging system does decide to auto mount then you will have 2 btrfs volumes that share the same uuid and thus it won’t be unique and you will have corruption of one or both. This is a btrfs level thing however and as long as you don’t mount the volumes you should be OK. As well to know though.
That’s kind of the general advise for any given system / data type arrangement and a lot easier than taking someone through a full on repair procedure. Especially given the re-install methods accessibility and the relative unfamiliarity of many NAS users with the linux command line.
Not yet anyway. However if the advise in our documentation is to be followed all configuration and data for all your previous Rockons are preserved in the data pool and it’s associated redundancy level. Rock-ons are docker based and docker images are designed to be ‘burnt’ and have no data stored inside them. They are by design transient. But yes it would be good if we could add at least their auto re-install to our config restore. If you fancy having a go at codeing that you are welcome. I may have a go myself at some point but I think the likes of @Flox or @suman may beat me to it.
It does if you use mdraid, which is currently better for mirrored root arrangement anyway, given the better grub support. Please see our Mirroring Rockstor OS using Linux Raid which has seen a number of contributors / testers. It’s fairly messy as our upstream CentOS installer doesnt’ really want to do it so one has to be pretty persuasive, but then we have plans and fairly active ongoing development (last 6 months) to move to openSUSE as our base, that way we inherit their more capable and flexible installer and their kernel maintenance / updates and given they employ a number of the btrfs developers they form a more nature home for us. This leaves us to concentrate only on the Rockstor specific code. We would also inherit their capability to boot to a prior (auto made) root snapshot as they have a default (with system disk >16GB) to enable their snapper based system of being able to revert almost the entirely of / to a prior instance: another much desired feature in failed update scenarios.
But we only surface basic info on the mdraid arrangement so maintenance is strictly a command line only arrangement. In time we hope to support ‘native’ raid for your system disk: ie btrfs based. But this is a way off as of yet and is not yet supported by openSUSE who are leading the way in btrfs root and already include grub patches to facilitate extended use of btrfs for root (the boot to snapshot thing).
Incidentally, from my perspective / experience on the forum, the most common cause of a failed update is the user rebooting in the middle of a large system update. We have also had a few db migration failures but of late, the last couple of years, they have mostly self corrected on system reboot.
Hope that helps to update you on Rockstor and your options and do remember that ‘care to be taken’ with imaging btrfs file systems as ultimately you will have, upon image completion (or significant progress their in) 2 btrfs pools with identical uuids. But of course if you are only imaging to a file, rather than a raw device, then this is not an issue as their is rarely an auto mount systemd service for loopback mounting. You can then, once you have disconnected the source system disk, attache the ‘to be duplicate’ device and then restore the image file to that device, there by avoiding 2 raw devices attached simultaneously to the same system (very bad).
Thanks for the impressive and very detailed reply! I was planning to use something like Parted Magic’s disk image/clone tool. It should not care what the file system is but I would, of course, test the copy and not just assume it worked.
I do understand development resources for Rockstor, like UnRaid, are limited and hence must be prioritized to the most beneficial and critical tasks. If I end up using Rockstor (so far I am just testing) I will absolutely buy a 3 year subscription to support the effort.
As for a USB flash drive for the OS it is unfortunate Sandisk no longer makes the old SSD-based Extreme. Sandisk also has next to zero technical information on their USB drives. They don’t tell you what kind of flash they use, what kind of wear leveling they have, expected write endurance, etc.
I think the best option for a USB flash drive are “industrial” drives based on SLC flash that have both static and dynamic wear leveling and choose the largest size you can afford to spread out the wear. One excellent, but expensive, option are some of the Swissbit drives. But it’s likely cheaper to buy or build a small external USB 3 SSD if there are no options for an internal SSD.