Restore from snapshot and now I can't access anything

alex549us3 · January 8, 2022, 8:37am

Hi, I recently had kernel panic - not syncing: vfs: unable to mount root fs on unkown-block(0,0) happen from a suspected bad update from what I gathered from reading a bunch of other threads.

I was finally able to boot into a snapshot and restored to that snapshot by doing a snapper rollback. This allowed everything to boot into a read/write state and allowed the web-ui to run, however this presented with some new problems such as I was stuck at the setup screen until I put in new credentials, as my old username said something to the effect of “this user already exists”, so I created a new one and got in.

This is when I saw that my pools weren’t up, or at least that’s what it looks like in the web-ui and to everything else. I was able to see that it was actually mounted and I could browse the files just like normal. Though I cannot log in as the old user it tells me bad credentials even though I’m sure they are correct.

My biggest question is how do I fix this and sync up the existing data with this rollback? Is it best to just cut my losses and reinstall rockstor and import the pool?

I’m on RockStor 4.0.9-0 if that’s helpful.

phillxnet · January 8, 2022, 11:04am

@alex549us3 Welcome to the Rockstor community.

The root snapshot you rebooted into likely already had this user in existence, or the Rockstor db still did, which isn’t rolled-back like the root system is. If it’s the former then one can just delete that user from the system so that Rockstor’s setup program can go ahead and create it.

Incidentally the fact that the Rockstor first boot / setup screen came up prompting to create a new users would strongly suggest that you have returned to a very early snapshot, i.e. one take at the very beginning of the install. So you are pretty close to a new install on the Rockstor front anyway.

Basically if the file “/opt/rockstor/.initrock” doesn’t exist then our initrock script deletes the database anyway (i.e. starts a-fresh). This suggests the early snapshot, but he existing rockstor created user suggests a later snapshot. Tricky.

Given the above

It may be simpler if you’r not happy/interested in messing about. Given you are now running essentially a fresh install Rockstor config wise (the .initrock missing so db wiped scenario indicated by the setup screen) you may as well do a re-install. Assuming you didn’t create an shares (btrfs subvols) on the system disk as it discouraged anyway.

Rollbacks in no way affect anything other than the “ROOT” pool which is in the most part the system files not the additional pool data files.

Hope that helps.

We incidentally have the following issue to make this process more robust/friendly:

github.com/rockstor/rockstor-core

[NG] boot to snapshot robustness re read only root

opened 03:14PM - 09 May 20 UTC

phillxnet

When booting to a read only snapshot of root in our NG (Next Generation) 'Built …on openSUSE' variant it would be preferable if we were able to start our various processes enough to inform the user of a successful boot. There would be obvious limited capabilities given the read only root fs (for now at least). This would help folks establish if they have at least achieved a successful boot from a Web-UI perspective prior to enabling that snapshot for the next boot via the: ``` snapper rollback ``` command.

Given when you fist boot to a snapshot it’s read-only which basically breaks Rockstor’s capabilities to present the Web-UI. It would be better all around, not just in the first stage of a boot-to-snapshot scenarios if we coped better with this and presented an error in the Web-UI along the lines of:

“System/ROOT pool Read-Only: many functions will be inoperable.”

Type thing. With maybe an later feature to ‘sense’ if this is a rollback snapshot (tricky) and offer the ability to enforce it as the reboot default. All in good time, but bigger fish to fry just yet.

And take a look at that issue as it may be that you have other later snapshots you can instead boot into given you now know our Web-UI doesn’t work in a read only situation. They are all still there for you to try but you have to boot into them read-write for our bit to function as expected.

Hope that helps.

alex549us3 · January 8, 2022, 11:17am

yeah this was the most recent snapshot that was working, though it was only from a couple of weeks ago and that was at least a month or so after the first installation, so I’m not sure why the snapshot seems to be so far back.

I will probably try to reinstall though to see if that helps fix things later today when I have a bit more time. It’s good to know though at least for now that the data is safe, since that’s separate.

It may be useful to know that this is a virtualized instance through proxmox, though I’m passing trough all disks except the main boot disk.

I’ll reply and update this post once I reinstall to note how that goes.

I would love to see a better system for booting from a read-only snapshot, as I was able to boot, but unable to see if anything significant was going on. I look forward to whenever that is implemented if this happens again.

phillxnet · January 8, 2022, 11:28am

@alex549us3
Re:

Yes me too. Boot to snapshot is new to us as we didn’t have this in our prior CentOS base. We’ve inherrited it whole sale from openSUSE. We just need time to adapt to a read only root so that folks can still use our Web-UI, even give it’s inevitable restriction in such a scenario.

Cheers. And keep us updated on the re-install re-import.

I’m assuming you didn’t take a config backup:
Configuration Backup and Restore — Rockstor documentation

Also do give any feedback on how we might improve our pool import docs:
Disks — Rockstor documentation

Hope that helps.

alex549us3 · January 8, 2022, 6:30pm

Ok I did that and I’m still hitting an error:

Failed to import any pool on device db id (6). Error: (Error running a command. cmd = /usr/bin/mount -t btrfs -o subvolid=259 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00013 /mnt2/4k. rc = 32. stdout = ['']. stderr = ['mount: /mnt2/4k: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error.', '']).

            Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 856, in _btrfs_disk_import
    import_shares(po, request)
  File "/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py", line 239, in import_shares
    mount_share(nso, "{}{}".format(settings.MNT_PT, s_in_pool))
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 667, in mount_share
    return run_command(mnt_cmd)
  File "/opt/rockstor/src/rockstor/system/osi.py", line 201, in run_command
    raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/bin/mount -t btrfs -o subvolid=259 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00013 /mnt2/4k. rc = 32. stdout = ['']. stderr = ['mount: /mnt2/4k: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error.', '']

I think I see what’s going on here, I had a share called 4k but I thought I had deleted it. And after clicking import I can ls /mnt2/ and I see 4k ROOT home media media is the only pool I thought I had and that I care about. When I ls /mnt2/media I see all the files I expect to be there.

I just checked all the labels of all 6 disks that I’m trying to import using btrfs fi label /dev/disk/by-id/<a-pool-member-disk-name> as listed on the disks section. and all came back with media nothing about 4k so I’m not sure where that pool is coming from. When I run btrfs filesystem show -d I get back exactly what I’d expect:

Label: 'media'  uuid: e15291be-1c02-47aa-b626-5d7f6192d5f9
	Total devices 6 FS bytes used 6.96TiB
	devid    1 size 4.55TiB used 1.42TiB path /dev/sdb
	devid    2 size 4.55TiB used 1.42TiB path /dev/sdc
	devid    3 size 4.55TiB used 1.42TiB path /dev/sdd
	devid    4 size 4.55TiB used 1.42TiB path /dev/sde
	devid    5 size 4.55TiB used 1.42TiB path /dev/sdf
	devid    6 size 4.55TiB used 1.42TiB path /dev/sdg

So basically how do I re-import this especially since it’s already mounted. I would have thought that mounting it would make it show up in the web-ui.

phillxnet · January 9, 2022, 6:39pm

@alex549us3 Re:

As a note to help understand what’s going on. We mount both the entire pool (btrfs volume), via their labels, and all shares, via their names, at /mnt2.
So for instance
/mnt2/pool-label (Entire pool mount)
/mnt2/share-name (Each share)

Two points here. One relating to the above re us mounting both pools and shares along side each other at /mnt2/mount-point. Two, each disk is, in-turn, reporting the label of it’s owning pool. So you are asking the same question of the overall pool (what is my label) from each of it’s individual members. Btrfs is funny that way. You can address the pool from many angles, including from each of it’s individual members.

I think from what you are saying there is no pool named/labelled 4k. But as you see a mount point (directory) of “4k” within /mnt2/ you are assuming a pool. I’m hoping the above addresses this.

Given:

alex549us3:

Failed to import any pool on device db id (6). Error: (Error running a command. cmd = /usr/bin/mount -t btrfs -o subvolid=259 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00013 /mnt2/4k. rc = 32. stdout = ['']. stderr = ['mount: /mnt2/4k: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error.', '']).

This looks odd some how. device “db” is strange.

As you say the mount has worked but the import has failed. This can happen when stuff times out. One way around this can be to disable quotas on the pool first, via the command line. The all functions are faster and this can help with imports.

We basically look at what pools are found, make mount points for each and mount each pool, then, in-turn, scan for all subvols and snapshots and analyse which ones to show on the Web-UI. This can take a little while and can end up timing out part way through.

Could this be a failed attempt to mount the subvol (share) you had thought was now deleted. This btrfs subvol mount (share mount) could be what’s holding up the entire import process.
I.e. you mention:

Is this pool healthy? You could try a scrub via the command line.

It may be you have a rough subvol (share in Rockstor speak) and it need deleting via the command line, assuming it’s data is not to be recovered.

In short if the pool failed to import you can try again. But disabling quotas via the command line first may well help. But we have a mount failure report here of a subvol. As indicated above, we mount the entire pool at it’s top/bottom and then each subvol as well. All at the same directory level of /mnt2/pool-label-or-share-name-here.

I’m afraid I’m a little unclear about the current state. I’m assuming there are no pools showing in the Web-UI, is this correct? In which case I’m betting you have this rough 4k subvol (share) that you thought you deleted but it’s actually not properly deleted, and so can’t be mounted; and is in turn failing the whole import.

Let us know the current state. But it looks like you have a partial import and it’s being blocked by a duff subvol. Try removing this subvol via the command line then re-try the import.

If you have a partial import where the pool is showing but not all it’s shares then fixing the blocking share (subvol) via comand line still works and on each and every reboot the entire pools will be rescanned for any new subvols. And if you have properly addressed the supposed duff subvol, the boot process will continue to import the remaining shares and snapshots.

On a partial pool import: do not delete the pool. That would be bad. And is not required.

My apologies, I basically need to know what the Web-UI shows on the pool front. But the errors indicate a failed subvol import that tallies with your mention of a prior thought to have been deleted subvol.

Btrfs is a tad fractal, each subvol is like a filesystem in it’s own right. It isn’t as each subvol needs it’s parent, but it looks and behaves like filesystems within filesystems. That is why we mount each subvol to treat it as such. We also need an overall pool mount in order to instruct overall pool type stuff like quotas and overarching (defaulting) mount options.

Hope that helps and let us know how you get on with addressing that suspected blocker of the 4k share (btrfs subvol). And you can always do a pool wide scrub from the command line.

alex549us3 · January 9, 2022, 6:54pm

Disabling quotas via the command line is what ended up finally fixing the import!

Thanks for all the detailed explanations on everything I think I understand things a bit better now. I’m finally cracking into more of the btrfs docs and that should only help in the long run!