Unable to import BTRFS pool into Rockstor 4.0.8-0

Superfish1000 · September 17, 2021, 11:42pm

I have just finished building the installer for Rockstor on OpenSuse 15.3 and have gotten everything installed and working. The only issue I have had thus far is getting my old pool imported into the new OS.

I have not done anything to the OS apart from completing the installer and signing in through the web GUI.


#uname -a
Linux Mnemosyne 5.3.18-59.19-default #1 SMP Tue Aug 3 14:11:23 UTC 2021 (055c4fd) x86_64 x86_64 x86_64 GNU/Linux
#btrfs version

btrfs-progs v4.19.1

On the initial import attempt there is a pause while the system appears to load and then I am presented with the following error.
Traceback (most recent call last): File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 850, in _btrfs_disk_import do.save() File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 734, in save force_update=force_update, update_fields=update_fields) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 762, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 827, in _save_table forced_update) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 877, in _do_update return filtered._update(values) > 0 File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py”, line 580, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/sql/compiler.py”, line 1062, in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/sql/compiler.py”, line 840, in execute_sql cursor.execute(sql, params) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py”, line 64, in execute return self.cursor.execute(sql, params) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/utils.py”, line 98, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py”, line 64, in execute return self.cursor.execute(sql, params) OperationalError: deadlock detected DETAIL: Process 14386 waits for ShareLock on transaction 1807; blocked by process 14438. Process 14438 waits for ShareLock on transaction 1805; blocked by process 14386. HINT: See server log for query details. CONTEXT: while updating tuple (1,86) in relation “storageadmin_disk”

On any subsequent attempts I receive this error:
Traceback (most recent call last): File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 856, in _btrfs_disk_import import_shares(po, request) File “/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py”, line 239, in import_shares mount_share(nso, “{}{}”.format(settings.MNT_PT, s_in_pool)) File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 667, in mount_share return run_command(mnt_cmd) File “/opt/rockstor/src/rockstor/system/osi.py”, line 201, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = /usr/bin/mount -t btrfs -o subvolid=22240 /dev/disk/by-id/ata-WDC_WD80EFAX-68LHPN0_7SGJSA4C /mnt2/test-config. rc = 32. stdout = [’’]. stderr = [‘mount: /mnt2/test-config: wrong fs type, bad option, bad superblock on /dev/sdl, missing codepage or helper program, or other error.’, ‘’]

After the initial failure the array has been mounted and is accessible under the original mount point, but it does not appear in the web panel.

Any advice would be appreciated.

Superfish1000 · September 18, 2021, 2:10am

Update, following the discovery of lightglitch in this thread, Import Pools and Shares after install
I have removed the @transaction.atomic flag.
This seems to have bypassed the first error, but the second is still persisting. The FS is mounting, but there appears to be an issue mounting the sub folders.

After checking dmesg I found this.


[  559.444044] BTRFS info (device sdk): disk space caching is enabled
[  559.447776] btrfs: cannot remount RW, RAID56 is supported read-only, load module with allow_unsupported=1

This appears to be related to the use of RAID6 in BTRFS, which is marked as unsupported and is therefor disabled by default.

Will update after enabling.

UPDATE
Pool auto mounted after making change to allow unsupported modes. Shares appear to have imported as well, however they are all marked as unmounted.
Link to doc for enabling unsupported modes.

TLDR:
echo 1 > /sys/module/btrfs/parameters/allow_unsupported

Superfish1000 · September 18, 2021, 4:51am

Update and Apparent Resolution

Final resolution to this issue appears to involve a few parts, first, there was a damaged share titled test-config on my pool. I booted into the CentOS based Rockstor and deleted this share, it was empty anyway.

Once this was complete, the resolution appers to be the solution posted by lightglitch in the aforementioned post, followed (potentially not needed) by a reboot and the enabling of unsupported modes in the BTRFS module as mentioned in my second post. This allowed the pool and all of the shares to import as expected.

Update… Again…
This does not work after reboot. The issue with OpenSuse not loading the BTRFS module returns after reboot.
I appear to have resolved it by adding a manual mod load to the /etc/modprobe.d/ folder.
Inside of the new file I made, /etc/modprobe.d/01-btrfs.conf
I have added the line options btrfs allow_unsupported=1
This must then be followed by running mkinitrd to rebuild the list of modules loaded at startup. Or something… I’m very tired at this point.

Source:
https://forums.opensuse.org/showthread.php/535605-How-to-correctly-blacklist-a-driver-module-My-blacklisted-Broadcom-drivers-are-still-loaded

phillxnet · September 18, 2021, 10:04am

@Superfish1000 Hello again.
Re:

Well done. I’m super curious as to why you needed this “allow_unsupported” however. Have you in the past enabled something like space_cache_v2 or run a super new kernel, or used non Rockstor supported raid levels or something: e.g. metadata at c2 c3 or the like?

Yes, your modprobe addition gets the unsupported option in place and then via the mkinitrd into the boot ramdisk used by the kernel. Very curious. Do keep us posted; once you’ve had a little rest of course :).

Thanks for sharing your findings on this one. Super useful for anyone running into the same issue. I’m just a little concerned about why, in your case, a custom option was needed. Did you run a newer kernel in your CentOS than is now in your Leap 15.3 based Rockstor instance? If so it may well have silently enabled a ‘feature’ that is yet to be enabled/backported in the Leap 15.3 kernel.

But at least we now have your findings here for others. But I would advise folks don’t try this unless they have the same clear indication of this being required:

The btrfs parity raids of 5/6 have been enabled for a long time now. Hence my question as to a more advance variant like using a c2 c3 metadata or the like.

Also, shame you had that rough subvol which didn’t help.

Regarding the db deadlock, we have had quite the success overall on imports for a googly while now but on larger and/or slower (high number of snapshots) volumes we still see some issues that can require some manual mounting first and/or the disabling of quotas to speed things up. See @Klapaucius story/post here:

Likely again related to failing to wait or manage our import in such cases.

Nice report, bar the wearing nature of your experience of course.

Again, thanks for sharing your findings. I suspect we will have to do some deeper work on the import before our v5 or whatever release.

Superfish1000 · September 18, 2021, 5:42pm

BTRFS RAID5,6 is not officially supported by OpenSuse and so it completely fails to load my pool. The exact error is in my second post. It will only allow mounting as read only in that mode, so the pool import fails and no shares are loaded. To my memory, I never enabled anything special for BTRFS. I just used the RAID5/RAID6 modes through the Rockstor GUI when setting up my NAS and have continued to run RAID6.

If I’m understanding correctly, this is purely because OpenSuse does not support BTRFS RAID5,6 and disables it. The link I added was for SLES12 but this one indicates that RAID5,6 is still disabled on SLES15.

Honestly not sure on this one. I didn’t want to dig into SQL or reading the code too much and once I found the workaround mentioned I stopped digging. For what it’s worth, I have not run quotas in over a year.

Additionally and likely unrelated to this issue, I have found that some Rockons are freezing indefinitely at the install stage. I have had one going since last night and it appears to be completely locked.
I have had to reboot to resolve as restarting the docker service fails because of some uncleared temp files, or so the error seems to indicate. I’ll make a new thread for that when/if I find anything useful.

Thank you as always for the feedback and advice.

Superfish1000 · September 19, 2021, 12:19am

It seems like this is still an issue. It seems like I may be completely unable write data to the mounted shares despite having enabled write with the unsupported flag.
I have attempted to add some docker containers and they appear to install, and then freeze when attempting to access the RAID6 FS.

I attempted to copy a file into one of the other shares on the FS and ran into a similar result.

It seems like RAID6 may not function at all.

Edit:

I have preformed a sanity check using the ISO uploaded and built by pallab in this thread, and it seems like everything has started and imported without issue, however, it still does not allow new file creation.

It seems like I have broken something with the creation of the ISO itself, but something seems to be broken between the filesystem on my pool and the version of BTRFS on the new version of Rockstor.

phillxnet · September 19, 2021, 11:10am

@Superfish1000 Thanks for the additional info/feedback. We ourselves have had a warning against using the Raid5/6 for some time but not out-right disabled it. The btrfs unsupported status is not the same as the distro support status but we need to keep an eye on this for sure.

But Rockstor auto enables quotas upon import, something we really need to re-address. Especially given it can take a good few minutes for this re-enablement to be enacted and mid-process there can be some strange quota reports form the various and numerous btrfs commands we run while doing an import.

As you say if there is no write access then things are a little constrained. We do have work to do in this scenario re better timeouts. To ‘kill’ database ‘knowledge’ we have a brute force script whose mechanism we intent to add to an advanced area for Rock-ons.

Take a look at @Flox post here for this “delete-rockon” script’s use:

Briefly:

 /opt/rockstor/bin/delete-rockon <rock-on_name>

That’s progess. Please note however that there are, as of yet, not official pre-build installers. I’ll contact you directly via forum private messaging for the details of an ongoing closed beta test in that area.

Not necessarily. You may just have a poorly pool, and in turn it is throwing us off track re the unsupported and read only. Given you have not in the past used newer kernels and the newer kernels have better ‘protections’ than we had in our prior v3 CentOS based variants. You may just be experiencing the btrfs design decision to go read-only upon finding insanity of sorts. And insanity is frankly more likely in the much younger parity raid levels of 5/6 within btrfs. Hence it’s lake of support / production recommendation in our upstream and within our own doc and Web-UI. I would recommend a back-up refresh and re-assessment of raid levels. One avenue that is unfortunately as yet unavailable in Rockstor is to use a raid1 or raid1c2/3 for metadata while maintaining the 5/6 parity for data. We hope to at least surface this within the Web-UI in time but it’s not likely to be in the near future. Unless of course we get more developer input in this direction. You never know. I fancy seeing this myself but a little hands-full on my end currently.

Another note on the ISO build that may account for you differing experience. Whenever you build a new installer it pulls in the latest upstream (openSUSE) updates. And to their enormous credit the openSUSE / SuSE teams aggressively backport btrfs fixes/features. So you may just have received a fix that favoured your situation re the import/mount or what ever. Or the initial mounts you attempted helped to clean up some of the potential pool issues that had accumulated un-noticed by the prior kernel.

Keep an eye out for forum PM notices for that closed beta test of our pending pre-build installer.

Thanks again for the detailed report/feedback; much appreciated.

stray_tachyon · September 22, 2021, 2:49am

Hi. I also have a raid 5 pool built from many years before you guys disable it. Should i be concern if i upgrade to rockstor 4?

Thanks

phillxnet · September 22, 2021, 11:43am

@stray_tachyon Hello again, long time no see.
Re:

We have not disabled it. We do not recommend it’s use for production via a tooltip that may well have not been there when you initialised your first pool. We added this warning in version 3.8.15 released November 2016 (and a little earlier in the testing channel that proceeded it):

via the following pull requst:
https://github.com/rockstor/rockstor-core/pull/1375
linked to the following issue at the time:
https://github.com/rockstor/rockstor-core/issues/1372
as indicated in the issue, this addition was in response to the equivalent of the following page at the time:

https://btrfs.wiki.kernel.org/index.php/Status

which now indicates the parity raids (5/6) as “Unstable” where as previously our warning was added due to it’s “experimental, not production-ready” status.

I’m still uncertain myself of the origin of @Superfish1000’s difficulties. Remember that they also had a rough (and empty) subvolume that refused/held-up the original import. We don’t concentrate on the 5/6 support as it’s never been recommended but you should at least have read only access to the pool if nothing else, assuming it’s importable. As always it would be sound practice to refresh your backups prior to any major change such as supplanting one OS (CentOS for v3 Rockstor) for our new "Built on openSUSE’ endeavour (v4).

Hope that helps. And for context your data is likely in far better hands under our new OS base than our older one given we at least now have upstream support of the fs and a far far newer btrfs stack as a result. We used to use elrpo kernels in v3 and frankly did a bad job of keeping them updated. We now no longer have that responsibility given our new openSUSE base. Plus they aggressively back-port btrfs fixes and improvements into their kernel flavours so there’s that . OpenSUSE/SuSE are a major player in the development of btrfs so that can only be good for us.

Superfish1000 · October 30, 2021, 9:05pm

It seems like something went wrong with my creation of the installation media. After using the media created by another fellow user, I have been able to get a pool configured for RAID6.
I did, however, have to completely rebuild my array and have had to migrate all my data off and to restore from backup.
I do not know if this was in any way related to the migration process/incompatibility between BTRFS versions, or some sort of accumulated error on my pool.
I do believe that it is important to note however, as I have until this point been mostly trusting my NAS to not lose my data.
Obviously, this is not proper data security, but it is none the less how I have been operating.
To @phillxnet’s confusion, I wish I could shed more light myself. I am truly not sure what caused the issues I was having. I would be happy to provide the ISO I generated if anyone wants to dig into it to see what I may have done wrong or what may have been changed unexpectedly.
My only guess, is that it had something to do with the creation process and that SLES15 specifically says it is disabled by default.

I could be wrong, but I suspect this is getting overridden during the media creation process and something in that chain broke when I created my ISO.

I apologize for the confusion this post may have caused, I know it is not organized and is rather hard to digest.

To the point, I would like to state my experience with correctly generated install media thus far.

I have been using the Suse based version of Rockstor for a while now and have been relatively happy. After the initial hiccups, it is running as expected and has so far been stable. As much as I was irritated by the data migration I had to perform to get what was likely a broken pool worth of data loaded on to the new OS, it has given me the chance to start fresh on a newly generated BTRFS pool. I have recently run a scrub and it has come back clean across the 43.6TB worth of data I have stored on my NAS. I have been getting quite acceptable performance and otherwise have nothing to report, which I count as a success.

My advise for migration

After the difficulties I have had, I highly recommend ensuring you have a proper backup of your data before hand. This is always the correct practice, but in the case of BTRFS RAID5/6 I think it is all the more relevant as they are still considered unsupported. I am still not certain why the pool did not cleanly import on to my Suse install after the corrections I made. Because of this I would caution that it may also be necessary to regenerate the pool and migrate the data as opposed to simply importing the pool, as it was in my case. This is likely not the case, but I would keep it in mind.

Admittedly I had been having multiple issues with my previous install which were likely related to damaged caused to the pool; I deleted snapshots from the pool that then bricked my Rockon installation, making it impossible to load install/reinstall certain Rockons. I tried for weeks to repair Docker, but ultimately gave up and hoped it would be fixed by upgrading.

I hope this serves to clear up things a bit and that it gives some needed clarity to my issue.

phillxnet · October 31, 2021, 11:41am

@Superfish1000

Yes, but note that one of our main reasons for the “Built on openSUSE” endeavour was to update our users’s btrfs code, both kernel and userspace, by serveral years. We have also for several years had a warning against using the parity raids of 5/6. And have never actually recommended them. This is in keeping with upstream advise.

This refers to Suse Linux Enterprise Server. There is an ongoing ‘merge’ by way of binary compatibility between Leap and SLES at 15.3 but they do have different defaults. But again we advise against the parity raids.

Agreed, that is a silver lining as you know know your pool is free from earlier kernel creation issues. It was a concern of ours and we had hoped that the move to space_cache_v2 would have been completed by now but it is not. That will provide a further boost to reliability and performance. Once upstream has fully ratified and merged it and it’s been backported to whatever Leap version we can hopefully provide an easy migration over to this.

To be clear, are you actually running Suse Linux Enterprise Server or is your system based on an openSUSE Leap version? They are getting progressively closer but are still distinct.

Thanks for the follow-up. All good advise. When importing a 4.10 4.12 mainline kernel to Leap 15.3 's backports there are many years of fixes/changes that have taken place. And to be frank we started out early so caught the earlier days of btrfs. Still we are here now and your feedback has been much appreciated. Incidentally the new fedora install also tries to hide / guide folks away from the parity raids of 5/6 so that’s interesting. I’ve not tried it myself but heard of this in an interview with Neal Gompa I think it was.

I think what you experience was, as you seem to now suspect, a damaged pool. This was partially alleviated by enabling ‘unsupported’ options which were not 5/6 as such but likely other capabilities that partially worked around your possibly poorly pool state. But I really don’t know.

I’m not convinced of this. And each time you mount a pool it gets changed very slightly. Possibly for the better is the hope. And it may well have been that you earlier mount just helped. Or the later installer had more updates that helped you out. Each time you build the installer it pre-installs all upstream updates. This could of course work in reverse for an update that introductes regressions but it’s worth noting never-the-less.

So all in well done on getting through that rather rough transition. It’s an obvious concern but I’m pretty confident that the vast majority of pools will import just find. And if not then disabling quotas (as you already did) and also disabling Rockstor’s code that re-enables them on import may help; especially on larger pools. We didn’t get that far with your endeavour but it’s yet another option.

Cheers and keep us informed. All constructive feedback is good and nothing is perfect, especially software as it goes: too many moving parts, and progressively more a house of cards. Which is why in part I recommend bare metal btrfs: removes a hole bunch of ancient layers that have all their own bugs and card stacks. This includes having hardware raid underneath incidentally.

Superfish1000 · October 31, 2021, 5:56pm

Thank you for the correction. I have never used SUSE in any flavor, so I must plead ignorance here.

This is an incorrect use of terminology on my part. I am using openSUSE. I attempted, and apparently failed, to create the install media following the guides on the forum/git repo.

The error I received specifically stated that “RAID56 supported read-only”, but it also mentions “disk space caching.”
I must again plead ignorance if these could be interrelated.

The installer that I ultimately used was actually older. The version I used, created by pallab was using Leap 15.2 where as the one I attempted to create was using 15.3.

Absolutely. I just hope that if anyone runs into the same issues I did that they might be able to find a solution faster by getting to learn from my mistakes and misunderstandings.

Thank you again @phillxnet for providing corrections.

Superfish1000 · February 15, 2022, 3:45am

Update: On the latest ISO release the syntax appears to be options btrfs allow_unsupported=1
for the file created in /etc/modprobe.d/
In my case, I created /etc/modprobe.d/01-btrfs.conf