Error importing Pools, shares and snapshots

deigo · January 13, 2022, 11:07pm

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

[write here]Today I installed Latest Rockstor 4.0.9-0 Leap 15.3 and when I started to import my pool the “Huston we have a problem” came up. Beside the drive is a small arrow when you click it it asks if you want to import pools shares and imports. I clicked that then this error came uo

Detailed step by step instructions to reproduce the problem

[write here]

Web-UI screenshot

[Drag and drop the image here] Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 856, in _btrfs_disk_import
import_shares(po, request)
File “/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py”, line 239, in import_shares
mount_share(nso, “{}{}”.format(settings.MNT_PT, s_in_pool))
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 667, in mount_share
return run_command(mnt_cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 201, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/bin/mount -t btrfs -o subvolid=259 /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_R6GP7YBY /mnt2/movies. rc = 32. stdout = [’’]. stderr = [‘mount: /mnt2/movies: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.’, ‘’]

Error Traceback provided on the Web-UI


[paste here]

phillxnet · January 14, 2022, 11:22am

@deigo Hello again.

I’m assuming this is a prior Rockstor v3 pool. Can you tell us anything about it? We have had some import failure but got around them all so far. Is this by chance from a btrfs raid 5 or 6 instance? Did you used to run a newer kernel in your prior v3 CentOS based instance. That can cause this issue and there is a work around.

Some folks have had issues importing some larger pools or ones that have many snapshots; or both. In these cases disabling quotas on the pool, via the command line, before re-attempting the import has worked. So more information would help here.

The missing codepage thing can be a one-off and trying to import using another member of the same pool can then sometimes work. Or pressing the rescan button. Or sometimes a reboot and re-try can also help.

What is the output of:

btrfs fi show

We’ve not had a failed import yet, but we have had a few hick-ups along the way. And in the case of those who have previously run customised v3 instances and ‘exotic’ raid setups we also have a work around but lets here more about the system you are running this Rockstor instance on the some info about the pool first. Was the pool healthy before you did the re-install using v4.

Hope that helps and we can take it from there. V4 is a superset of v3 and there has been no change in the pool structure, but the kernel and btrfs-progs are years newer, this can sometimes cause issues on the initial import. So more info and lets take it from there.

phillxnet · January 14, 2022, 11:33am

@deigo Re:

See the following recent forum thread for an example of this:

Where @alex549us3’s failed import report looks pretty much identical to yours. And in that case:

So you may be seeing the same issue with timing likely. It can take quite some time to build quota info and that can throw the btrfs import process some how where it looks like a prior share mount (I’m assuming “movies” was a share / btrfs sub-vol) mount attempt failes the entire pool import. We mount not only the pool/btrfs-volume but each individual share/btrfs-subvolume.

Hope that helps.

deigo · January 14, 2022, 9:29pm

Hello Phil, was using Linux 4.12.4-1 e17.elrepo.x86_x64, Rockstor ver 3.9.2-57.I have now installed 4.0.9-0 and there is the problem, the pools were created in the 3.9.3-57 version. I have 4x 8tb WD red in a raid 5 brtfs. I have 6 Share folders not including root, when I try the “btrfs fi show” response is command not found (not sure how to. I am using the “System Shell” in the system menu tab within the Rockstor web interface to do that. Correct? I noticed that my stable updated had lapsed so I did purchase new year. Nor really sure how to use the su cmd, learning slowly

Tom

deigo · January 14, 2022, 10:59pm

Got the btrfs fi show info

deigo · January 14, 2022, 11:07pm

I had issues with root password had to fix. I reinstalled and that problem fixed it. Fresh install went to disks clicked the down arrow got same error. As I was doing that I noticed something going on in command line so took picture of it, sending if it helps

phillxnet · January 15, 2022, 1:15pm

@deigo Hello again, and thanks for copying this to the forum from the support email.
Re:

Well done.

Yes, in our openSUSE Leap 15.3 upstream they have enforced read-only for the parity raid levels of 5 and 6. They are a lot younger than the other raid levels and have not been recommended for production yet. We have this as a warning in our docs and on the later Web-UI. That is why we see the message you have pictured above:

… btrfs: cannot remount RW, RAID56 is supported read-only, load module with allow_unsupported=1

Great. Yes for others reading the root pass is set during the install at the following stage within the installer:
https://rockstor.com/docs/installation/installer-howto.html#enter-desired-root-user-password

Yes, well done. There is often important info on the console. Also note that you can see such messages from within the Log Manager in Rockstor’s Web-UI. Also note the following command:

journalctl

and if you use a " -f" after it you can see live output. Definitely worth looking up if you are interested in such things.

I’m a little surprise it was the same error. And we may be able to get a read-only import by first disabling quotas on this pool via the command line, but we also have the following issues here:

… BTRFS info (device sde): bdev /dev/sdd errs: rw 0, rd 0, flush 0, corrupt 9, gen0
… BTRFS info (device sde): bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 1, gen0

In this report we see how btrfs uses any one of the pool (read btrfs volume) members to reference the whole pool: in this case it’s using device member sde).

But we then have 2 reports of 2 different device member having experienced corruption.
/dev/sdd corrupt 9
/dev/sdc corrupt 1
This is not a good show and unless you have two fault drives we should find the cause.

I would suggest that you ensure your systems memory is good. That can cause corruption.

Take a look at memtest86+ or one of it’s off shoots for this, as anything more we do with this pool or system is at risk if the system memory is fault which is what I suspect here.

The following doc section has a section on testing memory:
Pre-Install Best Practice (PBP) : Pre-Install Best Practice (PBP) — Rockstor documentation
and more specifically the first subsection:
Memory Test (memtest86+) : Pre-Install Best Practice (PBP) — Rockstor documentation

For the next thing we need to do to get you up and running we must first ensure you system memory is good. So you must first do one or ideally two or more complete memory tests using memtest86+ or an offshoot that you can boot from. Our old install has this build in so you can use it from there. Just be sure not to re-install of course.

Let us know how you get on with that as if 2 Pool members have proven corruption it does suggest you memory is faulty. That means anything else we do could make things worse. So check this and if you are lucky/unlucky, depending on how you look at it, you may just be able to remove half the memory and be left with a solid system.

Also be sure to re-seat all sata connections etc as that can lead to similar issues but that would also likely show up in some of the read and write stats.

For now you existing data can rest where it is, but make sure you memory tests good before proceeding as that’s important for our next and likely last step in getting you going again.

Hope that helps and let us know how you get no.

deigo · January 15, 2022, 5:03pm

Hello Phil, thanks again for helping. I have replaced the ram after I found corruption on my origonal 3.9 Rockstor so that could have caused the corruption, not 100% sure. I am running memtest (on new ram) now and will continue most of day while out.

[quote=“phillxnet, post:7, topic:8208”]
… btrfs: cannot remount RW, RAID56 is supported read-only, load module with allow_unsupported=1
[/quote]I am assuming I run this command while logged into root, just my way of a newbe understanding.

If memtest finds no errors and I run the “allow_unsupported=1” my drives will still work as they do now. I have the origonal install of 3.9 on another ssd and until I get 4.0.9-0 working and seeing my drives I just switch back and forth

phillxnet · January 15, 2022, 6:13pm

@deigo Hello again.
Re:

OK, that may explain it. A pool will retain a ‘memory’ of such events and this may be just that. You can clear it once you are sure of the new memory but we can get to that shortly.

Great. Better to have this reassurance at this stage.

The “allow_unsupported=1” is an option we can pass to the loaded btrfs module. It is not a console (read terminal) command. But we don’t want to do that as we would be second guessing those who put this in place in the first place. I.e. the openSUSE/SuSE SLE folks who directly resource and in some cases employ the btrfs developers.

And so rather than take that route, which seems unwise, I’ve prepared the following How-to:

https://rockstor.com/docs/howtos/stable_kernel_backport.html

Where we explain and step folks through installing the latest stable backported kernel maintained by openSUSE/SuSE in their OBS (Open Build Service https://openbuildservice.org/) where they build all their distos and associated repositories. This way we give folks a fighting chance with the known to be teething btrfs parity raid levels. It’s not idea but if one absolutely requires a parity raid for say the 2 disk failure capability of btrfs raid6, or such as in your case where you may have unwittingly chosen a parity raid. And in the later case folks can then either resume their use of the parity raid in read-write mode as they have done previously with older versions of Rockstor, or have the option, given it’s read-write, to modify the raid level in-place to one more generally considered production ready; assuming drive space allows for this.

Given you are going to be moving ahead a few years in btrfs development I would advise against returning to the now years old kernel & btrfs-progs in 3.9 once you have read-write mounted in the newer kernels in Leap 15.3. Especially once you take the kernel backport route. Basically once you have imported into the new backported kernel you should avoid attempting to mount that pool again in such an old kernel as is in our legacy CentOS based offering.

Hope that helps and let us know when you are ready to try the import on the new v4. And be sure to upgrade to 4.1.0 and follow that how-to to enable the backported stable kernel, then reboot, before you attempt the import.

If all else fails you may well be OK re-mounting in 3.9 but try and set aside enough time to avoid that scenario. The on-disk format of btrfs has not changed but each rw mount does modify the pool. And in your case the pool may well be poorly as there is evidence of the corruption in the pool member records. This corruption may well have damaged the pool. And the best place to repair that damage and subsequently mount the result is the newest stable kernel. Which tallies with your requirement to use such a kernel given your parity raid predicament :).

Hope that helps.

GeoffA · January 15, 2022, 6:21pm

I’m going to add something here which might need me to wear tough headwear.

As soon as you are up and running again with a viable and functioning v4 Rockstor, get away from raid5/6.

They are not properly supported by BTRFS, and to be honest I don’t see the point of them any more. These raid levels were created when drive costs were much higher than now, and were a way to get some level of data redundancy without high expense. Times have moved on, and I don’t think its relevant any more.

Move to a raid 10 setup, assuming your current data usage will allow that with your current drive size/count.

There, I’ve said it. I can relax now.

Oh yeah, make sure you have decent backups… just in case

GeoffA · January 15, 2022, 6:25pm

Forgot to type that moving to a different raid level with Rockstor is insanely easy, and doesn’t involve any downtime.

That’s another benefit of BTRFS.

Can you tell I like BTRFS and Rockstor? lol

deigo · January 15, 2022, 6:43pm

Yes will move on definately, using 3.9 as a platform until I am able to import my pools into 4.0 which will happen soon. I have never noticed corruption in 3.9 with any of my pools everything works, there were a few corrupted files which I deleted, balances and scrubed

deigo · January 15, 2022, 6:48pm

thanks for imput, so after everything is up and running I can change my raid level? Does it keep all my files shared folders intact?

GeoffA · January 15, 2022, 6:52pm

Yes, changing raid level has no impact on shared folders at all. You can run the re-raid in the Rockstor GUI, and just carry on using those same shared folders, SMB shares etc.

If you have an equal number of drives, and your current data usage will fit in half of that raw storage I’d go for a raid 10. Raid 1 is an alternative of course.

GeoffA · January 15, 2022, 6:54pm

I’d recommend having current backups of course. That’s very important, not because I doubt the ability of Rockstor, but you never know when a drive might die, the power might die etc

deigo · January 16, 2022, 12:04am

hey thanks, when I setup my new 4.0 Rockstor I fooked up on my Host name, what my OS sees, was as an example TOMSERVER is now admin, do you know how to change that?

deigo · January 16, 2022, 12:07am

Hey Phil as a FYI I did the memtest and on 12 passes now errors so went ahead and did the backport update, now everything works yeah. I did tho messup on something else, when starting install it asked hostname and I thought it ment something else so named it admin. Would like to change that back, could u let me know how. Tahnx

Hooverdan · January 16, 2022, 12:33am

You should be able to change under the “Appliances” Menu (under System):

When you click on the Host Name it allows you to change it.

deigo · January 16, 2022, 1:02am

Hey Dan thanks for replying but I just figured it out from root directory, cmd is, nano /etc/HOSTNAME and backout the old name and replace with new name, reboot and eazy peezy, thanks

deigo · January 16, 2022, 1:20am

So Goeff if I was thinking about changing raid to say 10 I would go to pools and click on re raid only, is that correct