Can't add/clone new share due to qgroup inconsistencies (with quotas off)

This is still on my Centos 3.9.2-57 version with a recent kernel update.

I was trying to perform some cleanup of Rockon shares.
I cloned an existing share using the WebUI. Once the cloning process was done, I removed the original share that I cloned from. So far so good. 24 hours later I tried to continue, by cloning another share into a new one, and then ran into the issue below. I also tried creating a new share directly on the same pool (not the OS drive by the way), but that failed as well.

image

  Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/share.py", line 180, in post
    pqid = qgroup_create(pool)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1127, in qgroup_create
    max_native_qgroup = qgroup_max(mnt_pt)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1082, in qgroup_max
    o, e, rc = run_command([BTRFS, 'qgroup', 'show', mnt_pt], log=True)
  File "/opt/rockstor/src/rockstor/system/osi.py", line 176, in run_command
    raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/sbin/btrfs qgroup show /mnt2/<pool name removed>. rc = 1. stdout = ['']. stderr = ['WARNING: qgroup data inconsistent, rescan recommended', 'ERROR: cannot find the qgroup 0/7301', "ERROR: can't list qgroups: No such file or directory", '']

note: I scrubbed my pool name from the error message

No quotas are enabled (and havenā€™t been for years).

I suspect that a clone/delete share operation the day before might have done something to the consistency of the quota definition (even if they are not active). I wasnā€™t entirely sure about how to perform a rescan (since the error message suggested that) from the UI, so I used the command line:

/usr/sbin/btrfs quota rescan -R /mnt2/<pool name>
to see whether any rescans were running, and when there werenā€™t any:

/usr/sbin/btrfs quota rescan /mnt2/<pool name>

After completion (a couple of minutes of checking with the above -s option) This didnā€™t seem to resolve the issue.

In the meantime, I started a scrubbing operation on the pool, as I realized that my scheduled scrub hasnā€™t been running for quite some time for some reason (separate topic, I think).

Any suggestions?

@Hooverdan Hello again.

Iā€™ve addressed your other issue re:

and it seems that from:

All bets are off Iā€™m afraid. The output from kernels and their often but not always associated btrfs-progs changes all the time and our Web-UI has to move with that. Hence my question about kernels in that other thread. Newer kernels have also chanced how they deal with quotas also and we have also made some progress on that front; but again only in the v4 variant. The btrfs parsing in our CentOS variant is now years behind what it is in our v4 ā€˜Built on openSUSEā€™ variant and in the latter we have a know kernel version to work against. As we did in the CentOS variant but we had to do that maintenance and frankly we failed at it so it is now, in v4, back in the hands of those most expert at it: upstream. We still have a moving target of sorts thought e.g.:

So what we need is report such as your; (highly detailed) but on our current efforts and using what we ā€˜dish outā€™. Changes in the output of kernel/btrfs-progs can basically break our Web-UI. That is bad and we try to fail elegantly where-ever possible such as we did in your last post re scrub using presumably the newer kernel you mention.

Ensure you get your pool into as healthy a state as possible (scrub) and make sure it will mount rw on Rockstor reboot. Then migrate to the v4 ā€œBuilt on openSUSEā€ variant where you can hopefully take advantage of improvements we have made there. And if the kernels there are still not new enough we are at least much closer to them and interested and capable of releasing updates to address up-and-coming changes, such as the above GitHub issue.

Hope that helps and I know itā€™s a pain but we had to move OS for a number of reasons. And in doing so we have dodged a few disasters along the way. One of which was our poor record on maintaining our own kernels, all be it unmodified epal kernel-ml releases.

1 Like

@Hooverdan Re:

And your penchant for such things :slight_smile: , I looked up @kageurufu recent post on hoicking up the kernels within an openSUSE Leap install, such as only slightly deviate from re their JeOS in our own v4 installers.

Wasnā€™t sure if you caught this and looks to be relevant to your own potential migration if you are un-comfy with downgrading your kernel. However I would point out that the openSUSE folks to fairly aggressively backport a load of btrfs stuff so the actual kernel version no longer really reflects the contained btrfs version. But that is another story and you may already be ahead in your CentOS instance with custom kernel version, btrfs wise, to what is found in these ports. Hence the above reference.

Hope that helps. Also might as well go with our Leap 15.3 installer profiles now as 15.2 approaches EOL.

2 Likes

Yes, like in the other thread on scrubbing, I promise I will move shortly :). Like @GeoffA I just need to get my not unsubstantial backup in order also. I thought it interesting that it would pop-up on a sub-version change of the kernel. The btrfs programs actually were not updated along with the latest kernel as there wasnā€™t a newer version, so thatā€™s why thought it was strange that I would now suddenly run into a quota issue. But then again, I havenā€™t deleted a share in a while, which might have done something ā€¦
The scrub completed, no major errors, and a couple of reboots addressed some sudden docker permission issues that I encountered. So, the pool looks healthy.
Before I play around more with share cloning, creating new ones, I will probably prioritize the OpenSUSE upgrade, and hope I wonā€™t have to go down @kageurufuā€™s rabbit hole.

Thanks.

3 Likes

Well, like I mentioned in another thread, I took the step! Running my main NAS now on OpenSUSE 4.0.9
I didnā€™t have to downport kernels or change btrfs-progs versions, fortunately (for once I did not run into murphyā€™s law everwhere I went :slight_smile: ).

In the new installation and after adding the RAID pool back in using the WebUI, I noticed this output on the terminal about orphan qgroup relation entries.

And, interestingly, the share I tried to clone under the 3.9.2-57 version and gave me the error messages, automagically was there in the WebUI after the import ā€¦
So, I guess for now, I can consider this resolved. Letā€™s see what happens.

As usual, thanks for all the support in my self-inflicted problems, because I couldnā€™t leave good enough alone :slight_smile:

3 Likes

Excellent move @Hooverdan - welcome to the World Of 4 :slight_smile:
So, backups all ok and in order I trust? :slight_smile:

Thank you.
So far, so good! Did some comparisons (fortunately not all the data on the NAS is backup worthy) and all seems to be in order. Will have to try the share cloning/creating a new one soon.

1 Like