Btrfs Balance error

Brief description of the problem

3 disk Raid5, 840GB used out of 6.46GB. Ran balance which went on for several hours, At completion, error message:
"Error running a command. cmd = btrfs balance start --full-balance /mnt2/CEHPool. rc = 1. stdout = [’’]. stderr = [“ERROR: error during balancing ‘/mnt2/CEHPool’: No space left on device”, ‘There may be more info in syslog - try dmesg | tail’, ‘’]
The dmesg log is empty.

Detailed step by step instructions to reproduce the problem

I haven’t run balance again. Want to investigate issue first. I can’t understand the part of the message “No space left on device” given 12% space usage,

Web-UI screenshot

[Drag and drop the image here]

Error Traceback provided on the Web-UI

[paste here]

@ceh-u Yes, the parity raid levels can be a little flaky, especially on the maintenance / repair side. But as you say, with loads of room you shouldn’t get that error. But there have been some bugs / omission in free space calculations in the parity raid levels for some time. I believe one was addressed more recently though which is good as that one caused issues for Rockstor on space availability reporting. This can be seen by “not supported” type messages from the output of for example:

btrfs fi usage /mnt2/CEHPool

Where the “fi” is shorthand for “filesystem”

Another though I had was that you may actually be running a Leap 15.1 based install if you copied and pasted my ‘first draft’ example command to build the installer, given you were one of our earliest testers. @Flox pointed this out in the following issue:

and I fixed the example command in the following pull request around 10 days ago now:

If so; sorry about that. I’d got the title and explaining text correct but cut and paste from a Leap 15.1 run by mistake for the example command.

Definitely worth checking and if your kernel version (top right of Rockstor Web-UI) is around 4.12 then you are on a Leap 15.1 based build. If it’s around 5.3 then you are on a Leap 15.2 based build. The mouse over popup on that text will also given the Leap version your install is based on:

leap-version-mouse-over-tool-tip

In both cases openSUSE aggressively back-ports btrfs fixes / improvements but given the newer based kernel more back-ports are likely with the newer Leap 15.2 kernel.

It’s also a know issue with some maintenance / repair tasks on the parity raid levels of 5 & 6 that it can take a few attempts for them to complete. So given you are giving it another go you may just find that earlier attempts have cleared the way and it completes as expected. This is one of the reasons we dis-wade folks from the parity raid levels currently. But as your own research has found, once you know it’s foibles and performance issues (usually during repair) it can be a workable option within these limitations.

Post the output of the usage command above and give the balance another go I say. It should pick-up from where it left off last time as it makes changes as it goes along so many chunks may already have been balanced prior to the out of space message and a fresh run may complete just fine. And do check you wern’t affected by my accidental use of a Leap 15.1 kiwi profile in that first draft of the rockstor-installer instructions example command.

Hope that helps.

1 Like

OK, my install is 15.1-based. btrfs version shows 4.19.1. I can’t get at console output on my mac, but here’s a screenshot.

How would I get to the 15.2 base? Can I do that without redoing my current setup?

BTW, I am registered in Appman but I lost my access - how do I get that?

OK, that’s consistent with a cut and paste of the now corrected example kiwi-ng installer build command:

kiwi-ng --profile=Leap15.1.x86_64 --type oem system build --description ./ --target-dir /home/kiwi-images/

which should have read 15.2 of course. As per the prior referenced pull request change:

(Click on “Files Changed” tab)

And from you picture of the ‘btrfs fi usage’ command you can see the multiple:

WARNING: RAID56 detected, not implemented

referenced earlier indicating the incomplete support for some usage functions. This is one of the indications of the parity raids being less well developed, they are basically just younger. But note that this short-coming has now been fixed in upstream btrfs. The same command on your “/” pool would not have those warnings.

You should be able to bring up the terminal program and do a:

ssh root@ip-of-rockstor-machine-here

and enter the root password you setup during the install. And “exit” once your are done with that terminal access.

Two ways:
1: Rockstor Config backup and download to a client machine, re-install (after building a 15.2 profile installer), import data pool, re-apply config backup: see: Configuration Backup and Restore
You will need to set each added users password after this as they will be random upon config restore. The Web-UI Admin user, setup on the first Web-UI screen, can do this within the Web-UI. Or use the root of course via the command line.

2: In place “Distribution Update” which should be pretty much along the normal lines of an in-place update for a regular Leap15.1 to Leap 15.2.

This latter method hasn’t yet been approached in our docs, given Leap 15.2 was our intended first ‘general consumption’ target, so I’ll make an first draft attempt below. You 15.1 was the result of us having used 15.0/15.1 and finnaly 15.2 during our 2 year transition phase from CentOS to finnaly Leap 15.2. My typo and your cut paste has landed you with a Leap15.1. Perfectly viable given it’s far more tested as a distro than 15.2 but we are aiming at Leap 15.2 for re-launch to get the newer kernel and btrfs stack. But given some folks will inevitably have difficulties via kernel regressions the Leap 15.1 option is a fall back position for us as it’s the same base kernel version as our CentOS ran (4.12) so very likely to work equally as well on their existing hardware. But the Leap 15.2 profile and it’s 5.3 kernel base is obviously preferred. We were just trying to cater for our existing clients on the kernel side while still offering newer btrfs backports.

Distribution update (draft)

There are numerous openSUSE docs on how this might be done. One example can be found here:
SDB:System upgrade
Essentially you:

1: Make sure all updates are in place and you have rebooted there after. This ensures all updates are in-play.
2: Backup your existing repos just in case via:

cp -Rv /etc/zypp/repos.d /etc/zypp/repos.d.Old

and to restore these you would, if need be, execute:

cp -f /etc/zypp/repos.d.Old/* /etc/zypp/repos.d/

3: Check visually your existing repos via:

zypper lr -u

4: Change them from 15.1 to 15.2 if hardcoded (the default ones are) via:

sed -i 's/15\.1/15\.2/g;s/15_1/15_2/g' /etc/zypp/repos.d/*.repo

Which searches for “15.1/15_1” and replaces them with “15.2/15_2” respectively for all repos in all repo files.

5: Ensure visually all repos are changed over via:

zypper lr -u

6: Import all new keys and refresh the new repos themselves via:

zypper --gpg-auto-import-keys ref

7: Do a Distribution Update downloading all packages in advance via:

zypper dup --no-recommends --download-in-advance

Note: be sure to read and answer carefully all multi-choice questions on the way.
The likely safest answer in most cases will be to accept a vendor change i.e.:

Problem: problem with installed package dracut-kiwi-lib-9.21.3-lp151.1.1.x86_64
 Solution 1: install dracut-kiwi-lib-9.20.5-lp152.5.4.1.x86_64 (with vendor change)
  obs://build.opensuse.org/Virtualization:Appliances  -->  openSUSE
 Solution 2: keep obsolete dracut-kiwi-lib-9.21.3-lp151.1.1.x86_64

Choose from above solutions by number or skip, retry or cancel [1/2/s/r/c/d/?] (c): 1

The above vendor change is simply moving from a prior OBS repo source (used in our installer build) to the ‘regular’ “openSUSE” variant/version native to the new ‘distro’ version.

Note also: if you see in the overview that “systemd-presets-branding-rockstor” package is going to be removed then cancel (c) and add our following OBS repository via:

zypper --non-interactive addrepo --refresh -p97 http://download.opensuse.org/repositories/home:/rockstor:/branches:/Base:/System/openSUSE_Leap_15.2/ home_rockstor_branches_Base_System
zypper --gpg-auto-import-keys ref

This relates to an in-progress fix regarding the current-as-of-writing omission of this repo in our installer. Post ‘fix’ installers should not be affected by this omission.

Once all multi choice options have been answered a large summary of changes with be shown along with indications of actions required once these changes have been enacted. Most likely:

    Note: System reboot required.

Continue via “y” and you should be on your way.

Once all packages have been downloaded and installer/updated/etc your should see the following:

Core libraries or services have been updated.
Reboot is required to ensure that your system benefits from these updates.

where upon you can:

sync
shutdown -r now

Please note that the first boot screen, the Grub options screen, will still show the original Rockstor installer version. This is expected and is purely cosmetic.

All the above steps will require the root user.

(Tested on a Rockstor 4 build on openSUSE Leap 15.1 variant installed from our installer recepie)

########## end of proposed Distro doc draft entry ###############

Yes, via the Distribution update (draft) method proposed above. But as Leap 15.2 is our intended first public release, and we are still in release candidate phase on our 15.2 variant, this method has had no real community testing. And given your testing activities/reports to date you may be just the candidate :).

If by this you mean your password no longer works (or has been misplaced), then try clicking on the “Forgot Password” link in the Sign In screen. That should send you instructions on how to proceed. I’ve checked your Appman account and you look to have registered and had your email verified successfully.

Hope that helps.

1 Like

Thank you very much for such comprehensive advice. I will certainly work on the Distribution update later today and report.

In my post, it might have been clearer if I had used the proper words “activation code”, required for activating update access.

OK, your Distribution update worked perfectly. My Rockstor is on Leap 15.2 and hasn’t missed a beat!
Many thanks.

1 Like

@ceh-u] Glad the distro update worked out OK and thanks for testing these draft instructions. I’ll get to fixing that missing repo in the rockstor-installer repo soon. We now have an issue open for it though:

And re:

Yes, appman doesn’t yet have an Activation code reset facility. But in time we should be able to add this. Unfortunately this is still done via email request to support. But most folks can grab it from when they were emailed it originally.

Apologies if I failed to answer this same question via your support email by the way.

Hope that helps.

The btrfs version in Leap 15.2 (Linux kernel 5.3.18) is still 4.19.1, which dates to October 2018. I’m running balance again to see if there’s any difference.
I’ll email support for my Update Channel activation code.

No worries on that, this is all openSUSE managed. We no longer have any part in kernel versions or btrfs-progs versions. And they make the decisions on the exact version numbers. Plus if you look up the major kernel version it also is relatively old, but that doesn’t take into account the back-ports. What you could do it look to the Changelog for each package and see what they (openSUSE) have back-ported to these major version numbers. They cherry pick stuff you see as the bleeding edge is not always the best to use. And given openSUSE/SUSE employ a good number of btrfs devs they are in a good position to identify what is less risky to backport. And is is often the case that there are teething problems with major version updates so they hold off on them. For an example of the discrepency between major package version number release date and back-port dates see @Flox’s post here:

Where you see that in a Leap 15.1 4.12 kernel there was a one month old btrfs back-port. Yet 4.12 (the major version) was released in 2017. So distribution packages often stick to older major versions and cherry pick patches, often hundreds of them. Hence our discioion to go this openSUSE who are, as mentioned, well placed to make these decisions.

Yes we look to have an email issue on that one and I have started a fresh email thread to resolve this. I’m also going to PM you in the forum to help resolve what’s going on there as I have you original activation email as having been delivered and have also copied it into a thread we have ongoing in support so all a bit strange but we will get it sorted one way or another. Likely just a quoted email that was hidden as required a click to expand or the like. I’ll copy into the PM what I’ve also just sent you re the activation code which itself was a copy of what was sent on 8th August. But at least we also have the forum.

Hope that helps.

1 Like

Very interesting about the back-porting. I hadn’t realised they did that. Thanks for the explanation.
BTW, I ran balance again today just to see if was OK and any difference. It finished muc more quickly; I didn’t time it but it was probably about 1/4 of the time. And there were no error messages.

I emailed you separately about the activation code, now inhand.

@ceh-u Thanks for the report.

And yes the back-port thing can be quite confusing. We have had similar reports of the version of ssh that was in our CentOS variant. Again the reporter was considering only upstream versioning and not the distro patches / back-ports. All not very clear really but the distros have their reasons to stick to ‘release’ versions as their users, the likes of us for example. Build upon those versions. There are also concerns, more for kernel versions, of certification on some hardware platrorms.

Glad to hear your balance worked OK this time. Not easy to know the speed up reason or if the earlier balance had done most of the work already but the Leap 15.2 base does look to be a nice one for our re-launch and given some kernel back-ports can depend on the base kernel version we are likely to receive newer kernel elements of btrfs in the Leap 15.2 kernel than the Leap 15.1 variant. But I’m no expert on this. The kernel package changelog and @Flox prior referenced post are the key to finding out on that front.

But more on the speed theme, there have been a tone of quota fixes more of late and quite a few have been performance related, so that and the parity raid level improvements that are coming along all the time are all most welcome. Plus now we are not dependant on our own currently modest means we should receive all kernel and btrfs updates as and when the openSUSE/SuSE folks deem them appropriate which is most welcome. We just have to keep an eye on breaking changes, so keep up the reports as we depend on such reports to spot such breaking changes. We have plans to build a comprehensive openQA test suite but everything takes time and as always contribution are welcome. But in time we should get to a position where we can guarantee functionality across all our key features; but along with all our other efforts it’s mostly down to available resources.

Thank again for all your testing and reporting.

2 Likes