Trouble using newer kernel/btrfs-progs

kupan787 · September 2, 2019, 4:05am

Just a heads up, but if you are compiling your own kernel/btrfs-progs, and using them with Rockstor, it looks like some of the parsing logic around btrfs scrub is breaking. This causes the scrub pages to show a status of unknown

I am currently running btrfs-progs v5.2.1 and kernel 5.2.3.

At some point in the 5 series (maybe 5.1 or 5.2, not sure), the output of

btrfs scrub status -R /mnt2/<PoolName>

has changed. Before it looked like:

scrub status for b81bb956-547c-4bb1-948a-78329fbcab18
	scrub started at Sat Aug 31 10:00:01 2019 and finished after 00:02:53
	data_extents_scrubbed: 246073
	tree_extents_scrubbed: 14808
	data_bytes_scrubbed: 3190272000
	tree_bytes_scrubbed: 242614272
	...

The new format is:

UUID:             b81bb956-547c-4bb1-948a-78329fbcab18
Scrub started:    Sat Aug 31 10:00:01 2019
Status:           finished
Duration:         0:02:53
	data_extents_scrubbed: 246073
	tree_extents_scrubbed: 14808
	data_bytes_scrubbed: 3190272000
	tree_bytes_scrubbed: 242614272
	...

I’m not a Python expert, but I think what needs to change is in src/rockstor/fs/btrfs.py in the scrub_status method.

For the status checks, out[1] should change to out[3]
For the duration checks, out[1] should change to out[4]
For the data_bytes scrubbed, out[2:-1] should change to out[5:-1]

I’m not sure how exactly to make and test the change on my local install. Can I just update the btrfs.py file? Do I need to somehow build/compile it? Or will the changes to the python file just be picked up automatically?

phillxnet · September 2, 2019, 10:53am

@kupan787 Hello again and nice find.

Yes, we have been expecting this for our pending openSUSE based offerings which in their LEAP15.x offerings get periodic btrfs backports and the Tumbleweed pretty much uses current kernels / btrfs-progs every few days.

To the upstream change origin I think the following btrfs-progs is related:

github.com/kdave/btrfs-progs

btrfs-progs: scrub: show the scrubbing rate and estimated time to finish

kdave:master ← gkowal:scrub_status

opened 07:50PM - 04 Jun 19 UTC

gkowal

+32 -14

The estimation is based on the allocated bytes, so it might be overestimated. … Example output for running scrub: scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx scrub started at Fri May 31 15:56:57 2019, running for 0:04:31 total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31 no errors found Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>

with another very minor formatting change re “Scrub resumed: …” in the following:

Yes I’d say that’s spot on:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/fs/btrfs.py#L1538-L1551


      
                  # fields[3] == pqid):
                  # TODO: enhance to accommodate for multiple listings via:
                  # pqid in fields[3].split(',') or the like.
                  # [0]               [1]         [2]     [3]     [4]
                  # qgroupid         rfer         excl parent  child
                  # 0/258        16.00KiB     16.00KiB 2015/1,2015/5 ---
                  # 0/311        16.00KiB     16.00KiB 2015/1        ---
                  # 0/313        16.00KiB     16.00KiB 2015/1        ---
                  # 2015/1       48.00KiB     48.00KiB ---     0/258,0/311,0/313
                  # ie 2015/1 has 3 children of 0/258,0/311,0/313
                  (fields[3] == pqid or fields[4] == pqid)
              ):
                  return True
          return False

Yes, but re:

No, but you will have to restart the rockstor service for your code changes to take effect.

systemctl restart rockstor

Yes, once the above rockstor service restart has been completed.

If you fancy making a GitHub issue for this that would be great, and of course if you end up sorting our the parsing then even better. Though I’d ask that you indicate it’s openSUSE relevance, to Rockstor, within the title (ie start with “[openSUSE]”) as we are not likely to be servicing any more kernels for our CentOS variants given our move to openSUSE where we, more properly, depend on our upstream for such things. Though it would be great if you, within issue text, mention it’s relevant for those using newer kernels on the CenOS Rockstor offerings.

If you do end up getting a fully working parse fix for this it would be much appreciated if you could submit a pull request with your findings as we are going to need this fix in our pending openSUSE offerings. The following doc/wiki links are relevant here:

Contributing to Rockstor - Overview and specifically it’s Developers subsection.

And with regard to developing and testing on the proposed openSUSE distros we have the following wiki within the forum:

The main difficulty here is in maintaining backward compatibility, ie if the code finds itself on an older CentOS install, such as when an ongoing CentOS install is updated with this code, we need to ensure their system doesn’t experience any regressions, ie breaking our current CentOS based offerings just to fix our openSUSE offerings is not what we want to be doing. Especially given it’s going to take a while for everybody to migrate over once we have the new rpms rolled and released and our planned new installer out. In some places I’ve used distro() from the python-distro package to inform code paths. But of course this assumes that the relevant back port in LEAP15.x is in place so can get a little tricky.

The python-distro dependency and use was added in the following pr:

github.com/rockstor/rockstor-core

fix non legacy build/docker issues plus dev to rpm install mechanism. Fixes #1989

rockstor:master ← phillxnet:1989_fix_non_legacy_build/docker_issues_plus_dev_to_rpm_install_mechanism

opened 09:32PM - 19 Nov 18 UTC

phillxnet

+326 -62

Fix a number of the remaining build and docker issues concerning the proposed mo…ve to openSUSE as an upstream linux ditro base. Includes a fix to facilitate moving from a developer (source) install to an rpm based one: currently this only supports our existing legacy CentOS base, pending the instantiation of an openSUSE rpm build backend along with related distro aware repository config code changes. Docker specific notes: Many more modern dockerd invocations require a number of command line arguments. Previously we passed, from the docker.service file, only one: our --data-root target. The included modifications allows for more custom or distro specific requirements to be met via accommodation of any number of arguments (unfiltered). All prior Rockstor specific dockerd arguments are preserved and applied as before. Summary: - Add dependency on python ‘distro’ library. - Store build system distro info in django settings, the assumption here is we build on our target distro: normally the case. - Add distro UI element, uses prior 2 items. - Normalise prior UI subheader linux info formatting. - Remove prior incorrect data_collector code comment. - Selectively run postgresql-setup (legacy) or initdb (non legacy) in initrock. - Update psycopg2 from 2.6 to 2.7.4. - Normalise on direct paths for commands: avoids redundant fs redirection ie in CentOS root we have “/bin -> /usr/bin" and "/sbin - > /usr/sbin"; as these dir links are not found in our non legacy base move all hard wired command paths using them to their canonical reference. - Use Django settings for a selection of variably located (distro specific) command paths: again with the assumption that we build on our target distro. - Fix version indicator and software update page display for dev (source) installs. - Fix dev (source install) to rpm install transition mechanism - necessarily considered as a re-install so db is wiped during the transition. Note that this, in part, involved the addition of an explicit 'yum install rockstor' command during update, along with ensuring that initrock is re-run on next rockstor.service start. - Add distro aware docker.service template file selection based on distro.id(); moving fully to a live edit (during Rock-on service enable) rather than build time customization: ie to accommodate for our docker wrapper redirect and it’s consequent requirement for NofityAccess=all for Type=notify docker configs. Both included openSUSE templates are taken from their respective distro default installs of docker-ce. - Establish docker-generic.service failover config for unknown distro ids taken from default upstream docker-ce 18.09 CentOS example. - Enhance docker wrapper to pass additional arguments to dockerd. - Minor additional rock-ons-root config exception logging. - Catch and log harmless reboot/shutdown command exceptions with rc=-15. The exception log reports from these Web-UI initiated events are misleading as they suggest malfunction where there is none as both commands execute as expected with: out='', err='', and rc=-15. Fixes #1989 See issue text for context and forum reference. @suman Ready for review. Please note the additional ‘python-distro’ package dependency required of the rpm. My understanding is that the included entry in base-buildout.cfg [rpm-deps] section is sufficient, but I am not certain of this. N.B. my experience so far is that the build system will also require this package installed due to its use in settings.conf.in and it’s invocation from within initrock. Apologies for the inconvenience here. Also note the use of subprocess to run ‘which’ within settings.conf.in: this makes the assumption that the build system is as per the target system, ie CentOS (currently), with regard to binary locations of the commands in question (‘udevadm’, ‘shutdown’, and ‘chkconfig’ (see #1986 )). The package name of ‘python-distro’ works across all 3 current target distros, CentOS (currently only python2 variant) and both python2 and python3 variants in the openSUSE distros: both currently defaulting / resolving to their python2 versions. Testing: Source to rpm install transition (source install assumed not to be in /opt/rockstor). Worked on CentOS distro base (only rpms currently available), with caveat that browser (Firefox) prompted for new cert exception upon transition as this is treated as a re-install / fresh install: db is wiped due to unknown state from development / source install. N.B. re-activation of the chosen update channel is also require, again due to this transition having a re-install status (prior db wiped). However the prior yum repo config is unaffected until a selection is made. Distro adaptive docker config. Tested on an original legacy CentOS (‘rockstor’ distro.id()) install as well as on current openSUSE Leap 15.0 (opensuse-leap distro.id()) and Tumbleweed (opensuse-tumbleweed distro.id()) installs. In all cases a successful Rock-ons-root config was achieved with the consequent test of installing a Plex Rock-on and ensuring that it auto started on reboot. The Rock-ons service was also successfully enabled and disabled on all systems via the Web-UI services page. Images to follow in comments to illustrate the functioning Rock-on system and the added distro UI element.

See how you get on and if nothing else this thread can help to inform the changes we need to make to accommodate this change going forward. Also note that currently the master branch is ahead of latest CentOS rpm as we are shortly to release another stable channel rpm.

Hope that helps and thanks again for bring this up and sharing your findings.

kupan787 · September 2, 2019, 4:46pm

Not a problem! I’ll give it a whirl and see if I can’t spin up a VM and try and get a fix working.

kupan787 · September 2, 2019, 9:16pm

So I was close. I forgot arrays start at 0 and not 1 (I should have known that!), so my arrays were off by 1.

The change would be as follows to the file:

/opt/rockstor/src/rockstor/fs/btrfs.py

In the method, scrub_status,

if len(out) > 1:
    if re.search('interrupted', out[2]) is not None:
        stats['status'] = 'halted'
        # extract the duration from towards the end of the first line eg:
        # "... 2017, interrupted after 00:00:09, not running"
        dfields = out[3].split()[-3].strip(',').split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    elif re.search('running', out[2]) is not None:
        stats['status'] = 'running'
    elif re.search('finished', out[2]) is not None:
        stats['status'] = 'finished'
        # extract the duration from the end of the first line eg:
        # "... 2017 and finished after 00:00:16"
        dfields = out[3].split()[-1].split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    elif re.search('aborted', out[2]) is not None:
        stats['status'] = 'cancelled'
        # extract the duration from the end of the first line eg:
        # "... 2017 and was aborted after 00:04:56"
        # TODO: we have code duplication here re finished clause above.
        dfields = out[3].split()[-1].split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    else:
        return stats
else:  # we have an unknown status as out is 0 or 1 lines long.
    return stats
for l in out[4:-1]:
    fields = l.strip().split(': ')
    if fields[0] == 'data_bytes_scrubbed':
        stats['kb_scrubbed'] = int(fields[1]) / 1024
    else:
        stats[fields[0]] = int(fields[1])
return stats

I’m looking to see if I can somehow detect the version of btrfs installed, and if it is less than 5.1.2, run the old code and if newer than 5.1.2 run the new code. I’ll need to read up on the links you sent over.

kupan787 · September 2, 2019, 9:48pm

@phillxnet, so I was just trying to setup a clean dev environment, but on the build step (bin/buildout), I ran into the following error:

Installing js-libraries.
Downloading http://rockstor.com/downloads/jslibs-dev/lib.tgz
While:
Installing js-libraries.
Error: Error downloading extends for URL http://rockstor.com/downloads/jslibs-dev/lib.tgz: HTTP Error 404: Not Found

Looks like maybe there is a missing file that used to be hosted on the Rockstor website?

phillxnet · September 3, 2019, 9:52am

@kupan787 Are yes. Sorry about that. Sorted now and confirmed via:

wget http://rockstor.com/downloads/jslibs-dev/lib.tgz
md5sum lib.tgz
78f723aafcd05684f41193778fb0e26a  lib.tgz

re:

github.com

rockstor/rockstor-core/blob/master/base-buildout.cfg#L175-L182


[js-libraries]
recipe = hexagonit.recipe.download
url = http://rockstor.com/downloads/jslibs/lib.tgz
md5sum = 78f723aafcd05684f41193778fb0e26a
strip-top-level-dir = true
destination = ${buildout:directory}/static/js/lib
on-update = true
ignore-existing = true

and:

github.com

rockstor/rockstor-core/blob/master/buildout.cfg#L193-L195


[js-libraries]
url = http://rockstor.com/downloads/jslibs-dev/lib.tgz
md5sum = 78f723aafcd05684f41193778fb0e26a

This was an omission on my part related to our server moves which has kept me otherwise engaged for quite some time now.

This is a single point of failure we need to remove ideally. Plus it’s a larger file that we have to keep downloading. Better if we could source the upstream distribution points of all these libraries and then pin them some how according to their distribution mechanisms. But then we have many point of failure for this single js libraries requirement. We will have to have a think about this as I’d like to remove it as a dependency in the build as it may become unsustainable as more folks build rockstor going forward. Suggestions welcome.

This file is a tgz of our curated https://github.com/rockstor/rockstor-jslibs repository. But pulling all those libs from that repo each time may also be unsustainable as we may end up upsetting GitHub.

This needs more thought and again any suggestions would be welcome as I’d like to improve this mechanism going forward. But pinning all of those lib from their upstream distribution points is non trivial and prone to more breakage. Especially given the age of many of them. We really need to rationalise our use of these and cut out all that we can. Anyone wishing to chip in with a better solution would be most welcome. Plus if anyone familiar with js libs could help with the rationalisation that would also be great. They are now > 600K even when zipped !!

Sorry about that and thanks for the heads up.

kupan787 · September 16, 2019, 4:04am

Sorry, one more question

If I make a change to a JS file (for example scrub_details_view.js), how do I get the changes to show up on my test environment?

I made the change to both of the instances of the file I found:

static/storageadmin/js/views/scrub_details_view.js
src/rockstor/storageadmin/static/storageadmin/js/views/scrub_details_view.js

But even after restarting the rockstor service, and clearing my browser cache, I don’t see the changes. Is there anything else that needs to be done to see the JS changes. Sorry I am new to python web development.

kupan787 · September 17, 2019, 3:08am

I’ll answer myself

bin/django collectstatic --noinput -i admin -v 0

Doing this built all my static files, and I see my changes.

phillxnet · September 17, 2019, 10:16am

@kupan787 Hello again and sorry for slow response.

Yes as you’ve already worked our it is required, when changing mainly the non python stuff, that you have to do a collectstatic thing. We do have this in the previously referenced:

But that doc does cover quite a lot really. More specifically in the Change → Test cycle subsection we have:

"
If you made any javascript, html or css changes, you need to collect static files with this command:

[root@build_vm ]# /path/to/build_dir/bin/buildout -c /path/to/build_dir/buildout.cfg install collectstatic

Then, refresh the browser to test new changes in the WebUI.
"

Which is esentially what you arrived at anyway.

Take care with developing on any live system as it is common place to wipe the database during our development scripts as then we are at a known state re the db.

Well done for persevearing and again appologies for not pickup on your question sooner.

Hope that helps and thanks for sharing your findings re the build process.