Trouble using newer kernel/btrfs-progs

Just a heads up, but if you are compiling your own kernel/btrfs-progs, and using them with Rockstor, it looks like some of the parsing logic around btrfs scrub is breaking. This causes the scrub pages to show a status of unknown

I am currently running btrfs-progs v5.2.1 and kernel 5.2.3.

At some point in the 5 series (maybe 5.1 or 5.2, not sure), the output of

btrfs scrub status -R /mnt2/<PoolName>

has changed. Before it looked like:

scrub status for b81bb956-547c-4bb1-948a-78329fbcab18
	scrub started at Sat Aug 31 10:00:01 2019 and finished after 00:02:53
	data_extents_scrubbed: 246073
	tree_extents_scrubbed: 14808
	data_bytes_scrubbed: 3190272000
	tree_bytes_scrubbed: 242614272
	...

The new format is:

UUID:             b81bb956-547c-4bb1-948a-78329fbcab18
Scrub started:    Sat Aug 31 10:00:01 2019
Status:           finished
Duration:         0:02:53
	data_extents_scrubbed: 246073
	tree_extents_scrubbed: 14808
	data_bytes_scrubbed: 3190272000
	tree_bytes_scrubbed: 242614272
	...

I’m not a Python expert, but I think what needs to change is in src/rockstor/fs/btrfs.py in the scrub_status method.

For the status checks, out[1] should change to out[3]
For the duration checks, out[1] should change to out[4]
For the data_bytes scrubbed, out[2:-1] should change to out[5:-1]

I’m not sure how exactly to make and test the change on my local install. Can I just update the btrfs.py file? Do I need to somehow build/compile it? Or will the changes to the python file just be picked up automatically?

1 Like

@kupan787 Hello again and nice find.

Yes, we have been expecting this for our pending openSUSE based offerings which in their LEAP15.x offerings get periodic btrfs backports and the Tumbleweed pretty much uses current kernels / btrfs-progs every few days.

To the upstream change origin I think the following btrfs-progs is related:

https://github.com/kdave/btrfs-progs/pull/177

with another very minor formatting change re “Scrub resumed: …” in the following:

https://github.com/kdave/btrfs-progs/pull/190/files

Yes I’d say that’s spot on:

Yes, but re:

No, but you will have to restart the rockstor service for your code changes to take effect.

systemctl restart rockstor

Yes, once the above rockstor service restart has been completed.

If you fancy making a GitHub issue for this that would be great, and of course if you end up sorting our the parsing then even better. Though I’d ask that you indicate it’s openSUSE relevance, to Rockstor, within the title (ie start with “[openSUSE]”) as we are not likely to be servicing any more kernels for our CentOS variants given our move to openSUSE where we, more properly, depend on our upstream for such things. Though it would be great if you, within issue text, mention it’s relevant for those using newer kernels on the CenOS Rockstor offerings.

If you do end up getting a fully working parse fix for this it would be much appreciated if you could submit a pull request with your findings as we are going to need this fix in our pending openSUSE offerings. The following doc/wiki links are relevant here:

Contributing to Rockstor - Overview and specifically it’s Developers subsection.

And with regard to developing and testing on the proposed openSUSE distros we have the following wiki within the forum:

The main difficulty here is in maintaining backward compatibility, ie if the code finds itself on an older CentOS install, such as when an ongoing CentOS install is updated with this code, we need to ensure their system doesn’t experience any regressions, ie breaking our current CentOS based offerings just to fix our openSUSE offerings is not what we want to be doing. Especially given it’s going to take a while for everybody to migrate over once we have the new rpms rolled and released and our planned new installer out. In some places I’ve used distro() from the python-distro package to inform code paths. But of course this assumes that the relevant back port in LEAP15.x is in place so can get a little tricky.

The python-distro dependency and use was added in the following pr:
https://github.com/rockstor/rockstor-core/pull/1990

See how you get on and if nothing else this thread can help to inform the changes we need to make to accommodate this change going forward. Also note that currently the master branch is ahead of latest CentOS rpm as we are shortly to release another stable channel rpm.

Hope that helps and thanks again for bring this up and sharing your findings.

1 Like

Not a problem! I’ll give it a whirl and see if I can’t spin up a VM and try and get a fix working.

1 Like

So I was close. I forgot arrays start at 0 and not 1 (I should have known that!), so my arrays were off by 1.

The change would be as follows to the file:

/opt/rockstor/src/rockstor/fs/btrfs.py

In the method, scrub_status,

if len(out) > 1:
    if re.search('interrupted', out[2]) is not None:
        stats['status'] = 'halted'
        # extract the duration from towards the end of the first line eg:
        # "... 2017, interrupted after 00:00:09, not running"
        dfields = out[3].split()[-3].strip(',').split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    elif re.search('running', out[2]) is not None:
        stats['status'] = 'running'
    elif re.search('finished', out[2]) is not None:
        stats['status'] = 'finished'
        # extract the duration from the end of the first line eg:
        # "... 2017 and finished after 00:00:16"
        dfields = out[3].split()[-1].split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    elif re.search('aborted', out[2]) is not None:
        stats['status'] = 'cancelled'
        # extract the duration from the end of the first line eg:
        # "... 2017 and was aborted after 00:04:56"
        # TODO: we have code duplication here re finished clause above.
        dfields = out[3].split()[-1].split(':')
        stats['duration'] = ((int(dfields[0]) * 60 * 60) +
                             (int(dfields[1]) * 60) + int(dfields[2]))
    else:
        return stats
else:  # we have an unknown status as out is 0 or 1 lines long.
    return stats
for l in out[4:-1]:
    fields = l.strip().split(': ')
    if fields[0] == 'data_bytes_scrubbed':
        stats['kb_scrubbed'] = int(fields[1]) / 1024
    else:
        stats[fields[0]] = int(fields[1])
return stats

I’m looking to see if I can somehow detect the version of btrfs installed, and if it is less than 5.1.2, run the old code and if newer than 5.1.2 run the new code. I’ll need to read up on the links you sent over.

2 Likes

@phillxnet, so I was just trying to setup a clean dev environment, but on the build step (bin/buildout), I ran into the following error:

Installing js-libraries.
Downloading http://rockstor.com/downloads/jslibs-dev/lib.tgz
While:
Installing js-libraries.
Error: Error downloading extends for URL http://rockstor.com/downloads/jslibs-dev/lib.tgz: HTTP Error 404: Not Found

Looks like maybe there is a missing file that used to be hosted on the Rockstor website?

@kupan787 Are yes. Sorry about that. Sorted now and confirmed via:

wget http://rockstor.com/downloads/jslibs-dev/lib.tgz
md5sum lib.tgz
78f723aafcd05684f41193778fb0e26a  lib.tgz

re:

and:

This was an omission on my part related to our server moves which has kept me otherwise engaged for quite some time now.

This is a single point of failure we need to remove ideally. Plus it’s a larger file that we have to keep downloading. Better if we could source the upstream distribution points of all these libraries and then pin them some how according to their distribution mechanisms. But then we have many point of failure for this single js libraries requirement. We will have to have a think about this as I’d like to remove it as a dependency in the build as it may become unsustainable as more folks build rockstor going forward. Suggestions welcome.

This file is a tgz of our curated https://github.com/rockstor/rockstor-jslibs repository. But pulling all those libs from that repo each time may also be unsustainable as we may end up upsetting GitHub.

This needs more thought and again any suggestions would be welcome as I’d like to improve this mechanism going forward. But pinning all of those lib from their upstream distribution points is non trivial and prone to more breakage. Especially given the age of many of them. We really need to rationalise our use of these and cut out all that we can. Anyone wishing to chip in with a better solution would be most welcome. Plus if anyone familiar with js libs could help with the rationalisation that would also be great. They are now > 600K even when zipped !!

Sorry about that and thanks for the heads up.

Sorry, one more question :slight_smile:

If I make a change to a JS file (for example scrub_details_view.js), how do I get the changes to show up on my test environment?

I made the change to both of the instances of the file I found:

static/storageadmin/js/views/scrub_details_view.js
src/rockstor/storageadmin/static/storageadmin/js/views/scrub_details_view.js 

But even after restarting the rockstor service, and clearing my browser cache, I don’t see the changes. Is there anything else that needs to be done to see the JS changes. Sorry I am new to python web development.

I’ll answer myself :slight_smile:

bin/django collectstatic --noinput -i admin -v 0

Doing this built all my static files, and I see my changes.

@kupan787 Hello again and sorry for slow response.

Yes as you’ve already worked our it is required, when changing mainly the non python stuff, that you have to do a collectstatic thing. We do have this in the previously referenced:

But that doc does cover quite a lot really. More specifically in the Change -> Test cycle subsection we have:

"
If you made any javascript, html or css changes, you need to collect static files with this command:

[root@build_vm ]# /path/to/build_dir/bin/buildout -c /path/to/build_dir/buildout.cfg install collectstatic

Then, refresh the browser to test new changes in the WebUI.
"

Which is esentially what you arrived at anyway.

Take care with developing on any live system as it is common place to wipe the database during our development scripts as then we are at a known state re the db.

Well done for persevearing and again appologies for not pickup on your question sooner.

Hope that helps and thanks for sharing your findings re the build process.