Built on openSUSE testing channel live (early-adopters/developers only)

@kupan787, @phillxnet,

I just tested that quickly, but noticed that the current Leap15.2 beta is still using Btrfs-progs 4.19.1 (with kernel 5.3.18) so the btrfs scrub satus output format is still OK for us. Only tumbleweed should have this problem. I know it’s only a matter of time, though, so it looks like we still need to account for the btrfs-progs version when applying this fix.
For instance, I have:

rockdev:~ # btrfs --version
btrfs-progs v4.19.1 
rockdev:~ # uname -a
Linux rockdev 5.3.18-lp152.11-default #1 SMP Wed Apr 15 12:02:19 UTC 2020 (71e1af4) x86_64 x86_64 x86_64 GNU/Linux

…which gives me:

rockdev:~ # btrfs scrub status -R /mnt2/test_pool
scrub status for 18ed12fc-fa61-4076-b522-3e37f4e6e9d7
        scrub started at Wed Apr 29 08:14:05 2020 and finished after 00:00:00
        data_extents_scrubbed: 1
        tree_extents_scrubbed: 18
        data_bytes_scrubbed: 65536
        tree_bytes_scrubbed: 294912
        read_errors: 0
        csum_errors: 0
        verify_errors: 0
        no_csum: 16
        csum_discards: 0
        super_errors: 0
        malloc_errors: 0
        uncorrectable_errors: 0
        unverified_errors: 0
        corrected_errors: 0
        last_physical: 1179648000

…that itself is picked up nicely by the webUI:
image

I don’t know whether or not Leap 15.2 will ship with newer btrgs-progs version, however.

1 Like

So I am still on CentOS, but I’ve been upgrading my kernel and btrfs progs:

[root@rocknas ~]# btrfs --version
btrfs-progs v5.6
[root@rocknas ~]# uname -a
Linux rocknas 5.6.7-1.el7.elrepo.x86_64 #1 SMP Wed Apr 22 18:05:11 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

And here is what btrfs scrub looks like:

[root@rocknas ~]# btrfs scrub status -R /mnt2/MainPool
UUID:             23cad14d-6fec-4482-997c-d62021de832a
Scrub started:    Wed Apr 15 00:05:04 2020
Status:           finished
Duration:         10:49:34
	data_extents_scrubbed: 660905136
	tree_extents_scrubbed: 2753807
	data_bytes_scrubbed: 43241772498944
	tree_bytes_scrubbed: 45118373888
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 491699192
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 5688272945152

And without the -R:

[root@rocknas ~]# btrfs scrub status /mnt2/MainPool
UUID:             23cad14d-6fec-4482-997c-d62021de832a
Scrub started:    Wed Apr 15 00:05:04 2020
Status:           finished
Duration:         10:49:34
Total to scrub:   40.03TiB
Rate:             1.03GiB/s
Error summary:    no errors found

A while back I tried to do some work to identify the btrfs progs version, and update the output. It is a little “hack-y”, so if I get some time, I’ll try and clean it up and submit a pull request. But I also made a few changes to my UI, to incorporate some of the fields that I think are new (not sure if they are in the old version of not):

image

Here I added a SCRUB rate column. I also made the End Time display the ETA while a scrub is in progress.

image

Here I added the ETA column and the Scrub Rate column.

These changes were obviously for myself per my preference, so might not fit into the overall direction of Rockstor.

However, the fixes to parse the SCRUB output will be needed once you get to a more modern btrfs-progs. I’m surprised that tumbleweed would have a newer kernel, but still be using the older btrfs-progs.

2 Likes

@kupan787 Thanks for following up on this one.
I for one like both of your Web-UI addition, but we will have to wait for the btrfs-progs update in Leap15.2beta to have those otherwise we just get blank columns. Unless your game to do a fail over in the Web-UI layer say something like, ie ‘not available’ when running on older btrfs-progs. I think @Flox was saying that the Tumbleweed does have newer btrfsprogs only the Leap15.2beta is trailing on that front. But for the Tumbleweed we have the deprecated python2 packages issue, hence the Leap15.2beta target currently. Once we are done with the feature parity and establishing the new Stable channel for the Leap15.2, once it’s out of beta, we can begin with our Python 3 move. And some progress has been made on that front already but the distro move has been a higher priority.

1 Like

Here is hoping I did this right (my first time doing a pull to another project)!

2 Likes

There are a couple of other items that the newer kernels/btrfs-progs offer. Namely:

  • zstd compression
  • RAID1C3 and RAID1C4

Adding the zstd compression I think should be pretty straight forward (I got it half implemented on my end right now). If I get some time in the next few weeks, I’ll try and put together another pull request for that (if you guys are ok with it).

As for RAID1C3 and RAID1C4, by themselves they are interesting due to how btrfs does RAID1. However, from what I have read, they are more investing when used for the Metadata on a RAID5 or RAID6. In the current setup, I don’t think Rockstor has a way to specify the RAID level of the Data vs the Metadata. So this kind of an enhancement might be a little more involved.

Not sure if you guys have a roadmap, and if adding these features are on it, but might be something to consider down the road once the cut over to openSUSE is done.

3 Likes

@kupan787 Re:

That would be great. And yes, should just be allowing it as an option and reporting it correctly there after. But theres also the kernel version issue but soon we are going to have to set a minimum kernel version we ‘support’ and it’s likely to be whatever Leap15.2 arrives with and it’s associated back-ports.

Agreed, but we could start by surfacing within the Web-UI the raid levels and take it from there. We currently fudge these a little and favour the data for reporting I believe, from memory. But getting them both in the Web-UI and through-out the models etc would be great. But this ‘level’ of stuff has to have unit test coverage and there exists already a test for the reporting bit so looking to that would be a good starting point.

Again from memory I think we have the facility to track this but just don’t, so you may well find stuff in place already. Again it’s been a while since I looked at the code from that angle. Also many fencing issues with regard to what we don’t allow will have to be revised once we introduce new levels, i.e. min number of disk etc. But yes it would be fantastic to at least start to report the metadata as well as data raid levels. And from that unit test it looks like we collect the lot already at the lowest level. It’s just fudged later to represent only data.

Exactly. Our current focus is on feature parity with our CentOS variant (very close) so we can establish the new ‘Built on openSUSE’ Stable Channel rpms. There after we can steam into this stuff in the testing channel, but we first have to do the little matter of the Python 3 changeover !! But all doable and in progress.

Bit by Bit, and I’m looking forward to where we are going on all of this.

1 Like

@kupan787 Re your pull request. I’m all ready to merge but just want to give you the chance to do a linked issue. That way we get clearer attribution in the Changelog. We have some other pending changes to these same files that I’d like to rebase on top of your changes before pr submission so if you could pop in a simple issue then I can get the links sorted and the pr merged. I can then do a Fixes issue-number in the merge commit and it makes things easier to find / track later on. And gives us a clean way to create the changelog that is then displayed within the Web-UI.

Cheers. And thanks again for stepping up to this one.

Done.

1 Like

Quick heads up for the early adopters:

3.9.2-58 testing will require a reboot after the rockstor package update.

The system pool is now mounted differently to address one of the last major feature parity issues when compared with our current Stable channel release.

  • Old ROOT pool mount point, boot-to-snap config:

2-58-ROOT-mount

  • New ROOT pool mount point 3.9.2-58 post reboot:

2-58-ROOT-mount

N.B. No ROOT/system pool shares or snapshots will be visible within the Web-UI until after the required reboot.

Enjoy, and remember to keep those early adopter reports coming in. We are definitely getting there now.

I will formally announce this release shortly, if all goes well here, in a fresh forum thread aimed at a wider audience as we edge ever close to feature parity with our current (3.9.2-57) Stable Channel release (CentOS only for now). I think we are now ready to call 3.9.2-58 our beta testing release. Again, as always, all constructive input is welcome. But this looks more like a beta to me now. Packages available shortly for Leap 15.1, Leap 15.2 beta (our main target), and Tumbleweed (if you already have installed, or can find, the required python 2 dependencies).

Hope that helps.

1 Like

After the update the shares on my root pool seem a bit messed up. The pool previously had snapshots enabled (default setting, I never did anything to the root pool). All the custom shares (so the non-home share) are now at /@/.snapshots/1/snapshot/ and seem to have lost all data.

Further, the root pool now has a red “quotas disabled” status.

EDIT: I just checked, the data is still there on disk, it’s just that the path of the shares no longer matches.

@freaktechnik Hello again.
Did you read the bit about having to reboot.

Have you now rebooted and still see issues on the root pool re default shares.

Hope that helps.

Yes, I have rebooted twice. The home share (/@/home) was migrated correctly, but any other shares on the root pool have not had their mount point adjusted.

@freaktechnik Could you start a new thread on this.

But note that user shares on the root pool were non functional for boot-to-snap config in openSUSE, hence the fix. So any you made were likely in the wrong place and if created from Rockstor’s Web-UI, and boot-to-snap is configured as you indicate with the .

bit then they would have disapeared upon page reload.

Only shares created after 3.9.2-58 via the Web-UI on ROOT in the ‘Built on openSUSE’ will be in the correct place. Prior created ones are going to have to be deleted as they existing in the wrong place and will likely be lost to the Web-UI on a boot-to-snap rollback. You will likely have to remove all prior user created shares to remove the confusion.

Lets continue in another thread as your shares are likely non standard unless they were created within 3.9.2-58. Also log counterparts to indicate exactly what’s happened and a subvol list via command line in that thread will help. But likely you just need to clean up (delte) those ‘confused’ shares as we were essentially doing it wrong before 3.9.2-58 on the boot-to-snap config and I believe we are now doing it correctly. Plus of all the default shares only “home” should be surfaced. No other server install defaults should show up at all in the Web-UI.

Hope that helps.

1 Like

will there be migration process from centos to opensuse? Id be willing to schedule a cut over if I can import my shares and config.

Running mine on a

  • ProLiant DL380 G7
  • Processor Type:Intel® Xeon® CPU X5690 @ 3.47GHz
  • Logical Processors:24
  • Ram 143.99 GB
  • Disk : 2x 1.8TB ent Micron ssd. ( pool ) 2x 1tb pioneer ssd. 3x 500gb ssd. 1x 200gb ssd.
  • Networking 10GB

@Roger_Lund Hello again,

That’s a nice machine :slight_smile: .

In as far as config back and restore post re-install, yes. And if you use a different system disk then if all falls on it’s face you can simply boot from your old system drive and it’s shouldn’t have noticed anything having happened. But probably best to try and avoid going backwards in kernel version. Just in case. Btrfs on disk format has not changed though but still, there have been now hundreds of btrfs fixes between our now legacy CentOS base and our ‘in testing’ Leap15.2 RC base.

Configuration Backup and Restore

So once the new system is installed one would import the pool as normal and then apply the previously saved and downloaded config backup, then apply it. Not perfect as doesn’t cover everything but the pool import is the same code we have been developing all along and it’s now fairly stable. Or at least we haven’t had a report of it failing for some time now.

For more up-to-date info on the ‘Built on openSUSE’ variant I’ve now transitioned to posts it’s progress on the new ‘Beta’ thread:

In your particular hardware case though the following issue may be relevant, but probably only if you have a similar disk controller. But I’m pretty sure I can fix that issue once I find a moment to concentrate on it:

Check out that issue and see if it is likely that you have the same controller as if so you will want to wait until that is fixed on our side.

Hope that helps, but as yet we haven’t promoted any of the testing rpms to the ‘built on openSUSE’ Stable channel as they are just not quite there yet.

1 Like

In this case, i’m running two SAS3008 PCI-Express Fusion-MPT SAS-3 cards.

I think i’ll have to grab another USB drive to install to next time i’m in the pop, and attach it.

1 Like

Hi All

Since it seems that Rockstor 4 is a stable release candidate, I took the plunge this weekend to build up my OpenSUSE system on a new SATA SSD (no native NVMe booting on my old motherboard). The old one is in a safe place for now (looking like I won’t need it though - only missed a single VM disk image that was on a share in the CentOS root partition, since copied over to the same place as the rest of them)

Pools were imported fine (despite starting to get curtailed API responses from the “Shares” page on my old CentOS based system, causing the page to not render and no rockons could be added due the same API web call curtailing). Also I moved my appliance ID over just fine thanks to the tools for it (it took me longer to find my activation key from 2 years ago).

Only thing that didn’t seem to work was the importing of my Samba or SFTP share list - which is no great loss, as I did not have that many and I read the config backup by eye to recreate them. Interestingly NFS worked perfectly fine. Maybe I was just impatient, if I get a chance I might build a VM at some point with a spoofed share list in a pool and see if it happens again. I’ve rebooted a couple of times since (network config changes - so might not have the original logs around import now, but will see)

Also, if anyone has a SuperMicro X10-SLH/X10-SLM, for the LEAP 15.2 install, UEFI booting is OK, but might want to disable secure boot on the summary before package installs. For me the installer would not complete (no activity shown after packages completion) with secureboot enabled - could just a red herring though, I use a pile of repurposed ex-workstation bits and pieces.

System seems nice and quick now, loving it. OpenSUSE package for KVM is great too, no more git pulls for the UEFI support and dodgy XML hacks to use it - it’s baked in.

2 Likes

I am not sure if this is the correct thread to post issues, however after the install (OpenSUSES 15.2, installer iso created with kiwi-ng on freshly installed and updated 15.2) i keep getting those messages on the console and in dmesg:

[404887.813445] audit: backlog limit exceeded
[404887.813505] audit: audit_backlog=65 > audit_backlog_limit=64
[404887.815997] audit: type=1400 audit(1599054448.661:817848324): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=22193 comm="apps.plugin" requested_mask="read" denied_mask
="read" peer="unconfined"
[404887.817281] audit: audit_lost=12218207 audit_rate_limit=0 audit_backlog_limit=64

The rate is one event every second.
Upon checking AppArmor is disabled and auditd is not present on the system.
Could you please advise?

@lexxa First off well done on a successful installer build.

As to your log entry, I’ve not yet seen this myself. Is this directly from the resulting install. Or did it appear after applying updates, or have you, since install, installed anything. And if so what.

Let us know if you end up tracking this one down.

If anyone else is seeing this, or is more up on what this is about then please chip in. I’m afraid I’m a little tied for time at the moment so can’t really look into this apparent log spam right now.

Hope that helps.

Hi @lexxa,

Tahnsk for the report… Similar to @phillxnet, I’ve not seen that one, but it may be container-specific… I presume you have a rock-on (or at least a docker container) running, isn’t it? If yes, could you turn it/them off and identify which one(s) is causing this log event?

I can’t have a look at my systems at the moment but maybe the docker documentation on their AppArmor profile may help:

AppArmor shouldn’t be running indeed, but maybe we have not completely disabled it or something else re-triggered it… you did specify auditd wasn’t running, though… curious.

We’ll hopefully get to the bottom of this and identify whether we should tweak our docker settings accordingly.

1 Like