Unresponsive system

One remark, seems that the space_cache (v1 I guess) is already added by defualt:

/dev/sdh on /mnt2/mount type btrfs (rw,relatime,degraded,space_cache,subvolid=263,subvol=/mount)

I guess because I manually mounted as degraded and it was not managed by Rockstor.
From the manual: ( nospace_cache since: 3.2, space_cache=v1 and space_cache=v2 since 4.5, default: space_cache=v1 ), so somehow till now I’ve been using some kind of space cache :slight_smile:

I have activated it now: /dev/sdh on /mnt2/mount type btrfs rw,relatime,degraded,space_cache=v2,clear_cache,subvolid=265,subvol=/mount)
I’ll monitor the latency graph to see if there is any improvement so far (considering that de delete device missing is not running now).

So far rockstor gui is working fine, no issues as of now with this new mount:
image

Seems that the latency improved for the last hour :slight_smile: so I think this should be enabled by default on Leap 15.2 as I cannot see no reason why for kernel 5.x.

image

That’s not my understanding. Once you’ve done the suggested remount or a mount with the 2 options, one to wipe the existing cache and one to specify the v2 version. There after all mounts will default to to using the new v2 version and so Rockstor will simply honour this as it has not specific preference either way. It just goes with the default. But you can’t use the Web-UI to administer this change as it doesn’t know about this option and so will reject it. Hence the need to do a one off mount / remount at the command line to move the new default over to the v2 space cache. There after all regular mounts will just use the new default. Hope that clear enough.

Haven’t seen this before, I wonder if this is a beta issue in 15.2 only. Lets see if this settles down with some updates and if not then open a GitHub issue on it and specify the exact way to reproduce and your disto variant and testing channel rpm.

It’s the default but I don’t believe we add it. But as it’s the default I’m guessing it being shown as such. Still you can use that to see if the change ‘takes’ and is in play there after.

All btrfs mounts by default use space_cache is my understanding. But v1 only. And one has to explicitly state, at least once and wipe the old v1 in this case (hence the 2 combined remount / mount options) to redefine the default, for that pool, to space_cache v2.

I believe the manual is out of date now and I also think what it is saying there is that v2 was introduced in the 4.5 kernel. Not that it was a different default.

Nice.

but not that you only want to use the space_cache=v2 in combination with clear_cache on the initial mount to change the default. On subsequent boots I’d only expect to see the space_cache=v2 and not the clear_cache. Sorry I haven’t played with this myself. So do keep an eye on how it presents itself. I’m assuming this is the same mount where you change the default and used both options. I’m assuming on subsequent boots we wont see the chear-cache options there.

Good news at last.

As for having Rockstor enable a non default mount options, I’m a little uncomfortable with this. I’d far rather stick to the file system defaults and they have not as yet made this then new default. But we could have a doc entry on how to do this. And add these ‘change pool default’ options to our Web-UI allowed mount options. That shouldn’t be too difficult. We could then instruct folks to add those 2 options, and I think we do a remount ourselves anyway. And then they could presumably just remove them again. Otherwise the clear_cache will end up wiping the cache on every boot which I think is a bad thing. But still adding and then removing via the Web-UI is far better, and a first step, than having to drop to cli and stopping all rockstor services etc. I’ll take a look soon hopefully as I’m hoping it may be an easy addition to our mount option filters. Haven’t added any of these for ages either: the last additions were:

And incidentally we have the following issue for the space_cache=v2 pool mount option:

Where I have now, in the latter, added this forum thread as a reference.

It will be very interesting to see how you get on with this. So do keep us informed. Hopefully it will speed up your disk removal when that time arrives again.

1 Like

I’ve been playing with balance as this process is easy to be stopped. I can see an increase of speed like 10-15% but I’m being subjective here as my official measuring tool is based on “feelings” :slight_smile: . I still don’t think that’s enough.

New ideas:

  1. I’ve been wondering if the cfq enablement will help or bfq is better? I don’t have experience with those :slight_smile:
    As of now I have: # cat /sys/block/sda/queue/scheduler
    [mq-deadline] kyber bfq none

  2. My FS usage is 97%, is this affecting the delete performance? Seen some recommendations for BTRFS that needs to be <90%. From what I know the way that BTFS works shouldn’t impact this and that was only the case for ZFS that needs to be less the 80% to not impact the write performance.

  3. Seen that there are many many fixes on tumbleweed and also they run btrfs-progs 5.6. I’m planning to upgrade and try next week. Any hints for the rockstor repo’s and if I need to enable/disable something else?

Thanks

@shocker Hello again.

Re:

97% is way beyond safe I would say and yes 80% plus is where you need to start addressing space. Btrfs is similarly affected by lack of ‘breathing room’ as it’s easier and quicker to make a new chunk and populate it than to find every free block within existing chunks. That’s one of the benefits of a balance, that it can fee up new space rather than having to use all the little bits left over from partly used chunks. And each chunk is normally 1 GB. But then, as I understand it, a balance will start by trying to create new free chunks and shuffling blocks into that fresh chunk. I’m afraid I’m a little weak on knowledge at this level though.

Interesting. But bear in mind that the btrfs stuff in that kernel is less ‘curated’ so you will truely be on the cutting edge. But if that is what you need. Where as the openSUSE/SUSE team cherry pick more for the backports in the Leap releases. But as long as you understand that then great.

To your question re repos. It’s actually a little better on that front as it goes given that 15.2 is in beta and so no shells repo. Where as Tumbleweed is ‘ever prescent’ so you can use the ‘native’ repos. So really it’s just change to the testing repo for the rockstor rpm and use the tumbleweed repo for form obs for the shells. But do make sure to honour the priority on that shells repo, as there is more in there than just the shellinabox that we need as a dependency.

Shells repo:


has details for both the Leap15.1 (15.2 to be added when available) and already has the Tumbleweed shells repo adding commands.

Rockstor Testing repo (early-adopters/developers only)


has for some time had the Tumbleweed rpm repo info also as well as how to import the Rockstor key.

Note that I’ve now just updated the latter to include Leap15.2 beta repo instructions (bit late you you :slight_smile: ) and our intention to use this for our next Stable release. I’ve also, hopefully, made it clearer and quicker to follow. Let me know if I’ve missed anything as you try it out.

So good luck but do remember that theTumbleweeds kernel is, as far as I’m aware, pretty much mainline with a few relatively minor openSUSE/SUSE patches to enable things like boot to snapshot so it is far less tested/curated than Leaps. But it ‘Newer Tec’ may be of use to you. Just remember that it is more common for corruption bugs to be released as at that age far fewer folks have actually run those kernels. However if you end up having to resource the btrfs-mailing list for your issues you will be in a far more advantageous position than if you are not running a cutting edge case. In that regard it’s a fantastic resource for us and enables those in need to run these very new kernels.

Thanks
I’ll use Tumbleweed just to try to recover. Afterwords I’ll give a fresh reinstall to Leap15.2.
As of now I do have a full backup, but unfortunately the recover will take a week due to the huge size of the backup.

I’ll try Tumbleweed with or without Rockstor as to be honest the only thing I’m using is a nfs server and that’s it :slight_smile:

@shocker

OK, keep us posted on any progress / performance reports.

Tumbleweed is now installed. Changed the repo for rockstor during installation but seems that is not working. Web gui is giving me Internal Server Error.
Also reinstalled rockstor just to ensure that everything is ok.

rockstor.log is blank.

supervisord.log:
2020-04-21 09:33:32,784 INFO spawned: ‘data-collector’ with pid 8036
2020-04-21 09:33:33,338 INFO success: ztask-daemon entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2020-04-21 09:33:33,496 INFO exited: data-collector (exit status 1; not expected)
2020-04-21 09:33:35,500 INFO spawned: ‘data-collector’ with pid 8045
2020-04-21 09:33:36,186 INFO success: nginx entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2020-04-21 09:33:36,186 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2020-04-21 09:33:36,213 INFO exited: data-collector (exit status 1; not expected)
2020-04-21 09:33:39,832 INFO spawned: ‘data-collector’ with pid 8054
2020-04-21 09:33:40,583 INFO exited: data-collector (exit status 1; not expected)
2020-04-21 09:33:41,585 INFO gave up: data-collector entered FATAL state, too many start retries too quickly

@shocker

This suggests you are doing an inplace distro transition. Is that the case? All rockstor rpms are build on and subsequently installed into their respective distros prior to release and for testing channel are at least tested through initial setup screen and through to dashboard and update screen. And given each rpm is build ‘natively’ there can be subtle differences. So if you did a transition the rpm re-install may not have enforced all file changes. I have as a result of the above build procedure the following:

tw-20200417

However that was updated post the rpm build/install but pretty sure it was build and freshly installed on a 20200413 TW version I think.

Try a fresh install via:

systemctl stop rockstor rockstor-pre rockstor-bootstrap
zypper remove rockstor
rm -rf /opt/rockstor/
zypper install rockstor
systemctl start rockstor

and if you want to use the same admin user again, as the above will initiate a fresh initial setup screen, you will also need to:

userdel old-admin-user

That way the setup screen won’t complain about that user already existing on the system.

Then visit the Web-UI and do a fresh setup sequence.

All of the above assumes no prior source builds which should also be wiped in they exist. Just in case.

That is obviously a full config reset but if you have been doing live distro migrations then it’s probably for the best as you then guarantee a complete set of naively build support files. The above procedure has just worked on my freshly (yesterday) zypper dup 'ed 20200417 install using the indicated latest published testing rpm for Tumbleweed.

See how that goes. Otherwise some pointers/investigation at your end is a possible next step as I don’t see the same behaviour here.

Hope that helps.

Tried, no improvement. Seems that something got wrong with the self generated SSL and that was the root-cause.
I have installed tumbleweed from scratch and it’s working now :slight_smile:

image

@shocker Thanks for the update and glad your now up and running.

Re:

So that’s a new one. We will have to keep an eye on that.

So were you doing in-place distro transitions, ie just changing repos and moving from Leap15.1 to Leap15.2beta and then to Tumbleweed?

Yes, Leap 15.1 -> Leap 15.2 Beta (no issues) -> Tumbleweed latest snapshot.
During the upgrade I have changed the Rockstor repo from /leap/15.2/ to /tumbleweed/ and during the upgrade there was 1 package from Rockstor, so it worked.

Also on nginx.conf you need to remove “SSL on” and add SSL on listen 443 default_server, something like “listen 443 default_server ssl;” ( /opt/rockstor/etc/nginx/nginx.conf )

tail -f supervisord_data-collector_stderr.log

from .exceptions import (

File “/opt/rockstor/eggs/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/builtins.py”, line 93, in import
result = _import(*args, **kwargs)
File “/usr/lib/python2.7/site-packages/urllib3/exceptions.py”, line 2, in
from .packages.six.moves.http_client import IncompleteRead as httplib_IncompleteRead
File “/opt/rockstor/eggs/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/builtins.py”, line 93, in import
result = _import(*args, **kwargs)
File “/usr/lib/python2.7/site-packages/urllib3/packages/init.py”, line 3, in
from . import ssl_match_hostname
ImportError: cannot import name ssl_match_hostname

tail -f gunicorn.log

import urllib3

File “/usr/lib/python2.7/site-packages/urllib3/init.py”, line 7, in
from .connectionpool import HTTPConnectionPool, HTTPSConnectionPool, connection_from_url
File “/usr/lib/python2.7/site-packages/urllib3/connectionpool.py”, line 11, in
from .exceptions import (
File “/usr/lib/python2.7/site-packages/urllib3/exceptions.py”, line 2, in
from .packages.six.moves.http_client import IncompleteRead as httplib_IncompleteRead
File “/usr/lib/python2.7/site-packages/urllib3/packages/init.py”, line 3, in
from . import ssl_match_hostname
ImportError: cannot import name ssl_match_hostname

Hope that helps.

P.S. Why certbot is not integrated by default to generate a cert for the SSL in case that a valid domain is used? There can be something like:
if domain resolv to ip then use certbot to generate ssl for nginx
else use rockstor default ssl

I’m playing with balance to check the performance. One interesting thing is that is using 100% CPU for only one CPU.
Does btrfs know about multi-thread? Or it’s something that I need to activate?

image

@shocker Re:

Thanks for the feedback, much appreciated.

That’s a tricky one. I’d first like to establish an opt-in utility for lets encrypt because as always there are ‘details’. Take a look at the following issue we have open for this

and that’s just the lets encrypt part:

And of course not all lets encrypt mechanisms are available to all folks, hence my gravitating towards a dedicated port 80 solution in that thread.

Pull requests are always welcome if there are any takers on this. And of course it would be great to have this, but for many folks it’s just not going to work unless they also have port forwarding enabled (in on a private network) or can only use DNS auth or whatever. So maybe this is something folks could discuss in a dedicated thread, using that issue as a base for where we are up to. And yes, once we have an established robust system we can maybe effectively intergrate it but I don’t quite see it being that easy just yet. Too many variables and the like and folks need to be able to get into their fresh install as soon and as easily as possible. If you fancy starting a thread with some ideas and link to that GitHub issue we can see if their is interest/takers on getting this sorted.

In parts very much so, in others very much not so. And remember you are also using the youngest variant via the parity raids. It’s actually quite a mix from what little I’ve heard of the architecute at this level. Again these architectural questions are better asked of the btrfs-linux mailing list, or of other btrfs specialised mailing lists / forums. The Rockstor dev team is very much a user of this software, we don’t as-yet have capability to contribute back to it. Although I’ve made a trivial doc contribution (not yet merged) and I believe a forum member has had a successful doc pr merged and Suman made some contributions to the wiki. But nothing on the programming front that I know of. It’s super highly specialised, and entirely (usually) non trivial.

Out of curiously why are you doing a balance prior to returning the pool to a healthy state, i.e. non degraded mount. Or is the ‘balance’ you are referencing here an ‘internal’ one initiated by a missing device delete command? When the pool is mounted degraded it is non representative of it’s usually state: that of not being mounted degraded. So best to get it mounting regularly before you do anything else. How many missing devices does this pool now have?

Shame about all those unused cpu cores, alas the future is as-yet not here apparently. Do keep us posted as this is turning into quite a journel. And does your use case, post restoring this pool, allow for say a raid10 migration as that, along with some of the usual tweaks, ie quotas off, noatime, & space_cache=v2. Looks to be the best all round performer. But only has a single drive redundancy unfortunately. Otherwise it’s using something like emerging mix of parity raid for data and c2/c3 for data. Not within Rockstor’s capability yet and still also too young for my liking but still. Another note on the performance front is some more recent tweaks where you can have metadata favour non rotational drives. But again not yet released but looking promising. Can’t find the linux-btrfs mailing list entry for it currently. However as we are now moving to an upstream supported kernel these goodies should trickly down to us as and when.

Also have you take a look at the default performance comparisons of the btrfs raid levels done earlier this year as part of an article by Phronix: https://www.phoronix.com/scan.php?page=article&item=linux55-ssd-raid&num=1 Might be interesting given your trials. All apples and oranges with the other file systems listed as they are using mdraid but interesting to see the difference across the btrfs raid levels for each type of load. Plus they are all ssd drives so not representative of spinning rust. so theirs that.

“Btrfs was tested with its native RAID capabilities while the other file-systems were using MD RAID. Each file-system was run with its default mount options.”

Popping in here given this thread has quite a few performance related elements now.

Hope that helps.

1 Like

I think for start this can be a checkbox that can only be activated if your hostname is getting resolved to your IP. I know that it will not be for everyone, but at least for those with public IP’s.
Then It can evolve to have a mDNS mechanism behind with a port forwarding etc. :slight_smile:
It’s just an idea to make it more fancy :slight_smile:

I have subscribed to btrfs-linux and asked the question there :slight_smile:

Remember that I read something similar a few years ago, but seems that this one is updated. I’ll give it a go, thanks :slight_smile:

Currently, I’m trying to understand if going to ceph will be a solution going forward for my raid6 system. As of now I only have two storage devices, not sure if that makes sense but I’m seeing that it’s very flexible (that’s why I have started with BTRFS over ZFS) you can add/remove any HDD, add another one with another size, etc… as ZFS it’s quite hard to manage. If you get stuck for example with 60 old HDD’s in one vdev you cannot upgrade them to newer HDD’s with a bigger capacity etc. BTRFS was exactly what I was looking for for my needs and performance is great except when you have an HDD failure and troubleshooting becomes a hobby :slight_smile:

@shocker Re:

Yes the cluster file systems are very interesting, and particularly nice of large file systems requiring varying/flexible levels of redundancy. I have ambitions to integrate an easy Web-UI initiated Gluster/GlusterFS system (https://www.gluster.org/) within Rockstor as I like the idea of each ‘node’ still be independently a file system where you can still read files ‘regular like’. This I had planned could be run either intra-rockstor across multiple pools within a single Rockstor instance, or for the more discerning redundancy lovers across Rockstor instances, i.e. inter-rockstor. Our existing replication system already has some inter instance communication so we may be able to build on that to aid auto-config of such a thing. And Samba apparently has the capability to work with Gluster to pick from available noted to effectively do fail over capabilities or allow for ‘downing’ an entire Rockstor Gluster node for maintenance/upgrade etc. But alas I have my hands quite full with our distro migration and then our technical debt. But it’s definitely something I’ve been looking into and fancy seeing to fruition during my maintainership of Rockstor. And I really like what I’ve seen of Gluster so far. Plus it seems a natural extension to the redundant storage problem.

Just a thought.

Forgot about Gluster, I remember I played with this one in 2010 :slight_smile: but this is just a brick over existing FS if I’m remembering you cannot have RAW r/w on the HDD with it to create a cluster raid as ceph? Or they evolved? I’ll check it to see what’s new. Last time it was something extremely easy to deploy, 2-3 command lines :slight_smile:

New update.
Waited to have more updates on tumbleweed and I have tried again the device delete missing yesterday. I have the same situation, is taking ages and r/w performance is ~95% degraded.

Going forward I think have 3 options (all of them implies migrating the data over and destroy the pool and I already have new 7 x 16TB shiny new HDD’s to start with):

  1. Create small BTRFS Raid5 pools (4-5 HDD’s per pool) or Create Raid5: 7 x 8TB, Raid5: 7 x 8TB, Raid5: 5x14TB, Raid5: 7 x 16TB (just to have small pools to recover faster and to not have my entire storage degraded. Then make a union pool with BeeGFS (Gluster is way to slow with parallel tasks; MergerFS - never tested). I don’t know if I’ll take more risks with BTRFS, to be honest :slight_smile:
  2. Same scenario as above but with ZFS. ZFS already has a union pool feature or I’ll add on top the BeeGFS just to be able to expand in the future with another enclosure.
  3. LizardFS v3.12 with EC8:2 (MooseFS Pro will be a better choice but it’s way too expensive for my needs) or Ceph v15.2.1 with EC8:2.

Pros:

  1. BTRFS: flexibility to increase the pool with higher disks. I don’t need to replace all of them in a pool. In theory, faster pool recovery after the device is replaced (in theory :slight_smile:). Last but not least: the awesome Rockstor team members :slight_smile:
  2. ZFS: Stability. Everything works as intended.
  3. LizardFS (free, high performance, easy to maintain); MooseFS (active dev, better performance, supported over time); Ceph (active development, lots of documentation and use cases, good performance).

Cons:

  1. BTRFS: the current issue that I have :slight_smile:
  2. ZFS: in order to migrate to bigger drivers I need to upgrade all of them in one of my created pool before growing the disk space. This will take a higher investment and longer time to rebuild the pool 5-7 times. Also the risk of rebuilding over and over with a raid5 :slight_smile:. If I’ll go with a larger raid6 then the HDD upgrade will take longer and it will be more and more expensive.
  3. LizardFS dead development; MooseFS crazy expensive; Ceph high requirements to have top performance (RAM for storage; desktop CPU to keep the CephFS performance up as is running on a single-thread).

I know I’m going a little off topic but I’d appreciate if you guys can add some wisdom based on your sysadmin/storage experience that you have :slight_smile:

Cheers

I’ll answer to my question maybe others will find this info interesting :slight_smile:

The way forward:

  • OS: OpenSUSE Leap 15.1 or 15.2 with Rockstor on top
  • OS Raid1 2 x SSD’s with BTRFS.
  • Erasure coding 8:2 with LizardFS.
  • Metadata on BTRFS Raid1 (2 x NVMe’s)
  • Every HDD will be created as individual FS with BTRFS and on top LizardFS will handle the chunks.

TL;DR you are not getting rid of me yet, I’ll stick around for a while :slight_smile:

After performing some tests on VM’s seems that LizardFS offer the same read speed as BTRFS Raid6. Tested also BeeGFS but it’s slower than BTRFS Raid6/LizardFS (~20-25%). My first concern of using fuse as mount-point but seems that they adopted fuse3 and nfs-ganesha so it’s good to go :slight_smile:

1 Like