Big issue when activating from testing to stable channel impacting services shares and Web-UI

roberto0610 · November 27, 2021, 3:31pm

Now my Web-UI is not working. My SMB share has a supper slow read/write performance to the kilobyte.

I have 20TB of new production data?

I wonder if I re-install from ISO (usb) and activate Stable-Channel right after installation. Will be able to access my data from my only one disk. It is a hardware RAID6 made from 12x sas drives…

-As I already have 20TB of production data in these Rocstor???

phillxnet · November 28, 2021, 2:06pm

@roberto0610 Hello again and thanks for bringing this from support email to the forum

With regard to your title reference:

“… when activating from testing to stable channel …”

This is a potentially miss-leading as there is no stable channel for your Leap 15.3 profiled installer, or more accurately we currently don’t have a rpm repository setup yet. That was one of the problems we covered in the support email and mentioning here for context. And we have had this reported here on the forum also, e.g. :

This is primarily down to our OS move. Otherwise the last stable rpm would be available and the repository would be ‘live’. But given we have an outstanding email notifications issue and one or two additions to pop in before what will hopefully be 4.1.0-0 as our first stable release. We can yet establish the repository. But it’s not a problem if one has subscribed as when-ever we first create the repo it will take effect. And until then it’s ignored anyway.

So what we are looking for here in your sudden slow down is something from another repository that does exist that has slowed things up massively. And that is most likely to be a kernel thing and that in itself is somewhat unlikely; but it’s the only thing that is likely to have changed in your update. Our installer and the one you used I believe incorporates 4.0.9-0 as RC10 from testing. 4.1.0-0 should appear soon in testing and if it well received the exact same rpm will be used to populate our yet to exist 15.3 stable repository.

So the only thing I can suggest here, as per the support email, is to take a look in:

less /var/log/zypp/history

For suspect packages that have upset you setup some how.

If it is kernel related for example you may even be able to reboot into that kernel and see if all is well again. Or do a boot to snapshot and enable that snapshot as the default to put you back where you were. See the upstream capabilities for this that we inherit, and enable by default within our installer; irrispective of the system drive size (hence our min drive size spec of 16GB).

3.3 System Rollback by Booting from Snapshots:
https://doc.opensuse.org/documentation/leap/archive/15.0/reference/html/book.opensuse.reference/cha.snapper.html#sec.snapper.snapshot-boot

Note that this rollback will have to be managed from the command line as we still have the outstanding following issue:

github.com/rockstor/rockstor-core

[NG] boot to snapshot robustness re read only root

opened 03:14PM - 09 May 20 UTC

phillxnet

When booting to a read only snapshot of root in our NG (Next Generation) 'Built …on openSUSE' variant it would be preferable if we were able to start our various processes enough to inform the user of a successful boot. There would be obvious limited capabilities given the read only root fs (for now at least). This would help folks establish if they have at least achieved a successful boot from a Web-UI perspective prior to enabling that snapshot for the next boot via the: ``` snapper rollback ``` command.

But once rolled back you should be good. Note that this does not roll-back Rockstor or it’s database thought. But in your case the installer had 4.0.9-0 and that is our latest version to date anyway.

alternatively your system may be in-the-throws of running a system disk balance, that can also impact performance, but not likely on the hardware I imagine you are using. At least not to the degree that would be required to cause Web-UI time-outs which is what you mention here. In the support email I understood you also had SMB service issues.

All in I think you are actually looking at a potential hardware issue here. Sometimes hdd’s can end up going really really slowly prior to failing in a more obvious way. That can end up hanging a system for disruptive amounts of time. Hence Web-UI time-outs and likewise SMB time-outs or the like.

Note that putting hardware raid underneath btrfs in not desirable and will undermine btrfs’s ability to self repair. It can also cover up the type of harddrive failure I describe above. We have recently added this advice to our:

Minimum system requirements: Quick start — Rockstor documentation

Rockstor is a complete Linux distribution “Built on openSUSE” intended for direct hardware installation. Virtual Machine installs can work but are not recommended without full drive or preferably whole drive controller pass-through. Hardware raid underneath btrfs, Rockstor’s chosen filesystem, will weaken data integrity assurances. Raid controllers, if used, should thus be configured to HBA / JBOD operation.

And a recent btrfs mailing list quote from Zygo Blaxell 4 hours ago has:

Use btrfs raid1 instead of hardware RAID1, i.e. expose each disk
separately through the RAID interface to btrfs. This will enable btrfs
to correct errors and isolate faults if one of your drives goes bad.
You can also use iostat to see if one of the drives is running much
slower than the other, which might be an early indication of failure
(and it might be the only indication of failure you get, if your drive’s
firmware doesn’t support SCTERC and hides failures).

https://lore.kernel.org/linux-btrfs/8747149.faa9ddba.17d5b575f6b@tnonline.net/T/#mfefdade26124684a24338d98dde8cd3dc425898d

I’ve seen drives slow down massively myself, with no other indication of failure or smart reporting. Quite frustrating.

There shouldn’t be any issue with re-importing. Especially if you’ve already successfully done so in the prior v4 instance. And if you haven’t used the system drive for anything then that’s all the better given it’s best to keep a strict separation between the system drive and the data drives.

I’m not sure I’ve been of much help here, but popping down these notes in the hope that it can help to narrow down what may have happened here. But we haven’t ourselves made any new releases since the 4.0.9-0 update here:

And then only to the existing v4 testing channel repositories for Leap 15.2 & 15.3 based installs.

Also for others reading this, as I covered it in our email chat (apologies for the repeat to yourself), we have recently updated/clarified our update mechanisms within the Web-UI. See the following section fot the two way one can update which differ in important ways. Especially if one is running a testing channel install:

Install updates from the Web-UI: Installation — Rockstor documentation

Hope that helps. And maybe others can chip in here with suggestions as to what may be going on here.