Root SSD died, advise on replacement best practise?

Hi All,

I need to reinstall my RS box, as the Transcend SSD hosting the root OS has decided it’s time to die.

A couple of questions regarding the current state.

  • I am running the latest Rockstor from stable branch, has the issue with M.2 SSDs for the root OS been resolved, or should I stick with Sata SSDs?
  • Once installation has been completed, if I add a second SSD to the root BTRFS pool, will this break things? I don’t want to be in this position again.
  • What’s the progress with the pending move to OpenSUSE? Is it better for me to manually install the OpenSUSE version following the dev notes thread, or would I expect to see a lot of broken if I did this?

Those who have seen me here before know I don’t mind getting my hands dirty, but I’d also prefer the system to be relatively stable and usable as my mrs uses this box for streaming.

Throw your answers at me as quickly as possible, as the box is in pieces on my bench and I’m running out to get bits shortly! :o)

Cheers.

So the SSD has gone and died on you? No SMART warning or anything? That’s unfortunate to say the least… I can’t answer your questions due to lack of knowledge, but I do wish you all the best!

Scurries off to do a SMART checkup on his own Transcend SSD…

Disk showed bad sectors previously, but unfortunately some tool (not me of course, I’d never do this) decided to ignore it.
I’m trying the OpenSUSE tumbleweed route, had a fight getting the OS installed, but I’m there now, and just beginning the rest of the setup.

And now I have nothing to watch over dinner :angry:

OK, so Rockstor itself is up, OpenSUSE version not detected, and interface is extremely slow considering this is the same appliance it was running on previously.
Is there maybe some gunicorn or nginx config I can tweak to fix the speed?

Hi @Haioken,

I’ve personally never had the chance to try the Tumbleweed flavor on real hardware (only VMs), but never noticed slowness… Maybe it’s because I haven’t had the chance to use it on a “full-fledged” system with substantial pool(s) and share(s).
You did mention you had a hard time installing it, however, so maybe there’s something there. I usually simply follow the dev notes–and you did mention you would do that as well–but would you remember what the difficulties were? From the top of my head, I believe these notes are up-to-date and to get TW up and running but I may forget something…

I’m sorry I can’t provide a clear help for now :frowning:

@Haioken Hello again, and welcome to still rather small team of Rostorians on openSUSE.

As @Flox states this is very much an indication of something a miss, especially indicated by the no “OpenSUSE version” bit:
i.e. I have the following here:

Tumbleweed-version-20190806

Their is something definitely a miss if things are slower as there are quite a few speedup in both our code (in current master) and in the upstream of course. I have seen this slow interface and no version thing before but not for a while and I think at the time it was down to a library version downloaded during build that later corrected itself (upstream fix I believe). Let us know how it goes and as @Flox states, look carefully at the opensuse dev wiki entry.

OK just found a reference to where I’ve seen this “no version” before i.e. from a pull request that ended up not being necessary as upstream fixed it before we needed to pin:

https://github.com/rockstor/rockstor-core/pull/1996

From there I said: “… fresh source build no longer fails to display Web-UI header info such as time, Rockstor and kernel version, etc.”
It may be we have something similar happening again. Take a look at that closed and uncommitted pr and see if it looks like what you may have run into. It may be that we need to start pinning the version of python-enginio to the last working version if it has again broken gevent for us.

Thanks for the report by the way.

Oh and re an element of your original question: both Rockstor code and openSUSE don’t yet support multi device btrfs on root. Theirs some old cranky code I have to fix in Rockstor (moving our data btrfs in partition capability to also work with the system drive) and upstream grub btrfs interplay is still problematic re multi dev system disks. Should all be sorted in time but just not yet.

Hope that helps. And no speed tweaks have been necessary here. Hope you get to the bottom of it and do chip in here with anything you find. And do keep an eye on open issues / pull requests as your rather on the bleeding edge with this build. But progressively less though :slight_smile:.

@flox, @phillxnet
I installed via the dev notes and attached Wiki, except I made one modification - I tried to get a multi disk root disk for better redundancy (something about a transcend SSD earlier), which always failed to boot afterward.
I’ve digressed and am up and running with a single disk with no redundancy - which of course makes me very comfortable.

@phillxnet
I can see that the egg supplied on my version is 3.9.3.

Updating setup.py to pin the version to the last known working (3.3.0) resolves the UI display issue, I can now see “ROCKSTOR UNKNOWN VERSION” (which I expect), the OS and time.

I saw vast improvements in the UI speed this morning as well - before (and after) the rebuild, perhaps some cache was being built in the background - I suspect something to do with the multitude of snapshots on the system from my prior installation.
Note that the UI still isn’t quite as quick as it was previously

I’ve got all of my Rockons running again except for pi-hole. I’ve tried both the diginc and my own custom rockon of the official pi-hole docker, but both fail when starting lighttpd with:

fdevent_linux_sysepoll.c.148: assertion failed: NULL != ev->epoll_events
./run: line 4:   704 Aborted                 lighttpd -D -f /etc/lighttpd/lighttpd.conf
Stopping lighttpd
lighttpd: no process found
Starting lighttpd
fdevent_linux_sysepoll.c.148: assertion failed: NULL != ev->epoll_events
./run: line 4:   725 Aborted                 lighttpd -D -f /etc/lighttpd/lighttpd.conf
Stopping lighttpd
lighttpd: no process found

Not sure what to do about this yet. Can confirm that attempting to start lighttpd in the container yields effectively the same result. This is from my pi-hole official, however pi-hole-diginc does the same:

[ root@rockout (pass 104s) /usr/opt ]# docker exec -it pi-hole bash
root@4b1e7e3c8d6a:/# lighttpd -D -f /etc/lighttpd/lighttpd.conf
fdevent_linux_sysepoll.c.148: assertion failed: NULL != ev->epoll_events
Aborted
1 Like

@Haioken Thanks for the heads up on that egg version info. I’ll keep an eye on it. They fixed it the last time it happened quite quickly so it may be that the same will happen again. It’s a downside of the source builds of course; at lease where we don’t pin library versions anyway. And pinning is double edged in itself.

It could be that as a recent install there was a background openSUSE initiated scrub going on? Also note that with a load of snapshots it can take quite a while to establish quotas info.

Did you previously run with quotas disabled. I believe a new install will default to enabling quotas? That might explain the difference.

Thanks for all the feedback and glad your ‘mostly’ managing on the openSUSE variant. Keep the reports coming as we have a way to go just yet (but getting there).

I checked, and no scrubs were running, however as you mentioned afterward, it could have been grabbing the quota info.

Quotas were disabled on the filesystem prior to reinstallation, and you’re correct that they seem to be enabled again now.
Thanks for the heads up on quotas being enabled by default, I’ve gone and disabled them again.
The interface does already seem a lot snappier.

1 Like

Got PiHole working again.
Not sure why, but Linux’ “epoll” event queue is not available to the container, so I had to change the event handler to poll or select (I used “poll”)
Changing the poll type in pihole container’s /etc/lighttpd/lighttpd.conf worked.

docker stop pihole && \
docker start pihole && \
docker exec -it pihole bash -c "echo 'server.event-handler = \"poll\"' >> /etc/lighttpd/lighttpd.conf"

This allows pihole to function as expected.

2 Likes