Sadly I don’t have a reproducible error, but my Rockstor kernel is crashing after boot, sometimes between 10 and 20 minutes of uptime, sometimes longer. I can use the web UI while the system is up, and SSH in. Zypper shows no updates available. The system is running Rockstor 5.0.15, on Leap 15.6, and has two Rock-ons running, Jellyfin and Syncthing. While the system is running, these both work.
Then out of the blue, kernel panic!
Looking at the console, it’s a variety of reasons, ranging from L3 cache error, through fatal exception in interrupt, and the latest was “Hard LOCKUP”.
I’ve replaced from old parts the CPU, cooler, power supply, system disk, RAM and case, so I’m left with just the motherboard and pool disks.
The console stays in 80x24 text mode so it’s impossible to see the full panic messages; is there any logging I can turn on in Rockstor/OpenSUSE to see more of what’s happening here?
Sorry to hear of your trouble. From a generic “hardware” point of view, one could check several things.
Several older motherboards (especially AMD chip-sets) required the CMOS battery be in good working condition to operate properly. (3.0v+)
Running “Memtest86” from a USB and enabling all cores at the same time is a good test to do.
Making sure you have clean power and good ground is sometimes helpful.
Sometimes trying a single memory stick in each slot can discover problems.
On the software end, you could try a clean install of Rockstor without any RockOns installed.
Just off the top of this old mans head, that is about all I could suggest not knowing a thing about the setup. I’ve seen strange things in my time from a power line glitch caused by a refrigerator turning on
to a hard drive starting up causing shutdowns to external LAN cables being spiked.
Thanks. I should be OK on troubleshooting the hardware front. The machine’s connected to a UPS so the power should be good. Interesting thought on the battery - I’ll check that.
I’m wondering if there’s anything I can enable on OpenSUSE to capture crash logs, though?
It would need verification, but the Leap default used to be to not persist system logs upon reboot so that would need to be enabled first so that you can look at the logs from the previous boot that failed.
I’ll try to find that reference again as it used to be well documented in the Leap docs.
If the power supply itself has an issue the UPS might not be helpful. Though the issue I’ve had in the past had symptom of spontaneous reboots due to the power supply itself not being stable… had to chuck it and put a new one in.
Seems to have been progressive, as it started crashing after only a few moments of memtest86. Pool disks are now working in another system on the same power cable, so with all the replacements that machine had been a recipient of, I guess it was the motherboard, aged nine years. Thanks, folks.