Today, when I got home, my rockstor box had crashed. This newer happens, so I was quite surprised.
I rebooted it but it ended up with this:
The first errors are normal, I see them at every boot, but the rest is not normal.
Is this recoverable, or will I have to reinstall?
I really hope not having to reinstall, and even more hope not to have lost any data.
Boots of a 16Gb mSATA SSD.
The pool resides on 7 disks running RAID6.
Can you try with kernel-ml-4.2.5-1.el7.elrepo.x86_64? Your error looks like mine did: “appeared twice”. See https://github.com/rockstor/rockstor-core/issues/1069
Trying the kernel-ml-4.2.5-1.el7.elrepo.x86_64, doesn’t work either.
The system seems to stop at the same place during bootup, but without any errors. Even with debugging turned on.
A little frustrating. But I trust that a reinstall will fix it, if all else fails
device not ready looks bad. Are the cables okay? Is there a card the sata cables are connected to that is not on the board - if yes, check it’s in properly.
You could also try a newer kernel http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/kernel-ml-4.5.0-0.rc2.el7.elrepo.x86_64.rpm
The device not ready is not important, its a known bug with the onboard sata controller, and there is a workaround in place.
The system has worked with this error since the first day, and booted every time. Thats not the problem.
I have tried unhooking all my hdd drives, so that only the boot drive is attached, and the system still freezes during bootup, but without any errors.
It seems I will have to reinstall
I think you want to reinstall
I really dont.
I will have too go through all the hassle of setting up shares / NFS / Samba / rockons and so on.
But will get the latest ISO and start the reinstall.
But probably first tomorrow.
Okay, then if it was my rockstor, I would reboot with debugging enabled (again), then wait 10 minutes to see if that error changes.
What would happen after 10 minuttes?
You haven’t had a crash, it looks like a hang. If you wait you may get a timeout and a better error message.
I have restarted the system with debugging enabled. Will wait and see if anything happens.
16 minutes later no change.
I will let it sit through the night, and see if anything has shown up tomorrow.
Nothing happened overnight, so now the system is reinstalled, and up and running again.
The pool was OK, and I have access to all shares.
It would be nice to have known what went wrong, but I guess I’ll have to accept it as a random event.
Glad you got it all back @KarstenV. Are you running 3.8-11? with 4.3.3 kernel?
Yes, running 3.8-11.06 4.3.3-1.el7.elrepo.x86_64.
The reinstall even fixed the permission denied problem I had with SMART on one of my disks.
Only thing I need to get running is Plex. If I reuse the same shares as on the last install I hope Plex will use the settings stored in these?
I’m glad to hear you didn’t lose data.