Sorry I’ve been very quiet the last 6 months or so, not been very involved.
I had lots of console errors last night on my system drive (nvme - Rockstor 3.92-57 still, 213G free on / ).
It was fine the day before, but last night as I went to shut down from the UI, I couldn’t connect. couldn’t ssh either, and it wasn’t showing up in DHCP leases on the router. Plugged a monitor and keyboard in to see what was going on.
It happened last week, too (not being able to connect to the UI, or by ssh, I didn’t connect a monitor and keyboard on that occasion)
Also on the console there was:
systemd-journald Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
I guess if the system disk is having problems, logging to it is going to be compromised.
Having done a hard shutdown, it successfully restarted, but there doesn’t seem to be anything in the logs about it… again, I guess if the disk/filesystem completely went haywire you wouldn’t see anything.
I’ll have a closer look today. Any advice on things to check/maintenance to do?
The two main hardware failures are PSU’s and RAM. You could try a memtest86+ boot disk such as we link to in our “Pre-Install Best Practice (PBP)” doc section: https://rockstor.com/docs/installation/pre-install-howto.html#pre-install
Failing memory will of course corrupt ‘in-flight’ filesystem transactions and this in turn can lead to filesystem corruptions, or at least checksum errors. And with only a single disk btrfs is unable to resource a second to do it’s repair. And unfortunately our system disk defaults to single metadata also.
Another think to check is the cables. They can sometimes work loose via thermal cycling / vibration etc.
I built the system just before Christmas. I’ll run the memtest86+ and check the cables.
System disk is an m.2 nvme disk (2280, but the MB only has a hole to screw down a 2242, so it’s just held in place by friction in the slot… Could not find an m.2 nvme disk in 2242 for love nor money)
I have a 2242 m.2 SATA disk, but that would take up a SATA place in the MB SATA controller, which I wanted to keep for data disk swapping/upgrading. Might get a SATA PCI controller to compensate if I replace the system disk with the m.2 sata. In Sept I really must think of installing 4.0.x, and using a fresh disk might be an idea.
EDIT: I’ve just found that 2242 m.2 nvme drives are available again!