Rockstor fails to boot with kernel 3.18.1

tobiasly · March 17, 2015, 1:09am

Sure thing, glad to help!

Output of lsblk:

NAME=“sda” MODEL=“Samsung SSD 840 " SERIAL=“S1DBNSCF102321V” SIZE=“232.9G” TRAN=“sata” VENDOR=“ATA " HCTL=“0:0:0:0” TYPE=“disk” FSTYPE=”” LABEL="" UUID="“

NAME=“sda1” MODEL=”" SERIAL="" SIZE=“500M” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“ext4” LABEL="" UUID=“6d920393-f250-4fca-a5c2-d993f55f0ab8"

NAME=“sda2” MODEL=”" SERIAL="" SIZE=“7.5G” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“swap” LABEL="" UUID=“d21934be-9ed3-479b-ab50-753c77332720"

NAME=“sda3” MODEL=”" SERIAL="" SIZE=“224.9G” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“btrfs” LABEL=“rockstor_nas” UUID="92447d61-73cc-442b-8db4-5c17ae908dbf"

NAME=“sdb” MODEL=“WDC WD20EFRX-68E” SERIAL=“WD-WCC4MKKACPCX” SIZE=“1.8T” TRAN=“sata” VENDOR="ATA " HCTL=“1:0:0:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“wd_red_pool” UUID="9fb673ae-1e53-497e-ae8a-c03aa119e9e2"

NAME=“sdc” MODEL=“WDC WD20EFRX-68E” SERIAL=“WD-WCC4MHYZDPVZ” SIZE=“1.8T” TRAN=“sata” VENDOR="ATA " HCTL=“2:0:0:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“wd_red_pool” UUID=“9fb673ae-1e53-497e-ae8a-c03aa119e9e2”

/etc/fstab:

#

# /etc/fstab

# Created by anaconda on Fri Mar 13 21:01:58 2015

#

# Accessible filesystems, by reference, are maintained under ‘/dev/disk’

# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

#

UUID=92447d61-73cc-442b-8db4-5c17ae908dbf / btrfs subvol=root 1 1

UUID=6d920393-f250-4fca-a5c2-d993f55f0ab8 /boot ext4 defaults 1 2

UUID=92447d61-73cc-442b-8db4-5c17ae908dbf /home btrfs subvol=home 1 2

UUID=d21934be-9ed3-479b-ab50-753c77332720 swap swap defaults 0 0

I’m still just exploring the system so I don’t mind testing things out and just wiping/starting over if necessary…

tobiasly · March 17, 2015, 1:12am

Aaaah I just noticed as I pasted that, is it okay for two different physical disks (sdb and sdc) to have the same LABEL and UUID? No, I don’t have any hardware RAID enabled.

suman · March 17, 2015, 4:20pm

Output looks normal to me. The LABEl and UUID are same because those two drives belong to the same pool. A pool in Rockstor is BTRFS filesystem. Since this is not a core Rockstor issue, I am unable to provide insightful feedback. Sorry, but here are couple of things I’d try next.

1. You said you were able to boot from 3.19 once but it hung on reboot. Rockstor automatically sets the default kernel to 3.18.1 on every boot. Is it possible that on reboot, you were going back to 3.18.1 or did you manually select 3.19 from grub and make sure 3.19 also has the same problem?

2. During the install, have you given the hostname as nas? If you do re install, try not setting the hostname during install. You can do it during the web-ui setup. This is unrelated and I doubt it’s related to the label issue, but a worthy experiment.

tobiasly · March 17, 2015, 5:35pm

Ok, thanks for the input so far. Yes, I was manually selecting 3.19 from Grub. And yes, I gave the hostname during install, so I’ll try it in the web-ui setup if I exhaust other options.

suman · June 21, 2015, 11:10pm

The current kernel version as of current Rockstor version(3.8-1) is 4.0.2. Please update if this is still an issue.

tobiasly · June 24, 2015, 1:59pm

Hi @suman, unfortunately I was never able to resolve these issues and ended up moving off of Rockstor for my NAS, so I’m unable to test it any further right now. I had tried many different combinations of kernels and install options but kept having the hangup during boot.

Based on some similar issues by other users, I tried things like reordering/disabling various systemd options but still couldn’t get it to boot reliably. I made some progress by disabling some auto-mount points in /etc/fstab then writing a script to mount them after boot was finished but even then it didn’t always boot reliably.

I don’t think any of this really has anything to do with Rockstor, with the possible exception that it seemed like its startup script was running twice (sorry I can’t find specifics on which script since I didn’t write it down) but even disabling that didn’t really fix the problem.

sirhcjw · July 5, 2015, 10:48pm

I have too had this issue and it is related to when the system trys to jump to using a gui beet screen.

I had this same problem on 3.18 and 4.0.2 and solved it by doing the following

edit /etc/default/grub

change

GRUB_CMDLINE_LINUX=“rhgb quiet crashkernel=auto”

to

GRUB_CMDLINE_LINUX=“crashkernel=auto”

Then run

grub2-mkconfig -o /boot/grub2/grub.cfg

I would suggest making this a change to the distro as it seems to be causing a few issues and there is no need for the gui boot screen on a server?

Cheers

Chris

suman · July 5, 2015, 11:53pm

Isn’t removing rhgb and quiet parameters purely cosmetic? Or does rhgb require drivers that may not be available on certain hardware? If so, does it not fail back to a text boot screen?

sirhcjw · July 6, 2015, 2:07am

I am not exactly sure but i know it fixes it. My thinking is it requires the frame buffer to do more work in higher res and older hardware with on board graphics is not likeing it.

fictionedge · July 27, 2015, 3:38pm

Hi
Had the same error and found out the problem was … hard disk full!!
For some strange reason my 500Gb system only non shared disk was full.
I’m moving to a 1Tb had disk and later find out what could be causing this.

MrChucklez · August 11, 2015, 10:34pm

Sorry to tack on to this thread.

I see the same error after I added 2 WD Red drives. I see the following errors repeating over and over with different values for x and n:
[uptime in s?] systemd-udevd[different pids?]: worker [another pid] /devices/pci0000:00/000:00.1f.2/ata4/ host3/target/…/block/sdc timeout: kill it
[uptime in s?] systemd-udevd[different pids]: timeout ‘/bin/mknod /dev/btrfs-control c 10 234’
<above line repeats 5 times>
[uptime in s?] seq [some number] /devices/pci /block/sdc killed

The above repeats additions with /devices/system/memory/memory65 timeout: kill it

Not sure what is going on here.

Schrauber · March 17, 2016, 10:10am

Hi,

same problem here.

I tried all of the above methods but none worked. First, I also thought that removing quiet and rhgb from kernel line fixed the problem. But this was not the case. After some days the problem occured again.
Also my disk is not full. I also tried disabling the crashkernel feature. But no luck.

I tried this on four completely different hardware.
First was a HP microserver N54L.
Second was a Asrock Q1900 with J1900 CPU.
Third was an older AMD E-350 Board.
The last one is an Asrock AM1B-ITX with an Athlon 5350.

On all systems the same problem after some time.
I also tried different SSDs as system disk. And I also tried different harddrives for data. But all these drives where WD red.

I tried different SATA cables and different Cases. All the same problem.

The only thing, that is in common: All the Cases (HP Microserver, a 4bay Case from Inter-Tech and now an NSC-200 from U-Nas) have a backplane for the harddrives.

Actually I have pulled out the drives and connected them directly to the mainboard. But I have not much hope for this.

I am running out of ideas with this. Last thing is a software problem.

Last info on this: I tried Rockstor versions from 3.8-10 on to the actual 3.8-12. All with the same issue.

Schrauber · March 17, 2016, 1:47pm

Additional info again:
I have running a productive Rockstor system in a VM on a KVM host. This works without such a problem.

Also Rockstor installations in other VMs seem to work fine.

So this problem seems to only occur on physical hardware.

Schrauber · March 17, 2016, 7:46pm

OK, this seems not to be hardware related.
The backplane is also not the problem. I had connected the harddrives directly to the mainboard and now it also happened.

Here are two “Screenshots”.
The fist shows where the system hangs.

After some minutes there are some more messages:

Now this happens on 4 different systems. So this seems to be a software issue.
I can not find a single real hardware, where Rockstor boots reliably.
Only in VMs there does not seem to be a problem.

Many users maybe do not notice this problem, because storage servers are not rebooted this often.

EDIT: One more info:
I have today also tried without any data harddrive connected. Only the system drive was connected. It hangs on boot also in this case.

magicalyak · March 18, 2016, 1:16pm

This is on KVM? did you try acpi=off and it also looks like ecc isn’t on (not sure that’s an issue for rockstor but it is for other NAS distros).

It looks like there is a sound card error, could you remove the sound card or disable in BIOS and see if that improves? (do you need that sound card?) If it’s a VM you should be able to disable it on the VM.

What hypervisor are you using? What are the VM settings for Storage Driver, memory and BIOS?

Schrauber · March 18, 2016, 1:42pm

As written above, this occurs on three different machines. So no VMs.

I also have one Rockstor installation in a KVM VM and this is the only one, that does not have this problem.

I already tried disabling soundcard and all other unneeded things (serial and parallel ports and so). This does not change anything.

ECC is disabled because this system does not have ECC RAM installed. So nothing to enable. Also there is no Bios option for ECC.

Peter.v · July 12, 2016, 12:20am

I have this same issue just today on the latest update of rockstor, 1x 64GB SSD as boot drive. 10x 500MB configured as 1 for storage. Its just a test / play unit at work so not really any use for it right now.
I will leave it for the day and see if it “heals” itself as Suman said up there somewhere, its btfrs fixing the drives?
If not i will scrap it tomorrow and reinstall the OS

Thanks

Schrauber · July 12, 2016, 9:57am

Did you read this Thead?

I have postet a solution there with modifying the nfs-config.service.
Could you try this?

Try if you system boots with network cable disconnected. This helped for me. After I was able to modify the nfs-config.service

grebnek · July 24, 2016, 11:32am

@Schrauber Thanks, can confirm disabling nfs resolved the problem for me.

After disabling quit boot I could see that it got stuck at:
Starting nfsv4 id-name mapper

I boot to rescue mode, remounted /root in rw mode and disabled nfs service with

mount -o remount,rw /
systemctl disable nfs

Maybe I’m a bit late, but I hope it helps others.

Schrauber · July 25, 2016, 7:24am

Hi,

thanks for the info.

Did you try booting with network cable disconnected? Did it hang anyway?

Completely disabling nfs is maybe not the ideal solution.
Did modifying nfs-config.service not work in your case?
This was my suggestion in the other thread.