Building opensuse installer with kernel backport 4.1.0-0 not successfully installing [Solved]

I am trying to build an installer according to the github instructions for 4.1.0-0 on a brand new Leap15.3 Virtual Machine installation (I installed the basic development tools via YAST) with the kernel backport repos uncommented. The iso is successfully created, no obvious errors reported during that process.
However, when I try a test install, Rockstor fails to start after successfully configuration and the process finishing so I can log in.
Looking at the messages it starts with the avahi service and goes downhill from there. Some of the error messages indicate that no user was found for a given service (e.g. for avahi). When I then checked my existing 4.1 install, there are many more users set up than what I can see on the test installations (e.g. avahi, postfix, postgres, etc.).

to Check whether my kiwi file editing caused any issues, I re-pulled the installer repo and only updated the version to 4.1.0-0 where indicated. Again, build went through, iso was created, but install showed same symptoms as above.

Finally, not that is should matter, I installed using the EFI option on Virtualbox for another installation attempt, same result.

I am wondering where else I can look to see what might have gone wrong during the iso built? The kiwi result files don’t show me an obvious culprit.

@Hooverdan Hello again.
Re:

You don’t need dev tools for this as such, just the indicated packages within the Readme. Did you use the boxbuild method?

Is the problem here that the first re-boot, post install, fails? In that case it may be efi related. See the following: https://github.com/rockstor/rockstor-installer/issues/82

But is sounds like you are booting, at least initially, via legacy bios boot.

Take care here. You only want the stable backport kernel repo and the associated filesystems repo uncommented. You must have both a kernel repo and the filesystems unremarked. Also avoid the Kernel:HEAD repo, that’s too cutting edge. If it is also uncommended then you will end up with a Kernel:HEAD that is from the last few days. You don’t mention this but I thought you may have uncommitted the entire section by accident.
So only the repository sections below:
<!-- Kernel Stable Backports -->
and:
<!-- btrfs-progs backport via filesystems repo -->

You can build with zero edits. You will end up with a un-versioned filename for the installer but it should still work.

I’m not really sure what’s gone on here actually. Favour the boxbuild method and if you are already doing that then try the ‘by-hand’ native build. Also note the instructions regarding using the upstream, not the default, kiwi-ng program install if not using the boxbuild method, i.e.: the

sudo zypper addrepo http://download.opensuse.org/repositories/Virtualization:/Appliances:/Builder/openSUSE_Leap_15.3/ appliance-builder
sudo zypper install python3-kiwi btrfsprogs gfxboot qemu-tools gptfdisk e2fsprogs squashfs xorriso dosfstools

bit.

I’m due to look at this again here soon so keep us posted on what you find. Also always try with no edits first just in case. It may well be we have had an update in upstream that has thrown something compatibility wise that we have not yet seen since our last installer builds.

Hope that helps and well done on at least getting an apparent installer built.

2 Likes

@phillxnet, thanks for the detailed response. I should have been a little clearer in my details.
I have not successfully used the box-build method, as all my current Linux boxes are running inside Virtualbox and it’s been hard to get any type of KVM/QEMU to work. So I opted to use the “legacy” method.

On the Kernel backport, I should have been clearer, I only uncommented the Stable/btrfs-progs sections that you also mentioned above (I did read the warning in the comments :slight_smile: )

I did use the upstream kiwi-ng stuff, and not the disti provided packages.

In the end I built (more than) 3 installer versions:

  1. Plain Build, no file edits
  2. Version adjustments in file
  3. stable kernel backport and filesystem enabled.
  4. to ensure my VM config was not screwed up, I also did a test install using the official installer, worked just fine without failures

All three bootable and running the installation screens successfully, unfortunately, then the start up of services failed, as mentioned and then downhill from there. Of course with all of those pre-reqs not properly coming up, rockstor did not stand a chance.

A reboot did not fix the issues, but repeated them. At the end of that I do get my login prompt and an opensuse operating system, just no rockstor running.

I enabled the EFI vs. non-EFI (Virtualbox setting) for the various installs, all with the same end result, unfortunately:
image

And, in all cases (probably among other things) the list of created users was suspiciously short.

In journalctl starts with this (I skipped additional lines around these messages, but can get all of the details, if interested):

Feb 26 18:55:01 localhost rpcbind[479]: cannot get uid of 'rpc': Success
Feb 26 18:55:01 localhost systemd[1]: **Failed to start RPC Bind**.
...

continues on to:

Feb 26 18:55:02 localhost systemd[1]: Failed to start Activation of DM RAID sets.

Feb 26 18:55:02 localhost systemd-tmpfiles[521]: /usr/lib/tmpfiles.d/postgresql.conf:2: Failed to resolve user 'postgres': No such process

Feb 26 18:55:02 localhost avahi-daemon[523]: **Failed to find user 'avahi'**.
Feb 26 18:55:02 localhost systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/EXCEPTION

Feb 26 18:55:02 localhost dbus-daemon[524]: dbus[524]: Unknown username "**avahi**" in message bus configuration file
Feb 26 18:55:02 localhost dbus-daemon[524]: dbus[524]: Unknown group "**avahi**" in message bus configuration file
Feb 26 18:55:02 localhost dbus-daemon[524]: dbus[524]: Unknown username "**polkitd**" in message bus configuration
Feb 26 18:55:02 localhost systemd[547]: postgresql.service: Failed to determine user credentials: No such process
Feb 26 18:55:02 localhost systemd[547]: **postgresql.service**: Failed at step USER spawning 

Feb 26 18:55:02 localhost chronyd[563]: Fatal error : Could not get chrony uid/gid
Feb 26 18:55:02 localhost systemd[1]: **chronyd.service**: Failed with result 'exit-code'.
Feb 26 18:55:02 localhost systemd[1]: Failed to start **NTP client/server**.

Feb 26 18:55:02 localhost atd[571]: Cannot get uid for at: Success
Feb 26 18:55:02 localhost systemd[1]: **atd.service**: Main process exited, code=exited, status=1/FAILURE

and then of course this:

Feb 26 18:55:04 localhost systemd[1]: rockstor-pre.service: Main process exited, code=exited, status=1/FAILURE
Feb 26 18:55:04 localhost systemd[1]: rockstor-pre.service: Failed with result 'exit-code'.
Feb 26 18:55:04 localhost systemd[1]: Failed to start Tasks required prior to starting Rockstor.
Feb 26 18:55:04 localhost systemd[1]: Dependency failed for RockStor startup script.
Feb 26 18:55:04 localhost systemd[1]: Dependency failed for Rockstor bootstrapping tasks.
Feb 26 18:55:04 localhost systemd[1]: rockstor-bootstrap.service: Job rockstor-bootstrap.service/start failed with result 'dependency'.

so … that’s where I am :weary:

1 Like

Well, @phillxnet’s comment on the build box made me go back once more to see whether I could get it to work. And, it seems that it did!!!

I realized the main item missing on the VM instance was to activate the nested virtualization by setting this switch:
image

And the Virtualization activation (KVM/QEMU) using YAST (since I was doing this on an OpenSUSE VM) was then “operational”.

Only other thing I believe I had to ensure it was installed was python3-devel based on some other things I was reading about KVM and running kiwi. I didn’t go back and try it without.

Next hurdle for me was that when using the pip command it seemed to pull from a pip version aligned with the python 2.7 on that machine. So I manipulated the command line a bit, and then it was off to the races:

zypper in git
git clone https://github.com/rockstor/rockstor-installer.git
cd rockstor-installer/

and then (using pip3, instead of just pip):
python3 -m venv kiwi-env
./kiwi-env/bin/pip3 install kiwi kiwi-boxed-plugin

finally:
./kiwi-env/bin/kiwi-ng --profile=Leap15.3.x86_64 --type oem system boxbuild --box-memory 2G --box-smp-cpus=1 --box leap -- --description ./ --target-dir ./images

Since the VM was not all that big, I deviated from the standard 8G/4CPU default. Took a little longer (a bit less than 30 minutes), but created the image(s):

I test installed successfully:

  • vanilla (no kiwi file changes)
  • stable backport kernel and filesystem and version number updates (some kiwi file changes)

Both of them installed successfully and the WebUI came right up (basic config and pool import) and survived reboots. Vanilla with original kernel 5.3 and the backport showed the latest 5.16.

One thing to note, since, of course I made that mistake very early on. When uncommenting lines in the kiwi file, ensure that the “comment” characters at the end of the line are cleared as well, otherwise errors will occur. E.g.

> <!--       <source path="https://download.opensuse.org/repositories/Kernel:/stable:/Backport/standard/"/>-->
> <!--    </repository>-->

to:

<source path="https://download.opensuse.org/repositories/Kernel:/stable:/Backport/standard/"/>
</repository>

So, boxed plugin works without too much effort on a VirtualBox VM as well. One just has to put their mind to it :slight_smile:

I will mark this as solved, as I don’t think that for now I will further investigate the non-box build option issues I ran into. Now, maybe checking out whether I can get the windows-based vagrant built to work again (because that doesn’t currently work anymore either).

2 Likes

@Hooverdan Well done, chuffed your now sorted on this front. We don’t yet use the boxbuild method for the official installer builds but will probably switch to that method in time. So I’ll take another look here soon here at the non boxbuild method to see if I get the same result as you did re missing users. It may just be a transient thing with package versions or the like thought. I.e. something that may be fixed by the time I try this. Although package install usually creates the required users so really don’t know what happened there.

Re:

Nice find. I’ve created the following as a result:

If you fancy dong the honours on an associated pr then do make a note in that issue to avoid duplication of effort. Thanks again for reporting this. Much appreciated.

OK, I’ll double check this soon and we can add as a dependency if need be. Should be a dependency for kiwi-ng install but it may still have been missed upstream.

Yes, that is a bit messy. I think the indentation was distracting if I did a block remark so left it this way.

I’ll be having a go at this again soon myself so can report any findings in this thread. Cheers.

Yes, I did consider not putting this in the same repo but thought it might get more attention/up-keep that it has actually received. There were early issues with the available VM images and the like which meant it was dysfunctional for a while. It would be good to have it as a working option but if it doesn’t receive much attention I’ll likely remove it as otherwise it’s an unwelcome support surface / frustration point. But for now I think it OK to have it in the same repo to help folks have a single point of reference for the installer build. Plus there was, at one time, quite a bit of community effort in that direction. I’ve not tried it myself and we don’t reference it from the top readme as for me at least it’s way too much magic (moving parts) that we already have in place. The VM image availability has already failed it now numerous times. Plus we now have a similar issue actually in our boxbuild method.

I’m sticking for now with a native build in production. But the boxbuild method does offer the assurance of a known appropriate ‘platform’. Especially useful when we are mid-term on the Leap versions.

Thanks again for your as-usual rigour, and do make a not on that issue if you fancy doing the pr.

1 Like

Thanks for all your comments. Let’s hope, the direct build issues are transient as you’ve mentioned.

One thing I wanted to point out on the box build method is that I observed that the KVM within a VM (Virtualbox) set up does not like to write stuff to a shared folder (on the VM to the host system). Despite the fact that the user/group rights appear correctly, it refuses to write out directly to a non-VM native directory.
I found the same behavior when playing with changing the Vagrant approach to using the box build (which I got to work by the way), which is in line with what was reported here (though there the focus was on using WSL2 to build installers):

2 Likes

@Hooverdan Interesting. Again too much magic in magic really.
Re:

Maybe we could add a note to this end. Likely this should be in it’s own issue/pr. Do you fancy making at least the issue as you have the current knowledge in-head as it were. I’d really like to have this entire DIY installer build as straight forward as possible without adding the way-too-much magic of the vagrant abstraction.

Also:

Pull request possibly ??

2 Likes

I’ll put the issue up. Just wanted to clarify this is not only a Vagrant problem I ran into. I had the same issue using a VM with Leap on it when I started out with the “system build” flavor. I wanted the iso to be written to the shared folder on the host OS and it didn’t like that either (hence I quoted that earlier forum post).

1 Like

Added here:

Let’s discuss, and then I can possibly do a PR for it as well.

2 Likes