Hello,
Just migrated from Centos to OpenSuse Leap 15.1 and found that the nfs service status is not correct on WebGui.
What I did:
backup configuration on the latest rockstor version on centos.
install opensuse leap 15.1
installed rockstor, imported and mounted btrfs
imported the configuration
system reboot
Seems that the nfs is starting on boot and itās working ok, but on rockstor web I have this:
Seems that during install something got unsyncronized.
If Iām enabling the NFS trigger on WebGui, the nfs daemon stops. If Iām disabling it then it starts.
In order to get it in sync, I have disabled it on WebGui and on CLI and now itās working fine.
I do think itās a bug as the status needs to be read if the pid is running, not dropping dummy commands
Seems that there is a bug due to the restoration. Also Samba seems to be open even if itās not in the system. I think the restore config set the trigger on the db even if on machine itās not enabled.
By open do you mean enabled. We generally use systemd commands to start / stop services and a variety of mechanisms, depending on the service, to assess if current state. But again itās mostly systemd commands but from your report some of these look to be switching whatever service state is found which is obviously not correct. But given we havenāt had this reported before itās a little strange as there is little difference in these regards between our CentOS base and the āBuilt on openSUSEā endeavour.
Curious.
From memory the restore service state mechanism is meant to only enable a service if that service was previously enabled when the config was saved. The service state restore mechanism was last updated on 3.9.2-51:
[Config Backup/Restore] Restore Service status. Fixes #2087 @FroggyFlox
So taking a look at that issue and itās linked pull request should lead to the related code. And a glance at the following file:
should give an indication of the triggers used to inform the Web-UI switch reports of service status. The intention is that the system is the single point of truth for service state and we update the db wheneve the page is refresh with the state as reported at the lowest level from the above fileās various mechanism.
Definitely worth taking a look if you fancy as itās all pretty readable and we do have some switching functions in there that may be the root of this behaviour. I.e. we me be inadvertently switching something when we should be explicitly enabling / disabling. But again we havenāt had reports of this, at least in more recent years. So this out-of-sync state may be, as you say, an artefact of the more recent restore mechanism.
Thanks again for the reports. Canāt look into these issues just yet as am deep in some other stuff but we now have these reports which is great. Keep us informed of any progress and do take a look at that services.py file if you suspect stuff originates at that level.
@shocker Also do take a look at @Floxās pull request linked to from the issue Iāve highlighted re the service state restoration as it is, as per the @Flox norm, excellently presented and tested. So there is a tone of pointer there for just this king of down-the-road investigation. I.e. an explanation of how this feature works and the like. It may be we missed something thatās causing your inverted service reporting that just didnāt register with either of us at the time and didnāt show up in our testing prior to me merging it.
Likely to be useful to take a read if you end up looking into this.
The samba toggle in webgui under services was showing that was enabled. By checking the status on opensuse with service smb status I have seen that is disabled. I have manually enabled from cli smb then played with web toggle on off and everything went back to normal.
The problem is that I cannot reproduce this now as it was the first state after migration.
I have checked the code above and my behavior is really strange is something imported in the db with the current state of the service in the backup config?
I will also try to play with this later on as Iām not using smb and I can test if.
Thanks for this pointer, @phillxnetā¦ I quickly looked at my recent PR on the matter (for the config restore bit), and I actually am using the current db info to check whether the service is currently ON or OFF. I donāt know whether that can participate to explain this report, but we may want to switch to using service_status() from system.services instead. See relevant snippet below:
I unfortunately wonāt have time to look into that further for the next few days as Iām still out of any free time from work, but I wanted to point that out before forgetting. If you agree on switching to using systemctl status instead of the db info, Iāll work on it as soon as I can have some free timeāunless somebody else beat me to it.
I finally could find some time to look into that one and think I found the origin of the problem. If Iām correct, it seems to be yet another illustration of small differences between distributions that make or break things for us. In this case, it all seems to relate to how systemd files are dealt with.
TL;DR: CentOS uses a symlink nfs.service --> nfs-server.service, whereas Leap 15.2 uses an alias service nfs.service to refer to the nfs-client. As rockstor uses systemctl status nfs to check the status of the NFS service, it was seen as OFF and surfaced as such in the webUI (which is correct as the nfs client was indeed off). Checking for the status of nfs-server instead should fix that.
For reference, hereās the related code that checks for the status of the nfs service:
This, in the end, runs the systemctl status nfs command, which equals to the following:
in CentOS:
root@rockdev ~]# systemctl status nfs
ā nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2020-05-04 05:19:21 PDT; 32s ago
Process: 3596 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
Process: 3593 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
Process: 3590 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
Main PID: 2116 (code=exited, status=0/SUCCESS)
May 04 05:18:16 rockdev systemd[1]: Starting NFS server and services...
May 04 05:18:16 rockdev systemd[1]: Started NFS server and services.
May 04 05:19:21 rockdev systemd[1]: Stopping NFS server and services...
May 04 05:19:21 rockdev systemd[1]: Stopped NFS server and services.
in Leap 15.2:
rockdev:~ # systemctl status nfs
ā nfs.service - Alias for NFS client
Loaded: loaded (/usr/lib/systemd/system/nfs.service; disabled; vendor preset: disabled)
Active: inactive (dead)
This is due to the following symlink in centOS:
[root@rockdev ~]# ls -lhtr /usr/lib/systemd/system/nfs*
-rw-r--r-- 1 root root 567 Aug 8 2019 /usr/lib/systemd/system/nfs-utils.service
-rw-r--r-- 1 root root 1.1K Aug 8 2019 /usr/lib/systemd/system/nfs-server.service
-rw-r--r-- 1 root root 395 Aug 8 2019 /usr/lib/systemd/system/nfs-mountd.service
-rw-r--r-- 1 root root 330 Aug 8 2019 /usr/lib/systemd/system/nfs-idmapd.service
-rw-r--r-- 1 root root 375 Aug 8 2019 /usr/lib/systemd/system/nfs-config.service
-rw-r--r-- 1 root root 413 Aug 8 2019 /usr/lib/systemd/system/nfs-client.target
-rw-r--r-- 1 root root 350 Aug 8 2019 /usr/lib/systemd/system/nfs-blkmap.service
lrwxrwxrwx 1 root root 19 Mar 7 16:43 /usr/lib/systemd/system/nfs-rquotad.service -> rpc-rquotad.service
lrwxrwxrwx 1 root root 18 Mar 7 16:43 /usr/lib/systemd/system/nfs-idmap.service -> nfs-idmapd.service
lrwxrwxrwx 1 root root 17 Mar 7 16:43 /usr/lib/systemd/system/nfs-lock.service -> rpc-statd.service
lrwxrwxrwx 1 root root 16 Mar 7 16:43 /usr/lib/systemd/system/nfs-secure.service -> rpc-gssd.service
lrwxrwxrwx 1 root root 18 Mar 7 16:43 /usr/lib/systemd/system/nfs.service -> nfs-server.service
lrwxrwxrwx 1 root root 17 Mar 7 16:43 /usr/lib/systemd/system/nfslock.service -> rpc-statd.service
rockdev:~ # cat /usr/lib/systemd/system/nfs.service
[Unit]
Description=Alias for NFS client
# The systemd alias mechanism (using symlinks) isn't rich enough.
# If you "systemctl enable" an alias, it doesn't enable the
# target.
# This service file creates a sufficiently rich alias for nfs-client
# (which is the canonical upstream name)
# "start", "stop", "restart", "reload" on this will do the same to nfs-client.
# "enable" on this will only enable this service, but when it starts, that
# starts nfs-client, so it is effectively enabled.
# nfs-server.d/nfsserver.conf is part of this service.
Requires= nfs-client.target
PropagatesReloadTo=nfs-client.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
Of course, all we care about in our case is the nfs-server, not the client. This is why you observed a discrepancy between the nfs-server status on the machine and what the webUI was reporting.