OpenSuse Leap 15.1 - NFS status error

Hello,
Just migrated from Centos to OpenSuse Leap 15.1 and found that the nfs service status is not correct on WebGui.

What I did:
backup configuration on the latest rockstor version on centos.
install opensuse leap 15.1
installed rockstor, imported and mounted btrfs
imported the configuration
system reboot

Seems that the nfs is starting on boot and it’s working ok, but on rockstor web I have this:

The log file is error free, what’s the command to capture the status? I might try in the CLI to see what’s the outcome.

Seems that during install something got unsyncronized.
If I’m enabling the NFS trigger on WebGui, the nfs daemon stops. If I’m disabling it then it starts.
In order to get it in sync, I have disabled it on WebGui and on CLI and now it’s working fine.

I do think it’s a bug as the status needs to be read if the pid is running, not dropping dummy commands :slight_smile:

Seems that there is a bug due to the restoration. Also Samba seems to be open even if it’s not in the system. I think the restore config set the trigger on the db even if on machine it’s not enabled.

@shocker

Thanks for the reports, all useful stuff.

Could you clarify what you mean by:

By open do you mean enabled. We generally use systemd commands to start / stop services and a variety of mechanisms, depending on the service, to assess if current state. But again it’s mostly systemd commands but from your report some of these look to be switching whatever service state is found which is obviously not correct. But given we haven’t had this reported before it’s a little strange as there is little difference in these regards between our CentOS base and the ‘Built on openSUSE’ endeavour.

Curious.

From memory the restore service state mechanism is meant to only enable a service if that service was previously enabled when the config was saved. The service state restore mechanism was last updated on 3.9.2-51:

So taking a look at that issue and it’s linked pull request should lead to the related code. And a glance at the following file:

should give an indication of the triggers used to inform the Web-UI switch reports of service status. The intention is that the system is the single point of truth for service state and we update the db wheneve the page is refresh with the state as reported at the lowest level from the above file’s various mechanism.

Definitely worth taking a look if you fancy as it’s all pretty readable and we do have some switching functions in there that may be the root of this behaviour. I.e. we me be inadvertently switching something when we should be explicitly enabling / disabling. But again we haven’t had reports of this, at least in more recent years. So this out-of-sync state may be, as you say, an artefact of the more recent restore mechanism.

Thanks again for the reports. Can’t look into these issues just yet as am deep in some other stuff but we now have these reports which is great. Keep us informed of any progress and do take a look at that services.py file if you suspect stuff originates at that level.

1 Like

@shocker Also do take a look at @Flox’s pull request linked to from the issue I’ve highlighted re the service state restoration as it is, as per the @Flox norm, excellently presented and tested. So there is a tone of pointer there for just this king of down-the-road investigation. I.e. an explanation of how this feature works and the like. It may be we missed something that’s causing your inverted service reporting that just didn’t register with either of us at the time and didn’t show up in our testing prior to me merging it.

Likely to be useful to take a read if you end up looking into this.

Cheers.

1 Like

The samba toggle in webgui under services was showing that was enabled. By checking the status on opensuse with service smb status I have seen that is disabled. I have manually enabled from cli smb then played with web toggle on off and everything went back to normal.
The problem is that I cannot reproduce this now as it was the first state after migration.
I have checked the code above and my behavior is really strange :slight_smile: is something imported in the db with the current state of the service in the backup config?

I will also try to play with this later on as I’m not using smb and I can test if.

Thank you for the feedback!

Thanks for this pointer, @phillxnet… I quickly looked at my recent PR on the matter (for the config restore bit), and I actually am using the current db info to check whether the service is currently ON or OFF. I don’t know whether that can participate to explain this report, but we may want to switch to using service_status() from system.services instead. See relevant snippet below:

I unfortunately won’t have time to look into that further for the next few days as I’m still out of any free time from work, but I wanted to point that out before forgetting. If you agree on switching to using systemctl status instead of the db info, I’ll work on it as soon as I can have some free time–unless somebody else beat me to it.

1 Like

Seems that it’s a problem on how this status is read. I have manually stopped and started from console:

Cheers

2 Likes

Hi @shocker,

I finally could find some time to look into that one and think I found the origin of the problem. If I’m correct, it seems to be yet another illustration of small differences between distributions that make or break things for us. In this case, it all seems to relate to how systemd files are dealt with.

TL;DR: CentOS uses a symlink nfs.service --> nfs-server.service, whereas Leap 15.2 uses an alias service nfs.service to refer to the nfs-client. As rockstor uses systemctl status nfs to check the status of the NFS service, it was seen as OFF and surfaced as such in the webUI (which is correct as the nfs client was indeed off). Checking for the status of nfs-server instead should fix that.

For reference, here’s the related code that checks for the status of the nfs service:

This, in the end, runs the systemctl status nfs command, which equals to the following:

  • in CentOS:
root@rockdev ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2020-05-04 05:19:21 PDT; 32s ago
  Process: 3596 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
  Process: 3593 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
  Process: 3590 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
 Main PID: 2116 (code=exited, status=0/SUCCESS)

May 04 05:18:16 rockdev systemd[1]: Starting NFS server and services...
May 04 05:18:16 rockdev systemd[1]: Started NFS server and services.
May 04 05:19:21 rockdev systemd[1]: Stopping NFS server and services...
May 04 05:19:21 rockdev systemd[1]: Stopped NFS server and services.
  • in Leap 15.2:
rockdev:~ # systemctl status nfs
● nfs.service - Alias for NFS client
   Loaded: loaded (/usr/lib/systemd/system/nfs.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

This is due to the following symlink in centOS:

[root@rockdev ~]# ls -lhtr /usr/lib/systemd/system/nfs*
-rw-r--r-- 1 root root  567 Aug  8  2019 /usr/lib/systemd/system/nfs-utils.service
-rw-r--r-- 1 root root 1.1K Aug  8  2019 /usr/lib/systemd/system/nfs-server.service
-rw-r--r-- 1 root root  395 Aug  8  2019 /usr/lib/systemd/system/nfs-mountd.service
-rw-r--r-- 1 root root  330 Aug  8  2019 /usr/lib/systemd/system/nfs-idmapd.service
-rw-r--r-- 1 root root  375 Aug  8  2019 /usr/lib/systemd/system/nfs-config.service
-rw-r--r-- 1 root root  413 Aug  8  2019 /usr/lib/systemd/system/nfs-client.target
-rw-r--r-- 1 root root  350 Aug  8  2019 /usr/lib/systemd/system/nfs-blkmap.service
lrwxrwxrwx 1 root root   19 Mar  7 16:43 /usr/lib/systemd/system/nfs-rquotad.service -> rpc-rquotad.service
lrwxrwxrwx 1 root root   18 Mar  7 16:43 /usr/lib/systemd/system/nfs-idmap.service -> nfs-idmapd.service
lrwxrwxrwx 1 root root   17 Mar  7 16:43 /usr/lib/systemd/system/nfs-lock.service -> rpc-statd.service
lrwxrwxrwx 1 root root   16 Mar  7 16:43 /usr/lib/systemd/system/nfs-secure.service -> rpc-gssd.service
lrwxrwxrwx 1 root root   18 Mar  7 16:43 /usr/lib/systemd/system/nfs.service -> nfs-server.service
lrwxrwxrwx 1 root root   17 Mar  7 16:43 /usr/lib/systemd/system/nfslock.service -> rpc-statd.service

in Leap 15.2, we have instead:

rockdev:~ # ls -lhtr /usr/lib/systemd/system/nfs*
-rw-r--r-- 1 root root 567 Apr  6 09:50 /usr/lib/systemd/system/nfs-utils.service
-rw-r--r-- 1 root root 715 Apr  6 09:50 /usr/lib/systemd/system/nfs.service
-rw-r--r-- 1 root root 859 Apr  6 09:50 /usr/lib/systemd/system/nfsserver.service
-rw-r--r-- 1 root root 826 Apr  6 09:50 /usr/lib/systemd/system/nfs-server.service
-rw-r--r-- 1 root root 231 Apr  6 09:50 /usr/lib/systemd/system/nfs-mountd.service
-rw-r--r-- 1 root root 244 Apr  6 09:50 /usr/lib/systemd/system/nfs-idmapd.service
-rw-r--r-- 1 root root 433 Apr  6 09:50 /usr/lib/systemd/system/nfs-client.target
-rw-r--r-- 1 root root 317 Apr  6 09:50 /usr/lib/systemd/system/nfs-blkmap.service

/usr/lib/systemd/system/nfs-client.target.d:
total 4.0K
-rw-r--r-- 1 root root 84 Apr  6 09:50 nfs.conf

/usr/lib/systemd/system/nfs-server.service.d:
total 8.0K
-rw-r--r-- 1 root root 101 Apr  6 09:50 options.conf
-rw-r--r-- 1 root root  96 Apr  6 09:50 nfsserver.conf

/usr/lib/systemd/system/nfs-mountd.service.d:
total 4.0K
-rw-r--r-- 1 root root 105 Apr  6 09:50 options.conf
rockdev:~ # cat /usr/lib/systemd/system/nfs.service
[Unit]
Description=Alias for NFS client
# The systemd alias mechanism (using symlinks) isn't rich enough.
# If you "systemctl enable" an alias, it doesn't enable the
# target.
# This service file creates a sufficiently rich alias for nfs-client
# (which is the canonical upstream name)
# "start", "stop", "restart", "reload" on this will do the same to nfs-client.
# "enable" on this will only enable this service, but when it starts, that
# starts nfs-client, so it is effectively enabled.
# nfs-server.d/nfsserver.conf is part of this service.

Requires= nfs-client.target
PropagatesReloadTo=nfs-client.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true

[Install]
WantedBy=multi-user.target

Of course, all we care about in our case is the nfs-server, not the client. This is why you observed a discrepancy between the nfs-server status on the machine and what the webUI was reporting.

I’ll work on a fix hopefully shortly.

Hope this helps,

1 Like

Cool, thank you for the feedback. Maybe switching to service_status will be the way forward to maintain the two distributions :slight_smile:

A Github issue has now been created to track progress on its resolution:
https://github.com/rockstor/rockstor-core/issues/2160#issue-612539865

A PR has now been submitted:
https://github.com/rockstor/rockstor-core/pull/2161#issue-413499608

1 Like