[SOLVED over CentOS updates] 3.8.16 upgrade killed rockstor - glibc / samba library bug

Well killed might be a bit strong, but after reboot the GUI service isn’t running and none of my shares are mounted so it mights as well be dead.

This was an upgrade from 3.8.15-0 (Stable tree)

Dec 16 10:35:09 BN-NAS1 initrock[11917]: 2016-12-16 10:35:09,817: firewalld stopped and disabled
Dec 16 10:35:11 BN-NAS1 kernel: route[11974]: segfault at 968 ip 00007f3307a6b915 sp 00007ffd444c6c80 error 4 in libpthread-2.17.so[7f3307a65000+17000]
Dec 16 10:35:13 BN-NAS1 kernel: route[11975]: segfault at 968 ip 00007f6ed452f915 sp 00007ffd54c047d0 error 4 in libpthread-2.17.so[7f6ed4529000+17000]
Dec 16 10:35:15 BN-NAS1 kernel: route[11976]: segfault at 968 ip 00007fe4106b7915 sp 00007fffad740af0 error 4 in libpthread-2.17.so[7fe4106b1000+17000]
Dec 16 10:35:17 BN-NAS1 kernel: route[11977]: segfault at 968 ip 00007f19e9d34915 sp 00007ffd8deaf7e0 error 4 in libpthread-2.17.so[7f19e9d2e000+17000]
Dec 16 10:35:19 BN-NAS1 kernel: route[11978]: segfault at 968 ip 00007f1695a8d915 sp 00007fff42e14790 error 4 in libpthread-2.17.so[7f1695a87000+17000]
Dec 16 10:35:21 BN-NAS1 kernel: route[11979]: segfault at 968 ip 00007f9d518d0915 sp 00007ffffab77590 error 4 in libpthread-2.17.so[7f9d518ca000+17000]
Dec 16 10:35:23 BN-NAS1 kernel: route[11980]: segfault at 968 ip 00007f2d8a041915 sp 00007ffeb51e6cb0 error 4 in libpthread-2.17.so[7f2d8a03b000+17000]
Dec 16 10:35:26 BN-NAS1 kernel: route[11981]: segfault at 968 ip 00007fee368c4915 sp 00007fff2854de90 error 4 in libpthread-2.17.so[7fee368be000+17000]
Dec 16 10:35:28 BN-NAS1 kernel: route[11983]: segfault at 968 ip 00007f3dd95ed915 sp 00007ffd5e79e1b0 error 4 in libpthread-2.17.so[7f3dd95e7000+17000]
Dec 16 10:35:30 BN-NAS1 kernel: route[11984]: segfault at 968 ip 00007f9ddd997915 sp 00007fff223dee10 error 4 in libpthread-2.17.so[7f9ddd991000+17000]
Dec 16 10:35:32 BN-NAS1 kernel: route[11985]: segfault at 968 ip 00007f2f1469f915 sp 00007fff40cfdbd0 error 4 in libpthread-2.17.so[7f2f14699000+17000]
Dec 16 10:35:34 BN-NAS1 kernel: route[11987]: segfault at 968 ip 00007f516aa2a915 sp 00007fff7f86d0e0 error 4 in libpthread-2.17.so[7f516aa24000+17000]
Dec 16 10:35:36 BN-NAS1 kernel: route[11988]: segfault at 968 ip 00007f963279d915 sp 00007fff858b01f0 error 4 in libpthread-2.17.so[7f9632797000+17000]
Dec 16 10:35:38 BN-NAS1 kernel: route[11989]: segfault at 968 ip 00007f6b5753a915 sp 00007ffea7a10080 error 4 in libpthread-2.17.so[7f6b57534000+17000]
Dec 16 10:35:40 BN-NAS1 kernel: route[11990]: segfault at 968 ip 00007f1b4de7b915 sp 00007ffcf1c57670 error 4 in libpthread-2.17.so[7f1b4de75000+17000]
Dec 16 10:35:41 BN-NAS1 nmbd[2577]: [2016/12/16 10:35:41.485207,  0] ../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Dec 16 10:35:41 BN-NAS1 nmbd[2577]:   query_name_response: Multiple (2) responses received for a query on subnet 10.0.244.191 for name FLYING-BEAST<1d>.
Dec 16 10:35:41 BN-NAS1 nmbd[2577]:   This response was from IP 10.0.244.2, reporting an IP address of 10.0.244.2.
Dec 16 10:35:42 BN-NAS1 kernel: route[11991]: segfault at 968 ip 00007f5823f91915 sp 00007fffd05e8e20 error 4 in libpthread-2.17.so[7f5823f8b000+17000]
Dec 16 10:35:44 BN-NAS1 kernel: route[11992]: segfault at 968 ip 00007f7d2ddbc915 sp 00007ffe72ccfc90 error 4 in libpthread-2.17.so[7f7d2ddb6000+17000]
Dec 16 10:35:46 BN-NAS1 kernel: route[11993]: segfault at 968 ip 00007f81e1325915 sp 00007ffe859b8cf0 error 4 in libpthread-2.17.so[7f81e131f000+17000]
Dec 16 10:35:48 BN-NAS1 kernel: route[11994]: segfault at 968 ip 00007f3fd6744915 sp 00007ffecafe9b00 error 4 in libpthread-2.17.so[7f3fd673e000+17000]
Dec 16 10:35:48 BN-NAS1 systemd[1]: Got notification message for unit systemd-journald.service
Dec 16 10:35:48 BN-NAS1 systemd[1]: systemd-journald.service: Got notification message from PID 1181 (WATCHDOG=1)
Dec 16 10:35:48 BN-NAS1 systemd[1]: systemd-journald.service: got WATCHDOG=1
Dec 16 10:35:50 BN-NAS1 kernel: route[11995]: segfault at 968 ip 00007f40c92ff915 sp 00007ffc96aa5c20 error 4 in libpthread-2.17.so[7f40c92f9000+17000]
Dec 16 10:35:52 BN-NAS1 kernel: route[11996]: segfault at 968 ip 00007f4d2471b915 sp 00007ffcf29a5510 error 4 in libpthread-2.17.so[7f4d24715000+17000]
Dec 16 10:35:54 BN-NAS1 kernel: route[11997]: segfault at 968 ip 00007f7f9edac915 sp 00007ffdf7992500 error 4 in libpthread-2.17.so[7f7f9eda6000+17000]
Dec 16 10:35:56 BN-NAS1 kernel: route[11998]: segfault at 968 ip 00007f5603db9915 sp 00007ffc0b759c80 error 4 in libpthread-2.17.so[7f5603db3000+17000]
Dec 16 10:35:58 BN-NAS1 kernel: route[11999]: segfault at 968 ip 00007f8eee575915 sp 00007ffe84cb56d0 error 4 in libpthread-2.17.so[7f8eee56f000+17000]
Dec 16 10:36:00 BN-NAS1 kernel: route[12000]: segfault at 968 ip 00007f116f855915 sp 00007ffe99a661f0 error 4 in libpthread-2.17.so[7f116f84f000+17000]
Dec 16 10:36:02 BN-NAS1 kernel: route[12001]: segfault at 968 ip 00007f04d5901915 sp 00007fff90698ed0 error 4 in libpthread-2.17.so[7f04d58fb000+17000]
Dec 16 10:36:04 BN-NAS1 kernel: route[12003]: segfault at 968 ip 00007fa66af30915 sp 00007ffd50f713c0 error 4 in libpthread-2.17.so[7fa66af2a000+17000]
Dec 16 10:36:06 BN-NAS1 kernel: route[12004]: segfault at 968 ip 00007f770fb44915 sp 00007ffe82b7fc90 error 4 in libpthread-2.17.so[7f770fb3e000+17000]
Dec 16 10:36:08 BN-NAS1 kernel: route[12005]: segfault at 968 ip 00007f2dbcafd915 sp 00007ffd7e58cc90 error 4 in libpthread-2.17.so[7f2dbcaf7000+17000]
Dec 16 10:36:10 BN-NAS1 kernel: route[12006]: segfault at 968 ip 00007fef0b122915 sp 00007ffd8f12dad0 error 4 in libpthread-2.17.so[7fef0b11c000+17000]
Dec 16 10:36:10 BN-NAS1 initrock[11917]: 2016-12-16 10:36:10,465: Waited too long and tried too many times. Quiting.
Dec 16 10:36:10 BN-NAS1 initrock[11917]: Traceback (most recent call last):
Dec 16 10:36:10 BN-NAS1 initrock[11917]: File "/opt/rockstor/bin/initrock", line 45, in <module>
Dec 16 10:36:10 BN-NAS1 initrock[11917]: sys.exit(scripts.initrock.main())
Dec 16 10:36:10 BN-NAS1 initrock[11917]: File "/opt/rockstor/src/rockstor/scripts/initrock.py", line 447, in main
Dec 16 10:36:10 BN-NAS1 initrock[11917]: raise e
Dec 16 10:36:10 BN-NAS1 initrock[11917]: system.exceptions.CommandException: Error running a command. cmd = ['/usr/sbin/route']. rc = -11. stdout = ['']. stderr = ['']
Dec 16 10:36:10 BN-NAS1 systemd[1]: Received SIGCHLD from PID 11917 (initrock).
Dec 16 10:36:10 BN-NAS1 systemd[1]: Child 11917 (initrock) died (code=exited, status=1/FAILURE)
Dec 16 10:36:10 BN-NAS1 systemd[1]: Child 11917 belongs to rockstor-pre.service
Dec 16 10:36:10 BN-NAS1 systemd[1]: rockstor-pre.service: main process exited, code=exited, status=1/FAILURE
Dec 16 10:36:10 BN-NAS1 systemd[1]: rockstor-pre.service changed start -> failed
Dec 16 10:36:10 BN-NAS1 systemd[1]: Job rockstor-pre.service/start finished, result=failed
Dec 16 10:36:10 BN-NAS1 systemd[1]: Failed to start Tasks required prior to starting Rockstor.
Dec 16 10:36:10 BN-NAS1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=510 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1506 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd[1]: Job rockstor.service/start finished, result=dependency
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1506 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1507 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1508 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1509 reply_cookie=0 error=n/a
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1/unit/rockstor_2dpre_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1/unit/rockstor_2dpre_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie
Dec 16 10:36:10 BN-NAS1 systemd-logind[2105]: Got message type=signal sender=org.freedesktop.DBus destination=n/a object=/org/freedesktop/DBus interface=org.freedesktop.DBus

[root@BN-NAS1 log]# ip route
default via 10.0.244.1 dev eth0 proto static metric 100
10.0.244.0/24 dev eth0 proto kernel scope link src 10.0.244.191 metric 100
[root@BN-NAS1 log]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
Segmentation fault
[root@BN-NAS1 log]# ip route
default via 10.0.244.1 dev eth0 proto static metric 100
10.0.244.0/24 dev eth0 proto kernel scope link src 10.0.244.191 metric 100
[root@BN-NAS1 log]#

https://access.redhat.com/discussions/751073

Should rockstor not be calling ip route instead of route.

@suman

I tried patching initrock.py to call /scripts/iproute.sh instead (which just in turn calls /usr/sbin/ip route) since I couldn’t deal with spaces in the run_command (I’m not a coder/know nothing about python)

That gets further but barfs on the interface checks now

Dec 16 12:01:21 BN-NAS1 initrock[13201]: system.exceptions.CommandException: Error running a command. cmd        = ['/usr/sbin/ifconfig', '100']. rc = 1. stdout = ['']. stderr = ['100: error fetching interface information: Device not found', '']
Dec 16 12:01:21 BN-NAS1 systemd[1]: Received SIGCHLD from PID 13201 (initrock).
Dec 16 12:01:21 BN-NAS1 systemd[1]: Child 13201 (initrock) died (code=exited, status=1/FAILURE)
Dec 16 12:01:21 BN-NAS1 systemd[1]: Child 13201 belongs to rockstor-pre.service
Dec 16 12:01:21 BN-NAS1 systemd[1]: rockstor-pre.service: main process exited, code=exited, status=1/FAILURE
Dec 16 12:01:21 BN-NAS1 systemd[1]: rockstor-pre.service changed start -> failed
Dec 16 12:01:21 BN-NAS1 systemd[1]: Job rockstor-pre.service/start finished, result=failed
Dec 16 12:01:21 BN-NAS1 systemd[1]: Failed to start Tasks required prior to starting Rockstor.

Edit:

Ahh I see ip route outputs in a slightly different format.

Right after a horrible hackjob on initrock.py i’ve got services started. (I stripped out the function to check the IP/Interface)

@suman Not a proper fix I know but I’m not sure why suddenly route segfaults, the consensus seems to be that you should be calling ip route on centos7 based machines though.

https://www.centos.org/forums/viewtopic.php?t=60499
https://bugs.centos.org/view.php?id=12392
https://bugs.centos.org/view.php?id=12386

@suman Found these while testing a Github issue over samba (ntp not starting too)

EDIT: https://bugs.centos.org/view.php?id=12308

it’s all about a glibc bug actually with an high priority, not related to Rockstor itself, but Centos

M.

2 Likes

My opinion and suggestion to Rockstor users:

being this glibc related please avoid manual patching and wait for a patch (on Rockstor side we can only do the same)

M.

P.S.: Arch Linux, Debian and others having same issue so don’t worry about this, probably going to be solved asap

Really helpful information here @Dragon2611 and @Flyer. I am investigating the problem. From Matthew’s feedback it seems we should just stop using net-tools package. After my experiments we may have a hotfix with a few more issues to work on as a longer term fix in the next cycle.

It’s the second time this happened in our project’s history where a storm of upstream updates coincide with stable update release and we end up in unfortunate mess. We’ll plan to be more proactive and make rockstor services leaner and as solid as possible by being very careful about external tooling.

Really appreciate your input @Dragon2611.

The other option is Rockstor hosts it’s own Repo so it controls the updates to the OS, but then this also makes you responsible for ensuring any security updates are pushed out in a timely fashon

Adding some more infos for Centos 1611

https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7#head-281c090cc4fbc6bb5c7d4cd82a266fce807eee7c

we should have deprecated net-tools long ago, shame on us especially given we barely use the tools from the package anyway. Do you mind trying this out?

  1. `cd /opt/rockstor/src/rockstor/scripts/
  2. curl -O https://gist.githubusercontent.com/schakrava/c6b4410de52905ea10ff45d9d9a2acc3/raw/e2dcf57103eb01dac217666604f61bfbd5d1b9ed/initrock.py
  3. systemctl restart rockstor-pre

I am thinking about rolling out a hotfix rpm and removing dependency on net-tools. But if you could confirm the above change works, that would be really appreciated.

1 Like

Removed my hackjob’d file and re-downloaded it from the link provided to ensure it overwrote cleanly.

I believe that has worked, the restart didn’t throw much in the way of errors, some unknown values for SMBd but that’s probably some depreciated custom config set

[root@BN-NAS1 scripts]# systemctl status rockstor-pre
● rockstor-pre.service - Tasks required prior to starting Rockstor
Loaded: loaded (/etc/systemd/system/rockstor-pre.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2016-12-16 22:22:46 GMT; 1min 5s ago
Main PID: 20861 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/rockstor-pre.service

Dec 16 22:22:40 BN-NAS1 initrock[20861]: 2016-12-16 22:22:40,201: Checking for flash and Running flash optimizations if appropriate.
Dec 16 22:22:42 BN-NAS1 initrock[20861]: 2016-12-16 22:22:42,285: Updating the timezone from the system
Dec 16 22:22:42 BN-NAS1 initrock[20861]: 2016-12-16 22:22:42,286: system timezone = Europe/London
Dec 16 22:22:42 BN-NAS1 initrock[20861]: 2016-12-16 22:22:42,290: Updating sshd_config
Dec 16 22:22:42 BN-NAS1 initrock[20861]: 2016-12-16 22:22:42,295: sshd_config already has the updates. Leaving it unchanged.
Dec 16 22:22:42 BN-NAS1 initrock[20861]: 2016-12-16 22:22:42,296: Running prepdb...
Dec 16 22:22:44 BN-NAS1 initrock[20861]: 2016-12-16 22:22:44,556: stopping firewalld...
Dec 16 22:22:44 BN-NAS1 initrock[20861]: 2016-12-16 22:22:44,818: firewalld stopped and disabled
Dec 16 22:22:46 BN-NAS1 initrock[20861]: 2016-12-16 22:22:46,794: rockstor service looks correct. Not updating.
Dec 16 22:22:46 BN-NAS1 initrock[20861]: 2016-12-16 22:22:46,796: rockstor-bootstrap.service looks correct. Not updating.
1 Like

https://bugs.centos.org/view.php?id=12419

:wink:

1 Like

Hi all fortunately we have some news, other users reporting same issue on Centos Bug tracker :slight_smile:

Hi guys,
here a small workaround to solve this (this is not a patch!)

Edit /etc/nsswitch.conf, find hosts line and change it from
hosts: files wins dns myhostname
to
hosts: files dns wins myhostname

Having dns before wins avoid ping segmentation fault & ntpd getting killed

M.

Hi all,
some updates:

https://www.centos.org/forums/viewtopic.php?f=47&t=60499
https://bugs.centos.org/view.php?id=12419

Actually under Rockstor (and all CentOS): nsswitch.conf hack partially solves it, rebooting you machine will have ntp failing so manual systemctl start ntpd required, please remember this till samba 4.4.4-10 (this probably will fix it)

Mirko

1 Like
1 Like