since a few days my rockstor system is not booting up probably. The “rockstor-bootstrap” service is taking forever and will never finish activating. Systemd tells me:
● rockstor-bootstrap.service - Rockstor bootstrapping tasks
Loaded: loaded (/etc/systemd/system/rockstor-bootstrap.service; enabled; vendor preset: disabled)
Active: activating (start) since Sat 2016-08-06 23:10:06 CEST; 11min ago
Main PID: 3566 (bootstrap)
CGroup: /system.slice/rockstor-bootstrap.service
└─3566 /usr/bin/python /opt/rockstor/bin/bootstrap
The web interface is available but very slow. Rock-On (docker) and other services are not starting up (systemd status):
State: starting
Jobs: 10 queued
Failed: 0 units
Since: Sat 2016-08-06 23:08:49 CEST; 15min ago
Since it is a systemd service, you might try journalctl -u rockstor-bootstrap, or else manually look through the logs since last boot with journalctl -xb.
thanks for your reply. The journalctl -u rockstor-bootstrap is empty, I think because it never gets activated. journalctl -u rockstor seems normal:
[root@majestix ~]# journalctl -xu rockstor
-- Logs begin at Sun 2016-08-07 15:39:28 CEST, end at Sun 2016-08-07 15:49:38 CEST. --
Aug 07 15:40:49 majestix supervisord[3550]: 2016-08-07 15:40:49,655 CRIT Supervisor running as root (no user in config file)
Aug 07 15:40:49 majestix supervisord[3550]: 2016-08-07 15:40:49,675 INFO RPC interface 'supervisor' initialized
Aug 07 15:40:49 majestix supervisord[3550]: 2016-08-07 15:40:49,675 CRIT Server 'unix_http_server' running without any HTTP authentication checking
Aug 07 15:40:49 majestix supervisord[3550]: 2016-08-07 15:40:49,676 INFO supervisord started with pid 3550
Aug 07 15:40:50 majestix supervisord[3550]: 2016-08-07 15:40:50,681 INFO spawned: 'nginx' with pid 3564
Aug 07 15:40:50 majestix supervisord[3550]: 2016-08-07 15:40:50,687 INFO spawned: 'gunicorn' with pid 3565
Aug 07 15:40:50 majestix supervisord[3550]: 2016-08-07 15:40:50,690 INFO spawned: 'data-collector' with pid 3566
Aug 07 15:40:50 majestix supervisord[3550]: 2016-08-07 15:40:50,693 INFO spawned: 'ztask-daemon' with pid 3567
Aug 07 15:40:52 majestix supervisord[3550]: 2016-08-07 15:40:52,774 INFO success: data-collector entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
Aug 07 15:40:52 majestix supervisord[3550]: 2016-08-07 15:40:52,776 INFO success: ztask-daemon entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
Aug 07 15:40:55 majestix supervisord[3550]: 2016-08-07 15:40:55,781 INFO success: nginx entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
Aug 07 15:40:55 majestix supervisord[3550]: 2016-08-07 15:40:55,782 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
I cannot find any hint on a problem in journalctl -xb
This is odd, your bootstrapping process seems to hang already at /usr/bin/python /opt/rockstor/bin/bootstrap. Try running this command manually. The first thing this should do is perform a btrfs device scan, but it should generate output whether or not that succeeds.
BTRFS device scan complete
Error connecting to Rockstor. Is it running?
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /api/commands/bootstrap (Caused by <class 'httplib.BadStatusLine'>: '')
Error connecting to Rockstor. Is it running?
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /api/commands/bootstrap (Caused by <class 'httplib.BadStatusLine'>: '')
Error connecting to Rockstor. Is it running?
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /api/commands/bootstrap (Caused by <class 'httplib.BadStatusLine'>: '')
Looks like that rockstor.service is not booting up probably. I wonder why this output is not returned by the service.
Do the system journals and/or rockstor.log give any clues on why this is so? Have you tried restarting the rockstor service or system? I can see that your gunicorn server is getting worker process timeouts, which is surprising given the long (120 seconds) timeout limit, but I can’t so far figure out the cause with the system components apparently running fine.
I did not count how many times I restarted the system. Also tried to restart the services. This did not help at all. As far as I understand the components involved, the timeout occurs somewhere between the gunicorn server and the nginx server? But why does the UI than runs fine? Its slow, but it works.
The system is not the fastest, but should be fine: AMD Turion II Neo, 2,2GHz, 10GB RAM.
Yesterday I tried to reinit rockstor (deleting the .initrock and running the command). This did not work also. Except that I now can not reactivate my stable subscription.
I use the docker daemon for my own containers without using rock-on ui. But this should not affect, since the docker environment is not up at this state.
Well, if neither journald nor /opt/rockstor/var/log contains any further clues about this timeout, I’m afraid someone with more knowledge of the web components will have to look into this. I’m fairly new to web development and still learning the ropes of the code.
It might help to also report this issue at the github repo. All I could suggest as a workaround is to manually increase the timeout values in /opt/rockstor/etc/supervisord.conf and restart rockstor.service, but frankly two minutes should be plenty and your system is even slightly faster than mine.
I did some more research today and found out, that docker is not removing subvolumes when deleting/removing containers or images. So I got 1001 subvolumes laying around. Could this be the reason the bootstrap/gunicorn runs into timeout?