Rock-ons Web UI can fail to load due to a failed task

I started writing this earlier to post in troubleshooting, but as I was writing it thought of new things to try until I eventually solved my issue. But, I figured I’d post here in case it helps anyone else:


I was trying to create a new rockon and was having some trouble within, it kept not showing up in the “installed” page, and instead the logs would say:

/opt/rockstor/var/log/rockstor.log.2:[03/Oct/2018 23:22:41] ERROR [storageadmin.views.rockon:82] Rockon (Node.js app from git) is in pending state but there is no pending or failed task for it.
/opt/rockstor/var/log/rockstor.log.2:[03/Oct/2018 23:39:33] ERROR [storageadmin.views.rockon:82] Rockon (Node.js web app from git repo) is in pending state but there is no pending or failed task for it.

(I renamed it once just to try and get a fresh start, so both lines appear a few times.)

I played around a bit with docker on the command line and confirmed that everything worked when I started the image that way, so I’m still not sure where the initial hiccup came from.

I rebooted the entire system and re-started the rockons service a few times, and that didn’t seem to help. Then I realized that rockstor now has a setting for disabling q-groups. I had manually disabled them a while back, so I went and flipped the setting. I think that was the last change I made before the service stopped working entirely :confused:

I tried removing my .json file, but that didn’t help

Now whenever I go to the rockons page, I get this:

Houston, we've had a problem.
Found a failed Task (7bae3f90-d405-4840-bae9-bf58019aa46b) in the future of a pending Task (3a0daaa9-fca4-4540-9ed7-98e3b1096732).

            Traceback (most recent call last):
  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 27, in accept
    client, addr = listener.accept()
  File "/usr/lib64/python2.7/socket.py", line 202, in accept
    sock, addr = self._sock.accept()
error: [Errno 11] Resource temporarily unavailable

My rockstor.log has a few interesting things, including this:

[05/Oct/2018 10:05:11] ERROR [storageadmin.views.rockon_helpers:127] RockOn matching query does not exist.
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/rockon_helpers.py", line 122, in install
	rockon = RockOn.objects.get(id=rid)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/manager.py", line 127, in manager_method
	return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py", line 334, in get
	self.model._meta.object_name
DoesNotExist: RockOn matching query does not exist.

After seeing that, I went and created a duplicate rockon json file, except with the old name and rebooted, but it didn’t seem to help. That got me this line, so maybe I’m not remembering the old name quite right:

After that, I tried going into the database. I found my two entries in storageadmin_rockon and followed the trail of foreign_key constraints until I was able to delete both of the rockon json file entries from the db. Then I deleted both files form disk and rebooted. It didn’t help.

I looked at my logs again and noticed something new, though:

[05/Oct/2018 11:58:13] ERROR [storageadmin.util:44] Exception: Rock-on (64) does not exist.
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/rockon_id.py", line 67, in post
    rockon = RockOn.objects.get(id=rid)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py", line 334, in get
    self.model._meta.object_name
DoesNotExist: RockOn matching query does not exist.

I found where that was coming from in the code and noticed some stuff about ztask-daemon. I went back into psql and looked at the django_ztask_task table and found that it had 3 failed entries. I deleted all three and my web UI started loading again.


So, on a vaguely related note, it would be nice to be able to just add an arbitrary docker image with settings from the web UI, instead of having to ssh in and fiddle with JSON and risk break things. I guess that’s a feature request.


And, lastly, probably unrelated to the above but worth mentioning, I also have this entry in the logs a few times:

[05/Oct/2018 09:50:20] ERROR [storageadmin.util:44] Exception: Errors occurred while processing updates for the following Rock-ons (HTTP to HTTPS redirect: ['Cannot change image of the container (redirect-http-to-https) as it belongs to an installed Rock-on (HTTP to HTTPS redirect). Uninstall it first and try again.', 'Traceback (most recent call last):\n  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 68, in run_for_one\n    self.accept(listener)\n  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 27, in accept\n    client, addr = listener.accept()\n  File "/usr/lib64/python2.7/socket.py", line 202, in accept\n    sock, addr = self._sock.accept()\nerror: [Errno 11] Resource temporarily unavailable\n']).
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/rockon.py", line 107, in post
	self._create_update_meta(r, rockons[r])
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py", line 145, in inner
	return func(*args, **kwargs)
  File "/opt/rockstor/src/rockstor/storageadmin/views/rockon.py", line 194, in _create_update_meta
	handle_exception(Exception(e_msg), self.request)
  File "/opt/rockstor/src/rockstor/storageadmin/util.py", line 47, in handle_exception
	trace=traceback.format_exc())
RockStorAPIException: ['Cannot change image of the container (redirect-http-to-https) as it belongs to an installed Rock-on (HTTP to HTTPS redirect). Uninstall it first and try again.', 'Traceback (most recent call last):\n  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 68, in run_for_one\n    self.accept(listener)\n  File "/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py", line 27, in accept\n    client, addr = listener.accept()\n  File "/usr/lib64/python2.7/socket.py", line 202, in accept\n    sock, addr = self._sock.accept()\nerror: [Errno 11] Resource temporarily unavailable\n']
1 Like

@nfriedly Hello again.

Yes, it wasnt’ long before that setting appeared that if one disabled quotas on Rockstor managed pools then it broke a whole host of stuff. Took quite a bit of work to make everything ‘quota disabled’ compatible. Not sure if that had lead to some of your issues though. But Rockstor was previously most upset (broken) for disabled quotas. Had to steam on with that one as when we switched to docker-ce it ended up disabling quotas for us, nice. See:

The graphical disable button does nothing but show current state and enable or disable quotas on the associated pool, ie there is no, non transient, db counterpart component to that setting: the source of truth is the pool quota state. But it took until shortly before that inline edit was added for us to work out the various issues that quota disabled had brought.

Well done, bit of a journey that one, and thanks for sharing your findings.