Unable to install any rockons after fresh install

Dunkelheit · November 8, 2018, 3:06am

Hi! I have just installed rockstor on my new server rack and the experience has been great until I started looking in to rockons… I managed to install one (http to https redirect) fine but after that I have been unable to install any at all, even the http to https rockon fails to install now. I’ll upload the log file here: https://dunkelheit.co.uk/i/Ethical_Attwatersprairiechicken_7iYgh.log and have another go at fixing it myself. So far I have tried the following: Rebooting, reinstalling, completely wiping all drives and installing (managed to get one rockon after this then it wouldn’t install any others), checking loads of topics on the forums and asking some people I work with. Thanks in advance for any help - Dunkelheit

phillxnet · November 8, 2018, 7:41pm

@Dunkelheit Welcome to the Rockstor community.

Glad to hear that, and yes Rock-ons is a newer, ready less debuged, feature but not nearly as bad as your report indicates.

That’s rather a strange experience all in. Could you confirm that you are following the advice in our Rock-ons (Docker Plugins) docs, and specifically the Rock-ons root setup. And that you are using a dedicated share for the specific rock-ons needs. Older versions of Rockstor, still expose /root which is not recomended for this, or any, use really: hence it’s later removal as a listed share.

From a quick look at your logs you also don’t appear to have quotas enabled on your rockstor_rockstor (system) pool. Quotas disabled is only a supported options in the stable channel release (after updates have been applied. Which version of Rockstor is this and have you, after selecting which ever update channel, also applied all updates? This can take a while so be patient. And then it’s also best to reboot to make sure they are all in place.

You seem to have run into a number of issues from your log:

Apart from the quotas already mentioned:

[08/Nov/2018 01:18:37] ERROR [storageadmin.util:44] exception: Operation not permitted on this Share(root) because it is a special system Share
Traceback (most recent call last):

[08/Nov/2018 01:43:54] ERROR [storageadmin.util:44] exception: Share(rock-ons) cannot be deleted because it is in use by Rock-on service. If you must delete anyway, select the force checkbox and try again.
So you did try to have a seperate share for rock-ons. But be sure to use this only for the Rock-ons-root. Make others for each of the requirement each rock-on needs. Don’t re-use this share for anything else.

And if you want to delete it turn off the docker service and reconfigure it to use another share for it’s ‘rock-ons-root’ then, once that’s established you should be able to delete it as the docker system will no longer be associated with it.

[08/Nov/2018 01:44:20] ERROR [storageadmin.views.pool:519] Exception while updating disk state: ‘PoolDetailView’ object has no attribute ‘_update_disk_state’
This one looks like you have applied updates but not rebooted, or just not refreshed your browser. Again it’s a strange one.

OK, lets start with the actual Rockstor version number (run the following as the root user):

yum info rockstor

There is a scenario where the Web-UI indicated version can indicate available rather than installed but you need to update to pickup the ‘fix’. But no Web-UI is offered for update if it doesn’t think there is one. That one is a bit annoying. The above yum command should give us an idea of where your system is at. You must select a channel and apply all the available updates though as there are now many.

Let use know if those docs help and also I remember we did have issues with the whole Rock-ons thing playing up after every update, that hasn’t happened now for a while so I think it was associated only with the testing channel, which is now a fair bit older than stable and moved from an older docker to docker-ce.

Hope that helps and lets see what happens if you avoid using /home or /root for anything docker related and you don’t turn quotas off: although there was a docker issue that forced quotas off from under us:

github.com/rockstor/rockstor-core

docker-ce dictating pool quota enabled disabled status

opened 04:45PM - 12 Mar 18 UTC

phillxnet

After moving to docer-ce in stable 3.9.2-3 (pr #1865) it was found that docker-c…e, upon starting (ie on boot or when it's service was switched on) would disable quotas. This at the time was very disruptive to many basic Rockstor functions so a work around was found in: "rock-ons-root host pool quota disabled by docker-ce" #1872 and implemented in: "rock-ons-root host pool quota disabled by docker-ce. Fixes #1872" #1873 essentially adding " '--storage-opt', 'btrfs.min_space=1G' " to dockerd's initialization. This 'work around' effected a re-enabling of quotas by dockerd almost directly after it had initially disabled them, which it still did. This quota cycling of the affected pool on every boot was alleviated some what by improving Rockstor's quota disabled/cycling capabilities under issue: "improve quotas not enabled behaviour" #1869 and pr: "improve quotas not enabled behaviour. Fixes #1869" #1874 Given the above recent improvement in quota disabled behaviour a Web-UI selector for quotas / pool was introduced via issues: "[New feature] Add option to disable BTRFS quota/qgroups" #1592 and "Feature: Quota rebuild script?" #1785 and implemented in pr: "Add option to disable BTRFS quota-qgroups. Fixes #1592" #1903 So we now entertain the user selectable option of Quotas Disabled. But given that a pool's quota state is remembered by the pool itself and all our existing mechanism observe and are informed by the behaviour we have again surfaced an issue with docker-ce dictating pool quota status. This, to the issue author, looks to be the same upstream issue as was observed in the first referenced issue above (re-referenced here for ease: https://github.com/docker/for-linux/issues/78). It is not clear how we are to proceed as currently docker-ce is now dictating the final quota state for the pool that hosts it's rock-ons-root share (subvol). This is inappropriate and not in keeping with btrfs defaults of adopting the last quota stats set, which the rest of Rockstor observes. A workaround for those wishing to adopt a quota disabled state for a pool currently hosting the rock-ons-root is to re-create their rock-ons-root on another pool (where enabled quotas are acceptable) or potentially to revert the changes in #1873; but the latter would simply have docker-ce dictate that quotas were disabled rather than enabled shortly after boot and again circumvent the user setting (added in #1903) and the btrfs default behaviour of maintaining the last setting requested. Suggestions welcome: however, in the issue authors opinion, docker-ce should not dictate/hard wire quota status, irrespective of it's initialisation parameters. And as such this is viewed as an upstream bug that can only be addressed upstream.

however that was fixed in later stable updates and the offending docker-ce was only introduced in stable channel as well.

Thanks for reporting your findings and lets first see what that yum command states and take it form there, I suspect you have updates that have not fully taken affect, or no updates, although a reboot normally sorts the regular issues with some old and some new code; and far more regularly it’s normally a simply web page refresh to fix issues after updates as we use a lot of java script.

So I’d suggest starting over on a fresh install, then selecting the update channel that you fancy, stable is much newer code but testing is still fairly workable (we had to change priorities so we now do more regular stable updates instead). Then update to the ‘new’ rockstor version: all system updates come along with it as well and there are also many of them. And with the provided yum command make sure you actually have that version then installed. Then reboot and go about your experiments. And make sure to not select a prior used rock-ons-root just in case it’s got ‘confusing’ content or the like. Just to keep things simply for the time being. You should be able to import you existing pool if you have already populated it. But as a precaution against software / human error, when re-installing you might want to disconnect your data drives and only re-connect them once all updates and associated reboot has taken effect. And don’t connect them live, we aren’t that good at that bit just yet.

Hope that helps and let us know how it goes. Well done for persevering thus far and looking to the forum for answers. Your report is not one I have heard of before (ie only the first rock-on works, others then fail) but the rock-ons system is in need of improvement (better guidance) and in fact has received some updates in the stable channel by forum member @Flox but that was for extending their options capability.

Dunkelheit · November 8, 2018, 9:27pm

Thanks for the quick response but while trying to fix things myself I decided to try the boot drive in a different server that I was in the middle of setting up, everything worked fine on this. After moving it back to the previous system things seemed to work fine. If anything does change I will update you but as of now there are no problems at all. I have no idea as to what “fixed” my problem but if I find out I will be sure to tell everyone. Thank’s - Dunk

EDIT: I have the slight suspicion that a yum update may have had a part to play in this. Everything seemed to work fine until I updated all the packages through yum, I will try to test this by doing a fresh install and trying some rockons before and after a yum update

Dunkelheit · November 9, 2018, 3:11am

So, after a fresh install this is the error that I receive for every (slight difference between each error, this is the most recent. Download log below) rockon (before and after yum updates)

[09/Nov/2018 03:01:57] ERROR [storageadmin.views.command:85] deadlock detected
DETAIL: Process 7092 waits for ShareLock on transaction 6602; blocked by process 7086.
Process 7086 waits for ShareLock on transaction 6604; blocked by process 7092.
HINT: See server log for query details.
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 80, in _refresh_pool_state
p.save()
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 734, in save
force_update=force_update, update_fields=update_fields)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 762, in save_base
updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 827, in _save_table
forced_update)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py”, line 877, in _do_update
return filtered._update(values) > 0
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py”, line 580, in _update
return query.get_compiler(self.db).execute_sql(CURSOR)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/sql/compiler.py”, line 1062, in execute_sql
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/sql/compiler.py”, line 840, in execute_sql
cursor.execute(sql, params)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py”, line 64, in execute
return self.cursor.execute(sql, params)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/utils.py”, line 98, in exit
six.reraise(dj_exc_type, dj_exc_value, traceback)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py”, line 64, in execute
return self.cursor.execute(sql, params)
OperationalError: deadlock detected
DETAIL: Process 7092 waits for ShareLock on transaction 6602; blocked by process 7086.
Process 7086 waits for ShareLock on transaction 6604; blocked by process 7092.
HINT: See server log for query details.

I have looked at HTOP for a process with the id 7086 or 7092 but I cannot find one, I also cannot kill any process with said ID’s as it “Does not exist”. I would look into the files mentioned in the log but python is the only language that I do not know

I look forward to any help Thanks - Dunk

(Full log can be downloaded here: https://dunkelheit.co.uk/i/rockstor.log )

phillxnet · November 9, 2018, 1:37pm

@Dunkelheit Well done for trying to pursue this, but

yum update will not update the iso included rockstor code until you pick an update channel, with stable being the latest now and serving to help support Rockstor’s development.

From you log excerpt this looks very much like a db access race of some sort, ie the

element. I and others have fixed some of the possible causes of these in stable and there were also some earlier fixes in the testing channel code (still newer than iso Rockstor code).

So given the above we really need the output from the command requested previously to know what you are actually running, Rockstor code wise. Especially given the prior mentioned chicken and egg bug we had.
I.e. without the output of:

yum info rockstor

I at least cannot further participate.

They are likely processes involved in one or more transactional database functions that once found to be deadlocked (by Django code) where reverted / killed.

I wouldn’t let that put you off. If you have read any other programming language, or just plain English, then python is actually surprisingly readable. However db deadlocking is notoriously difficult to diagnose so maybe not the best place to get started with regard to code contribution / bug fixing. But I would highly encourage you to look at the code anyway as it’s all pretty readable (bar I would say the docker management code, ironically) and getting to be progressively better documented/commented as we go.

You might like to take a look at our Community Contributions doc section and specifically the Developers subsection.

And another quick look at your second generously supplied log we have:

Error while processing remote metastore at http://rockstor.com/rockons/root.json. Lower level exception: HTTPConnectionPool(host=‘rockstor.com’, port=80): Max retries exceeded with url: /rockons/root.json (Caused by <class ‘socket.error’>: [Errno 11] Resource temporarily unavailable)

that’s a simply web timeout, we are getting a little popular and occasionally exceed the hosted root.json which ultimately is hosted at our GitHub rockon-registry repo. Not sure of the exact nature of that end of things, project lead @suman may be able to comment on this. You actually have quite a few of those. Rockstor calls out to get a list of the latest available Rock-ons.

Anyway lets get that Rockstor code (package) version number and take it from there. As then everyone can see the actual version of the code you are running and possibly test for replicating this issue. Is there anything ‘odd’ about your hardware setup for example? Best to describe your setup and your choice of share’s re the Rock-on system and the individual Rock-ons’s share config. I.e. pics of your Disks, Pools, Shares for example. Exact steps to reproduce this (but before that just try subscribing to an update channel and getting whatever update it offers). You didn’t mention doing this in your reports and I haven’t seen it reported in the logs (may have missed it there though).