Request: better rockon update protocol

DrC · July 19, 2021, 8:27am

Currently:

root.json contains a list of:
'human project name' : 'project.json'
for each rockon

root.json and all project.json files are pulled every update. Failure to pull them all seems to result in an aborted update with no results.

This data seems to be held in memory and not written to the rockons share (or maybe it isnt until its fully downloaded so any timeout aborts the whole process)

rockon update pulls a meta list from /rockons/root.json
(as http and gets redirected to https, so fix that and half your web requests)
root.json has a list of all projects and is parsed
each project file is then pulled down, one after the other
(some seem to be pulled as http and redirected to https, again doubling requests)

Suggested:

Store the rockon’s list on the local rockon share. Then the problem just becomes how to update that. Here we have already split the problem (using rockons) into two distinct parts. The rockons code in the web interface uses the local store of rockons, so the problem now is how to update that local list.

Create ‘version 2’ fetch url at: /rockons/v2/

generate /rockons/v2/root.json (as outlined in the new format below) from data in existing /rockons/root.json

generate /rockons/v2/root.sha from sha hash on root.json

The format of /rockons/v2/root.json is now a list of
'human project name' : { url:'project.json', sha:'sha checksum', ...other fields?... }
(which should be easy to script from the existing /rockons/root.json)

rockon update pulls the root.sha
if the sha does not match the local copy, pull root.json into local storage
(if it did match, no need to pull again, just use the local copy)
walk projects in root.json and pull any project.json’s where local sha doesnt match the sha entry in root.json
(at this point you know how many downloads you will need to do so you can put up a progress bar)
delete any local projects not in root.json
(you now have a complete list of rockon’s in the minumum number of requests)
(I would also have a local directory for manually added projects which is always added, overriding projects in root.json for easy development)

Positives

much less webserver load
loading can be incremental (in that if it times out half way through, then when run again, anything that was sucessfully fetched before timeout will not be fetched again, as it’s sha will match and so it will be considered ‘fetched’ already)
only changed files (plus root.json) are pulled on changes
its easy to pull the rockons list from somewhere else (ie, git repo into local directory to simulate an update without needing to use the inbuilt update at all) (or maybe you could share it on an nfs directory and just symlink it into the local rockons share if you have a cluster of rockons)

negatives

requires reworking of rockon update code and change in file format.
requires script to generate /rockons/v2/ from /rockons/root.json

Why use root.sha as the first fetch? Well i figured its very small so the size of root.json doesnt matter and it can be fetched in probably 1 packet. You could also include a url in that root.sha to point to somewhere else to fetch the json files from so you would only need high capacity hosting for the root.sha because that would be fetched every update but the json files would only be pulled if there was ever a change. Then you could host the root.sha on google or amazon cloud and it would be speedy.

optional extras

Extend format to add utc time to each entry in root.json so that an update can show the ‘last updated’ entry for a project.

If you stored info on the date you installed a rockon, you could easily identify ones that have been updated then. (I dont know, i havnt been able to get rockons working at all so i dont know what a successful page of them looks like, haha).

You could even have version1 and version2 both living in the same tree. Rather than root.json, version2 would use root.v2.json and root.v2.sha (or something) and all the other files would be the same.

github?

As an after throught, why not just offload it all onto github and have rockons update just do a git pull from the github repo? why reintent this wheel? Pull it from github into a directory in /mnt2/whatever-rocksons-share/plugin-data and have the rockons page work from there?

You dont even need git, as you can fetch the raw pages from github via http so you could move it off your webserver completely.

phillxnet · July 19, 2021, 10:46am

@DrC Re:

Yes we just moved to https only on that server. And “… fix that …” would involve the past few years of Rockstor install all having to update, post moving the requests to https in the first place of course.

I’m assuming from that, that you mean ship with an up-to-date at release/build time Rock-on list. That has a number of issues associated with it in itself I’m afraid as the available Rock-on information is encoded in the local Postgresql database that backs our Django Web-UI framework.

Yes we do this with the jslibs during a build. And with some config files to see if they need updating during each boot.

Again this is difficult to reflect in the DB. I’m feeling like this is another whole level of complexity to address something that is really not a problem unless one is on a very slow connection. And the use case of no connection is mute as one can’t install anything anyway in that case. We are, in your suggestion, not starting from a clean slate each time (as per our current arrangement) but form an ever increasing number of initial states depending on which Rockstor version one installed from. I like the simplicity of always starting from scratch and then we only have to test if a clean system can update itself from that know state. I.e. no migration from prior state, at least from initial install. I appreciate that we then update from a miriade of states so their is that. But initial state is always blank and always has been so we can always return a system to that if something goes astray. Where as if we ship with a moving target of initial states that introduces more variance for little benefit. Simple often wins in the long run is what I’m thinking. And we pull down significantly less than a single typical web page view with every update. So there may not be a problem to solve here.

We do have, see the GitHub - rockstor/rockon-registry: hosted registry for Rock-ons re:

Upload the file to /opt/rockstor/rockons-metastore/[app].json. Hit update in the Web-UI and install your brand new Rock-on!

There are some good ideas here. Thanks for your thoughts. I particularly like the checksum download, however if nothing but a single Rock-on definition has changed then this checksum would have to cover the entire repository: again entirely doable and I think this is the nuget of agreement re improving things: post moving to server side compression of course. Plus older existing systems simply ignore this new checksum and continue doing the brute force method.

This is against the terms of service is my understanding. But they do now host packaging service so that may be doable. There is always value in hosting your own mechanism and in this case we do actually test, internally, against GitHub before publishing to the main server. This gives us a real world scenario to test against before we release each rock-on change. And incidnetally it’s painfully slow to update directly from GitHub. You can try this youself via code edits as we do for testing. It takes ages to retrieve all the files. I think they may rate limit that kind of access actually. Yes hosting directly from GitHub this kind of thing in not the way we are going. Unless they open a packaging service that matches our needs. Plus, as stated, there can be value in hosting ones own services. And also downsides of course :).

Yes we do this prior to publicly publishing. It’s a pretty nice tests as given it’s so slow it can double check our function in slow network environments.

All great ideas by the way. But as always the details are the killer. But definitely some stuff to discuss further here. @Flox is master of Rock-ons so he may well have input on some of these ideas also.

To know more about the mechanisms involved see @Flox’s excellent developer write-up here:

You might like that .

Thanks again for you input. We would definitely like to refine the system. But ideally not at the expense of complexity. Hence an initial move to a checksum to see if anything else at all needs to be pulled. In most cases this would avoid a tone of otherwise required calls/processing etc.

Flox · July 19, 2021, 12:10pm

Thanks a lot for all your input and ideas and for taking the time to lay them out like that, @DrC!

I’m unfortunately short on time right now to give your propositions the serious look they deserve, but I wanted to briefly react as that is something that has been on our minds for a little while now. What you will read below should thus be considered only as that–first reactions–and proper thoughts on it will hopefully follow soon thereafter.

First, let me state that I agree with you that we need to improve our current mechanism and we could benefit from not going with a full refresh every time. As you mentioned, our current mechanism consists in fetching the entire list of rock-ons, and then re-populating our database using that information, even for rock-ons for which nothing has changed. Doing so only for rock-ons for which a change has been made is thus the area the most susceptible to offer performance gains. Your checksum idea is thus particularly interesting here.

@phillxnet made a very important point on the need to be able to refer a “ground truth” if needed, though, so we may want to offer two update mechanisms: (1) quick update (using delta), and (2) full update as currently done. Of course, we always strive to keep things as simple as possible for the end user, so we may want to keep only the quick update in the Rock-ons page, and the full update in a future “System settings”-like separated page, or something like that.

I never really had the time to really identify the real source of timeout here, but as this update procedure is very heavy on database interaction, my first guess was always there. We will have substantial upgrades at our disposal once we have addressed our technical debt, which means we might be able to substantially reduce the occurrence of such timeout using improvement to our existing update mechanism.

A lot to do, but as mentioned, I fully agree with the need to rework this, so thanks a lot for initiating that conversation. I will try to find time to give your propositions the attention they deserve and will hopefully be more useful soon.

DrC · July 19, 2021, 12:33pm

Im going to try and find where in the code it reads the updates to put some more debugging in as my first peek into the source and hopefully i can find the timeout.

I should say that im a grumpy old unix admin from way back so i tend to miss some of the finer points on databases and web user interfaces, so coming from that ‘old man’ school…

I dont get why you put the json into a database at all. If you are always pulling the json down to merge into the database, why not just use the json itself?

Im probably missing some ‘win’ from putting it into a database but my inital reaction would be that the rockon’s code would run from the provided configs. See in my suggestion, it never even occured to me to put it into a database, yeah im that ‘text based’, hahaha.

With text files, *ground truth is just remove the directory and pull it again.

Flox · July 19, 2021, 1:25pm

You should find interesting details in the wiki post @phillxnet linked in his message:
Rock-on framework implementation

… in particular the section titled Rock-ons catalog (list).

Hopefully I’ll be able to find time for the rest soon.