Another take on "Unknown internal error doing a POST to /api/rockons/update"

Brief description of the problem

I’ve seen this discussed before and have verified that I have wget installed, and that I can with wget download the root.json file myself from a system shell. No other threads have contained things I’ve found applicable.

I am having issues updating my rockons list, this is a fresh install from a kiwi installer updated to 4.0.6-0 via gui.
I can’t find any traces of what’s going wrong (which probably is due to my poor understanding of the inner workings of Rockstor) save brief entries in nginx logs.

<workstation ip> - - [27/Mar/2021:17:13:53 +0100] "POST /api/rockons/update HTTP/1.1" 504 569 "https://<rockstor server ip>/home" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"

and gunicorn logs.

[2021-03-27 16:33:10 +0000] [10737] [INFO] Booting worker with pid: 10737
[2021-03-27 17:13:53 +0000] [1366] [CRITICAL] WORKER TIMEOUT (pid:10737)

What would you have me check next?

Detailed step by step instructions to reproduce the problem

Go to ROCK-ONS
Choose “all” tab
Click Update
Wait 120 seconds

Web-UI screenshot

Error Traceback provided on the Web-UI

None

I will in the mean time install a vanilla Leap 15.2 and do a fresh install of rockstor.
I had the same problem before updating from my own installer version to 4.0.6-0.

@greven Thanks for the report.

We’ve not had any other reports of this, however it could be that it’s simply taking too long to get all the rock-ons. We have seen this before due to the number of calls made on the server and have recently removed a redundant call in the code where every single Rockstor instance would update it’s Rock-ons list every time a dashboard was refreshed. This was an obvious waste of resources both on the client and the server.

The wget command to get the index file of root.json is a good test for access but is not the whole story. Once that file is retrieved each and every rock-on definition file is then individually downloaded and analyzed to incorporate/update the Rock-on definition in the local database.

Is your hardware particularly cpu slow for instance. In which case it could take more than 2 minutes to do this initial process. Or is the internet connection slow. This is the area I would look to first.

Also I’d stick to DIY installer installs myself as installing the rpm on a vanilla Leap 15.2 requires quite a few hoops to jump through, such as ipv6 kernel command line disable, that we detail here:

and shouldn’t be required unless you intend to setup a development environment.

Plus it’s easier to reproduce what you may have there from the installer than from the numerous dev env setup steps.

That’s a curios one and points again to something timing out. Look more to that log to see what has timed out in that case. We do have a new dependency on a Huey library and it may be it’s got a problem on your hardware.

If you could tell us more about the speed of the underlying system it may help to narrow this down as the Rock-on refresh is quite an intensive process, especially initially; and as our list grows of course. However we do have a Rock-on pruning process underway that is currently lead by forum member @Hooverdan under the following Rock-on repo issue:

https://github.com/rockstor/rockon-registry/issues/281

Sorry not to be of more help on this one just yet.

Also take a look at the following excellent technical wiki entry by @Flox, lead dev on Rock-ons, for a background on what’s going on re the Rock-ons update process:

It should help to narrow down where the bottle neck is in your system or to identify a recent bug which is likely down to me I’m afraid.

Keep us posted and again, I’d persevire with the DIY installer variant as there are less variables on what you system config is. Plus is way smaller than any regular Leap 15.2 install derived Rockstor. We essentially started with a Leap JeOS variant and added only the critical dependencies for the Rockstor rpms from there on. Try taking a look at the comparative sizes of your installs to get an idea of the difference, assuming you got as far as the Leap 15.2 based attempt.

Hope that helps and do keep us posted.

2 Likes

The hardware has a few miles on it’s clock but still running it’s
i7-3770K
2x4 GB RAM@1333MHz
for testing will be 4x4 or 4x8 when I’m done.

as for internet connection i’m on a 300/300 fibre connection easily reaching 10+ MB/s downloading packages.

The vanilla Leap with testing rpms give the same result for me.

I will have a look at the links you provided

2 Likes

Definitely not a slow machine and connection indeed!
It you’re systematically hitting this error, then I would lean towards something unrelated to connection slowness… Could you have a look in the logs? In particular, the rockstor.log might have more information (accessible either through the logs manager or at /opt/rockstor/var/logs/rockstor.log).

1 Like

And now it works… no change since last attempt, which failed it just worked

1 Like

@greven Thanks for the update/follow up. And glad your now sorted.

But still dissatisfying to not have at least a potential cause here. Agreed re speed and what @Flox said. Obviously not that in your case. We do have folk pushing the recommended spec and running for example Pi3 instances, I’m thinking of your @chrstphrchvz :slight_smile: , who has also contributed key speedups which is great. But on such slow hardware we still regularly time out / fail in all sorts of places. A few less thanks to @chrstphrchvz recent contribution but still, obviously not the case on this hardware.

So if you could take a look at the @Flox suggested log for the time of these timeouts it may help to shed light on this occurance. Alwasy best if we can get as much info as possible as it’s not always obvious at the time what may be happening but once we have the info it can help to inform future analysis or otherwise prompt folks who have had similar ‘transient’ experience of failure to also report it, again hopefully with more info.

Thanks again for the report and engaging on the forum. Much appreciated.

2 Likes

I am “happy” to let you know that it it keeps intermittently failing on my machine… :slight_smile:

Mostly it works but…

Rockstor Logs (as read from gui) contains nothing
Gunicorn (WebUI) (as read from gui) contains nothing but the ```
[CRITICAL] WORKER TIMEOUT (pid:1390)

Interesting… By curiosity, when that happens, does it error out right after clicking the “Update” button or after some time. My guess would be the latter as it is not systematically failing, but I’d rather make sure of it.

There is the option of setting the log level on your machine to DEBUG to see if we can have more help with the logs, but if it is indeed a timeout issue, it won’t necessarily be more helpful. In any case, if you are “game” to try that, you can set it that way (all from memory so hoping the path below is correct):

/opt/rockstor/bin/debug-mode ON

You should now be able to see DEBUG lines in rockstor.log (if not, you might need to restart the rockstor service: systemctl stop rockstor && systemctl start rockstor).

To go back to normal log level (INFO), you can simply run the same command but with the OFF flag.

3 Likes

It is indeed a timeout behaviour it takes about 120 seconds for the error notice to pop up, I’ll enable debug mode and see what we can learn.

2 Likes

I have the same 120 sec WORKER TIMEOUT bug, but it happen 100% of the time so Rock-Ons are not available. My system is not as fast, but not slow either. A four core 64bit 2.2 Ghz with 8GB memory.

Using current build Rockstor-Leap15.4-generic.x86_64-4.5.8-0.install.iso

1 Like

To keep track of this symptom, I have create an issue on github:

2 Likes