[SOLVED] Custom Rockon perpetually uninstalling

Hi,
on Rockstor 4.0.9 I am encountering a curious behavior for the first time. I played around with a custom version of MinIO that I created. It installed. However, when I am trying to uninstall it, it is perpetually stuck in the uninstall mode (the spinning cog on a greyed out Rockon).
After some time, based on some other comments, I used the /opt/rockstor/bin/delete-rockon <rockon_name> option. It cleaned out the database, however the task is still hanging at ‘uninstall’.

Finally, I went for a reboot, but that did not have any effect either.

In the logs I only see this (when I’m on the page I presume);

[04/Jan/2022 19:59:50] ERROR [system.osi:199] non-zero code(1) returned by command: 
['/usr/bin/docker', 'rm', 'MinIO_dw']. output: [''] error: ['Error: No such container: MinIO_dw', '']
[04/Jan/2022 19:59:52] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.
[04/Jan/2022 19:59:56] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.
[04/Jan/2022 20:00:00] INFO [storageadmin.tasks:63] Task [install] completed OK
[04/Jan/2022 20:01:40] INFO [storageadmin.tasks:55] Now executing Huey task [stop], id: ff837932-f295-4786-97eb-a1f69961a2c3.
[04/Jan/2022 20:01:40] INFO [storageadmin.tasks:63] Task [stop] completed OK
[04/Jan/2022 20:02:30] INFO [storageadmin.tasks:55] Now executing Huey task [uninstall], id: e2386433-f168-4c96-992d-8c1b701f1847.
[04/Jan/2022 20:02:31] INFO [storageadmin.tasks:63] Task [uninstall] completed OK
[04/Jan/2022 20:02:31] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.
[04/Jan/2022 20:02:43] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.
[04/Jan/2022 20:02:57] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.

And that continues until today …
when I check via console, the image/container doesn’t exist anymore, but yet this task still seems to be hanging around …

any other suggestions?

1 Like

@Hooverdan
Not quite sure how this may have happened, and a proper fix could follow once we have a reproducer. But for now your could try wiping the huey state stuff to see if that is part of what’s getting twisted up here.
So you could first logout of the Rockstor Web-UI and then stop all rockstor processes:

systemctl stop rockstor-bootstrap rockstor rockstor-pre

then wipe the current Huey status which is held in the following files:

-rw-r--r-- 1 root root 28672 Jan  5 18:12 rockstor-tasks-huey.db
-rw-r--r-- 1 root root 32768 Jan  5 18:33 rockstor-tasks-huey.db-shm
-rw-r--r-- 1 root root     0 Jan  5 18:33 rockstor-tasks-huey.db-wal

so say, assuming an rpm install:

rm /opt/rockstor/rockstor-tasks-huey.*

Then restart all the Rockstor services (I’ve left out the hdparm, and ipv6 check services in this case but those shouldn’t matter).

systemctl start rockstor-bootstrap

Which should in turn invoke rockstor.service and it in turn rockstor-pre.service via dependencies.

You should then see those three huey files re-appear. Only now they know nothing of the past.

This is an interesting one actually as I’ve not seen this before. It may be we need to incorporate such a ‘clean’ into our emergency/advanced delete-rockon script. Huey is also used for such things as scrubs and disk removal and other long running stuff but still, all those will restart at the filesystem level anyway, or can be re-initiated. We only ‘task’ it to keep an eye on it and not drag down (read stop) the Web-UI thread.

Let us know if that helps.

2 Likes

@phillxnet, thank you, unfortunately that hasn’t helped.

after stopping the rockstor processes I noticed that the db-shm and db-wal files are automatically gone, so I only had to remove the remaining huey db file.
After restarting as per your suggestion, the uninstall is still on the Rockon page.
So, I decided to first do a clean reboot and then execute the above steps, but unfortunately the end result was the same.

In the logs there is only this repeated entry (for the last few hours through the stop and reboots):

[05/Jan/2022 11:52:04] INFO [storageadmin.views.rockon:103] Rockon (MinIO_dw) state pending and no pending task: assuming task is mid execution.

I guess, the message is kicked out in this area of the rockon.py when it checks on status?

So, maybe somewhere in the storageadmin a status needs to be “tweaked”?

2 Likes

Not sure whether this is the correct “status” field that’s relevant, but the MinIO had an exit error code that’s visible in the storageadmin_rockon table:

image

Could that confuse the above evaluation, or am I totally looking in the wrong place :slight_smile:?

2 Likes

@Hooverdan Re:

Nice find. Yes I would think this could be upsetting something. Good to know it’s not the Huey getting stuck on something. Likely we need to also clear this db entry in our emergency clean-up script to account for such corner cases.

@Flox When you get the time I think this may be more in your area.

We’ve seen that one every now and then and that’s when the force delete script saves the day. Now you say you used it and it didn’t help but I wonder if this was before the Huey task list was wiped… Maybe you wiped but then there was still a pending task…

@Hooverdan, for the sake of confirmation, could you try to:

  1. force delete the Rock-on
  2. wipe Huey task list as per @phillxnet instructions but do not restart the rockstor-bootstrap service yet
  3. force delete the Rock-on again just to make sure (I actually am not sure that would work without the rockstor service running)
  4. restart Rockstor
  5. Force refresh of your browser, just to make sure there is no weird caching somewhere…

You could also check the storageadmin_rockon table as you did in between each step to see at what point it gets resolved (if it does).

Sorry if I can’t be of more help for the moment; I’m too short on time, unfortunately.

1 Like

@Flox, ok, so I followed your task list:

  1. force delete the Rockon - since I had tried that before, the message was Rock-On(MiniIO_dw) does not exist
  2. Log out of Web-UI
  3. stop rockstor services
  4. force delete the Rockon (actually runs) - same message Rock-On(MiniIO_dw) does not exist
  5. wiped the Huey task list
  6. force delete the Rockon (actually runs) - same message Rock-On(MiniIO_dw) does not exist
  7. checked the storageadmin_rockon table - entry remains as in the screenshot above
  8. restart Rockstor
  9. Cleared the cache (in fact used a different browser even)
  10. checked the storageadmin_rockon table - entry remains as in the screenshot above
  11. Logged back in and checked Rockons WebUI

Unfortunately, no difference. It’s still hanging at uninstalling

1 Like

I’ll have to find some time to have a better look at that one then… In the meantime:

I suppose that is a typo when you wrote that forum post, isn’t it? (MiniIO vs MinIO)

3 Likes

@Flox, maaaan. stupid when one uses the history function to perpetuate a typo!!! You’re, as usual, brilliant. That was actually the reason why it didn’t perform the deletion anymore. I went back, and during my earlier testing attempts I had the right spelling, but during this latest iteration that looked like it was continuing to uninstall I apparently snuck in an additional lower case i.
How embarrassing. Yes, after correcting this, it actually performed the deletion (as expected also from the storageadmin_rockon table) and the issue is gone.

Thank you!!! I guess, I better start using less confusing nomenclature when I am testing.

3 Likes

Glad it’s just that!
I’m looking forward to having that script integrated in the webUI… you wouldn’t even need to worry about exact name or case then (ideally)!

3 Likes

Whilst tinkering around with Rockstor’s codebase and re-installing a new Rockon (Unpackerr) I’m trying to publish, I had the exact same issue in version 4.6.0-0.

Since the new versions use poetry, one has to call the script a little differently:
/opt/rockstor # poetry run delete-rockon <rockon_name>

3 Likes