Error when clicking on pool

sirhcjw · July 18, 2015, 5:24am

After upgrading to 3.8.3 I am still seeing this error?

Thoughts? @suman

gkadillak · July 18, 2015, 3:36pm

Is there any output in the logs? The logs are found at /opt/rockstor/var/log/rockstor.log

sirhcjw · July 18, 2015, 10:12pm

Yes here is the errors

[18/Jul/2015 17:10:37] ERROR [storageadmin.util:38] request path: /api/pools/pool01/scrub method: GET data: <QueryDict: {}>
[18/Jul/2015 17:10:37] ERROR [storageadmin.util:39] exception: invalid literal for int() with base 10: 'after’
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py”, line 40, in _handle_exception
yield
File “/opt/rockstor/src/rockstor/storageadmin/views/pool_scrub.py”, line 46, in get_queryset
self._scrub_status(pool, disk)
File “/opt/rockstor/eggs/Django-1.6.2-py2.7.egg/django/db/transaction.py”, line 399, in inner
return func(*args, **kwargs)
File “/opt/rockstor/src/rockstor/storageadmin/views/pool_scrub.py”, line 56, in _scrub_status
cur_status = scrub_status(pool, disk.name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 627, in scrub_status
stats[‘duration’] = int(out[1].split()[-2])
ValueError: invalid literal for int() with base 10: ‘after’

sirhcjw · July 19, 2015, 9:37am

btrfs scrub status /mnt2/pool01
scrub status for cdc962b5-f6f6-45b4-bfc0-17316b3c1d7f
scrub started at Mon Jul 13 22:03:23 2015 and finished after 01:03:48
total bytes scrubbed: 1.30TiB with 1632 errors
error details: read=4 csum=1628
corrected errors: 1632, uncorrectable errors: 0, unverified errors: 0

Returns this.

I kicked off another scrub as from looking at the code and the error I figured it was not handling what was being returned and presto fixed.

So there is a bug.

Can this please be looked at and addressed.

Thanks

Chris

phillxnet · July 19, 2015, 10:17am

@sirhcjw and @gkadillak well done people. I have opened a github issue recording this info and given it’s prominence this should receive a high priority, however I think this one is in @suman only territory for the time being so it may be a little while as he is currently out and about promoting Rockstor via conference appearances.
Nice find.

KarstenV · July 19, 2015, 8:39pm

I have actually seen the same error.

In my case it established itself, while I was running a scrub, and asked the system to shut down.

It told me it did a gracefull shutdown.
When I restarted the system, I got the exact same error as shown in the original message, when selecting the pool that was being scrub’ed.

I initiated a new scrub from the command-line “btrfs scrub start /mnt2/**pool”, went to the pool in the web -interface, and the error was gone when clicking it.

So it would seem the system did not do a completely gracefull shutdown.

phillxnet · July 19, 2015, 8:48pm

@KarstenV I don’t think this error indicates a non graceful shutdown as it seems to be a bug in the way the UI is interpreting the state of the btrfs scrub; ie in certain situations it errors when trying to interpret perfectly sane responses from the btrfs subsystem. It should be taken in hand soon via the referenced issue.

KarstenV · July 19, 2015, 8:54pm

OK.

Just something I experienced.

I would think it would be wise for the system to check for running scrubs / balances, and ending them, before shutting down. I would guess that this is allready implemented, but my experience showed me otherwise.

But thank you for the answer

phillxnet · July 19, 2015, 9:14pm

@KarstenV I’m afraid I am not yet up on this part of Rockstor so don’t really know; it was just the impression I got from the log entry. It might be that a scrub, once initiated, is continued / restarted if it fails to complete due to reboot. I expect @suman can clear this up once he get done on other outstanding bits and bobs. Sorry not to be of any more help / reassurance on this; and given the error is in reading the scrub state we may have a chicken and egg problem here. Thanks for reporting you findings thus far.

KarstenV · July 20, 2015, 6:36am

Well there is definately something wrong.

The scrub I initiated last evening ended this morning. The results were like this:

As soon as the scrub ended I tried to click the relevant pool in the web-interface and got the error:

I will add that during the scrub, I was able to click on my pool, and see the status of it without error messages.

So there is definately something thats not right

KarstenV · July 22, 2015, 8:25am

I have managed to get access to clicking my pool again.

I was trying out some commands i saw in a video regarding maintenance on btrfs.

The command was “btrfs scrub start -B -d /dev/sdx”.

The -B tells the btrfs scrub, to not run in the background. Then you can exit out by doing Ctrl-c.

After having run a partial scrub and exiting out this way, the pool was accesible in the web interface again.

This is not a fix to the original problem, but it does make it possible to access the pool, and add/remove disks / do other maintenance.

Hope other peoble are able to make their pools accesible again.

phillxnet · July 22, 2015, 9:17am

@KarstenV That’s an excellent find, thanks for submitting it. I have quoted your findings in the ongoing github #issue752 to assist in the fix and collect triggers / info / workarounds. Looks like at least one cause of this error is pretty much known now. This is a great help. We now have at least 2 issues open that may related to the root cause of this so it’s going to be gratifying to have this one done and dusted. @suman has recently confirmed an extension to the usual release candidate testing period going forward and I believe there are plans to further extend the automated test coverage so hopefully show stoppers such as this wont be as common. Here’s hoping.

TheRavenKing · July 23, 2015, 6:27pm

@phillxnet @suman

Error!
invalid literal for int() with base 10: ‘after’

Looks more like the wrong type of variable is used here: perhaps int() instead of float()
No idea about the coding and where to find it but I think that’s the problem.

Looking in the code at Github, I notice you mix the size = models.IntegerField() and size = models.BigIntegerField(default=0)

Also If you want to accept null values with BooleanField,
use NullBooleanField instead. offline = models.BooleanField(default=False)

Hope this helps.

suman · August 1, 2015, 12:33am

Just want to update that the fix is rolled out in 3.8-4 update. Thanks for all your contributions.

TheRavenKing · August 2, 2015, 3:17pm

@suman Thanks, your welcome, I will test it