So I had scheduled scrub task from when I first built the VM, sadly at some point it got stuck and I resumed it from the CLI.
The problem is Rockstor thinks it’s still running so doesn’t start a new one, meaning the Pool hasn’t been scrubbed for nearly a year (Opps!)
[root@rock-home ~]# btrfs scrub status /mnt2/Pool1/
scrub status for 48e9fe04-9f49-4c33-b9d7-12aefe745674
scrub resumed at Fri Aug 19 22:10:20 2016 and finished after 90:42:26
total bytes scrubbed: 5.46TiB with 0 errors
423 August 1st 2016, 3:00:00 am August 1st 2016, 3:00:07 am running
Is there anyway to force Rockstor to show the job as complete?
Edit:
Going to delete the task and create a new one and see if that fixes it.
I’m having something similar but didn’t think to investigate. I’ll see if it’s the same issue but it’s scrubs not starting. I powerdown nightly but made sure scrubs start 30 minutes after wake.
Traceback (most recent call last):
File "/usr/lib64/python2.7/logging/handlers.py", line 76, in emit
if self.shouldRollover(record):
File "/usr/lib64/python2.7/logging/handlers.py", line 154, in shouldRollover
msg = "%s\n" % self.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib64/python2.7/logging/__init__.py", line 324, in getMessage
msg = str(self.msg)
TypeError: __str__ returned non-string (type list)
Logged from file pool_scrub.py, line 78
Traceback (most recent call last):
File "/usr/lib64/python2.7/logging/handlers.py", line 76, in emit
if self.shouldRollover(record):
File "/usr/lib64/python2.7/logging/handlers.py", line 154, in shouldRollover
msg = "%s\n" % self.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib64/python2.7/logging/__init__.py", line 324, in getMessage
msg = str(self.msg)
TypeError: __str__ returned non-string (type list)
Logged from file pool_scrub.py, line 39
@magicalyak Thanks for trialing the delete re-add route, and given your cron entry (cheers) you now have the new pool API format (id rather and name) so could you please post the output of the following:
btrfs scrub status -R /mnt2/pool-name-here
But from a quick look we have a bug in error message formatting that is making this a little more tricky to fathom ie:
expects as string but is receiving a list! I don’t see how as yet our pid is a list at this point.
Could you see if a manual scrub attempt (using the force option) clears this up as I’m not sure yet what’s happening. That is assuming there are no currently ongoing scrubs.
As can be seen the problem seems to be in the simple validation of the pool id, as far as the TypeError but the larger problem is why the pool id lookup is incorrect. Haven’t seen this before. Have you rebooted recently, ie since we made the pool API changes? Or at least re-started the rockstor service?
This looks to be a different problem to the others reported here.
I’ll try tonight but I reboot nightly and use wake at 6am (giving the machine a full hour before attempting a scrub). So I’m sure the new stuff is loading. I recall a manual scrub working a few weeks ago but I’ll verify today or tomorrow at the latest.
Thanks for contributing to this forum thread by way of confirmation. Very much looks like you and @magicalyak now have examples of this issue. I have had another quick look at this and have as a result opened an issue:
I now have a suspicion of what has happened but need a little more time to investigate. As I currently don’t have a reproducer here I may well have to ask you both to test a proposed fix (pending); assuming I don’t manage to reproduce this issue locally.
Thanks for your persistence on this one and I hope to have a proposed fix in due course, all interim progress will be noted in the referenced issue.
Happy to test. Just let me know. Btw a scrub run manually on the cli works fine.
Also is there a feature request to add a “run now” option for scheduled tasks?
Could you both look to see if you have any scrubs in the ‘Task history’ (clicking on “Last run” entry) for the problem pool that are in status ‘running’ or ‘started’, I’ve got a hunch? It may be that if there are multiple instances of these states in the history that we trip ourselves up.
I checked the task history, se appended screen dump. There are only finished and “error” entries. The last four ended with “error”; and one more between successful ones. Another thing I notice is that they all appear to finish in about 2 seconds. However, if I do btrtfs scrub status -R it is correctly reported as taking almost 3 hours so I assume the web UI reporting is just an cosmetic issue.
I will check later tonight if I can run scrub jobs manually and/or if it helps to delete and reschedule the job. Another thought; what if I manually clear the the history file?
[root@nitrogen ~]# btrfs scrub status -R /mnt2/pool-1
scrub status for 2815c029-9214-4217-9571-2d7fb501da0c
scrub started at Mon Jul 24 01:42:02 2017 and finished after 02:48:09
data_extents_scrubbed: 58144551
tree_extents_scrubbed: 318195
data_bytes_scrubbed: 3700702498816
tree_bytes_scrubbed: 5213306880
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 223040
csum_discards: 0
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 3778576121856
Funny you should say that, as if one removes a scheduled scrub via the bin icon in Actions it removes the history of that scrub and re-instating the same scrub will not give access to the same scrub history. However the pool details page retains a list of all scrubs done, manual (via Web-UI) or scheduled. My current thought is that you both have duplicate pid (in this case process id of the scrub job) entries against a pool in your storageadmin_poolscrub database tables. I’ll try and work out an sql command to extract this info by way of a test. I’m a bit rubbish at sql so this may take me some time.
Thank folks, I think that’s given me another angle to explore. Back in a bit.
OK so I think this might be worth a try. If you execute the following command (carefully) as root to see if your systems have duplicate pid (process id) entries for a given pool in storageadmin_poolscrub:
psql -U rocky storageadmin
pass=rocky
SELECT pool_id,pid,count(*) FROM storageadmin_poolscrub GROUP BY pool_id,pid HAVING count(*) >1;
and if you could past the output (if any) here that might help; still a little in the dark on this one unfortunately.
And to exit the postgresql console enter the following:
\q
Also do either of you have > 1 scrub jobs listed as started or running in the pools detail page scrub tab.
I’ll have to leave this one for a bit and circle back around to it as this may not be the tree to bark up.