Scrubs all end with status unknown

I’m having a small problem with my Rockstor install.

When a new scrub is started, the UI almost instantly return a scrub with a status unknown. Instead of the normal Running (followed by finished). And if I click the status, there is no data to display.

In the CLI, I can see the scrub being started, and after some time (depending on what I scrub) finish with Status 0, which is a normal uneventfull scrub AFAIR.

What could be going on?

I am on 4.5.9-1.

@KarstenV does it make a difference in behavior if you manually start one, or observing one that was scheduled?

2 Likes

Behavior is the same either way.

Scheduled scrubs start at the scehdules time, but an end time is not shown in the task history, and status is unknown.

Under the pool scrub history its shown as this:

@KarstenV Hello again.
Is the pool mounted degraded, or have a sticky degraded mount option?

I’m thinking of the following issue:

But this doesn’t match your observed “unknown” status. Might be worth given that issue a quick read however.

Hope that helps.

1 Like

Theyre not mounted degraded.

I am able to use the pool as usual and write to it.

The rockstor boot pool and my main RAID1 pool are mounted normally, and both show the same behaviour.

Here are the mount options for my Radi1 pool: rw,relatime,space_cache,subvolid=5,subvol=/

I have never had to force a mount or something like that.

“btrfs fi sh” gives this:

Label: ‘ROOT’ uuid: 4ac51b0f-afeb-4946-aad1-975a2a26c941
Total devices 1 FS bytes used 10.42GiB
devid 1 size 72.46GiB used 20.06GiB path /dev/sdg4

Label: ‘RSPool’ uuid: 12bf3137-8df1-4d6b-bb42-f412e69e94a8
Total devices 7 FS bytes used 5.62TiB
devid 9 size 2.73TiB used 2.31TiB path /dev/sde
devid 10 size 1.82TiB used 1.40TiB path /dev/sdc
devid 11 size 1.82TiB used 1.40TiB path /dev/sdd
devid 12 size 1.82TiB used 1.40TiB path /dev/sdf
devid 13 size 2.73TiB used 2.31TiB path /dev/sdb
devid 15 size 2.73TiB used 2.31TiB path /dev/sdh
devid 16 size 3.64TiB used 1.23TiB path /dev/sda

And even thuogh the GUI shows the status as unknown, the scrub runs fine in the background.

“btrfs scrub status” gives this for e.g. my root drive:

Scrub started: Thu May 25 21:05:05 2023
Status: finished
Duration: 0:01:27
Total to scrub: 10.61GiB
Rate: 124.85MiB/s
Error summary: no errors found

The last scrub for my main pool ( scheduled one):

Scrub started: Sat Apr 1 22:00:03 2023
Status: finished
Duration: 7:25:34
Total to scrub: 11.24TiB
Rate: 440.87MiB/s
Error summary: no errors found

There is nothing wrong with my pools I think, the GUI just doesnt catch that the scrub started, and therefore cant report the status or end result.

2 Likes

@KarstenV Thanks for the extra info.

These days we use both the regular btrfs scrub status and it’s raw variant btrfs scrub status -R and our most recent work in this area was the following that added more tests to our scrub interpretation code:

Could you also give the output for your data drive of the raw scrub command, and the output of the btrfs version command we use to modify our scrub status command parser accordingly:

btrfs version

I.e. see:

What I’m thinking here is that there has been a btrfs backport since we last visited that code and we are then misinterpreting the btrfs output as if it was from a legacy brtrfs version - but that the proposed backport has now outdated our assumptions based on the btrfs versioning.

Hope that helps, and thanks for the engagement on this issue.

1 Like

I’ll try to provide the wanted data :slight_smile:

I ran a scrub overnight on my RAID1 Pool, so the data should be more up to date.

Here are the -R results of my boot pool and my RAID1 pool.

Boot Pool:

btrfs scrub status -R /dev/sdg4
UUID: 4ac51b0f-afeb-4946-aad1-975a2a26c941
Scrub started: Thu May 25 21:05:05 2023
Status: finished
Duration: 0:01:27
data_extents_scrubbed: 286399
tree_extents_scrubbed: 24498
data_bytes_scrubbed: 10987966464
tree_bytes_scrubbed: 401375232
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 226394
csum_discards: 2456215
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 43911217152

RAID1 pool:

btrfs scrub status -R //mnt2/RSPool
UUID: 12bf3137-8df1-4d6b-bb42-f412e69e94a8
Scrub started: Thu May 25 21:20:22 2023
Status: finished
Duration: 7:19:25
data_extents_scrubbed: 188784274
tree_extents_scrubbed: 850863
data_bytes_scrubbed: 12348365783040
tree_bytes_scrubbed: 13940539392
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 801856
csum_discards: 3013935884
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 1361572790272

“btrfs ver” gives this:

btrfs-progs v6.2.1

I think I am still on Leap 15,3

1 Like

As a side info, If I do a zypper refresh I get errors:

zypper refresh
Repository ‘Kernel_stable_Backport’ is up to date.
Repository ‘Leap_15_3’ is up to date.
Repository ‘Leap_15_3_Updates’ is up to date.
Repository ‘Rockstor-Testing’ is up to date.
Retrieving repository ‘filesystems’ metadata …[error]
Repository ‘filesystems’ is invalid.
[filesystems|https://download.opensuse.org/repositories/filesystems/15.3/] Valid metadata not found at specified URL
History:

Please check if the URIs defined for this repository are pointing to a valid rep ository.
Skipping repository ‘filesystems’ because of the above error.
Repository ‘home_rockstor’ is up to date.
Repository ‘home_rockstor_branches_Base_System’ is up to date.
Repository ‘Update repository of openSUSE Backports’ is up to date.
Repository ‘Update repository with updates from SUSE Linux Enterprise 15’ is up
to date.

Could this have something to do with the problems?

I’ll quickly answer that one: I personally do not think so.
You’re seeing this error because this repository is no longer published by upstream for Leap 15.3. They only have Leap 15.4 and 15.5 published now. However, the only consequence is that you will no longer receive update for the packages on that system (which include btrfsprogs). If you have not updated your kernel since then, you may still be on kernel 6.2, but it looks like the kernel in the Kernel_stable_Backport repo is now on 6.3 so you may by out-of-sync between btrfsprogs and kernel versions. I can’t remember for sure but the name of the package may be kernel-default if you’d like to verify:

zypper info kernel-default

That being said, it may not be the cause of your issue… I don’t remember the exact format that Rockstor expects from btrfs scrub status -R so we’ll have to wait for @phillxnet’s expertise here.

Hope this helps,

1 Like

Zypper info kernel default reported this:

zypper info kernel-default
Retrieving repository ‘filesystems’ metadata …[error]
Repository ‘filesystems’ is invalid.
[filesystems|https://download.opensuse.org/repositories/filesystems/15.3/] Valid metadata not found at specified URL
History:

Please check if the URIs defined for this repository are pointing to a valid repository.
Warning: Skipping repository ‘filesystems’ because of the above error.
Some of the repositories have not been refreshed because of an error.
Loading repository data…
Warning: Repository ‘Leap_15_3_Updates’ appears to be outdated. Consider using a different mirror or server.
Warning: Repository ‘Update repository of openSUSE Backports’ appears to be outdated. Consider using a different mirror or server.
Reading installed packages…

Information for package kernel-default:

Repository : Kernel_stable_Backport
Name : kernel-default
Version : 6.3.4-lp154.2.1.gc5b4604
Arch : x86_64
Vendor : obs://build.opensuse.org/Kernel
Installed Size : 299.7 MiB
Installed : Yes
Status : up-to-date
Source package : kernel-default-6.3.4-lp154.2.1.gc5b4604.nosrc
Upstream URL : https://www.kernel.org/
Summary : The Standard Kernel
Description :
The standard kernel for both uniprocessor and multiprocessor systems.

Source Timestamp: 2023-05-25 04:46:56 +0000
GIT Revision: c5b4604852c852b09d2b5d0753f5c34058b4f1c3
GIT Branch: stable

So it looks like the kernel is perhaps on a newer version than btrfs. I’ll let @phillxnet comment on that.

2 Likes

I just checked on my environment (with the kernel backport).

The kernel is the same as yours:
Version : 6.3.4-lp154.2.1.gc5b4604

btrfs ver returns 6.3

(and that’s on Leap 15.4)
so that would indicate that due to the filesystems repo not updating under Leap 15.3 anymore, the btrfs-progs are out of sync since they come from that repo …

For fun I started a new scrub to ensure that one was running on the latest above versions, and the status correctly adjusted to running, as well as the stats output was populated correctly … again, not helping, but just providing some additional observation.

WebUI output:

vs. terminal output (I was not sync’d in refreshing so the values differ a bit obviously)

image

2 Likes

Today I finaly got the time to update my Rockstor system to Leap 15.4

I followed the guide available here:

https://rockstor.com/docs/howtos/15-3_to_15-4.html

That went fine, and the system now reports Leap 15.4, and doesn’t error out on the filesystem during “zypper -refresh” anymore.

So everything seems fine.

Except my scrubs stil error out immediatly…
So it does not seem the repository being out of sync was the problem.

Any new Ideas?