Raid5 Balance Failing - LP15.6 - v5.0.14

criddle · September 13, 2024, 1:49pm

Prior to jumping on the 5.0.14 version I was on stable and would do a daily balance on my media pool (raid5-1). Ever since the update to 5.0.14 (fresh install, pools imported) this balance has failed nearly immediately with nothing showing up in the dmesg logs other than “balance: ended with status: -5”. I’ve tried this with the previously scripted balance and the balance command from the web-ui and both fail. All the data appears to be there and the configs for my docker containers on this pool all appear without issue.

Dmesg output:

rockstor1:~ # dmesg | grep "BTRFS"
[    8.424742] BTRFS: device label ROOT devid 1 transid 69930 /dev/sdi4 scanned by (udev-worker) (430)
[    8.559169] BTRFS: device label Media devid 5 transid 4294346 /dev/sda scanned by (udev-worker) (422)
[    8.564140] BTRFS: device label Media devid 6 transid 4294346 /dev/sdb scanned by (udev-worker) (432)
[    8.577875] BTRFS: device label Media devid 7 transid 4294346 /dev/sdd scanned by (udev-worker) (412)
[    8.583802] BTRFS: device label Scratch devid 1 transid 7194986 /dev/sdf scanned by (udev-worker) (440)
[    8.587710] BTRFS: device label Media devid 4 transid 4294346 /dev/sdh scanned by (udev-worker) (426)
[    8.601289] BTRFS: device label Scratch devid 2 transid 7194986 /dev/sde scanned by (udev-worker) (446)
[    8.656960] BTRFS: device label Media devid 3 transid 4294346 /dev/sdc scanned by (udev-worker) (408)
[    9.228776] BTRFS: device label rockon_root devid 1 transid 57030 /dev/sdk scanned by (udev-worker) (429)
[    9.303109] BTRFS info (device sdi4): using crc32c (crc32c-intel) checksum algorithm
[    9.303191] BTRFS info (device sdi4): disk space caching is enabled
[    9.311898] BTRFS info (device sdi4): enabling ssd optimizations
[    9.311957] BTRFS info (device sdi4): auto enabling async discard
[   71.428145] BTRFS info (device sdk): using crc32c (crc32c-intel) checksum algorithm
[   71.428241] BTRFS info (device sdk): use no compression
[   71.428303] BTRFS info (device sdk): using free space tree
[   71.460014] BTRFS info (device sdk): enabling ssd optimizations
[   71.460083] BTRFS info (device sdk): auto enabling async discard
[   71.499551] BTRFS info (device sdf): using crc32c (crc32c-intel) checksum algorithm
[   71.499631] BTRFS info (device sdf): use no compression
[   71.499686] BTRFS info (device sdf): disk space caching is enabled
[   80.439927] BTRFS info (device sdc): using crc32c (crc32c-intel) checksum algorithm
[   80.440010] BTRFS info (device sdc): use no compression
[   80.440065] BTRFS info (device sdc): disk space caching is enabled
[  381.150452] BTRFS info (device sdc): balance: start -dusage=85,limit=10 -musage=85,limit=10 -susage=85,limit=10
[  381.152000] BTRFS info (device sdc): relocating block group 45761820557312 flags data|raid5
[  392.063408] BTRFS info (device sdc): found 18 extents, stage: move data extents
[  392.742247] BTRFS info (device sdc): found 18 extents, stage: update data pointers
[  393.190904] BTRFS info (device sdc): relocating block group 43267484745728 flags data|raid5
[  411.871904] BTRFS info (device sdc): balance: ended with status: -5
[  562.973437] BTRFS info (device sdc): balance: start -d -m -s
[  562.974682] BTRFS info (device sdc): relocating block group 45767189266432 flags data|raid5
[  573.400294] BTRFS info (device sdc): found 18 extents, stage: move data extents
[  573.811803] BTRFS info (device sdc): found 18 extents, stage: update data pointers
[  574.195688] BTRFS info (device sdc): relocating block group 45766115524608 flags metadata|raid1
[  584.170300] BTRFS info (device sdc): found 11066 extents, stage: move data extents
[  585.039617] BTRFS info (device sdc): relocating block group 45757525590016 flags data|raid5
[  608.600024] BTRFS info (device sdc): found 34 extents, stage: move data extents
[  608.902413] BTRFS info (device sdc): found 34 extents, stage: update data pointers
[  609.243256] BTRFS info (device sdc): relocating block group 45753230622720 flags data|raid5
[  631.624949] BTRFS info (device sdc): found 35 extents, stage: move data extents
[  631.994548] BTRFS info (device sdc): found 35 extents, stage: update data pointers
[  632.382191] BTRFS info (device sdc): relocating block group 45748935655424 flags data|raid5
[  654.662641] BTRFS info (device sdc): found 34 extents, stage: move data extents
[  655.093814] BTRFS info (device sdc): found 34 extents, stage: update data pointers
[  655.371374] BTRFS info (device sdc): relocating block group 45744640688128 flags data|raid5
[  679.541004] BTRFS info (device sdc): found 36 extents, stage: move data extents
[  679.961845] BTRFS info (device sdc): found 36 extents, stage: update data pointers
[  680.507792] BTRFS info (device sdc): relocating block group 45740345720832 flags data|raid5
[  704.115503] BTRFS info (device sdc): found 33 extents, stage: move data extents
[  704.475737] BTRFS info (device sdc): found 33 extents, stage: update data pointers
[  704.823672] BTRFS info (device sdc): relocating block group 45736050753536 flags data|raid5
[  726.714378] BTRFS info (device sdc): found 35 extents, stage: move data extents
[  727.030713] BTRFS info (device sdc): found 35 extents, stage: update data pointers
[  727.348683] BTRFS info (device sdc): relocating block group 45731755786240 flags data|raid5
[  750.702239] BTRFS info (device sdc): found 33 extents, stage: move data extents
[  751.082542] BTRFS info (device sdc): found 33 extents, stage: update data pointers
[  751.431438] BTRFS info (device sdc): relocating block group 45727460818944 flags data|raid5
[  772.952816] BTRFS info (device sdc): found 34 extents, stage: move data extents
[  773.267128] BTRFS info (device sdc): found 34 extents, stage: update data pointers
[  773.587739] BTRFS info (device sdc): relocating block group 45723165851648 flags data|raid5
[  797.411807] BTRFS info (device sdc): found 35 extents, stage: move data extents
[  797.762061] BTRFS info (device sdc): found 35 extents, stage: update data pointers
[  798.121735] BTRFS info (device sdc): relocating block group 45636192763904 flags data|raid5
[  823.464527] BTRFS info (device sdc): balance: ended with status: -5

BTRFS Fi:

rockstor1:~ # btrfs fi show
Label: 'ROOT'  uuid: 46befa9e-b399-480d-b10d-338a4c875615
        Total devices 1 FS bytes used 8.23GiB
        devid    1 size 230.82GiB used 10.05GiB path /dev/sdi4

Label: 'rockon_root'  uuid: 3214523d-00c7-4503-94cd-e4a28380ba2b
        Total devices 1 FS bytes used 27.77GiB
        devid    1 size 238.47GiB used 32.02GiB path /dev/sdk

Label: 'Scratch'  uuid: 0eec16fe-5dd6-46d8-b325-0778e9b4598a
        Total devices 2 FS bytes used 1.13TiB
        devid    1 size 1.82TiB used 1.14TiB path /dev/sdf
        devid    2 size 1.82TiB used 1.14TiB path /dev/sde

Label: 'Media'  uuid: 525f9c00-5f14-43ad-abdb-65f1ef24a9f9
        Total devices 5 FS bytes used 7.18TiB
        devid    3 size 3.64TiB used 1.93TiB path /dev/sdc
        devid    4 size 3.64TiB used 1.93TiB path /dev/sdh
        devid    5 size 3.64TiB used 1.93TiB path /dev/sda
        devid    6 size 3.64TiB used 1.93TiB path /dev/sdb
        devid    7 size 3.64TiB used 1.93TiB path /dev/sdd

Media pool usage:

rockstor1:~ # btrfs fi usage /mnt2/Media
Overall:
    Device size:                  18.19TiB
    Device allocated:              9.65TiB
    Device unallocated:            8.54TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                          8.98TiB
    Free (estimated):              7.36TiB      (min: 4.80TiB)
    Free (statfs, df):             7.36TiB
    Data ratio:                       1.25
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID5: Size:7.71TiB, Used:7.17TiB (93.10%)
   /dev/sdc        1.93TiB
   /dev/sdh        1.93TiB
   /dev/sda        1.93TiB
   /dev/sdb        1.93TiB
   /dev/sdd        1.93TiB

Metadata,RAID1: Size:9.00GiB, Used:8.12GiB (90.18%)
   /dev/sdc        4.00GiB
   /dev/sdh        4.00GiB
   /dev/sda        3.00GiB
   /dev/sdb        3.00GiB
   /dev/sdd        4.00GiB

System,RAID1: Size:32.00MiB, Used:544.00KiB (1.66%)
   /dev/sdc       32.00MiB
   /dev/sdh       32.00MiB

Unallocated:
   /dev/sdc        1.71TiB
   /dev/sdh        1.71TiB
   /dev/sda        1.71TiB
   /dev/sdb        1.71TiB
   /dev/sdd        1.71TiB

Thoughts on how I can proceed or how I can get more info about what is causing the failure

Hooverdan · September 13, 2024, 8:57pm

@criddle interesting that it would fail after the 15.6 upgrade. It would be a big coincidence if you had some disk or other failure at that time, too.

When you run it at the command line, are you running it without filters? Or have you tried to use the usage=percentage switch (e.g. `-dusage=90) to e.g. only focus on the low usage blocks?

But it still does not explain why you can’t see any failure/error codes …

criddle · September 13, 2024, 9:41pm

Hey Dan,

Yes, I’ve tried using my regular usage and limit cutoffs as well as lowering it to 10% as of just now. Same odd behavior, it works on a couple and then I get the IO error but nothing else actionable.
This command gets the following response:

rockstor1:~ # btrfs balance start -v -dusage=10 -musage=10 /mnt2/Media
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=10
  METADATA (flags 0x2): balancing, usage=10
  SYSTEM (flags 0x2): balancing, usage=10
ERROR: error during balancing '/mnt2/Media': Input/output error
There may be more info in syslog - try dmesg | tail

[29775.107183] BTRFS info (device sdc): balance: start -dusage=10 -musage=10 -susage=10
[29775.108856] BTRFS info (device sdc): relocating block group 43370563960832 flags system|raid1
[29775.417120] BTRFS info (device sdc): found 31 extents, stage: move data extents
[29775.678171] BTRFS info (device sdc): relocating block group 43190175334400 flags data|raid5
[29780.391602] BTRFS info (device sdc): found 69 extents, stage: move data extents
[29782.433781] BTRFS info (device sdc): found 69 extents, stage: update data pointers
[29783.165350] BTRFS info (device sdc): relocating block group 43185880367104 flags data|raid5
[29785.913789] BTRFS info (device sdc): found 16 extents, stage: move data extents
[29786.591831] BTRFS info (device sdc): found 16 extents, stage: update data pointers
[29787.001477] BTRFS info (device sdc): relocating block group 43168700497920 flags data|raid5
[29792.207023] BTRFS info (device sdc): balance: ended with status: -5```

Hooverdan · September 13, 2024, 9:59pm

Did you run a scrub recently/ever? Maybe there is a checksum error on one of the files (for some unknown reason, since that was not related to your upgrade per se.

Do you see possibly any S.M.A.R.T. warnings?

criddle · September 16, 2024, 2:13pm

No SMART warnings on any of my drives, but I did attempt a scrub this morning and was immediately given this error:

ERROR: scrubbing /mnt2/Media failed for device id 3: ret=-1, errno=5 (Input/output error)

And then checking the dmesg actually showed some corrupt files and these scary looking messages:

[261967.034419] BTRFS error (device sdc): unrepaired sectors detected, full stripe 7231560744960 data stripe 0 errors 14-15
[261967.057551] BTRFS info (device sdc): scrub: not finished on devid 3 with status: -5
[261967.148063] BTRFS error (device sdc): unrepaired sectors detected, full stripe 7231569395712 data stripe 2 errors 14-15
[261967.148565] BTRFS info (device sdc): scrub: not finished on devid 6 with status: -5
[261969.784958] BTRFS error (device sdc): unrepaired sectors detected, full stripe 7231648301056 data stripe 1 errors 15
[261969.819694] BTRFS info (device sdc): scrub: not finished on devid 5 with status: -5

criddle · September 16, 2024, 2:18pm

none of the devices in the array are showing any errors in the stats as these corruption messages come up

rockstor1:~ # btrfs device stats /mnt2/Media
[/dev/sdc].write_io_errs    0
[/dev/sdc].read_io_errs     0
[/dev/sdc].flush_io_errs    0
[/dev/sdc].corruption_errs  0
[/dev/sdc].generation_errs  0
[/dev/sdh].write_io_errs    0
[/dev/sdh].read_io_errs     0
[/dev/sdh].flush_io_errs    0
[/dev/sdh].corruption_errs  0
[/dev/sdh].generation_errs  0
[/dev/sda].write_io_errs    0
[/dev/sda].read_io_errs     0
[/dev/sda].flush_io_errs    0
[/dev/sda].corruption_errs  0
[/dev/sda].generation_errs  0
[/dev/sdb].write_io_errs    0
[/dev/sdb].read_io_errs     0
[/dev/sdb].flush_io_errs    0
[/dev/sdb].corruption_errs  0
[/dev/sdb].generation_errs  0
[/dev/sdd].write_io_errs    0
[/dev/sdd].read_io_errs     0
[/dev/sdd].flush_io_errs    0
[/dev/sdd].corruption_errs  0
[/dev/sdd].generation_errs  0

criddle · September 16, 2024, 3:39pm

I should add that last successful scrub was Aug 1, Sept was skipped because I forgot to reenable the job after the new install

Hooverdan · September 16, 2024, 4:54pm

This seems to be btrfs with newer kernel functionality, unfortunately not described in the btrfs documentation. I found this mail list discussion … which is not good news, but also seems to be possible to remediate (with some data loss unfortunately).

https://lore.kernel.org/all/c626e033-eaec-4392-9215-b57dcb165b4e@suse.com/T/

Since this seems to be happening at the core of btrfs, I suspect nobody on the forum has a good answer for you here. What you could do is ask for help from upstream btrfs:

https://rockstor.com/docs/data_loss.html#asking-for-help

criddle · September 17, 2024, 2:01pm

Thanks for this, Dan! through this and the libera.chat #btrfs channel I was able to find that in situations like mine when given an error similar to

[340187.174817] BTRFS error (device sdc): unrepaired sectors detected, full stripe 7231569395712 data stripe 2 errors 14-15

this for loop can help track down the files:

for i in {14..15} ; do btrfs insp logical -o $((7231569395712 + (5 * 64 + $i) * 4096)) /mnt2/Media ; done

where the range in braces is your errors, the large number is your logical address, and 5 is the number of devices in the affected pool.

cleaning out the affected files now and then running another scrub.

criddle · September 23, 2024, 3:06pm

an addendum for when the above doesn’t return any files, you can step through the entire stripe set

for x in $(seq 0 $(( (nr_disks - 1) * 16 - 1)) ); do btrfs insp logical -o $((Address + $x * 4096)) /mountpoint; done | sort -u