Balance failed after adding new disks

Hello all.

I have a 10 disk raid10 setup with the following disks:

2x1TB
2x2TB
2x3TB
4x4TB

Previously I had an 8 disk setup, which was all balanced, same disks minus 2 of the 4TBs.

So I added the 2 new 4TB drives, via the GUI, and it started a balance. Everything was going good until this morning. The GUI is showing the balance failed with the following message:

Error running a command. cmd = btrfs balance start -mconvert=raid10 -dconvert=raid10 /mnt2/MainPool. rc = 1. stdout = ['']. stderr = ["ERROR: error during balancing '/mnt2/MainPool': No space left on device", 'There may be more info in syslog - try dmesg | tail', '']

There is plenty of space, the overview tab shows:

Space free - 7.02 TB
Space used - 5.79 TB

And what is odd is that I got the out of space error when adding drives, which seems very strange.

At some point, it looks like a new balance was automatically initiated, not via the GUI. If I SSH in to the box, and do a btrfs balance status, it shows it as running.

So, I’m not exactly sure where I sit. I think my questions are:

  1. Why did the GUI balance say it ran out of space while adding disks?
  2. What does a failed balance mean for my data?
  3. If the second balance completes, should I assume everything is ok?

Could it be that BTRFS thinks I am out of space, when in reality I am not? Do I need to execute the command:

btrfs filesystem resize devid:amount /mount-point

Or does this GUI do this automatically when adding disks and rebalancing? Sorry, I am new to BTRFS, and just trying to understand how best to run things.

I will take over this thread, since I get the same error in a similar scenario.

However, the funny thing is that everything seems to be working fine. Should I be worried, can live my life in peace, or is there a more specific forum I should ask this question to?

Thanks! Now follows the actual issue with logs:

  1. Using Rockstor 4.6.0.0 (Linux: 5.14.21-150400.24.38-default) with two 8TB disks in RAID1. The disks still have 50% free space. The pool has no compression, and even though quotas are enabled, rockstor overrides it by disabling it anyway.
  2. Bought and installed another 8TB disk
  3. Web-UI > STORAGE > Pools > the data pool > Resize/ReRaid Pool > Add disk > Also change RAID profile from RAID1 to RAID1C3
  4. Balance operation completes to 100%, but with an error:

Web-UI:

Traceback (most recent call last): File “/opt/rockstor/.venv/lib/python2.7/site-packages/huey/api.py”, line 360, in _execute task_value = task.execute() File “/opt/rockstor/.venv/lib/python2.7/site-packages/huey/api.py”, line 724, in execute return func(*args, **kwargs) File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 2101, in start_balance raise e CommandException: Error running a command. cmd = btrfs balance start -mconvert=raid1c3 -dconvert=raid1c3 -f /mnt2/data. rc = 1. stdout = [‘’]. stderr = [“ERROR: error during balancing ‘/mnt2/data’: No space left on device”, ‘There may be more info in syslog - try dmesg | tail’, ‘’]

dmesg:

[ 1107.215171] BTRFS info (device sdc): disk added /dev/sdb
[ 1111.396154] BTRFS info (device sdc): balance: start -f -dconvert=raid1c3 -mconvert=raid1c3 -sconvert=raid1c3
[ 1111.396316] BTRFS info (device sdc): setting incompat feature flag for RAID1C34 (0x800)
[ 1111.397158] BTRFS info (device sdc): relocating block group 4685873283072 flags metadata|raid1c3
[…]
[32265.087954] BTRFS info (device sdc): found 3071 extents, stage: move data extents
[32265.587761] BTRFS info (device sdc): relocating block group 22020096 flags system|raid1
> [32265.754801] BTRFS info (device sdc): 1 enospc errors during balance
> [32265.754817] BTRFS info (device sdc): balance: ended with status: -28

btrfs filesystem show:

Label: ‘ROOT’ uuid: 45772de2-2706-496e-aa07-d272ca6f7abd
Total devices 1 FS bytes used 14.85GiB
devid 1 size 230.82GiB used 17.27GiB path /dev/nvme0n1p4

Label: ‘data’ uuid: 1f587ed3-eb52-4a74-adbc-ef4d7d9152c3
Total devices 3 FS bytes used 2.92TiB
devid 1 size 7.28TiB used 2.95TiB path /dev/sdc
devid 2 size 7.28TiB used 2.95TiB path /dev/sda
devid 3 size 7.28TiB used 2.95TiB path /dev/sdb

btrfs fi df /mnt2/data:

Data, RAID1C3: total=2.94TiB, used=2.91TiB
System, RAID1C3: total=32.00MiB, used=544.00KiB
Metadata, RAID1C3: total=10.00GiB, used=4.45GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

btrfs fi usage /mnt2/data

Overall:
Device size: 21.83TiB
Device allocated: 8.85TiB
Device unallocated: 12.98TiB
Device missing: 0.00B
Used: 8.75TiB
Free (estimated): 4.35TiB (min: 4.35TiB)
Free (statfs, df): 4.35TiB
Data ratio: 3.00
Metadata ratio: 3.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no

Data,RAID1C3: Size:2.94TiB, Used:2.91TiB (99.07%)
/dev/sdc 2.94TiB
/dev/sda 2.94TiB
/dev/sdb 2.94TiB

Metadata,RAID1C3: Size:10.00GiB, Used:4.45GiB (44.49%)
/dev/sdc 10.00GiB
/dev/sda 10.00GiB
/dev/sdb 10.00GiB

System,RAID1C3: Size:32.00MiB, Used:544.00KiB (1.66%)
/dev/sdc 32.00MiB
/dev/sda 32.00MiB
/dev/sdb 32.00MiB

Unallocated:
/dev/sdc 4.33TiB
/dev/sda 4.33TiB
/dev/sdb 4.33TiB

@aremiaskfa not entirely sure, but I have seen the metadata running out of space, while the actual data space was just fine.

Do you have a lot of snapshots?

You could see whether this will do something to “reclaim” space by running a zero usage balance (which should be really fast) and see whether that also officially reclaims some space that will make the next full balance go through without errors.

btrfs balance start -dusage=0 -musage=0 /mnt2/data
2 Likes

Thanks for the help!

I have about 10 snapshots; I never made a snapshot manually.

I’ve ran the command you gave me, which informed me that some chunks had to be relocated.
Then I upped the number from 0 to 5, to 25 and finally to 50, which relocated even more chunks.
Relocation of chunks also lowered the “total” numbers in the btrfs filesystem df command.

Afterwards I copied the full balance command that ran and failed in the Web-UI, after I’ve added a new disk. I’ve added the ,soft option to mconvert and dconvert switches so the conversion skipped the already balanced chunks, before actually running it.

Now it says “Done, had to relocate 0 out of 2990 chunks”, which hopefully means I’m out of the woods :slight_smile:

3 Likes