BTRFS Pool Conversion From RAID 1 to RAID 5 Fails

Lightsword · February 27, 2016, 10:14pm

I ran into a problem adding a new HDD to an existing pool and re-balancing it to a new RAID level.
I started with 2 4TB WD Red drives in a BTRFS RAID 1 pool. I checked and made sure I am using the latest version, 3.8-11.20. I decided to create a new topic since I didn’t see one that matched exactly and the similar ones were from almost a year ago.
The pool uses the whole disk and it’s almost completely full. I purchased a 3rd 4TB WD Red drive, tested it with the WD Desktop tool for the short and long SMART tests. It fully passed, so I took the drive, installed it in my Rockstor server and fired it up. The system booted, saw the drive and let me add it to the existing pool. The Add new disk utility seemed to work, it added the third disk to the pool, and started the balance last night. As the balance started, I did something stupid and started a commandline rsync transfer from an old server I’m working to decommission, then realized I should wait so I cancelled the rsync transfer and left the server alone until this morning.

I woke up to the share now being 7.28TB and with 1.97GB free on the share. Looking at the output of the balance command I got this result:

:2 failed February 27th 2016, 8:04:43 am 0 Error running a command. cmd = [‘btrfs’, ‘balance’, ‘start’, ‘-mconvert=raid5’, ‘-dconvert=raid5’, ‘/mnt2/Cluster1’]. rc = 1. stdout = [’’]. stderr = [“ERROR: error during balancing ‘/mnt2/Cluster1’ - No space left on device”, ‘There may be more info in syslog - try dmesg | tail’, ‘’]

The output of dmesg | tail is:

[root@blackhole ~]# dmesg | tail
[27291.841106] BTRFS info (device sdb): relocating block group 100952702976 flags 17
[27292.080869] BTRFS info (device sdb): relocating block group 99878961152 flags 17
[27292.330631] BTRFS info (device sdb): relocating block group 98805219328 flags 17
[27292.559545] BTRFS info (device sdb): relocating block group 97731477504 flags 17
[27292.809210] BTRFS info (device sdb): relocating block group 96657735680 flags 17
[27293.037859] BTRFS info (device sdb): relocating block group 95583993856 flags 17
[27293.276553] BTRFS info (device sdb): relocating block group 94510252032 flags 17
[27293.516306] BTRFS info (device sdb): relocating block group 93436510208 flags 17
[27293.766125] BTRFS info (device sdb): relocating block group 92362768384 flags 17
[27294.923425] BTRFS info (device sdb): 3729 enospc errors during balance
[root@blackhole ~]#

I’d be happy to pull logs if someone wouldn’t mind explaining how, I didn’t see anything that popped up in the forums or on the wiki of how to pull the logs and I didn’t want to mess with it further and risk breaking more stuff. My suspicion is that there’s some sort of corrupted file or that the space is somehow being incorrectly reported. I thought either I could look through the share and try to find and delete the file, remove the new HDD and try to re import it into the pool or retry to balance and scrub the disks. In short I’m happy to try stuff but I didn’t want to tinker further without checking to see what my best course of action would be. I do have SSH access, am somewhat comfortable using linux and BTRFS and am happy to test things if that’ll help. Thanks so much!

Christopher

Lightsword · February 28, 2016, 9:09pm

#Additional Info

BTRFT Fi DF shows what appears to be two pools both a RAID1 and a RAID5? Is it possible that the system was trying to migrate from one to another and failed?
[root@blackhole ~]# btrfs fi df /mnt2/Cluster1/
Data, RAID1: total=3.63TiB, used=3.63TiB
Data, RAID5: total=7.62MiB, used=5.25MiB
System, RAID1: total=32.00MiB, used=560.00KiB
Metadata, RAID1: total=5.00GiB, used=4.28GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

BTRFS fi show produced the following results:

[root@blackhole ~]# btrfs fi show
Label: ‘rockstor_rockstor’ uuid: 8581c98a-e0e1-4b59-8e3d-1a69182989dc
Total devices 1 FS bytes used 1.51GiB
devid 1 size 110.88GiB used 110.88GiB path /dev/sda3

Label: ‘Cluster1’ uuid: 0df824ef-cae4-4208-b7ea-d8fa4f12b095
Total devices 3 FS bytes used 3.64TiB
devid 1 size 3.64TiB used 3.64TiB path /dev/sdc
devid 2 size 3.64TiB used 3.64TiB path /dev/sdb
devid 3 size 3.64TiB used 7.62MiB path /dev/sdd

[root@blackhole ~]# btrfs fi show /mnt2/Cluster1/
Label: ‘Cluster1’ uuid: 0df824ef-cae4-4208-b7ea-d8fa4f12b095
Total devices 3 FS bytes used 3.64TiB
devid 1 size 3.64TiB used 3.64TiB path /dev/sdc
devid 2 size 3.64TiB used 3.64TiB path /dev/sdb
devid 3 size 3.64TiB used 7.62MiB path /dev/sdd

So I’ve deduced that I overloaded my two disks in RAID1 and this killed the BTRFS balance and the size change. I deleted the partially transferred file from my old server, and checked. RockStor showed that I had free space on the server now. Checking the Pool showed it was mounted as a RAID1 pool, and not RAID5 so I reconverted it to RAID5 in the RockStor WebGUI. We’ll see how it goes from here!