[SOLVED] BTRFS Balance abdnormally slow

G_Man_be · October 30, 2019, 1:34pm

Hello everybody!

I come to this community to seek some advice. I will try to make my story short and to the point

I had media_pool with sdf (6To) and sdh (4To) in single disk.
The size of the pool was 8TB and the total used space in the pool is 7.08TB
Last week I bought another 8To disk that I added to my pool but I wanted to convert it to RAID1. So I started the conversion from the UI and this is the command that is running: “btrfs balance start -mconvert raid1 -dconvert raid1 -f /mnt2/media_pool”
Since the 24th (6 days ago), only 11% of the balance is done! When I check the new disk (sdj) only 834GB of data is present.
So my question is like all the other questions on the internet, is this normal? I know that a balance will be slow, but only 834GB of data wrote in 6 days is very very slow to me…
If I check iostat the speed is only 1.7MB write on the new disk:

Linux 4.12.4-1.el7.elrepo.x86_64 (rockstor.home)        10/30/2019      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.54    0.40   11.09    0.26    0.00   87.72

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.00         0.06         0.00      37384       1580
sdb               0.04         1.32         0.04     769632      20848
sdc               0.00         0.09         0.00      52196       2092
sdd               0.04         1.31         0.03     765704      19004
sde               4.05        17.84       188.30   10408231  109878947
sdf               6.39      1679.18         1.00  979848208     581424
sdg               7.83       817.42       942.79  476989384  550145460
sdh               6.59      1678.68         1.00  979558376     581424
sdi               6.59       804.69       919.13  469560160  536335888
sdj               4.19         6.70      1695.85    3908408  989576004

Please note that I have no snapshots of this pool and I stopped all the services accessing this pool (beside plex, but it is not highly used).
Here is my version of Rockstor:

Installed Packages
Name : rockstor
Arch : x86_64
Version : 3.9.2
Release : 50
Size : 85 M
Repo : installed
From repo : Rockstor-Stable

Output of the following command:
btrfs balance status -v /mnt2/media_pool/

Balance on '/mnt2/media_pool/' is running
829 out of about 7324 chunks balanced (830 considered), 89% left
Dumping filters: flags 0xf, state 0x1, force is on
DATA (flags 0x100): converting, target=16, soft is off
METADATA (flags 0x100): converting, target=16, soft is off
SYSTEM (flags 0x100): converting, target=16, soft is off

There are no particular messages in dmesg:

kern :info : [Wed Oct 30 13:49:46 2019] BTRFS info (device sdg): relocating block group 10064043704320 flags data
kern :info : [Wed Oct 30 13:50:02 2019] BTRFS info (device sdg): found 2596 extents
kern :info : [Wed Oct 30 13:55:10 2019] BTRFS info (device sdg): found 2596 extents
kern :info : [Wed Oct 30 13:55:10 2019] BTRFS info (device sdg): relocating block group 10062969962496 flags data
kern :info : [Wed Oct 30 13:55:23 2019] BTRFS info (device sdg): found 2127 extents
kern :info : [Wed Oct 30 14:27:42 2019] BTRFS info (device sdg): found 2127 extents
kern :info : [Wed Oct 30 14:27:43 2019] BTRFS info (device sdg): relocating block group 10061896220672 flags data
kern :info : [Wed Oct 30 14:27:59 2019] BTRFS info (device sdg): found 2217 extents

So is it really normal to be so slow? One of my colleagues did a balance of a similar size in about 2 days.
Is something wrong? I norticular SMART error either…
In any cases, thank you very much for your advices!

Edit: I have found this post, so I disabled the quotas on that pool, I will let you know if it speeds up things.

George.

phillxnet · October 30, 2019, 2:32pm

@G_Man_be A very belated welcome to the Rockstor community.

So yes, quotas is the main slow down / speed up factor currently in btrfs so hopefully that should speed things up a tad. On our side we need to provide newer kernels/btrfs-progs as their are many upstream improvements, or in the case of our pending new linux base openSUSE (to address this issue) many back-ports, and we could also do more ‘intelligent’ balance filters. But in the case of a raid change it’s still best if every last bit of data / metadata is converted so their is little we can do on that front.

Let us know how it goes.

G_Man_be · October 30, 2019, 4:14pm

Thank you very much @phillxnet !
Disabling the quotas solved the issue, It’s been two hours since I disabled it, and it already balanced 12% (More than it did in 6 days). So it is really not has bad as I thought

Maybe you should put this in the docs somewhere as a procedure to disable quotas before balancing.

And changing also the message in this popup:

To me, it did not ring any bell because I have less than 10TB of data not that much shares…
In any case, thank you very much for the help!

George.

phillxnet · October 30, 2019, 4:35pm

@G_Man_be Glad it’s moving along a little better now.

Yes we do need to improve the messaging a little more there. I’m hoping that these massive slowdowns will be far less marked as we move to a more modern btrfs, ie with our openSUSE move. But all in good time. We should hopefully have a testing version ready for announcement official release fairly soon. Plus as of 3.9.2-49/50 our side of things has received a fairly significant speed up so their have been some improvements along the way. Just need to get folks onto the newer btrfs and see how the performance is from there on.

We have thought of doing auto disable during a balance but it’s rather overstepping the mark I think but maybe a tick along the lines of (disable quotas during this operation) with a note to significant speed ups or something.

Thanks for the update and hope it gets it’s act together shortly.