Thanks for the info @McFaul. I wonder if this is caused by a btrfs thread. This link might help you find out which thread is causing it for the most part.
How is your Pool, Share and Snapshot makeup? How many do you have and how are they distributed?
After a little more troubleshooting I now think it is the BTRFS thread.
I have two pools, a 12 drive one and an 8 drive one. two shares on the 12 drive, one share on the 8 drive. It’s just a giant media storage box, so nothing fancy, and no rockons or anything complicated like that.
two weeks ago I was doing a scrub on the 8 drive pool and it said that one of the disks had failed. I removed the disk, both physically and from the BTRFS pool, and rebalances so I had a 7 drive pool with no missing devices. Once the drive was out, I did a full surface scan of the “failed” drive, it was fine, so I did a full disk erase (on my windows machine) to zero the drive. then I put it back in the Rockstor and re-added it to the (now 7 drive) pool, however it hasn’t balanced yet.
Now the reason I think it is the BTRFS thread, after a few hours, the CPU is now basically idle… if I try and coy a file to the 8 drive pool, that thread goes back up to 100% and stays there… and then the file transfer rate transfer rate goes to zero and the copy times out “an unexpected network error occurred”
Hello,
I recently got the same problem so I went to btrfs irc channel for help (since here I did not get any answer).
In my case it occured during writting mainly (and whole system freeze/slow down). Thus I was adviced and at the end the result in my case was simple:
in principle I ran out of space (well, “space”). it really depends on what is your current pool state: try “btrfs fi show /pool” and “btrfs fi df /pool” and you will get the most important info.
If you are getting used space close to total it is bad, also if there is no unallocated space. basically the btrfs is trying to find some free chunks or something like that (also it is not only case for data but also can be for metadata).
I was advice the best thing is to keep at least 10% free, but I think much more can be needed. Also you can try balance your pool which can help ( https://btrfs.wiki.kernel.org/index.php/FAQ ).
As I understood it is common problem of btrfs.
I hope this may help a bit. But of course it doesnt need to be your case.
You are exactly right, I knew that some of the devices were very low on space (but some had LOTS of space), and i have been trying to run a balance… and it said it was working (it didn’t get a low space error), but it wasn’t actually moving anything.
So then i figured it may not have enough space on those devices to do the balance, so i deleted a bunch of files, and now the balance is running (and i can see the full devices emptying and the empty device filling).
This has also “cured” my 100% CPU usage… but I had not associated the two until i saw your post, so thanks for that!
Hi @PumaDAce, interesting topic covering an important btrfs well known issue
Talking about btrfs fi etc, etc : it actually returns desired info but needs 2 steps and some reading (this is the reason why some guys - just checking with fi show - think to be ok and then get out of space)
btrfs-progs contributors are working on a better solution to collect real space usage (Ex. from btrfs-progs 4.5 we’ll have btrfs fi du to collect more reliable info in a single command)