I have a friend who has a Rockstor NAS that I manage remotely for him. There was a pretty large hardware failure in a motherboard and/or HBAs that took a couple disks with them. One of them was in a ~20TB RAID6 array. We verified the good disks, All but one was usable so we put them in a new machine and mounted degraded via a fresh Rockstor install. Then we added a new drive and I started a btrfs dev replace
operation to rebuild the parity with the new disk. Note that I’m doing all of this over SSH.
After about 24 hours of rebuilding at about 9% complete the system crashes with out of memory errors. Processes just kept getting killed until the system would not respond at all. Creating a new tty didn’t help.
After a reboot I mounted the array in degraded mode again. It picked up the new drive immediately and to my surprise it picked up the replace operation immediately after mounting all by itself with no interaction from me.
I’ve been watching the resource usage of the machine closely all day and after a lot of troubleshooting I’ve nailed down roughly what’s using up this insane amount of RAM and I have a very rough idea of what’s causing it.
Take a look at this screenshot of htop. htop is reporting that almost the entirety of the memory usage is user processes (the green bar) yet it shows now processes using any unusual amount of RAM.
free
showed me that the high RAM usage was buffer/cache, which should show up as blue/yellow in the RAM usage graph in htop
:
# free -h
total used free shared buff/cache available
Mem: 15G 923M 1.0G 16M 13G 5.0G
Swap: 7.9G 0B 7.9G
This led me to slabtop
which narrows down the culprit (truncated for readability):
# slabtop -o -s=a
Active / Total Objects (% used) : 33431432 / 33664160 (99.3%)
Active / Total Slabs (% used) : 1346736 / 1346736 (100.0%)
Active / Total Caches (% used) : 78 / 114 (68.4%)
Active / Total Size (% used) : 10512136.19K / 10737701.80K (97.9%)
Minimum / Average / Maximum Object : 0.01K / 0.32K / 15.62K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
32493650 32492775 99% 0.31K 1299746 25 10397968K bio-1
323505 323447 99% 0.19K 15405 21 61620K dentry
176680 176680 100% 0.07K 3155 56 12620K btrfs_free_space
118208 41288 34% 0.12K 3694 32 14776K kmalloc-128
94528 43378 45% 0.25K 2954 32 23632K kmalloc-256
Observe the top of the list sorted by objects, bio
. This is growing slowly out of control as the replace operation runs until the system starts killing off processes. Since this is SLAB usage the RAM should be freed up automatically when it’s needed. However this RAM is not being released. I suspect, given that htop thinks that this memory is in user space, not kernel space, that the kernel can not free up the RAM. Running sync && echo 3 > /proc/sys/vm/drop_caches
clears up the slightly high dentry
usage, but it does not affect the bio
usage.
I’ve been to the btrfs IRC and mailing list but haven’t had any solutions.
I’m tempted to try to cancel the replace operation and run btrfs dev delete missing
instead which should in theory get me to the same place as replace but down a different path with a lot more writes. Does anyone know if that’s safe to do?
Not being a programmer, I’m not sure whether to report this bug to CentOS, the kernel itself, or btrfs. The bio
system seems to be really only documented for programmers and I can’t find a way to clear up the RAM usage.