I have a friend who has a Rockstor NAS that I manage remotely for him. There was a pretty large hardware failure in a motherboard and/or HBAs that took a couple disks with them. One of them was in a ~20TB RAID6 array. We verified the good disks, All but one was usable so we put them in a new machine and mounted degraded via a fresh Rockstor install. Then we added a new drive and I started a
btrfs dev replace operation to rebuild the parity with the new disk. Note that I’m doing all of this over SSH.
After about 24 hours of rebuilding at about 9% complete the system crashes with out of memory errors. Processes just kept getting killed until the system would not respond at all. Creating a new tty didn’t help.
After a reboot I mounted the array in degraded mode again. It picked up the new drive immediately and to my surprise it picked up the replace operation immediately after mounting all by itself with no interaction from me.
I’ve been watching the resource usage of the machine closely all day and after a lot of troubleshooting I’ve nailed down roughly what’s using up this insane amount of RAM and I have a very rough idea of what’s causing it.
Take a look at this screenshot of htop. htop is reporting that almost the entirety of the memory usage is user processes (the green bar) yet it shows now processes using any unusual amount of RAM.
free showed me that the high RAM usage was buffer/cache, which should show up as blue/yellow in the RAM usage graph in
# free -h total used free shared buff/cache available Mem: 15G 923M 1.0G 16M 13G 5.0G Swap: 7.9G 0B 7.9G
This led me to
slabtop which narrows down the culprit (truncated for readability):
# slabtop -o -s=a Active / Total Objects (% used) : 33431432 / 33664160 (99.3%) Active / Total Slabs (% used) : 1346736 / 1346736 (100.0%) Active / Total Caches (% used) : 78 / 114 (68.4%) Active / Total Size (% used) : 10512136.19K / 10737701.80K (97.9%) Minimum / Average / Maximum Object : 0.01K / 0.32K / 15.62K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 32493650 32492775 99% 0.31K 1299746 25 10397968K bio-1 323505 323447 99% 0.19K 15405 21 61620K dentry 176680 176680 100% 0.07K 3155 56 12620K btrfs_free_space 118208 41288 34% 0.12K 3694 32 14776K kmalloc-128 94528 43378 45% 0.25K 2954 32 23632K kmalloc-256
Observe the top of the list sorted by objects,
bio. This is growing slowly out of control as the replace operation runs until the system starts killing off processes. Since this is SLAB usage the RAM should be freed up automatically when it’s needed. However this RAM is not being released. I suspect, given that htop thinks that this memory is in user space, not kernel space, that the kernel can not free up the RAM. Running
sync && echo 3 > /proc/sys/vm/drop_caches clears up the slightly high
dentry usage, but it does not affect the
I’ve been to the btrfs IRC and mailing list but haven’t had any solutions.
I’m tempted to try to cancel the replace operation and run
btrfs dev delete missing instead which should in theory get me to the same place as replace but down a different path with a lot more writes. Does anyone know if that’s safe to do?
Not being a programmer, I’m not sure whether to report this bug to CentOS, the kernel itself, or btrfs. The
bio system seems to be really only documented for programmers and I can’t find a way to clear up the RAM usage.