Possible SLAB memory leak

HittingSmoke · May 30, 2016, 8:01pm

I have a friend who has a Rockstor NAS that I manage remotely for him. There was a pretty large hardware failure in a motherboard and/or HBAs that took a couple disks with them. One of them was in a ~20TB RAID6 array. We verified the good disks, All but one was usable so we put them in a new machine and mounted degraded via a fresh Rockstor install. Then we added a new drive and I started a btrfs dev replace operation to rebuild the parity with the new disk. Note that I’m doing all of this over SSH.

After about 24 hours of rebuilding at about 9% complete the system crashes with out of memory errors. Processes just kept getting killed until the system would not respond at all. Creating a new tty didn’t help.

After a reboot I mounted the array in degraded mode again. It picked up the new drive immediately and to my surprise it picked up the replace operation immediately after mounting all by itself with no interaction from me.

I’ve been watching the resource usage of the machine closely all day and after a lot of troubleshooting I’ve nailed down roughly what’s using up this insane amount of RAM and I have a very rough idea of what’s causing it.

Take a look at this screenshot of htop. htop is reporting that almost the entirety of the memory usage is user processes (the green bar) yet it shows now processes using any unusual amount of RAM.

free showed me that the high RAM usage was buffer/cache, which should show up as blue/yellow in the RAM usage graph in htop:

# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        923M        1.0G         16M         13G        5.0G
Swap:          7.9G          0B        7.9G

This led me to slabtop which narrows down the culprit (truncated for readability):

# slabtop -o -s=a
 Active / Total Objects (% used)    : 33431432 / 33664160 (99.3%)
 Active / Total Slabs (% used)      : 1346736 / 1346736 (100.0%)
 Active / Total Caches (% used)     : 78 / 114 (68.4%)
 Active / Total Size (% used)       : 10512136.19K / 10737701.80K (97.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.32K / 15.62K
 
  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                  
32493650 32492775  99%    0.31K 1299746       25  10397968K bio-1                  
323505 323447  99%    0.19K  15405       21     61620K dentry                
176680 176680 100%    0.07K   3155       56     12620K btrfs_free_space      
118208  41288  34%    0.12K   3694       32     14776K kmalloc-128            
 94528  43378  45%    0.25K   2954       32     23632K kmalloc-256

Observe the top of the list sorted by objects, bio. This is growing slowly out of control as the replace operation runs until the system starts killing off processes. Since this is SLAB usage the RAM should be freed up automatically when it’s needed. However this RAM is not being released. I suspect, given that htop thinks that this memory is in user space, not kernel space, that the kernel can not free up the RAM. Running sync && echo 3 > /proc/sys/vm/drop_caches clears up the slightly high dentry usage, but it does not affect the bio usage.

I’ve been to the btrfs IRC and mailing list but haven’t had any solutions.

I’m tempted to try to cancel the replace operation and run btrfs dev delete missing instead which should in theory get me to the same place as replace but down a different path with a lot more writes. Does anyone know if that’s safe to do?

Not being a programmer, I’m not sure whether to report this bug to CentOS, the kernel itself, or btrfs. The bio system seems to be really only documented for programmers and I can’t find a way to clear up the RAM usage.

HittingSmoke · May 31, 2016, 3:13pm

Found it. This is the commit that fixes the memory leak.

Flyer · May 31, 2016, 3:52pm

Hi @HittingSmoke Chris, found this is on 4.7-rc1 kernel, so we’ll have to wait for it (Rockstor Kernel usually some versions behind current kernel).

HittingSmoke · May 31, 2016, 5:36pm

Does this mean the Rockstor kernel is an unmodified CentOS elrepo mainline kernel or are there patches or config modifications specific to Rockstor?

phillxnet · May 31, 2016, 8:24pm

@HittingSmoke Well done on finding out and posting the issue, and nice report by the way, both here and on the linux-btrfs mailing list.

Linking to your post there for context:
https://mail-archive.com/linux-btrfs@vger.kernel.org/msg54254.html

And yes Rockstor uses a completely unmodified elrepo ml kernel.

HittingSmoke · May 31, 2016, 8:47pm

Thanks. I just pulled the sources and patched it up myself so I can finish the replace operation without the machine crashing once a day. Next time the machine dies I’m going to install the patched kernel then go back to the unmodified version after the replace is complete.

suman · May 31, 2016, 10:46pm

Just an FYI, I was looking forward to 4.5.5, but it’s not yet available on elrepo. On the other hand, I’ve started using 4.6. So may be I’ll release it on testing channel soon.

Oh, and thanks @HittingSmoke, for this post. very useful information indeed.

HittingSmoke · June 1, 2016, 4:51am

The patch in question is currently new to 4.7rc1 so 4.6 or 4.5 wouldn’t help.

However I’m not sure this patch fixes the issue. I downloaded the elrepo 4.4 sources, applied the patch manually, compiled an rpm on a VM, then installed it on the problem machine and booted the patched kernel. It’s a bit too early to tell with absolute certainty but it appears to still be leaking memory via bio. The bio slab usage was pushing 2GB last I looked and was not fluctuating. Only growing. And htop was still reporting he memory as userland.

By tomorrow morning the memory should have a chance to break 50% usage which is about where the kernel should start releasing cache so I’ll be able to tell for sure and go back to the mailing list if this requires a fresh bug report.

Of anyone has dev machines that they can help test this on it would be helpful. A 20TB RAID6 array is not exactly something many people have laying around and are willing to screw with. When this replace operation is done I won’t be able to do any more troubleshooting.

HittingSmoke · June 1, 2016, 10:57pm

Apparently I don’t understand rpmbuild and my patch didn’t take the first time. After I patched the source properly I confirmed this as fixing the memory leak on a VM using loop devices and now the production machine is running without bio growing unstoppably.

If anyone runs into the same problem before 4.7 hits Rockstor, I can provide a 4.4 kernel package with the memory leak fixed, or if you’re smart and don’t install kernels compiled by strangers on the internet I can provide the spec file and patch to compile it yourself.

This is the .spec file I used: http://pastebin.com/kV62pGvk

This is the patch: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=4673272f43ae790ab9ec04e38a7542f82bb8f020

You need the Elrepo .nosrc.rpm package for the kernel, as well as the kernel sources from kernel.org. After that the CentOS wiki has all the information you need to compile with my spec file. If you put the spec file and the patch file in the right places it should compile no problem.

https://wiki.centos.org/HowTos/I_need_the_Kernel_Source
https://wiki.centos.org/HowTos/Custom_Kernel
http://elrepo.org/linux/kernel/el7/SRPMS/

suman · June 1, 2016, 11:17pm

sure, I was just giving information about our progression.