NMI watchdog: BUG: soft lockup

Hi,

With the latest update (or at least a very recent update), I have notice several of these pop up in my logs:

NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kswapd1:222]

I first thought it might be a hardware issue, maybe one of my CPU’s was acting up, however on closer scrutiny it happens on all of them, which I think might rule out a hardware issue.

There’s some writings about this bug on the net, but I haven’t really been able to nail down what causes it yet.

Any thoughts?

Peter.

1 Like

I too am having this exact same issue. Been going on pretty much since I installed Rockstor. Here are some of the kernel functions running and eating up high amounts of CPU when this happens:

37.91% [kernel] [k] native_queued_spin_lock_slowpath
27.23% [kernel] [k] multi_cpu_stop

11.97% [kernel] [k] smp_call_function_single
9.72% [kernel] [k] native_queued_spin_lock_slowpath
9.10% [kernel] [k] smp_call_function_many
4.27% [kernel] [k] __list_del_entry

11.97% [kernel] [k] smp_call_function_single
9.72% [kernel] [k] native_queued_spin_lock_slowpath
9.10% [kernel] [k] smp_call_function_many
4.27% [kernel] [k] __list_del_entry

I’m on 4.8.7-1.el7.elrepo.x86_64, with 32GB of ECC DDR4 and a Broadwell-E dual socket 12 core Xeon

My 2 cents:
on a Rockstor development environment had a yum update and after that a kernel panic, solved reinstalling 4.8.7-1.el7.elrepo.x86_64.

@suman & @phillxnet ?

Mirko

I’ve never updated these servers via YUM… always let rockstor handle the updates, but I’m thinking about denying internet access to these production machines as they have started behaving very unstable. Today I had a complete system stop, where the entire server grinded to a halt. I managed to ssh into it (after waiting for several minutes for a prompt), and could see that the machine was swapping like crazy. It had several crashes in the log and kswapd was at a 100%. I’m now running constant logs on these machines.

These cpu softlocks are still happening after latest update. Causes 100% usage of a process called kswapd1 and then crashes multiple times.