Unresponsive system

Flox · January 8, 2020, 10:42pm

Allow me to chip in on this one:

I would personally try Leap 15.1 first as I see it as a lot more safe than Tumbleweed and received a ton of backports from the newer kernel. As a result, from a Btrfs perspective, it is far ahead from a generic 4.12 kernel. See below for examples:

On a freshly-updated Leap 15.1 system, for instance, you can see that the kernel package was updated just a month ago:

rockdev:~ # zypper info kernel-default
Loading repository data...
Reading installed packages...


Information for package kernel-default:
---------------------------------------
Repository     : Main Update Repository
Name           : kernel-default
Version        : 4.12.14-lp151.28.36.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 311.1 MiB
Installed      : Yes
Status         : up-to-date
Source package : kernel-default-4.12.14-lp151.28.36.1.nosrc
Summary        : The Standard Kernel
Description    :
    The standard kernel for both uniprocessor and multiprocessor systems.


    Source Timestamp: 2019-12-06 13:50:27 +0000
    GIT Revision: 8f4a495fffe8cb92ade0b0afc0abe10de21a1d4a
    GIT Branch: openSUSE-15.1

If we look at the Btrfs-related changes, we have (listing only the 40 most recent ones):

rockdev:~ # rpm -q --changelog kernel-default | grep btrfs | head -n 40
- btrfs: tracepoints: Fix bad entry members of qgroup events
- btrfs: tracepoints: Fix wrong parameter order for qgroup  events
- btrfs: qgroup: Always free PREALLOC META reserve in
  btrfs_delalloc_release_extents() (bsc#1155179).
- btrfs: block-group: Fix a memory leak due to missing
  btrfs_put_block_group() (bsc#1155178).
- btrfs: Ensure btrfs_init_dev_replace_tgtdev sees up to date
- btrfs: remove wrong use of volume_mutex from
  btrfs_dev_replace_start (bsc#1154651).
- btrfs: Ensure replaced device doesn't have pending chunk allocation (bsc#1154607).
- btrfs: qgroup: Fix reserved data space leak if we have  multiple
- btrfs: qgroup: Fix the wrong target io_tree when freeing
- btrfs: relocation: fix use-after-free on dead relocation  roots
- Btrfs: do not abort transaction at btrfs_update_root() after
- Btrfs: remove BUG() in btrfs_extent_inline_ref_size
- Btrfs: convert to use btrfs_get_extent_inline_ref_type
- blacklist.conf: Add invalid btrfs commits
  patches.suse/btrfs-add-missing-inode-version-ctime-and-mtime-upda.patch.
- btrfs: start readahead also in seed devices (bsc#1144886).
- btrfs: clean up pending block groups when transaction commit
- btrfs: handle delayed ref head accounting cleanup in abort
- btrfs: add cleanup_ref_head_accounting helper (bsc#1050911).
- btrfs: fix pinned underflow after transaction aborted
- btrfs: Fix delalloc inodes invalidation during transaction abort
- btrfs: Split btrfs_del_delalloc_inode into 2 functions
- btrfs: track running balance in a simpler way (bsc#1145059).
- btrfs: use GFP_KERNEL in init_ipath (bsc#1086103).
- btrfs: scrub: add memalloc_nofs protection around init_ipath
- patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch:
  patches.suse/btrfs-reloc-also-queue-orphan-reloc-tree-for-cleanup-to-avoid-bug_on.patch.
  patches.suse/btrfs-fix-wrong-ctime-and-mtime-of-a-directory-after.patch.
- btrfs: qgroup: Check bg while resuming relocation to  avoid
  patches.suse/btrfs-reloc-also-queue-orphan-reloc-tree-for-cleanup-to-avoid-bug_on.patch.
- btrfs: don't double unlock on error in btrfs_punch_hole
- btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid  BUG_ON() (bsc#1133612).
  patches.suse/0001-btrfs-extent-tree-Fix-a-bug-that-btrfs-is-unable-to-.patch.
- btrfs: qgroup: Don't scan leaf if we're modifying reloc  tree
- btrfs: extent-tree: Use btrfs_ref to refactor
  btrfs_free_extent() (bsc#1063638 bsc#1128052 bsc#1108838).
- btrfs: extent-tree: Use btrfs_ref to refactor

Now, looking at the most recent change listed there…

- btrfs: tracepoints: Fix bad entry members of qgroup events

… we can see it actually was submitted last October for inclusion in kernel 5.4-rc5 (unless I misunderstand this, of course):
https://lkml.org/lkml/2019/10/22/440

Now, looking at a freshly-updated Tumbleweed:

rockdev:~ # zypper info kernel-default
Loading repository data...
Reading installed packages...


Information for package kernel-default:
---------------------------------------
Repository     : Main Repository (OSS)
Name           : kernel-default
Version        : 5.3.12-2.2
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 163.1 MiB
Installed      : Yes
Status         : up-to-date
Source package : kernel-default-5.3.12-2.2.nosrc
Summary        : The Standard Kernel
Description    :
    The standard kernel for both uniprocessor and multiprocessor systems.


    Source Timestamp: 2019-11-21 07:21:43 +0000
    GIT Revision: a6f60814d3dbf81b05caf84e6143251ca14f5f37
    GIT Branch: stable

and most importantly:

rockdev:~ #  rpm -q --changelog kernel-default | grep btrfs | head -n 40
- btrfs: block-group: Fix a memory leak due to missing
  btrfs_put_block_group() (bnc#1151927).
- btrfs: don't needlessly create extent-refs kernel thread
- btrfs: tracepoints: Fix wrong parameter order for qgroup events
- btrfs: tracepoints: Fix bad entry members of qgroup events
- btrfs: fix uninitialized ret in ref-verify (bnc#1151927).
- btrfs: fix incorrect updating of log root tree (bnc#1151927).
- btrfs: fix balance convert to single on 32-bit host CPUs
- btrfs: allocate new inode in NOFS context (bnc#1151927).
- btrfs: relocation: fix use-after-free on dead relocation roots
- btrfs: delayed-inode: Kill the BUG_ON() in
  btrfs_delete_delayed_dir_index() (bnc#1151927).
- btrfs: extent-tree: Make sure we only allocate extents from
- btrfs: tree-checker: Add ROOT_ITEM check (bnc#1151927).
- btrfs: Detect unbalanced tree with empty leaf before crashing
- btrfs: fix allocation of free space cache v1 bitmap pages
- btrfs: Relinquish CPUs in btrfs_compare_trees (bnc#1151927).
- btrfs: adjust dirty_metadata_bytes after writeback failure of
- btrfs: qgroup: Fix the wrong target io_tree when freeing
- btrfs: qgroup: Fix reserved data space leak if we have multiple
- btrfs: Fix a regression which we can't convert to SINGLE profile
  writeback attempts (btrfs hangup).
- btrfs: trim: Check the range passed into to prevent overflow
- btrfs: qgroup: Don't hold qgroup_ioctl_lock in
  btrfs_qgroup_inherit() (bnc#1012628).
- btrfs: Flush before reflinking any extent to prevent NOCOW write
- btrfs: fix minimum number of chunk errors for DUP (bnc#1012628).
- btrfs: tree-checker: Check if the file extent end overflows
- btrfs: inode: Don't compress if NODATASUM or NODATACOW set
- btrfs: shut up bogus -Wmaybe-uninitialized warning
- btrfs: correctly validate compression type (bnc#1012628).
  patches.suse/btrfs-8447-serialize-subvolume-mounts-with-potentially-mi.patch
- btrfs: start readahead also in seed devices (bnc#1012628).
  patches.kernel.org/5.1.8-025-btrfs-qgroup-Check-bg-while-resuming-relocation.patch
- btrfs: correct zstd workspace manager lock to use spin_lock_bh()
- btrfs: qgroup: Check bg while resuming relocation to avoid
- btrfs: reloc: Also queue orphan reloc tree for cleanup to
- btrfs: don't double unlock on error in btrfs_punch_hole
- btrfs: Check the compression level before getting a workspace
- Btrfs: do not abort transaction at btrfs_update_root() after

As you can see by looking only at the most recent changes, we only are missing 3 changes in the Leap 15.1 kernel when compared to the Tumbleweed one (Btrfs-related, of course).

Hope this helps,