Pool Dissapeared

RankAmateur · November 4, 2017, 6:49pm

I rebooted Rockstor (v3.9.1-0) - had to restart the server hardware. The whole thing has been working for 6-8 months with no problem (Rockstor under Proxmox - disks passed through). The physical server has been restarted before, so that’s not new. This time, my main pool vanished. The disks are there, as is the default pool that gets created at installation. But the big pool (NAS_pool1) is gone. Absolutely nothing happened other than the power cycle - no updates, no changing anything, just shutdown and startup.

On the ‘disks’ page, I see the “click to import data” button for the disks, but if I try that, I get the following error:

        Traceback (most recent call last):

File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 700, in _btrfs_disk_import
mount_root(po)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 252, in mount_root
run_command(mnt_cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 115, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /bin/mount /dev/disk/by-label/NAS_pool1 /mnt2/NAS_pool1. rc = 32. stdout = [’’]. stderr = [‘mount: wrong fs type, bad option, bad superblock on /dev/sdb,’, ’ missing codepage or helper program, or other error’, ‘’, ’ In some cases useful info is found in syslog - try’, ’ dmesg | tail or so.’, ‘’]

The last few lines from dmesg are:
[ 124.287266] BTRFS info (device sdb): disk space caching is enabled
[ 124.287277] BTRFS info (device sdb): has skinny extents
[ 124.288978] BTRFS error (device sdb): failed to read the system array: -5
[ 124.307136] BTRFS error (device sdb): open_ctree failed
[ 147.561919] BTRFS info (device sdb): disk space caching is enabled
[ 147.561922] BTRFS info (device sdb): has skinny extents
[ 147.562888] BTRFS error (device sdb): failed to read the system array: -5
[ 147.578189] BTRFS error (device sdb): open_ctree failed
[ 182.429457] BTRFS info (device sdb): disk space caching is enabled
[ 182.429459] BTRFS info (device sdb): has skinny extents
[ 182.430491] BTRFS error (device sdb): failed to read the system array: -5
[ 182.445946] BTRFS error (device sdb): open_ctree failed

I suppose I could start over - again - and redownload everything from backup - again - but disk pools shouldn’t just go away. According to SMART both disks are good. No indication of a disk failure that I can see.

Any ideas? I’m not going to play with it for a few days in hopes that someone will have some suggestion for what to do. Otherwise, next month has a big S3 bill again.

Thanks.

phillxnet · November 4, 2017, 9:52pm

@RankAmateur A belated welcome to the Rockstor community.

and

are mount and btrfs short hand for ‘pool is not healthy’ and as Rockstor currently has no significant UI component for aiding in pool repair, your options are the obvious wipe plus re-create fs and restore from backup, as you indicated, or to embark on attempting a pool repair via the command line. Upon a successful repair you should be able to re-import the pool, if it doesn’t just re-appear on next boot after repair that is. Rockstor’s behaviour in this scenario is a little better in the latest testing channel release (it identifies the pool as unmounted in red etc rather than a no show), but is not significantly different with regard to having to repair the pool via command line.

So short of it is your pool is not healthy and so will not mount. This is btrfs default behaviour. Sometimes issues can show up only at mount time, hence your circumstance. Have you had regular scrubs scheduled as these can also help with finding / resolving issues? It may help others here advise on the best cause of action, pool repair wise, if you could post the output of:

btrfs fi show

run as root.

Hope that helps.

KarstenV · November 6, 2017, 6:22am

I have experienced (one time) that my pool disappeared, and that btrfs gave me the “openctree failed” error.
Also because of a routine reboot.

I went through some troubleshooting, but couldnt get the pool to mount, not even degraded. It was very mysterious and frustrating.

I thought I had lost all data, so decided to reinstall Rockstor. After reinstall, the pool was suddenly visible, I imported it, and it has worked since then without problems.

I still don’t know what the problem was back then, and dont know if I should write it off as a btrfs error or a Rockstor / Centos problem.

But before doing anything drastic, try the reinstall, it could fix your problem.

RankAmateur · November 7, 2017, 12:50pm

Wow, that actually helped. Thank you! So, one of the disks is working fine and I’m not sure about the other. The pool appears ok in the gui, both drives assigned. I can access the data. But btrfs fi show tells me that only one of the drives is actually functioning (physically, they’re both fine) in the pool. Can’t show the output right now because I’m not at the server right now.

I tried to remove the bad drive from the pool, via gui, and it looked like it was working. But, once the balance was over, nothing actually changed in the gui. I can attempt to add it back into the pool via cli. A little afraid that I’ll lose data without being immediately aware that it’s lost.

I get that this isn’t quite production software at this point and that’s ok. I’m not sure that I understand how the gui and cli give different information, though. That concerns me.

KarstenV · November 7, 2017, 1:43pm

You’re welcome.

I was just sharing my experiences.

I think we can safely say that neither Rockstor nor btrfs handles disk failures very elegantly. And there are strange bugs present.

I find it disturbing that a OS reinstall makes a btrfs pool go from completely unworkable, to fully accessible. That shouldn’t be happening, except if something has gone wrong with the OS itself. That can happen, but shouldn’t because of a simple reboot…

Haioken · October 23, 2018, 3:57am

2 Paragraphs stolen from previous posts (word for word) and an advertising link

… SPAM