Disk Fail problem

hscott · June 7, 2022, 2:17pm

I had a new disk fail right after adding it to the pool. Now I have lost the pool and can’t resize the pool to remove the failed drive from the pool. How do I remove the “detached” drive without losing all stored pool data?
Error information:

Houston, we’ve had a problem.

Pool member / raid edits require an active mount. Please see the “Maintenance required” section.

        Traceback (most recent call last):

File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 68, in run_for_one
self.accept(listener)
File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 27, in accept
client, addr = listener.accept()
File “/usr/lib64/python2.7/socket.py”, line 206, in accept
sock, addr = self._sock.accept()
error: [Errno 11] Resource temporarily unavailable

phillxnet · June 7, 2022, 7:56pm

@hscott Welcome the the Rockstor community.

Regarding your options here it all depends on the btrfs-raid level you chose while creating the pool. And if that level had redundancy sufficient for the number of disk that failed within it. One in your case. Also you mention:

If the add operation (or rather the implicit balance there after) failed to complete you may also not have full redundancy. I.e. if adding a disk to a single pool and moving to raid1. Although this is likely recoverable still in most situation I think as the new disk was the one bringing the additional redundancy.

From you error message you look to have an unmounted pool situation. So you may require, as per the advise within the pool details page of the Web-UI, to use a degraded mount option. Btrfs refuses to mount a pool with missing members. The Web-UI should be indicating this ‘missing’ member status for the pool. I’m assuming here you are running our v4 or at least a stable v3 as both have these guides built in for how to proceed.

Initially take a look at our newly re-written howto that covers some of this and links back to other relevant parts of the docs:
“Data Loss-prevention and Recovery in Rockstor” https://rockstor.com/docs/data_loss.html

Be sure to read it all and follow the various links. But in short the Web-UI elements within v4 (later v3 stable) should help you get sorted if it’s possible. And for an overview of the systems btrfs pool states the following command line may help for further posts here:

btrfs fi show

executed as the root user. It may help others here know the rough arrangement of your setup.

It is quite normal for btrfs to not mount a pool when it is missing a drive. You just need to add a degraded mount option and as per the Web-UI elements detailed in the above guide you may also want to use a ro (read only) option first to refresh you backups. But again the redundancy level is key here so more info would be needed for folks to help here on the forum and that guide will help you to answer some of your questions initially.

Let us know how it goes. And do keep in mind that if you are running a v3 testing there is little or no help from the Web-UI in this case. But our new V4 has all that we currently have available. Although if you are considering an install of v4 to attempt the repair do keep in mind that the initial import will be of a poorly pool and there is a special doc section for this.

The most specific part you need to be aware of within that guide is the following:
Web-UI and Data-Integrity threat monitoring
subsection:
Pool degraded indicators: https://rockstor.com/docs/data_loss.html#pool-degraded-indicators
but do read the entire doc as it’s really not very long.

Hope that helps.

hscott · June 7, 2022, 11:00pm

This is the response I get. -bash: btrfs: command not found

04%20PM

Flox · June 8, 2022, 12:52am

Hi there @hscott,

The btrfs command that @phillxnet listed require root privilege and I believe they assume you would be logged in as root; this is why you see this error, I believe.
Try either login in as root via ssh, or when logged in as a regular user like you did in your screenshot, running the command with sudo: sudo btrfs fi show

Hope this helps,

hscott · June 8, 2022, 4:59am

It probably would work if I had the sudo password.

I don’t remember it in the setup.

Thank you,

Scot

phillxnet · June 8, 2022, 11:10am

@hscott Re:

This password is the one for the ‘root’ user and was set during a v4 install:

‘Rockstor’s “Built on openSUSE” installer’: https://rockstor.com/docs/installation/installer-howto.html

in the following install stages:

‘Enter Desired root User Password’: https://rockstor.com/docs/installation/installer-howto.html#enter-desired-root-user-password
‘Confirm root User Password’:https://rockstor.com/docs/installation/installer-howto.html#confirm-root-user-password

That may help to jog your memory, assuming this is a v4 install.

If you have a relatively simple setup and have no data on the system disk (the “ROOT” pool) then you could just re-install and then try importing the pool via the “poorly pool route” documented here:

‘Import unwell Pool’: https://rockstor.com/docs/interface/storage/disks.html#import-unwell-pool

As your pools is missing a member, it will be classed as unwell and so will require a degraded mount option such as is used in that example.

It’s always a good idea to take a note of the root password set during install as it will always be required for more system critical maintenance. There are procedures to reset it you have local access but a simple re-install may just be simpler, plus if you are on v3 still then best you move to v4 anyway as the btrfs code in our new upstream are far more mature.

If you do have data on the current system disk then that complicates things, hence our Web-UI warnings against creating shares on the system disk, it violates system/data seperation. But if this is the case your fresh install could be to a new system disk. And once the pool is repaired via the Web-UI advise given and that data loss prevention guide you can always move the pool back to that original system disk and copy the system disk resident data over to the data pool, and move the pool back to the new install.

Let us know how it goes. But without root access the system is just not fullly accessible to you. However you can still enable the ro,degraded mount options I referenced from within the Web-UI and follow the Pool specific “Maintenance required” contextually sensitive advise given there. All without the root password.

In short we must know if you are on v3 testing or v3 stable (and updated) or v4. And you really must maintain root password awarness to leave your options open. But the re-install route is really only a few minutes on most more basic home setups. Plus if you have a recent config save file you can use this facility to aid the re-setup of any re-install.

‘Configuration Backup and Restore’: https://rockstor.com/docs/interface/system/config_backup.html

Throw yourself a bone here and give us some more info on this setup and others here are then more likely to be able to help. I.e. we know currently nothing about the raid level used the Rockstor version used, the update channel used the number of disks before or after but we can assume one disk was involved that is now missing. It was new, but not necessarily new hardware as it has been reported as failed on adding. But was that a failed add, or an interrupted balance there after (if it was then the poorly pool import advise above would also require the given example option of skip_balance). This way you can help with resourcing the knowledge here that may help get you up and running again. The btrfs comand was to get some of this info, but it is all available within the Web-UI and from your knowledge of this systems history. But without the root password you are still limited in your options so there is that. And such command line facilities are quicker and easier than multiple screen shots in some cases.

Hope that helps.

hscott · June 8, 2022, 1:53pm

Thank you for the help.

hscott · June 8, 2022, 4:01pm

Here is the info on the drives. I have added the fourth drive for the pool.

scotth@snet:/> sudo btrfs fi show
Label: ‘ROOT’ uuid: 4ac51b0f-afeb-4946-aad1-975a2a26c941
Total devices 1 FS bytes used 1.72GiB
devid 1 size 463.70GiB used 3.77GiB path /dev/nvme0n1p4

bad tree block 5557561327616, bytenr mismatch, want=5557561327616, have=0
Couldn’t read tree root
Label: ‘scott’ uuid: 30b80bf1-6650-4e59-9cbd-fd9f59b35531
Total devices 4 FS bytes used 2.82TiB
devid 1 size 5.46TiB used 1.42TiB path /dev/sda
devid 2 size 5.46TiB used 1.42TiB path /dev/sdb
devid 3 size 5.46TiB used 290.00GiB path /dev/sdd
*** Some devices missing

Label: ‘scott’ uuid: 7f91929c-cb4e-45c4-aa56-30e1fa7983b6
Total devices 1 FS bytes used 128.00KiB
devid 1 size 5.46TiB used 2.02GiB path /dev/sdc

phillxnet · June 11, 2022, 2:52pm

@hscott Hello again:
Re:

I’m afraid I don’t understand this, are you saying you command line added this drive?

Which brings me to another point. You currently have 2 pools called “labeled” identically. This is a situation that is not compatible or normally even achievable via the Web-UI:

Label: ‘scott’ uuid: 30b80bf1-6650-4e59-9cbd-fd9f59b35531
Label: ‘scott’ uuid: 7f91929c-cb4e-45c4-aa56-30e1fa7983b6

This will definitely massively confuse Rockstor’s Web-UI. Disks are added to pools via a specific btrfs command. Here you appear to have created 2 pools (canonical reference is the uuid) by the same label. You must first remove the unwanted single disk pool by the duplicate label. If there is no data on it you can just do a “wipefs -a /dev/disk/by-id” and don’t get that name wrong. Or relabel that drive via it’s uuid. As you currently have 3 pools. The system pool, and two other pools each with 3 and then 1 disk. They are not associated and giving pools the same name is not Rockstor compatible.

Again answers to all prior questions will help here, i.e. history and version / update channel as if you are on too old a version you may as well use v4 to do the repair. Raid level of pool with missing prior member. Some, if not all, history. But no version of Rockstor can create or maintain pools of the same label.

And once you have the single drive pool uuid 7f91929c-cb4e-45c4-aa56-30e1fa7983b6 detached physically form the system you should be able to at least approach the repair of the 3 (once 4 presumaly in intent) pool more easily. Rockstor mounts by pool label/name currently. Hence this being an incompatible scenario.

Hope that helps.