Unknown internal error doing a PUT to /api/pools/2/remove

iecs · December 14, 2019, 11:43pm

when removing a disk from a pool

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

the version of rockstor 3.9.1-16

Detailed step by step instructions to reproduce the problem

[write here]

Web-UI screenshot

Error Traceback provided on the Web-UI


[paste here]

but it is strange that when I am trying to remove another disk

it says a removing is in process

Error running a command. cmd = /sbin/btrfs device delete /dev/disk/by-id/ata-ST5000LM000-2AN170_WCJ1CPJH /mnt2/st5000danpan. rc = 1. stdout = [‘’]. stderr = [“ERROR: error removing device ‘/dev/disk/by-id/ata-ST5000LM000-2AN170_WCJ1CPJH’: add/delete/balance/replace/resize operation in progress”, ‘’]

        Traceback (most recent call last):

File “/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py”, line 41, in _handle_exception
yield
File “/opt/rockstor/src/rockstor/storageadmin/views/pool.py”, line 470, in put
resize_pool(pool, dnames, add=False)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 210, in resize_pool
return run_command(resize_cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 121, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /sbin/btrfs device delete /dev/disk/by-id/ata-ST5000LM000-2AN170_WCJ1CPJH /mnt2/st5000danpan. rc = 1. stdout = [‘’]. stderr = [“ERROR: error removing device ‘/dev/disk/by-id/ata-ST5000LM000-2AN170_WCJ1CPJH’: add/delete/balance/replace/resize operation in progress”, ‘’]

the other disk I ordered to be removed before is ata-ST5000LM000-2AN170_WCJ1F9RB

phillxnet · December 15, 2019, 12:22pm

@iecs Hello again and thanks for the report.

However as this is from our now retired CentOS testing channel it has been superseded by out Stable channel. In part your issue may be due to a still ongoing balance but as the legacy testing channel was unable to monitor / assess this from the Web-UI it’s difficult to tell if one is still running. Especially in the case of a disk removal ongoing.

As from Stable release version 3.9.2-49:
it is now possible to assess an ongoing ‘internal’ (read disk removal) event via the Web-UI:

specifically:
pool resize disk removal unknown internal error and no UI counterpart. Fixes #1722 1 @phillxnet

where #1722 was a biggie:

https://github.com/rockstor/rockstor-core/pull/2010

which in turn required quite a few other post legacy testing channel improvements to be in place first.

The main linked issue in that pull request was:

https://github.com/rockstor/rockstor-core/issues/1722

which in turn links to the following forum thread:

In effect you just have to wait for the initial ‘remove disk’ internal balance to finish. You system as is should then return to normal function. It was a non trivial task for Rockstor’s Web-UI to track these ‘internal’ disk removal events but it is now accomplished post 3.9.2-49 + versions. But our last legacy CentOS testing channel release was way before this improvement.

Hope that helps.

iecs · December 17, 2019, 9:17am

Thanks for your detailed reply and I just destroyed the whole array by a mistaken reboot

but luckily all those data is rebuildable and I reinstalled the instance and let those disks work on the safe raid10 mode

hope to have it works as designed

but I will not try to remove disks from it for a long while.