Error running a command. cmd = /sbin/btrfs filesystem show /dev/disk/by-id/sda3. rc

coronic · February 17, 2020, 11:30am

Brief description of the problem

While trying to scan disks in the WebUI on a freshly installed Rockstor on a VM in ESXi, the following error traceback is being presented and no disks are shown.
I looked sda3 up with parted and it is a btrfs filesystem.

Detailed step by step instructions to reproduce the problem

Install Rockstor in a virtual machine
Log in to the WebUI
Go to Storage > Disks
Click “Rescan”
Error appears!

Web-UI screenshot

rockstor-bug|690x277

Error Traceback provided on the Web-UI


            Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 377, in post
    return self._update_disk_state()
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py", line 145, in inner
    return func(*args, **kwargs)
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 322, in _update_disk_state
    p.uuid = btrfs_uuid(dob.name)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1055, in btrfs_uuid
    [BTRFS, 'filesystem', 'show', '/dev/disk/by-id/%s' % disk])
  File "/opt/rockstor/src/rockstor/system/osi.py", line 115, in run_command
    raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /sbin/btrfs filesystem show /dev/disk/by-id/sda3. rc = 1. stdout = ['']. stderr = ['ERROR: not a valid btrfs filesystem: /dev/disk/by-id/sda3', '']

coronic · February 17, 2020, 11:57am

I found out the device file /dev/disk/by-id/sda3 didn’t even exist.
I don’t know why Rockstor tries to access it, but I created a symlink to /dev/sda3 and now it seems to work.

It would be even better without this dirty workaround, but I don’t know where to start debugging the WebUI.

phillxnet · February 17, 2020, 4:21pm

@coronic Welcome the Rockstor community.
Re:

The by-id device link doesn’t exist as you have likely not met the minimum system requirements:

Docs - Quick Start - Minimum system requirements (which is the first section):

All drives must have unique serial numbers (real drives do); not all VM [*] systems default to this.

and it also offers a fix for VMware:

For VMware ensure you have disk.EnableUUID=“true” in your .vmx file

This is most likely you issue. The

Will not surfice. If you cannot assign or surface a unique serial for each and every device then your hyper visor setup is not compatible with Rockstor. All real devices have serial numbers and Rockstor depends on them as it depends on the by-id names and they, in-turn, are generated by udev which in-turn needs those serials. KVM and VMware can generate these serials.

Hope that helps. And you should remove that symlink as it will come back to haunt you and massively confuse Rockstor as it is very strongly dependant on by-id names.

coronic · February 18, 2020, 11:37am

Hello Philip,
thank you for your quick response.

disk.EnableUUID=“true” was set set for my VM from the beginning, so I don’t think that’s the problem. The web ui also shows all drives correctly.

Fortunately I’m only running it virtual for evaluation. The final setup will be on bare metal, so I hope there will be no such problem.

Best regards
Mario

phillxnet · February 18, 2020, 12:53pm

@coronic Hello again.

So that’s very strange. As I’ve only seen that as a cause for this issue. We depend on the by-id names, as stated and they are usually missing with no serial, which VMware defaults to. Hence the suggestion. So now I’ve no idea currently why you are not getting these by-id names.

It may be helpful to see the output of the following within your particular config as we have many folks use VMware successfully once that setting is in place:

ls -la /dev/disk/by-id/

All your drives and their respective partitions should be represented.

Incidentally in our next release we offer warnings about this missing serial, but that wont be out until we release our next ISO which wont be based on CentOS but openSUSE Leap15.1 so all change in the near future. Just so you know as this serial issue comes up very often but I’ve yet to see a non serial issue in VMware not be sorted by that setting. What is your configuration such that there is no serial. Are you maybe ‘passing’ real drives through or the like, as that may account for the serial loss, or the missing by-id names at least.

Also, out of curiosity, what version of Rockstor are you evaluating?

Hope that helps and yes Rockstor was always intended to be on bare metal, given our linux base I see one of our strengths as hardware compatibility, at least once we get our openSUSE move sorted.

coronic · February 19, 2020, 9:09am

Hello,

the problem occurred initially in Rockstor 3.9.1 right after the install. But now after I updated to the newest Testing and rebooted, it went away. Now all my drives are visible under /dev/disk/by-id.

For folks looking at the thread in the future, my ESXi VM configuration:

Thank you very much for your help and quick response.

Best regards
Mario

phillxnet · February 19, 2020, 10:23am

@coronic Thanks for the update, that’s really helpful.

Yes our ISO is now a little over 2.5 years old as of writing this.

I now think what has happened, in your case, is an upstream update (from the RedHat/CentOS folks) to libblk or udev has sorted your issue and so the Rockstor code was then able to pickup the now existing by-id names. Good to know. We have seen this with for example nvme devices.

And although newer our now deceased for CentOS Testing channel is only 3 month younger at still over 2 years out of date:

3.9.2 Stable Channel update was the same as 3.9.1-16 testing channel for our CentOS based offering. Please see the following forum thread for some background on this:

Where as our latest Stable channel release for CentOS (and openSUSE only Testing Channel releases) are now 8 days old as of writing with 3.9.2-53.

And for our modern testing channel releases, our “Built on openSUSE” variant, see the following thread:

We are currently aiming to achieve feature parity in our openSUSE testing channel, or near enough, before we establish a new ‘Built on openSUSE’ Stable Channel and spin off the new openSUSE only testing channel to address our fairly massive technical debt so that our current supporters in the CentOS Stable channel will have a workable system to transition to, with upstream kernel/btrfs updates / boot to snapshot etc, while we get on with working through our backlog of moving to Python 3, Django 2, possibly a new build system etc within the testing channel.

Note that if you do fancy evaluating our most recent code, on the current CentOS base, then you should first read the following post which addresses a critical bug when moving to Stable channel updates from the now defunct for CentOS base Testing channel:

Hope that helps and thanks again for sharing your findings, much appreciated.