Shares showing up unmounted, but I think it is failing before mounting during SMART startup

Sweepy · May 7, 2023, 5:54am

Problem:
Getting an error when booting and SMART is starting (SMART is showing failed in the startup output). I think this is leading to some shares not being mounted.

Backstory:
This is a bit of a long story, but I realized that I ran out of space when I accidentally wrote to the wrong mount location when trying to do a backup through rsync. I was able to delete the location, but somehow this was copied to .snapshots as well. This actually caused me to run out of space on startup and the webUI was not starting because of this. I tried to fix this the only way I knew by trying to manually delete the snapshot in the .snapshots folder. I was unable to do this in the beginning because it gave me the error “unable to delete due to filesystem being read-only”. I found a forum post where @phillxnet informed someone to use “brtfs property set /dir/you/want ro false” so I proceeded to do that to the directory I wanted to delete and it allowed me to do so. I proceeded to delete the snapshot (still has some /mnt links in there because it says I can’t delete those… not sure about that), but everything else is gone and I got my space back.

After all of that I the webUI was able to start again, but I realized that some of the shares I had were not mounting. I looked in the logs and found that it seemed like there was now an issue of the SMART application. It seems to be throwing a flag when trying to load the Seagate external drive I have. Weird thing is is that the shares on that drive seem to be mounting properly. Below is the log output:

[07/May/2023 01:09:01] ERROR [storageadmin.views.disk:480] Error running a command. cmd = /usr/sbin/smartctl --info /dev/disk/by-id/usb-Seagate_Expansion_NA82LGB6-0:0. rc = 2. stdout = [‘smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.24.60-default] (SUSE RPM)’, ‘Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org’, ‘’, ‘Read Device Identity failed: scsi error unsupported field in scsi command’, ‘’, “A mandatory SMART command failed: exiting. To continue, add one or more ‘-T permissive’ options.”, ‘’]. stderr = [‘’]
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 477, in _update_disk_state
do.name, do.smart_options
File “/opt/rockstor/src/rockstor/system/smart.py”, line 338, in available
[SMART, “–info”] + get_dev_options(device, custom_options)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 227, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/sbin/smartctl --info /dev/disk/by-id/usb-Seagate_Expansion_NA82LGB6-0:0. rc = 2. stdout = [‘smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.24.60-default] (SUSE RPM)’, ‘Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org’, ‘’, ‘Read Device Identity failed: scsi error unsupported field in scsi command’, ‘’, “A mandatory SMART command failed: exiting. To continue, add one or more ‘-T permissive’ options.”, ‘’]. stderr = [‘’]

I have no idea if any of this has anything to do with why the shares not being mounted. I thought I would ask if there was a solution that someone could think of, but I am pretty close to the “reinstall everything” step if I can’t find a path forward. I would have already started reinstalling everything, but I am not confident that I would be able to do that without losing data on my pools or what the correct process is to set them up again when everything is reinstalled…

Thanks for any help that anyone can give me.

Sweepy

phillxnet · May 7, 2023, 4:27pm

@Sweepy Hello again.
Re:

Yes, the ‘ROOT’ pool (system drive) is subject to some default snapshots. So although you may have deleted the current ‘version’ of this accidental system drive data, it had already by then been incorporated into a snapshot. We have boot to snapshot enabled by default also.

All sorts of things go wrong once the system drive runs out of space.

I don’t think this is related actually, it just means that smart can’t get the required info from the drive.

I.e. Rockstor is querying the drive with the following command:

/usr/sbin/smartctl --info /dev/disk/by-id/usb-Seagate_Expansion_NA82LGB6-0:0

And there is a non zero return code (rc = 2). So we bail. But notice the smartmontools report/error message:

We do support this ‘custom’ smartmontools option. Take a look at the following Howto section:
https://rockstor.com/docs/howtos/smart.html#disk-custom-s-m-a-r-t-options

So you could try this option. Incidentally it’s not uncommon for USB attached devices to need some custom smart options like that. Some have not support for SMART at all. It may be you can get some SMART function from this device with the (-T) permissive option so that it is less fussy about the missing scsi command. Could be worth a try.

Depending on the complexity of your setup, this is likely not that big a task. And given your system has basically been strained in the past (completely full system drive) you may a least get a nice clean setup again. And remember there is the pool import option, followed by a config restore (if you saved and also downloaded a config backup) if you end up going that route:

Import BTRFS Pool: Disks — Rockstor documentation
Configuration Backup and Restore: Configuration Backup and Restore — Rockstor documentation

And always good to know the route, and test, bare metal recovery.

All is not perfect on the config back-up and restore, but it serves many folks needs. And you can always add back by-hand any missing config. Again good to know it can be done.

Shouldn’t be required, but if your system has undergone say a partial update during it’s chocked up system drive that, in turn, further complicates things. So it’s good to not be too precious about the system drive and your setup/config. We currently have no redundancy option on the system drive so it’s basically one hardware failure away from toast anyway. I’d really like to have this improved in time but alas we sort of have our plate full currently. And we are also awaiting some upstream improvements regarding multi-device system drives.

So not of much help I’m afraid, as more info would be required. You could look further into the rockstor logs directly after a reboot; as that may have some pointers as to why stuff is not getting mounted.

If you have another candidate drive to serve as a future, or even temporary, system drive. Then you can try the whole process out with this ‘stand-in’ drive- But do-not leave the current and future system drives attached to the system at the same time. This won’t work. Rockstor gets confused if there are two versions of itself attached simultaneously.

So basically you could (by-passing config save and backup for simplicity):

Shutdown.
remove system drive.
disconnect all data drives.
connect new ‘stand-in’ system drive.
boot-up the installer.
install to the new ‘stand-in’ drive.
shutdown.
reconnect all data drives.
boot-up and import data drives.

You can then tell if this process is going to work. If it doesn’t you have the original system drive still detached and ready to drop-in replace the failed import attempt. But our import is actually pretty well field tested now.

A proper write up of a similar process (including config backup and restore) is covered in our following Howto:
Migrating from Legacy V3 to V4 “Built on openSUSE”: https://rockstor.com/docs/howtos/v3_to_v4.html

Not directly relevant (assuming you are already on V4) but still worth a ready as it’s basically a new install, import, config restore type arrangement that is require for folks to transit our v3 (CentOS base) to our V4 (“Built on openSUSE”).

I wonder if you data pool is actually a little poorly. Have you scrubbed it. Scrubbing is a btrfs process to test read every bit of stored data to prove it’s integrity. And if you have used a redundant btrfs raid profile, there is a good chance that any duff (corrupted) bit of data can be restored to health form it’s redundant copy.

So far less risk if you have, or can source, another system disk candidate. Also, as you have likely already realised, all system disk data will be gone with a re-install. But that data was either accidental (as in your report) or throw-away anyway given it’s non redundant backing.

Hope that helps at least a bit.

Sweepy · May 12, 2023, 4:02pm

@phillxnet I was able to get back to where I was before successfully. Thanks for the insight. I was able to successfully get all my data from the backup External drive I had. I still would like to set up the pool I had of hard drives, but I have a question about that that I can start in another thread. Thanks for your time and patience.