Can't start rock-ons

Jorma_Tuomainen · August 3, 2018, 8:26am

Hi,

Running newest stable (all updates applied) and I get following error:

Houston, we’ve had a problem.

Unknown internal error doing a POST to /api/sm/services/docker/start

empty traceback.

Jorma_Tuomainen · August 3, 2018, 9:12am

I did manage to start it with:
dockerd --log-driver=journald --storage-driver btrfs --storage-opt btrfs.min_space=1G --data-root /mnt2/rockstor_rockstor00/rockons

but don’t know why it stopped working after reboot (systemd start does not work).

Jorma_Tuomainen · August 3, 2018, 1:36pm

It seems that home00, rockons and root00 are unmounted, tried to enable quotas and reboot but no cigar. I have only ran stable updates…

Flox · August 3, 2018, 2:46pm

Sorry to read you’re having troubles.
Is this a fresh install in which this never worked, or has it been working before?

On top of my head, I would think that some helpful information may reside in the logs:

/opt/rockstor/var/log/rockstor.log

Does systemd give you some error information when you try to start the docker service?

Jorma_Tuomainen · August 3, 2018, 2:57pm

It’s the unmounted stuff that’s the reason, haven’t had time to figure out why those do not mount yet. Has worked before ok. Manually starting dockerd works, it’s just that the default /mnt2/rockons is not mounted. Rest of the shares work w/o problems.

Jorma_Tuomainen · August 4, 2018, 3:30pm

I get this error on the log:
ERROR [storageadmin.views.command:70] Skipping Pool (rockstor_rockstor00) mount as there are no attached devices. Moving on.

data is there and it does boot from that disk (nvme).

phillxnet · August 4, 2018, 4:19pm

@Jorma_Tuomainen Hello again.

OK, that’s good; looks like you are getting to the bottom of the cause for this one.

ie the system doesn’t recognise there to be any currently attached devices for that pool (ie all detached), and so doesn’t proceed to the share scan and mount associated with that pool (later on in that same file (command == ‘bootstrap’) as it thinks no disks are attached.

The simplest work around for now may just be to use another pool for your rockons share as the system pool has a few quirks associated with it that are in the process of being improved upon. But lets try some things first.

The code in question is:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/storageadmin/views/command.py#L57-L71


      
          )
          from nfs_exports import NFSExportMixin
          import logging
          
          logger = logging.getLogger(__name__)
          
          
          class CommandView(DiskMixin, NFSExportMixin, APIView):
              authentication_classes = (
                  DigestAuthentication,
                  SessionAuthentication,
                  BasicAuthentication,
                  RockstorOAuth2Authentication,
              )
              permission_classes = (IsAuthenticated,)

As such could you give us a screen grab of your Disks page and the Pools page.

This may be nvme related (Rockstor bug) or has come from Rockstor failing to adapt accordingly to system disk changes post install / first boot. Have their been any system disk changes that lead up to this rockstor_rockstor00 anomaly?

Hope that helps and lets see what those pages look like before we move on, it may be that there is a simple work around but we need to first see those pages. Probably best in the long run to not use the system share for data but then the rockons share is replaceable and I do the same on one of my machines here just to make better use of it’s ssd based system drive.

Jorma_Tuomainen · August 4, 2018, 6:50pm

Yep, well I now run dockerd manually:
dockerd --log-driver=journald --storage-driver btrfs --storage-opt btrfs.min_space=1G --data-root /mnt2/rockstor_rockstor00/rockons

(then just ctrl+z, bg, disown %1)

Jorma_Tuomainen · August 4, 2018, 6:51pm

And I had uptime of ~50 days before reboot so it’s some (update) that happened at that time.

phillxnet · August 4, 2018, 7:13pm

@Jorma_Tuomainen Thanks for sharing your current workaround.

Could you please edit this post so that the images are inline (ie not an external reference), you should be able to drag and drop the images into the forum web page.

Thanks. That way they are displayed in line within the forum web page and should be more stable going forward.

Also could you post your:

yum info rockstor

Just to make sure.

Jorma_Tuomainen · August 4, 2018, 7:20pm

Done and:

Installed Packages
Name : rockstor
Arch : x86_64
Version : 3.9.2
Release : 31
Size : 79 M
Repo : installed
From repo : Rockstor-Stable
Summary : RockStor – Store Smartly
License : GPL
Description : RockStor – Store Smartly

phillxnet · August 4, 2018, 7:26pm

@Jorma_Tuomainen Great, and thanks for the edit and version confirmation.

And can you confirm that the rockstor_rockstor00 pool is listed as having only a detached disk under the Disks column: it’s cut off in your second image.

Cheers.

Jorma_Tuomainen · August 4, 2018, 8:01pm

Yep, only that. It’s the nvme root-disk (which is not detached :))

phillxnet · August 5, 2018, 9:03am

@Jorma_Tuomainen

OK, great.

So my current suspicion is that this is my fault, sorry. I recently found and fixed a rather long term and elusive bug concerning Rockstor with 27 or more drives when the system drive was also named sda (plus a few other caveats). During the development of this fix there were a number of inadvertent side effects on disk recognition. For each I developed a test to ensure the ‘fixed’ code (drive recognition level) functioned as expected; and as it had done previously. The aim of course was to introduce no regressions and only add the fix. In the case of an nvme system drive I now suspect I missed the mark and there currently exists no testing, regression or otherwise, for this arrangement: which did work at least at some point as evidenced by yours and others reports. The change I suspect was for issue:

github.com/rockstor/rockstor-core

disk serial=none or fake-serial re sda[a-z] dev names

opened 08:59AM - 16 May 18 UTC

closed 12:16PM - 16 Jul 18 UTC

phillxnet

Thanks to forum member kingwavy in the following form thread for highlighting th…is bug. On initial inspection of the reported data, drive names of the form sdab, sdac, when the same system has a system drive of the more common sda# type name, are being incorrectly identified as system drives. This is currently thought to be the trigger for an incompatible serial=none response from scan_disks() that is then surfaced by scan_disks() caller _update_disk_state(): ``` File “/opt/rockstor/src/rockstor/storageadmin/views/disk.py”, line 342, in _update_disk_state if (re.match(‘fake-serial-’, do.serial) is not None) or ( File “/usr/lib64/python2.7/re.py”, line 137, in match return _compile(pattern, flags).match(string) TypeError: expected string or buffer ``` With kingwavy's assistance the incorrect labelling of root was identified via debug logging thus: ``` m = Disk(name='sdaf', model='PERC H710', serial='fake-serial-0b606a7b-c7fa-4cf0-9e7a-c8dcc70d4034', size=976748544, transport=None, vendor='DELL', hctl='0:2:0:0', type='disk', fstype='btrfs', label='SCRATCH', uuid='a90e6787-1c45-46d6-a2ba-41017a17c1d5', parted=False, root=True, partitions={}) [14/May/2018 13:21:21] DEBUG [system.osi:478] disks item = Disk(name='sdag', model=None, serial='fake-serial-2a01338a-d494-40ef-80d9-ba2888bfde5f', size=976748544, transport=None, vendor=None, hctl=None, type='disk', fstype='btrfs', label='SCRATCH', uuid='a90e6787-1c45-46d6-a2ba-41017a17c1d5', parted=False, root=True, partitions={}) [14/May/2018 13:21:21] DEBUG [system.osi:478] disks item = Disk(name='sdae', model=None, serial='fake-serial-288f98c5-e108-4ca4-b1e0-fd4a791e7ea5', size=234461593, transport=None, vendor=None, hctl=None, type='disk', fstype='btrfs', label='INTEL_SSD', uuid='a504bf03-0299-4648-8a95-c91aba291de8', parted=False, root=True, partitions={}) ``` ie multiple additional "root=True" attributions and a suspected consequent erroneous 'fake-serial-...' attribution: ie relevant lsblk: NAME="sdae" MODEL="INTEL SSDSC2KW24" SERIAL="CVLT6153072G240CGN" SIZE="223.6G" TRAN="sas" VENDOR="ATA " HCTL="1:0:17:0" TYPE="disk" FSTYPE="btrfs" LABEL="INTEL_SSD" UUID="a504bf03-0299-4648-8a95-c91aba291de8" However the system drive in this trigger system is still identified correctly: ``` [14/May/2018 13:21:21] DEBUG [system.osi:478] disks item = Disk(name='sda3', model='PERC H710', serial='6848f690e936450018b7c3a11330997b', size=277558067, transport=None, vendor='DELL', hctl='0:2:0:0', type='part', fstype='btrfs', label='rockstor_rockstor', uuid='7f7acdd7-493e-4bb5-b801-b7b7dc289535', parted=True, root=True, partitions={}) ``` It is noteworthy that the 'sdab' drive (presumably list order related) is the victim of _update_disk_state() bug with "serial=none": ``` [14/May/2018 13:21:21] DEBUG [system.osi:478] disks item = Disk(name='sdab', model=None, serial=None, size=976748544, transport=None, vendor=None, hctl=None, type='disk', fstype='btrfs', label='SCRATCH', uuid='a90e6787-1c45-46d6-a2ba-41017a17c1d5', parted=False, root=True, partitions={}) ``` when the serial is again plainly available (via default and initial lsblk call): ``` NAME="sdab" MODEL="ST91000640SS " SERIAL="5000c50063041947" SIZE="931.5G" TRAN="sas" VENDOR="SEAGATE " HCTL="1:0:14:0" TYPE="disk" FSTYPE="btrfs" LABEL="SCRATCH" UUID="a90e6787-1c45-46d6-a2ba-41017a17c1d5" ``` Also tangentially relevant is root_disk()'s inflexibility re matching only sda and not sdab type drive names, though this is not currently thought to be a trigger in this issue. Please reference / update the following forum thread with this issues progress / resolution: https://forum.rockstor.com/t/disk-scan-errors-expected-string-or-buffer/4783

and was fixed by the changes in pull request:

github.com/rockstor/rockstor-core

disk serial=none or fake-serial re sda[a-z] dev names. Fixes #1925

rockstor:master ← phillxnet:1925_disk_serial=none_or_fake-serial_re_sda(a-z)_dev_names

opened 06:32PM - 11 Jul 18 UTC

phillxnet

+1088 -18

On systems with 27 or more disks named sd* where the system/root is also install…ed on sda ie sda3 and where the 27th and subsequent disks are named sdaa, sdab etc and are also btrfs formatted; a regex based serial propagation bug (from base device to partition) in scan_disks() resulted in the first listed (by lsblk) sda[a-z] device receiving a serial=none attribution and the second and all subsequent listed sda[a-z] devices receiving an erroneous fake-serial attribution. All affected devices would also have empty (or incorrectly inherited from sda) model, transport, vendor, and hctl attributions. This resulted in an inability for the system to update the disk state as the abstracted disk info from the corrupt parsing of attached disk state caused an exception: "TypeError: expected string or buffer". Note that all affected disks (sda[a-z]) were also miss attributed as root=True in the erroneous abstract data produced by scan_disks() Summary 1. Add scan_disks() test mode to enable repeatable fake-serial- tests. 2. Establish unit tests to reproduce the above issue, as reported (36 disk system) and as a minimum artificial drive subset. 3. Establish unit tests to identify current correct behaviour for system disk on LUKS, mdraid, and bios mdraid; as well as btrfs in partition and data disk on bios mdraid behaviour. 4. Fix overly broad regex to identify system disk partitions, if any: ie improve system disk partition match mechanism. 5. Fix system disk LUKS, mdraid, bios mdraid regressions as a result of (4). 6. Address pre-existing TODO: re mdraid info via model info on non system disks; as part of the fix for (5). Fixes #1925 and, by way of duplication: Fixes #1834 ("disk serial is null") @schakrava Ready for review. Testing: The reported issue was reproduced via unit testing based on the reporter's 36 disk system. An abstract of the minimum drive count thought to be able to reproduce the issue was also produced. Further unit tests were constructed for regression testing during the fix development. ``` ./bin/test --settings=test-settings -v 3 -p test_osi* ``` with the relevant scan_disks() related results, post pr, as follows: ``` ... test_scan_disks_27_plus_disks_regression_issue (system.tests.test_osi.OSITests) ... ok test_scan_disks_btrfs_in_partition (system.tests.test_osi.OSITests) ... ok test_scan_disks_dell_perk_h710_md1220_36_disks (system.tests.test_osi.OSITests) ... ok test_scan_disks_intel_bios_raid_data_disk (system.tests.test_osi.OSITests) ... ok test_scan_disks_intel_bios_raid_sys_disk (system.tests.test_osi.OSITests) ... ok test_scan_disks_luks_on_bcache (system.tests.test_osi.OSITests) ... ok test_scan_disks_luks_sys_disk (system.tests.test_osi.OSITests) ... ok test_scan_disks_mdraid_sys_disk (system.tests.test_osi.OSITests) ... ok ---------------------------------------------------------------------- Ran 11 tests in 0.020s OK ``` The Intel bios raid related unit tests were developed using appropriately configured hardware. And all bcache related test data (LUKS on bcache) assumed the udev rules detailed at: https://forum.rockstor.com/t/bcache-developers-notes/2762 Thanks to forum member kingwavy for their cooperation and their contributing the report and the command outputs used to construct the initial unit test reproducer.

which was released in Rockstor stable channel version: 3.9.2-31 (20 days as of writing).

If you would indulge me a little more I would like to ask that you further provide what I hope will be enough info for me to create a test to reproduce your issue: missing/detached nvme system disk, ie it was there and after an update is no longer recognised as attached. This should help to avoid the same regression going forward and should also help with fixing what ever when wrong in the first place.

The procedure required on your end is a little cumbersome but given your supplied work around should be trivial for your. Essentially I require the same procedure and info as I requested from @kingwavy referenced in the indicated issue which in turn grew from the following forum thread:

More specifically in my 14th May post in that thread:

Repeating the request in this thread for ease and slightly modified for this instance:

Could you also post the output of the following commands:

btrfs fi show

and

ls -la /dev/disk/by-id/

and

lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID

and when the above command has a serial entry such as:

SERIAL=""

we resource udev via:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L424-L428


      
          # that is a member of eg md125 has
          # FSTYPE="linux_raid_member"
          # Add the same treatment for partitions hosting LUKS
          # containers.
          if dmap["FSTYPE"] == "linux_raid_member" and (

get_disk_serial location:

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L1022-L1027


      
              """
              disk_byid_withpath = get_device_path(disk_byid)
              return run_command([WIPEFS, "-a", disk_byid_withpath])
          
          
          def blink_disk(disk_byid, total_exec, read, sleep):

which in turn parses the following command. If you could execute this command on one of the drives that is showing the above empty serial (if any) it may help track this bug down:

udevadm info --name=devname-here

Also if you could remove the “#” and following space in front of the following line on your installed version located here:

/opt/rockstor/src/rockstor/system/osi.py

github.com

rockstor/rockstor-core/blob/master/src/rockstor/system/osi.py#L494


      
                      # work with and it current filesystem type.
                      # Has one role per device limit but helps to keep usability
                      # and underlying disk management simpler.
          else:
              # We are not a partition so record this.
              dmap["parted"] = False
              # As we are not a partition it is assumed that we might hold a
              # partition so start an empty partition dictionary for this.
              # N.B. This assumes base devices are listed before their partitions
              dmap["partitions"] = {}
              # This dict will be populated when we find our partitions and back
              # port their names and fstype (as values).
          if (not is_root_disk and not is_partition) or is_btrfs:
              # We have a non system disk that is not a partition
              # or
              # We have a device that is btrfs formatted
              # Or we may just be a non system disk without partitions.
              dmap["root"] = is_root_disk
              if is_btrfs:
                  # a btrfs file system
                  # Regex to identify a partition on the base_root_disk.

and then enable debug logging via:

/opt/rockstor/bin/debug-mode on

Then either reboot or restart the rockstor service via:

systemctl restart rockstor

We should then be able to see in your logs what scan_disks() is passing to _update_disk_state() and confirm or otherwise my suspicion that the nvme disk is just not being parsed correctly by scan_disks(). Or at least narrow down where the problem originates. Look in the main rockstor log for debug stuff, either via the UI component in System - Log Manager, or in:

/opt/rockstor/var/log/rockstor.log

So given this is suspected as a recent regression it would be good to get this one sorted as soon as possible and have some tests so that the same doesn’t happen again.

Sorry to have to ask all of this but it would be invaluable to get this output so that we can be assured that your particular instance is catered for as it is currently the only reproducer we have had reported.

Once we narrow down what is causing this bug I will open an issue with the relevant details and a fix can then be logged against that issue.

Thanks again for your help with this one and for helping to support Rockstor development via a stable subscription.

Jorma_Tuomainen · August 5, 2018, 1:45pm

Here:

[root@rockstor ~]# btrfs fi show
Label: ‘rockstor_rockstor00’ uuid: 4a05477f-cd4a-4614-b264-d029d98928ab
Total devices 1 FS bytes used 7.46GiB
devid 1 size 110.81GiB used 52.02GiB path /dev/nvme0n1p4

Label: ‘Data’ uuid: d2f76ce6-85fd-4615-b4f8-77e1b6a69c60
Total devices 2 FS bytes used 6.56TiB
devid 1 size 9.10TiB used 6.58TiB path /dev/sdb
devid 2 size 9.10TiB used 6.58TiB path /dev/sda

[root@rockstor ~]# ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 320 Aug 4 18:56 .
drwxr-xr-x 8 root root 160 Aug 4 18:56 …
lrwxrwxrwx 1 root root 9 Aug 4 18:56 ata-WDC_WD100EFAX-68LHPN0_7PKNDX1C -> …/…/sdb
lrwxrwxrwx 1 root root 9 Aug 4 18:56 ata-WDC_WD100EFAX-68LHPN0_7PKP0MNC -> …/…/sda
lrwxrwxrwx 1 root root 13 Aug 4 18:56 nvme-eui.0000000001000000e4d25c19f1e04d01 -> …/…/nvme0n1
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-eui.0000000001000000e4d25c19f1e04d01-part1 -> …/…/nvme0n1p1
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-eui.0000000001000000e4d25c19f1e04d01-part2 -> …/…/nvme0n1p2
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-eui.0000000001000000e4d25c19f1e04d01-part3 -> …/…/nvme0n1p3
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-eui.0000000001000000e4d25c19f1e04d01-part4 -> …/…/nvme0n1p4
lrwxrwxrwx 1 root root 13 Aug 4 18:56 nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A -> …/…/nvme0n1
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part1 -> …/…/nvme0n1p1
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part2 -> …/…/nvme0n1p2
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part3 -> …/…/nvme0n1p3
lrwxrwxrwx 1 root root 15 Aug 4 18:56 nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part4 -> …/…/nvme0n1p4
lrwxrwxrwx 1 root root 9 Aug 4 18:56 wwn-0x5000cca251f38e60 -> …/…/sdb
lrwxrwxrwx 1 root root 9 Aug 4 18:56 wwn-0x5000cca251f3d4af -> …/…/sda

[root@rockstor ~]# lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID
NAME=“sdb” MODEL=“WDC WD100EFAX-68” SERIAL=“7PKNDX1C” SIZE=“9.1T” TRAN=“sata” VENDOR=“ATA " HCTL=“1:0:0:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“Data” UUID=“d2f76ce6-85fd-4615-b4f8-77e1b6a69c60”
NAME=“sda” MODEL=“WDC WD100EFAX-68” SERIAL=“7PKP0MNC” SIZE=“9.1T” TRAN=“sata” VENDOR=“ATA " HCTL=“0:0:0:0” TYPE=“disk” FSTYPE=“btrfs” LABEL=“Data” UUID=“d2f76ce6-85fd-4615-b4f8-77e1b6a69c60”
NAME=“nvme0n1” MODEL=“INTEL SSDPEKKW128G7 " SERIAL=“BTPY72910KCW128A” SIZE=“119.2G” TRAN=”” VENDOR=”" HCTL="" TYPE=“disk” FSTYPE="" LABEL="" UUID=""
NAME=“nvme0n1p3” MODEL="" SERIAL="" SIZE=“7.8G” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“swap” LABEL="" UUID=“d33115d8-3d8c-4f65-b560-8ebf72d08fbc”
NAME=“nvme0n1p1” MODEL="" SERIAL="" SIZE=“200M” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“vfat” LABEL="" UUID=“53DC-1323”
NAME=“nvme0n1p4” MODEL="" SERIAL="" SIZE=“110.8G” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“btrfs” LABEL=“rockstor_rockstor00” UUID=“4a05477f-cd4a-4614-b264-d029d98928ab”
NAME=“nvme0n1p2” MODEL="" SERIAL="" SIZE=“500M” TRAN="" VENDOR="" HCTL="" TYPE=“part” FSTYPE=“ext4” LABEL="" UUID=“497a9eda-a655-4fc4-bad8-2d9aa8661980”

[root@rockstor ~]# udevadm info --name=nvme0n1p3
P: /devices/pci0000:00/0000:00:1c.4/0000:03:00.0/nvme/nvme0/nvme0n1/nvme0n1p3
N: nvme0n1p3
S: disk/by-id/nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part3
S: disk/by-id/nvme-eui.0000000001000000e4d25c19f1e04d01-part3
S: disk/by-partuuid/e1a514c1-5b8c-41af-980a-5aa8a35ecef9
S: disk/by-path/pci-0000:03:00.0-nvme-1-part3
S: disk/by-uuid/d33115d8-3d8c-4f65-b560-8ebf72d08fbc
E: DEVLINKS=/dev/disk/by-id/nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part3 /dev/disk/by-id/nvme-eui.0000000001000000e4d25c19f1e04d01-part3 /dev/disk/by-partuuid/e1a514c1-5b8c-41af-980a-5aa8a35ecef9 /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part3 /dev/disk/by-uuid/d33115d8-3d8c-4f65-b560-8ebf72d08fbc
E: DEVNAME=/dev/nvme0n1p3
E: DEVPATH=/devices/pci0000:00/0000:00:1c.4/0000:03:00.0/nvme/nvme0/nvme0n1/nvme0n1p3
E: DEVTYPE=partition
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=d33115d8-3d8c-4f65-b560-8ebf72d08fbc
E: ID_FS_UUID_ENC=d33115d8-3d8c-4f65-b560-8ebf72d08fbc
E: ID_FS_VERSION=2
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_NUMBER=3
E: ID_PART_ENTRY_OFFSET=1435648
E: ID_PART_ENTRY_SCHEME=gpt
E: ID_PART_ENTRY_SIZE=16252928
E: ID_PART_ENTRY_TYPE=0657fd6d-a4ab-43c4-84e5-0933c84b4f4f
E: ID_PART_ENTRY_UUID=e1a514c1-5b8c-41af-980a-5aa8a35ecef9
E: ID_PART_TABLE_TYPE=gpt
E: ID_PATH=pci-0000:03:00.0-nvme-1
E: ID_PATH_TAG=pci-0000_03_00_0-nvme-1
E: ID_SERIAL=INTEL SSDPEKKW128G7_BTPY72910KCW128A
E: ID_SERIAL_SHORT=BTPY72910KCW128A
E: ID_WWN=eui.0000000001000000e4d25c19f1e04d01
E: MAJOR=259
E: MINOR=3
E: PARTN=3
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=72618

Nothing on the logs about the nvme disk, only sda and sdb (hdds).

I’ll try reboot later.

phillxnet · August 5, 2018, 2:53pm

@Jorma_Tuomainen Thanks for these.

And given the rockstor_rockstor00 pool (vol) is on p4 (p3 is the swap partition which we exclude by another mechanism):

Could I also have:

udevadm info --name=nvme0n1p4

for the no serial entry of serial=“”:

The stuff that I need will only show up after the edit I requested on that source file and then only after the debug mode is enabled and the rockstor service restarted or as you say a reboot. But both the edit (un remarking that debug logging line) and the service restart (by however) are required for the log entries to start appearing and they should appear directly after every Disk, Pool, or Shares page refresh. And on every boot.

They should look something like:

disks item = Disk(name='sda3', model='PERC H710', serial='6848f690e936450018b7c3a11330997b', size=277558067, transport=None, vendor='DELL', hctl='0:2:0:0', type='part', fstype='btrfs', label='rockstor_rockstor', uuid='7f7acdd7-493e-4bb5-b801-b7b7dc289535', parted=True, root=True, partitions={})

except relevant to your own disks and there should be one for each ‘relevant’ devcie.

If you could paste the complete list generated after each Disk page refresh that should do it. I’m expecting that there is in fact only sda and sdb entries but it is best to paste the exact output so I can create a test to reproduce and start working on a fix. Plus I need absolute confirmation that the nvme partition is not presented along with your data drives to definitely pin down the location of the bug.

Thanks for you time on this one.

Bit by bit.

Jorma_Tuomainen · August 5, 2018, 4:06pm

[root@rockstor ~]# udevadm info --name=nvme0n1p4
P: /devices/pci0000:00/0000:00:1c.4/0000:03:00.0/nvme/nvme0/nvme0n1/nvme0n1p4
N: nvme0n1p4
S: disk/by-id/nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part4
S: disk/by-id/nvme-eui.0000000001000000e4d25c19f1e04d01-part4
S: disk/by-label/rockstor_rockstor00
S: disk/by-partuuid/87e031b7-6353-44af-bbc7-b674d8a8c85e
S: disk/by-path/pci-0000:03:00.0-nvme-1-part4
S: disk/by-uuid/4a05477f-cd4a-4614-b264-d029d98928ab
E: DEVLINKS=/dev/disk/by-id/nvme-INTEL_SSDPEKKW128G7_BTPY72910KCW128A-part4 /dev/disk/by-id/nvme-eui.0000000001000000e4d25c19f1e04d01-part4 /dev/disk/by-label/rockstor_rockstor00 /dev/disk/by-partuuid/87e031b7-6353-44af-bbc7-b674d8a8c85e /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part4 /dev/disk/by-uuid/4a05477f-cd4a-4614-b264-d029d98928ab
E: DEVNAME=/dev/nvme0n1p4
E: DEVPATH=/devices/pci0000:00/0000:00:1c.4/0000:03:00.0/nvme/nvme0/nvme0n1/nvme0n1p4
E: DEVTYPE=partition
E: ID_FS_LABEL=rockstor_rockstor00
E: ID_FS_LABEL_ENC=rockstor_rockstor00
E: ID_FS_TYPE=btrfs
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=4a05477f-cd4a-4614-b264-d029d98928ab
E: ID_FS_UUID_ENC=4a05477f-cd4a-4614-b264-d029d98928ab
E: ID_FS_UUID_SUB=05b3b2a3-25c3-4593-b922-2e3180122b52
E: ID_FS_UUID_SUB_ENC=05b3b2a3-25c3-4593-b922-2e3180122b52
E: ID_PART_ENTRY_DISK=259:0
E: ID_PART_ENTRY_NUMBER=4
E: ID_PART_ENTRY_OFFSET=17688576
E: ID_PART_ENTRY_SCHEME=gpt
E: ID_PART_ENTRY_SIZE=232380416
E: ID_PART_ENTRY_TYPE=ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
E: ID_PART_ENTRY_UUID=87e031b7-6353-44af-bbc7-b674d8a8c85e
E: ID_PART_TABLE_TYPE=gpt
E: ID_PATH=pci-0000:03:00.0-nvme-1
E: ID_PATH_TAG=pci-0000_03_00_0-nvme-1
E: ID_SERIAL=INTEL SSDPEKKW128G7_BTPY72910KCW128A
E: ID_SERIAL_SHORT=BTPY72910KCW128A
E: ID_WWN=eui.0000000001000000e4d25c19f1e04d01
E: MAJOR=259
E: MINOR=4
E: PARTN=4
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=72776

I did that but without reboot (only with code change, enabling logging and restarting rockstor) I get only:
DEBUG [system.osi:494] disks item = Disk(name=‘sda’, model=‘WDC WD100EFAX-68’, serial=‘7PKP0MNC’, size=9771050598, transport=‘sata’, vendor=‘ATA’, hctl=‘0:0:0:0’, type=‘disk’, fstype=‘btrfs’, label=‘Data’, uuid=‘d2f76ce6-85fd-4615-b4f8-77e1b6a69c60’, parted=False, root=False, partitions={})
DEBUG [system.osi:494] disks item = Disk(name=‘sdb’, model=‘WDC WD100EFAX-68’, serial=‘7PKNDX1C’, size=9771050598, transport=‘sata’, vendor=‘ATA’, hctl=‘1:0:0:0’, type=‘disk’, fstype=‘btrfs’, label=‘Data’, uuid=‘d2f76ce6-85fd-4615-b4f8-77e1b6a69c60’, parted=False, root=False, partitions={})

nothing about nvme disk, only the:
ERROR [storageadmin.views.command:70] Skipping Pool (rockstor_rockstor00) mount as there are no attached devices. Moving on.

phillxnet · August 5, 2018, 4:48pm

@Jorma_Tuomainen This is great work, thanks.

I should now have enough to get under way with this issue but have two more issues (currently) pending in my queue.

Now very much looks like the issue is with scan_disks() failing to identify the nvme as the system disk, hence the:

and the higher order code segments consider it ‘detached’ as a result: since it was once known to them.

I will, once I’ve started, create an issue detailing my initial findings and link back to this forum thread. I shall then update this thread upon the issues resolution.

It would be great if you could hang in there just a little longer while I sort this one out as a ‘real life’ confirmation of the fix at your end would be pukka.

Thanks again and I hope to get on to this issue as time allows: unless of course someone beats me to it that is.

Jorma_Tuomainen · August 5, 2018, 7:05pm

Not in that big of a hurry since I have a workaround for docker

phillxnet · August 5, 2018, 7:18pm

@Jorma_Tuomainen OK and cheers.

I’d like to get it sorted sooner rather than later though as it’s a regression and the associated code / tests are still fairly fresh in my head so I’ll see how it goes.

As always many things to juggle.