Issues: Can't wipe some drives & No SMART diagnostics & Disk Serial Numbers not legitamate

Thanks @phillxnet! I’ve just sent the logs to support. Let’s see what happens.

Peter.

@priel71 OK, I’ve had a quick look into your serial issue and although it is possible to add a serial number to an md device by the indicated udev rule it isn’t picked up by by the current lsblk that we use. However we have in place a fail over system that uses udev and it does pickup the udev applied serial number but then due to the multiple occurrences in the lsblk output of an md device our systems then disregards the serial and substitutes a fake one which in turn leads to your observed Warning, even with a successfully applied (via udev) and retrieved (via udev) serial number for the md device. I have only proved this behaviour with a non system disk md but suspect it also applies to the system disk.

So short of it is that this is not a scenario we have yet accounted for. I have opened a GitHub issue to track progress on this one but given it’s not that common (but a nice idea) I’m not too hopeful for a quick fix. I think the devs also need to discuss which way we want to go on this one, ie what if any md arrangements we can support.

If we were able to account for an md based system drive install for example it may only be in a limited form to keep things simple and predictable. Could you explain what your mirror arrangement is on your system drive ie I am assuming you have created an md0 out of 2 devices and mirrored them, then installed all 3 system drive partitions on this one md0 device. Is that correct?

Do you also use the grub flag that allows for booting the md device degraded?

Thanks and sorry I couldn’t sort this one for you, at least for now.

Hi @phillxnet, thanks for your continuous help looking into this. I was actually quite surprised that Rockstor decided to create a mda from the hardware raid built into the Supermicro MB. I assumed that it would just see it as a single disk, since I set it up in the intel raid manager at boot time to just mirror the two disks that it could see.

I can’t really give you anymore info than that, it all happened during the install. I can send you some diagnostic / info files if you like… output from lspci, fstab etc…?

Peter.

@priel71 Thanks for you’re offer to help and I’m afraid I was a little overly wordy on the last post but had just been rooting around in the code to find the strange behaviour I was seeing with the multi device I had setup to test the proposed fix for sorting you serial problem. In the process I had neglected the detail you specified that this was a hardware raid, however it may well fall foul of the issue / bug I found for linux software raid devices that also show up as md (multi device) names. So if you are game it would be very useful / helpful if you could paste the output of the following commands executed on you Rockstor as root.

/usr/bin/lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID

and

udevadm info --name=md0 | grep ID_SERIAL

but in this last command if you could replace the md0 with your system drive’s device name. It should then return up to two lines containing the serial number of this device, if any.

Also if you could send a screen grab of the Storage - Disks page that would be great.

Sorry to be so demanding but I think there is an opportunity here to root out a possible bug that is in an area I’ve recently visited so just trying to sort it while I’m fresh in that area. Images can be attached to posts via the “Upload” button.

Thanks and all in your own time of course, no real rush.

Hi @phillxnet, not demanding at all. I’m happy that someone is taking time to just look into it!
I ran the above commands, and also a fdisk -l just so you have an idea of the disks attached.
I can’t do the screenshots right now, but will send them to you tmrw once I’m back in the office.

And to retort… no rush, enjoy your xmas first! :smile:

Peter.

Ps. Just tried uploading a zip file with the output txt files, but forum only accepts images. So forgive the log paste below…

udevadm output:

P: /devices/virtual/block/md126 N: md126 L: 100 S: disk/by-id/md-uuid-2cc19c3c:3490fbf2:68b26e35:4bbd68cf S: md/System E: DEVLINKS=/dev/disk/by-id/md-uuid-2cc19c3c:3490fbf2:68b26e35:4bbd68cf /dev/md/System E: DEVNAME=/dev/md126 E: DEVPATH=/devices/virtual/block/md126 E: DEVTYPE=disk E: ID_PART_TABLE_TYPE=dos E: MAJOR=9 E: MD_CONTAINER=/dev/md/imsm0 E: MD_DEVICES=2 E: MD_DEVICE_sdm_DEV=/dev/sdm E: MD_DEVICE_sdm_ROLE=0 E: MD_DEVICE_sdn_DEV=/dev/sdn E: MD_DEVICE_sdn_ROLE=1 E: MD_DEVNAME=System E: MD_LEVEL=raid1 E: MD_MEMBER=0 E: MD_MON_THIS=../md127 E: MD_UUID=2cc19c3c:3490fbf2:68b26e35:4bbd68cf E: MINOR=126 E: SUBSYSTEM=block E: SYSTEMD_WANTS=mdmonitor.service mdmon@md127.service E: TAGS=:systemd: E: USEC_INITIALIZED=69792

lsblk output:

NAME="sda" MODEL="ST2000NM0023 " SERIAL="5000c50056565637" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:0:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdb" MODEL="ST2000NM0023 " SERIAL="5000c5005663af0f" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:1:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdc" MODEL="ST2000NM0023 " SERIAL="5000c500569db947" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:2:0" TYPE="disk" FSTYPE="" LABEL="" UUID="" NAME="sdd" MODEL="ST2000NM0023 " SERIAL="5000c500565cfaeb" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:3:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sde" MODEL="ST2000NM0023 " SERIAL="5000c5005663bca7" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:4:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdf" MODEL="ST2000NM0023 " SERIAL="5000c50056639fa7" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:5:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdg" MODEL="ST2000NM0023 " SERIAL="5000c5005663b263" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:6:0" TYPE="disk" FSTYPE="" LABEL="" UUID="" NAME="sdh" MODEL="ST2000NM0023 " SERIAL="5000c500569f1f5f" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:7:0" TYPE="disk" FSTYPE="" LABEL="" UUID="" NAME="sdi" MODEL="ST2000NM0023 " SERIAL="5000c5005663c1db" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:8:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdj" MODEL="ST2000NM0023 " SERIAL="5000c50056639d37" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:9:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdk" MODEL="ST2000NM0023 " SERIAL="5000c5005663b527" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:10:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdl" MODEL="ST2000NM0023 " SERIAL="5000c5005663ac63" SIZE="1.8T" TRAN="sas" VENDOR="SEAGATE " HCTL="0:0:11:0" TYPE="disk" FSTYPE="btrfs" LABEL="RAID" UUID="b217cdf3-1390-487b-982b-136272978c51" NAME="sdm" MODEL="ST9250610NS " SERIAL="9XE0CT40" SIZE="232.9G" TRAN="sata" VENDOR="ATA " HCTL="2:0:0:0" TYPE="disk" FSTYPE="isw_raid_member" LABEL="" UUID="" NAME="md126" MODEL="" SERIAL="" SIZE="221.2G" TRAN="" VENDOR="" HCTL="" TYPE="raid1" FSTYPE="" LABEL="" UUID="" NAME="md126p1" MODEL="" SERIAL="" SIZE="500M" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="ext4" LABEL="" UUID="17ec29b4-9284-4cb9-a0dc-10ae8c7cf720" NAME="md126p2" MODEL="" SERIAL="" SIZE="4G" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="swap" LABEL="" UUID="6e15f5e8-5504-4950-9b94-a7da76ec6a95" NAME="md126p3" MODEL="" SERIAL="" SIZE="216.7G" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="btrfs" LABEL="rockstor_system" UUID="4b445c9e-e130-42fa-87c5-6116db4dc662" NAME="sdn" MODEL="ST9250610NS " SERIAL="9XE0CTBG" SIZE="232.9G" TRAN="sata" VENDOR="ATA " HCTL="3:0:0:0" TYPE="disk" FSTYPE="isw_raid_member" LABEL="" UUID="" NAME="md126" MODEL="" SERIAL="" SIZE="221.2G" TRAN="" VENDOR="" HCTL="" TYPE="raid1" FSTYPE="" LABEL="" UUID="" NAME="md126p1" MODEL="" SERIAL="" SIZE="500M" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="ext4" LABEL="" UUID="17ec29b4-9284-4cb9-a0dc-10ae8c7cf720" NAME="md126p2" MODEL="" SERIAL="" SIZE="4G" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="swap" LABEL="" UUID="6e15f5e8-5504-4950-9b94-a7da76ec6a95" NAME="md126p3" MODEL="" SERIAL="" SIZE="216.7G" TRAN="" VENDOR="" HCTL="" TYPE="md" FSTYPE="btrfs" LABEL="rockstor_system" UUID="4b445c9e-e130-42fa-87c5-6116db4dc662"

fdisk output
`
Disk /dev/sdm: 250.1 GB, 250059350016 bytes, 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000bc92c

Device Boot Start End Blocks Id System
/dev/sdm1 * 2048 1026047 512000 83 Linux
/dev/sdm2 1026048 9414655 4194304 82 Linux swap / Solaris
/dev/sdm3 9414656 463888383 227236864 83 Linux

Disk /dev/sdn: 250.1 GB, 250059350016 bytes, 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000bc92c

Device Boot Start End Blocks Id System
/dev/sdn1 * 2048 1026047 512000 83 Linux
/dev/sdn2 1026048 9414655 4194304 82 Linux swap / Solaris
/dev/sdn3 9414656 463888383 227236864 83 Linux

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdh: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdl: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdi: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/sdk: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes

Disk /dev/md126: 237.5 GB, 237510852608 bytes, 463888384 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000bc92c

  Device Boot      Start         End      Blocks   Id  System

/dev/md126p1 * 2048 1026047 512000 83 Linux
/dev/md126p2 1026048 9414655 4194304 82 Linux swap / Solaris
/dev/md126p3 9414656 463888383 227236864 83 Linux`

@priel71 Thanks for the quick response and that info is perfect (with bonus fdisk info thrown in). Cheers. As I suspected the lsblk output does contain multiple duplicate lines for the md device as with the software raid I tried here. So once we have a fix for the issue opened from this thread to deal better with that then I think the udev rule to add a serial number should work for you. At your leisure for the screenshot; thanks.

That’s the hope anyway.

And here’s the screenshot. Cheers!

Peter.

@priel71 Just to let you know that the existing smart issue “no SMART info on MSA70” that I originally linked to has now been addressed in the testing channel updates version 3.8-10.12 thanks to your logs / smart report submission earlier in this thread. I am hopeful that this will addresses at least some of your above reported smart issues once these fixes are included in the next stable release.

Also the issue opened as a result of this thread ie “enhance scan_disks to handle md block devices” has also been addressed a little earlier in the testing channel at version 3.8-10.10. Again this should, if all is well, end up in the next stable release also. As a consequence I am hoping that after the next stable release update (or in the testing channel as of 3.8-10.12 onwards) you will be able to assign via the udev method described above a serial to your md device and have Rockstor not complain about the serial number of that device. Your hardware arrangement is a little different than that used in the issue but it may just be that you have to assign a unique serial to each of the partitions also, ie one for each of md126, md126p2, and md126p3. We can cross that bridge once the next stable release is out and you are able to trial the fixes as they currently stand.

Thanks for your assistance and patience concerning these issues and any feedback you have on these fixes when the time comes would be much appreciated.

Cheers.

1 Like

Hi @phillxnet, happy new year! That’s great news, I might pull down the testing branch and try it out on a identical server later today. If you need any more feedback and logs etc, I do have that machine to muck around with as it’s not gone into the production network yet.

Peter.

@priel71 Thanks, and happy new year to you too. Having two identical machines subscribed to each of the testing and stable channel updates is obviously ideal so that any issues can be caught and fixes tested prior to their release to the stable channel but does mean one machine will almost always be ahead of the other which is not something that is necessarily practical. If you can trial the testing channel on an identical machine then great and let us know how it goes. It would be best if you can make the same decisions / selections in the installer also to rule out any other install differences. Remember you will still have to use the udev rule trick to ascribe serial numbers to the md devices though.

As for logs there was something that couldn’t be verified as working which was the parsing of the “Error Logs” and the “Self-Test Logs” as the drive detailed in the smartctl logs you sent had no errors and never undergone, or had no record of, any self tests. However it was assumed that the format of this output would be the same as is currently parsed correctly so I expect this to be fine. But I would keep an eye on the improved “Raw S.M.A.R.T error log” section inside the “Error Logs” tab as that may be the only place that this controller / drive combo actually reports errors.

Obviously if you do find one of the drives is reporting errors then please let us know if it is formatted / represented OK in the relevant tabs and if not then we can have another go at improving that once we get a file dump of what the output looks like. There are obviously still improvements to be made but bit by bit.

Thanks for offering to help still further. Your assistance is greatly appreciated.

@priel71 Another update that relates back to your posted info. As a spin off from another issue I found that I could reproduce an install very similar to your own ie a bios raid1 system disk pair. My resulting disks screen was also akin to your own, without the rather marvellous 12 strong set of data disks though unfortunately (screen grab).


The sda and sdb entries here are the raid members that the md126 devices are made from, you had equivalent entries only they were off screen in your screen grab.
This is essentially Rockstor not being able to interpret this kind of bios raid install.

So with reference to:-

And having now reached the next stable release and into the new testing channel we are at that bridge and have been able to chip a little more off your displayed issues (building on top of the issues already mentioned):

As of 3.8-11.09 (testing channel updates)

Is no longer necessary as via improve bios raid handling on system disk your setup should now sort out it’s own serial numbers from the md devices and display things as intended (almost).

On my bios raid1 system disk test setup here the Disks page now looks like this (screen grab):-
Note the raid member count, serial numbers of member disk[with raid index] and raid level indicated in the Model column of the system md partition.

I’m afraid we still have the two raid members inappropriately flagged with the cogs offering to wipe them but bit by bit and this problem is understood so in time it should also get addressed.

So assuming you have not added the udev rules suggested earlier you should now not need to from 3.8-11.09 onwards.

Hope that helps.

Hi @phillxnet, I recently did install Rockstor on a identical server, however I did not choose the testing channel as I needed to run some real world production tests on it… so didn’t want to add any beta issues in the mix. I’m adding infiniband support on these servers so it’s enough of a headache getting that to work!

This is how my disks look on this server (sorry, missed 2 disks which were on page2). It was a clean install of 3.8-11, and SMART seems to be working better, however as you mentioned above, the MDRAID stuff is still detected in strange way. Here’s the weird part tho… if you look at the below screenshot, you see 3 disks that aren’t part of any pool (sde,sdg & sdm). I know I selected them all when I set it up the initial pool, and after I first detected this… I added them again to the pool via the resize function… but, same thing happened, was back to 9 drives after awhile. I’ve just added them again, and will check periodically and see if they stick to the designated pool.

Peter.

Here is page2.

Peter

Btw, didn’t take long. The 3 disks I just added to the pool, just decided to leave again… so back to 9.

Ok, now after a reboot they are back in the pool. Will keep an eye on it…

@priel71 Another quick update re bios raid on system disk installs such as yours.
Regarding my previous comment:-

As of 3.8-11.11 testing channel release the “wipe me” cogs no longer appear on mdraid members (screen grab).

I know you are currently on stable but as the current testing channel is now 25 days or so old the next stable updates release is pending. Just a heads up that these issues are getting addressed and that they should hopefully fall into place on your stable channel systems with the next release.

How are the 3 reluctant pool members doing, have they stayed put after that last reboot?

1 Like

Hi @phillxnet! I’m sorry for such a delayed response. I was just doing some sysadmin on these servers, and I have attached a screen grab of how it looks currently. It’s fully updated with the latest release.

Hope that helps… Haven’t had any issue though with the drives, having some other errors but I will post that in another thread.

Cheers,

Peter.

@priel71 That’s great thank, good to see those mdraid members without their ‘delete me’ cogs these days.

Nice.

Funny tho… A completely identical server shows this in the disk layout:

@priel71 You can safely delete the “detached-” drives, although keep in mind we still have the bug I created and haven’t yet fixed of first having to “Rescan” just prior to removing (via bin icon) a detached drive.

The difference may simply be that those drives were once present and have yet to be removed. Also the serials are odd, could they be left over from some manual udev additions or something. Remember in the early days we had to assign serials for md devs via manual udev rules.

Anyway see how binning those drives works out, “detached-” naming is used if there is no evidence from a low level disk scan of anything matching that devices db stored serial number attached. I.e once a device with a serial of “md126p2” etc was attached and is no longer found to be attached.

Hope that helps.

This server is going down for maintenance soon and once I’ve moved the data off I’ll give it a shot. :wink:

Thanks!

1 Like