It seems like in some cases Linux (as shown by lsblk -o+SERIAL) doesn’t actually report the drive’s serial number, but reports the enclosure’s serial number instead. This means that in some cases with usb attached drive enclosures (I’m using startech model sdock2u313 for example ) we see drives with duplicated serial numbers:
# lsblk -o+SERIAL
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS SERIAL
...
sdb 8:16 0 931.5G 0 disk BF0987654321
sdc 8:32 0 931.5G 0 disk BF0987654321
sdd 8:48 0 931.5G 0 disk CE0987654321
sde 8:64 0 931.5G 0 disk CE0987654321
sdf 8:80 0 931.5G 0 disk 447987654321
sdg 8:96 0 931.5G 0 disk 447987654321
This upsets rockstor greatly. “Warning! Disk unusable as pool member - serial number is not legitimate or unique.” and a disk becomes unusable. I appreciate the need for something unique for drives, but I don’t think the serial number field is the right place to look for that: The Truth About USB Device Serial Numbers | SANS
My testing shows that WWN (world wide name, World Wide Name - Wikipedia ) is probably the identifier rockstor wants to use (lsblk -o+SERIAL,WWN). It passes through my docks from the actual HDD and shows up correctly in the OS.
Is there any reason why rockstor couldn’t switch to using WWN instead of SERIAL to allow for the use of usb attached drive enclosures?
@greyltc Welcome the the Rockstor community forum:
Re:
The reasoning and references we used at the time this decision was made are all laid out here:
Take a look at that doc and see what you think. Part of our problem is actually retrieving the serial numbers, and we may just have to switch to other mechanisms for retrieving them, rather than throwing them out. We used a RedHat guide at the time as our canonical references. Note that we use udev to retrieve our serials and all hardware must have serials - but some hardware just obfuscates these. Likely some hardware will similarly obfoscate WWN names similarly. It was a choice between the two as I rember and and a transition from one to the other would likely be entirely doable - but we must take great care here. We don’t want to throw one set of hardware hacks out for another. Also note that your reference is to USB devices. The USB bus has come up a number of times on the btrfs mailing list as an unreliable medium for storage: especially for multi device setups as the bus has a tendency to drop unrelated devices when a single instance fails: far less common on the storage orientated busses.
But stitching to wwn names is entirely doable I think, but may not be the cure-all in all situations: I think in some setting we actually fall back to this. And yes there are many enclosure (which we mark as unusable when found) that share this obfoscation of serials. But a drive serial is a hardware based unique identifier: if an enclosure (some USB enclosure only to date) obscures this it is not appropriate: it looses hardware info and is thus on the path to hardware raid of sorts.
In short:
We have had zero issues reported that are not USB enclosure related (mostly multi-device).
USB is not an advised storage bus.
If you take a look around the forum at previous posts on this issue - there has been suggestion that hdparm or smartmontools can retrieve serials where udev fails. That would be a far less invasive change for us.
It would be a great help if you could test your setup and your enclosure for these tools ability to retrieve the hosted drives serials as that is the root of the problem here: udev not retrieving serials when we need them. But the switch-out is not quite as simple as it seems as we also depend upon udev presenting names accordingly in such setups as partition arrangements etc. But we do have the option to extend udev accordingly via appropriate rules for instance.
Mainly we have to take great care with such low level changes: device serials, has served us for many years now in the vast majority of cases. We don’t want to throw the baby out with the bath water here for a few poorly behaved and ill advised bus devices. Convenience is not competency. So we advise folks to use less generic busses and once that have a proven competency in storage. But on this note we have had issues retrieving nvme serials I believe - but again that was resolved upstream in our OS udev being updated (many years ago that was). A serial is definite - a WWN is just another serial - but if it works better (these days) then great we should definitely consider it. But the change is so low down it would have to have complete test coverage with many real world examples such as our own tests now cover for drive serial.
Incidentally we origianlly used the kernel canonical /dev/sda names - at least until early intel NUC machines started changing them on every boot. The serial move was our first attempt at resolving that and I fully aknowledge there may now be better alternatives: if they are in-fact better. Grass greener and all that.
Do take a look at the referenced wiki article and also note that a btrfs developer who has done a lot of drive management stuff themselves suggested serial numbers of devices to distinguish one from another and stated this was more in the realm of a app - which we are. Sorry again but I have no time to retrieve a reference. Needles to say we thought carefully about this and the only failures have been in multi-device USB external enclosures and USB is not a recommended bus for storage. Ergo low priority against our technical debt.
Note:
The USB controllers we black-list:
Maybe, give the bath-water and current reliability outside USB, what we could do here is start by treating USB devices differently: given they are the known failure point. But again there are other ramifications (udev name expectations in the code) - however all doable and in time could lead to a less fragile system than we have currently. However we don’t want to end up writing our own udev - we use it’s names exclusively. As this is not a priority for the core developers, as we are all working on getting our dependencies in shape again, you could look to our current critical code paths regarding the wwn idea. It may well be that it is now better represented within udev and would actually slide in nicely with what we have been ‘managing’ with these past few years.
Thanks for you suggestion here. I would personally like to have as pragmatic approach to hardware as possible - I’m just concerned about such deep changes. But I also appreciate that we have some hacks that could just go-away: which would be nice. Note also that we have some legacy treatment re partitions for our system drive, that are not there in our data drive interpretation. So we definitely have some low level stuff to tidy and maybe this suggestion is a potential ‘out’ or ‘through’ way. All depends on if udev serves us similarly. It is after all the interface layer for device naming provided by modern linux, and we are heavily invested in it: I think we existed before udev and systemd as it goes.
Hope that helps, at least for context. And do take a look at the code and our developer docs: https://rockstor.com/docs/contribute/contribute.html#developers
as there may end-up being an easy way to test your proposition: at least with your setup. I think btrfs within partition with wwn may be the kicker here, or possibly the LUKS support, or the system drive support for mdraid (not recommended and may be removed once grub can do btrfs multi-device).