I have read the following threads and this doesnt solve my issue.
This is what I have before installing new drives:
2x interal SSD for rockstor configured with mdraid.
6x 4TB “WDC WD40EFRX-68W” (RAID10, main datastore)
1x 2TB Seagate Baracuda (Unused)
They all work fine until I try and add more drives, then the UUID’s disappear and I see the error (Topic) on rockstor console.
I have tried adding;
4x 750GB Seagate Baracuda drives
or
5x 2TB Seagate Baracuda drives
I first thought the 750GB drives were bad causing this issue. but I have since valiudated this not to be true. The 2TB were only in use within another system an hour earlier and they cause the same wierd problem when added.
It seems I can only have my base configuration, plus 2x more drives, without things crapping out and mucking up the UUIDs. If I remove the drives, it returns to normal and is fine.
Could this be related to my PSU?
At boot time I know the drives will draw more current and maybe my 450W is not sufficient, and this is causing instability in the drive. What is the recommendation here?
Or there is a limitation in the number of drives that can be managed by the system.
Or there is a bug.
Or something else I have missed.
Please help.
lsblk looks like this for the, previously good 4TB drives, others are just missing.
NAME=“sdc” MODEL=“WDC WD40EFRX-68W” SERIAL=“” SIZE=“3.7T” TRAN=“sas” VENDOR=“ATA " HCTL=“6:0:0:0” TYPE=“disk” FSTYPE=”" LABEL=“” UUID=“”
NAME=“sdd” MODEL=“WDC WD40EFRX-68W” SERIAL=“” SIZE=“3.7T” TRAN=“sas” VENDOR=“ATA " HCTL=“6:0:1:0” TYPE=“disk” FSTYPE=”" LABEL=“” UUID=“”
NAME=“sde” MODEL=“WDC WD40EFRX-68W” SERIAL=“” SIZE=“3.7T” TRAN=“sas” VENDOR=“ATA " HCTL=“6:0:2:0” TYPE=“disk” FSTYPE=”" LABEL=“” UUID=“”
I do think it is power related, but it is a strange bug. I just dont want to go out and buy a new larger PSU to find that this is some other bug (potentially).
@GIDDION Hello again. I am just about to write a technical manual wiki entry on device / serial management in Rockstor and want to reference it here so hang in there. And yes there is a suspected bug in serial management but I have yet to root it out but I have only seen it occur on nvme devices so far ie in the following forum thread:
In that no db entry for serial should be null and yet in the above thread it was found to be the case: hence suspected bug, but I would rather have more info before creating a targeted issue on this as it is still a little hazy.
Maybe we can route it out here if you are also affected and game.
I’ll get this wiki entry done first then circle back around to your serial issue. But as a quick note your lsblk readout does indicate all those 4 drives are not reporting their serial, try the udevadm commands in the above referenced forum thread except on your problem drives and see if the serial numbers are extracted correctly then (posting the full output of both here will also help), as if lsblk reports no serial Rockstor fails over to trying udevadm to retrieve them. Sorry need to read more on your issue as reported but also will need the wiki. Also note the contents of your rockstor.log (System - Logs Manager) when you press the Rescan button on the Disks page. Essentially Rockstor parses the lsblk output to know what drives are connected and manages them via their serial numbers which are required to be unique.
Hope that helps.
Back in a bit.
Edit: and screen grabs of the Disks page would be good. Thanks.
@GIDDION We now have a Device management in Rockstor technical manual entry so at least that’s there to refer to now. Mainly intended as a developer reference but still it needed to be done. I await your info as previously requested in this thread.
thanks, I will have a look when I can. We just had a baby, so time is limited.
This is important to me, so I will endeavour to action it asap.
I also have a number of other tests try, including rescan the bus manually. I will also look at updating the FW of the HBA, but this requires UEFI, something I have not done before, so it may take me a little bit to figure it out.
@GIDDION First off congratulations on the new baby.
My current thoughts are that you have a dodgy drive or port as the following excerpt from you logs indicates a blk_update_request io error and always on the same sector. Could it be that this drive (most likely drive I think) is throwing a controller and causing all drives attached to that controller, or part there of, to fail detection?
From the Rockstor perspective if lsblk doesn’t report it then it doesn’t exist and hence the bunch of detached / removed drives when ever this problem appears as the db knows they used to be attached and they are no longer attached. That is how we determine a detached device. Incidentally in testing these drives are now given a “detached-uuid4” name to make this clearer, rather than just a uuid4 name.
It also seems that the “WDC WD40EFRX-68W” drive when it was sdc with the above quoted errors also failed to report it’s serial (ie not showing in the lsblk output) and as a result udev wasn’t able to assign a by-id; as they require a serial. Hence the no serial Web-UI report.
Now I know that earlier on you tried adding 4 drives and things went wrong so you sensibly tried adding 4 completely different drives and the same thing occurred. This is very strange but judging from the quoted log entry something is definitely not happy at quite a low level here.
Essentially the lsblk output has to see drives for Rockstor to work with them and it or udevadm also has to be able to extract their serial, this in turn means we get a by-id which is soon to be required also: but is essentially just an extension of needing a serial in the first place.
A quick note on the screen grab pics, they are really low res so I can hardly make them out. If you just drag and drop the original it will be uploaded and auto sized and then, when clicked on, will provide a full or near enough, version.
So in your last post you say adding an additional drive throws things off, is that if you add any drive, and on any spare port? No limits on Rockstor drive capacity beyond that of the underlying hardware and CentOS by the way. Although there is currently a know limit of 9 partitions on the system drive but that is irrelevant here.
Can’t quite get away from the “blk_update_request: I/O error”. Could it be that your interface has more connections than it is able to present to the OS simultaneously with it’s current config, ie it can present only a maximum of 8 drives to the system but can connect say twice as many (in it’s current or maybe any config, another question), ie one can connect 16 real drives but only present them to the OS as 8 virtual drives (via a hw raid arrangement). Sorry just guessing here but it is rather strange. That might explain why after 8 drives you have issues as the card fails to report the devices correctly. Also Rockstor currently doesn’t deal with multi path devices, not sure if that’s relevant here.
For the time being I would stick to the low level output of lsblk and keep an eye on the logs for the IO errors, and ensure udevadm is able to see serials for all devices. As without those the underlying config of the hardware or OS is the problem. Once that is sorted we are back into Rockstor land. So continue with your hw juggling diagnostics and we should have more information with which to work this problem.
Incidentally the bug I suspected in the previously referenced issue does not seem to be in play here, at least no yet anyway as there a device did present it’s serial via udev but Rockstor mis-interpreted it as it didn’t recognise the device type / name and messed up when enforcing unique serial, the device also had to be a system drive I believe.
I think you are close to sorting this one simply by elimination, maybe someone with more experience with your particular hardware can chime in with wiser words. What is the hardware arrangement, ie the SAS controller make / model / spec etc.
The LSI HBA (quad port HBA) port/cable is a SAS 4-lane connection to the SAS backplane of my Norco RPC-4224 chassis, which enables 4 drives per cable/connection. So, with a quad port card I can physically connect to 4 of the backplane ports, which is how it is. Supporting 16 drives directly without any virtual drive stuff going on, I would have to believe.
I will connect up the 6x 2TB HDD i have, instead of the 6x 4TB drives and report that output, before adding the 4TB back into the setup and report that, all without using the very old 750GB drives.
As for the potential faulty drive, I thought this at first too, and went through 12x 750GB HDD, writing in big black text “DEAD”, but the drives work fine if they are below the 8 drive threshold.
I will eliminate the potential issue of the 750GB HDD, which I previsouly did, by only using the 2TB and 4TB. Now we know the 4TB drives work fine, as they have all my data on them. I also know the 2TB drives work fine as they were in use (alternate system) only recently (before data relocated to new 4TB configuration).
By pulling the 4TB drives and setting up the 2TB drives into a configuration, gather data, then adding the 4TB drives back in, we should see the same wierd behaviour. Eliminating potential drive issue and port/backplane issue, as different ports/backplane used for each drive type.
Hi @phillxnet, I have been doing some catch up on the various responses you’ve made on this topic. I was hoping that you could clarify something for me. I have just installed rockstor on a pi4 and have an ORICO 4 Bay USB 3.0 to SATA enclosure.
I have run the command $ lsblk -P -o NAME,MODEL,SERIAL,SIZE,TRAN,VENDOR,HCTL,TYPE,FSTYPE,LABEL,UUID | grep -i -e d93a79d7-d6c3-4e1b-8b8d-ecf697c48cb6 and the results were as follow:
It seems as though in this instance the serials for the HDDs are available and unique. Is there anything you can advise on how I could use these with my setup?
@redplague Welcome to the Rockstor community form.
Re:
Hopefully I can:
As you have likely read, some of these have been problematic in that they don’t issue unique serials. Or they issue serial numbers relevant only to the particular bay, which means Rockstor can only track bays, not actual devices.
But:
Yes, it does, doesn’t it. Those serials look like regular Seagate serials or the like. Each drive will actually have it’s serial printed on it somewhere (sometimes this is on the very end), so if those serials match exactly what you see in this list then it’s job done and all is well. If that is the case let us know the exact model of this enclosure as ORICO make some nice equipment and it’s a shame they based some on low end, basically faulty, chips that obfoscate
Our current code to flag known problematic enclosures has the following notes/entries:
It would be useful to know if udev equally returns unique and hopefully original serials as well. It likely will.
Our future preferred serial retrieval will likely be via udev directly, i.e. this procedure:
So double check what the following command returns, serial info wise, for each of your drives in that enclosure:
udevadm info --name=device_name
where “device_name” is, from the above procedures docstrings:
:param device_name: eg /dev/sda as per lsblk output used in scan_disks()
The likelyhood is they will match those returned by lsblk.
As far as how you could use these drives, if the above double check pans-out , they should be usable however you like. You are still running through presumably a single USB port, but at least this means when the USB bus falters which it is known to do on various wims, all of the enclosed drives are simultaneously inaccessible. And likewise they all then come back, as one, after the bus as reset. So I would just advise that you do not create pools that have members both within, and outside of this enclosure. I.e. if all of a pool, or pools members are within the enclosure they will come-and-go as one with the vagaries of the USB bus, so this should not lead to a common problem where one drive is on one USB bus (adapter) and anther drive (in the same pool) is on another USB bus. One bus goes down and activity continues to the remaining drive, then the first drive returns after a USB bus blip and you have potentially a split brain situation in the making. Btrfs is still very sensitive to drive dropping our and then returning.
Let us know how it goes and what the performance it like. It could be quite a nice setup. Maybe send a picture of this enclosure if possible. We are always on the look-out for well behaved/reported devices so do keep us informed of who this device holds up under use.
I see from the lsblk output that the drives are LABEL=“btrfs-raid10”, is this from you already having tried them out within Rockstor or from a prior life they have had?
Hope that helps and that this is in-fact a perfectly usable multi-drive external enclosure.
I can confirm that those are indeed the Seagate Barracuda drive serial numbers. I found them printed on the original boxes the drives arrived in as SN:xxxxxxx.
Comparing this output to the other drives in the enclosure revealed:
Rockstor:~ # udevadm info --name=/dev/sda | grep -i -e serial E: ID_SERIAL=External_USB3.0_DISK00_20170331000DA-0:0 E: ID_SERIAL_SHORT=20170331000DA Rockstor:~ # udevadm info --name=/dev/sdb | grep -i -e serial E: ID_SERIAL=External_USB3.0_DISK01_20170331000DA-0:1 E: ID_SERIAL_SHORT=20170331000DA Rockstor:~ # udevadm info --name=/dev/sdc | grep -i -e serial E: ID_SERIAL=External_USB3.0_DISK02_20170331000DA-0:2 E: ID_SERIAL_SHORT=20170331000DA Rockstor:~ # udevadm info --name=/dev/sdd | grep -i -e serial E: ID_SERIAL=External_USB3.0_DISK03_20170331000DA-0:3 E: ID_SERIAL_SHORT=20170331000DA
Unfortunately it looks as though all the drives return incorrect serial information and it is not unique according to udevadm.
This is a left over from when I manually created a raid10 configuration on the command line: sudo mkfs.btrfs -f -L "btrfs-raid10" -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde
I have attached an image of the enclosure even though it’s not as well behaved as I had hoped. I would like to have been able to use it with Rockstor:
One more question: When I look in the Rockstor UI at the existing manually and externally created raid setup, I can see that I have the option to import pools from the first disc that does not show a warning. What do you expect would happen if I did import the pool and tried to use the setup as usual within Rockstor?
and the same ID_SERIAL_SHORT=20170331000DA across all devices also. This also explains your disk page serial warnings. The first drive is not marked with a warning as it was simply the first to be found which what initially looks like an unflagged serial number. It is only once the system realises that there is then a second/third/fourth repeat that it marks them as repeats. So basically this is not going to work. An import will not get you anything more than further down a path of confusion regarding disk management. Rockstor needs an anchor with with to track devices, and that is serial numbers. But they are all the same via udev!
Interesting also is how the model column is also populated, i.e. USB3.0 DISK00, USB3.0 DISK01, etc.
There is hardware obfuscation afoot here and that is problematic for us. I’m just a little surprised we didn’t pick-up on the earlier serial numbers within lsblk’s output.
And having a look at your copied in full output from:
udevadm info --name=/dev/sda
I don’t see a single reference to the actual serial anywhere. Yes lsblk has it! I was certain that lsblk, these days, used udev to get this info. But if it has retrieved the serial, there is hope for us to some-how do the same.
If you can find a standard program to retrieve the ‘real’ serial as per what is on the drive we could pop-in some compatibility for these types of devices. Maybe it can be retrieved via for example smartmontools or the like?
Apologies for offering little way around this currently. Ideally I would need one, in-house, to experiment with. But give you have this, in-hand, do let us know if you find a quick/simple serial drive retrieval from dev name method. That is all we need in this case. The we can, on seeing such devices, revert to this back-up method. Udev is somehow playing along with this obfoscation, by design or by accident.
Let us know how your investigations go. But as-is this is not Rockstor compatible and I’d really like to add support for these devices but they are currently just not behaving like regular independent drives on the same bus: as they have the same serial (as per udev)!! I’ll puzzle some more as I go and hopefully in time we can add a clause for these devices as the are rather nice.
Hope that helps, in some way. But there may well be a 3rd way to retrieve the original hardware assigned serials that we are currently just missing. That would be the clincher for gaining compatibility.
@redplague I couldn’t resist, had to do a quick search again on this.
Re:
This has cropped up before actually, and I think it is what I was thinking of and instead defaulted to smartmontools. Apparently hdparm can tell us serials!!
What do the enclosure drives return when you try hdparm on them as I’ve just done here:
hdparm -i /dev/sdX
for each of the drive names there-in.
If it can retrieve them we may have a potential work-around.
Hope that helps and further suggestions welcome. What we are after really is something cheap/quick and built in. That we can parse rapidly ideally.
@redplague Thanks for the persistent testing here.
Re:
It does indeed. Funny how the -i failed thought. Possibly due to some hard disk type assumption or other and good to know this. And I’m guessing the capital “… directly from drive” ‘I’ option works around this.
But what-ever this is a nice find. If you or someone else doesn’t beat me to it I think what we need there is some code testing to see if this is all we need to support these devices. My current worry is that we have elsewhere some assumption on the structure of the by-id name udev assigned.
Can’t look myself currently but it may be this is non critical. And can be addressed as an independent issue. But it’s hopeful to have found a way to split out these drives serials from their enclosure obfuscation and get their ‘real’ serials.
A quick hack, to be done on a non production system , is to have your copy of get_disk_serial() use hdparm -I to retrieve the serial, instead of it’s current use of udevadmin. But I’m still a little perplexed that our initial lsblk reading of the serial is not working as intended. It’s still the default I believe.
Again I’ll have to focus in on this area of the code again soon with this new info. But do let us know if you end up doing any more investigations.