Warning! Disk unusable as pool member - serial number is not legitimate or unique

Greg_Simpson · November 28, 2017, 2:15pm

rockstor 3.9.1-0

upgraded the emby server rockon tonight with catastrophic results

performed reinstall of rockstor (after two months of flawless 24x7 operation - very nice)

4 SAS drives in raid 10 config (btrfs) so they all imported ok

one boot/os drive (sata) that was overwritten during install now throwing the current warning
i’m assuming that there’s a db file somehwere that keeps a record of the serial of the drive and that this has changed during the reinstall

from the error message (posted here)

Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/disk_smart.py”, line 43, in _validate_disk
return Disk.objects.get(id=did)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/manager.py”, line 127, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py”, line 334, in get
self.model._meta.object_name
DoesNotExist: Disk matching query does not exist.

there’s some sort of sql query to validate the list of disks and its getting bent out of shape with the btrfs partition on the sata drive not matching to what was there previously - that btrfs “partition” is also a member of a standalone pool that i use for the rockons (only emby server at this stage)

where and what do i have to patch/zap/run a sql command to fix this?

Greg_Simpson · November 28, 2017, 3:00pm

i’ve just had a read of this article on the wiki

very interesting

if you’re having trouble managing the raw devices before any levels of meaningfu abstraction come into play, why not make the devices themselves responsible for that task?

and, yes, i know you can’t go chaning drive firmware, but if btrfs is grabbing hold of the device well before the OS gets its hands on it shouldn’t it be possible to create a “btrfs” partition?

reserve some space on the drive and implement a pull mechanism that includes a history of the machines it’s been in, security rules, data importance, what roles it’s had, pools, shares et al

make the drive a repository of the meta-data that btrfs seems to be keeping in its db somewhere

this would allow boot safe unique identification but also allow transposition of drives within a server cluster if the metadata were to flow up some sort of data tree within the cluster

just a thought

Greg_Simpson · November 28, 2017, 3:12pm

okay - the btrfs fi show command gives me a list of the current UUIDs for the drives attached to the server at the moment

i’m not seeing a “fake” or a 'tmp" uuid here as per the article

how do i find out where the OLD btrfs serial/identifier is/was and sort this out - until then i can’t create a share on the 250GB of space i’ve got to allow the rockons some space to play around with

phillxnet · November 28, 2017, 6:46pm

@Greg_Simpson A belated welcome to the Rockstor community to you.

The article references how Rockstor manages / tracks the disks. This is below the level of the filesystem / volume manager (btrfs) and the ‘trick’ that we use to re-label duplicate serials as “fake-uuid” does not reference btrfs’s knowledge as the uuid in that name is simply auto generated on each scan as a place holder as the rest of the code must have a serial so we run with a fake one: “fake-” as easy identifier “uuid” component as arbitrary unique component unrelated to anything else and unlikely to accidentally be duplicated.

I’m wondering if your issue is down to a custom partitioning that is not supported by Rockstor, this is from your comment:

where “that” is unclear to me. I’m probably picking up on the obvious. Could you post a image of the disks page to help clarify.

Also your suggestions re disk management are interesting but at the wrong level. The serial use it to track drives regardless of their contents. Ie to answer such question as “that drive did this” even when that drive was removed, or now is misteriously blank for instance. We can then use our db to serve a lower level of knowledge re attached / detached disks than that of btrfs, which currently does manages it’s own knowledge of those disks by what is on them. Glad you found the article interesting. It’s more of dev / design guide as we use to get a lot of questions re the serial obsession: hence the subtitle. Should also help with on-boarding devs. Thanks for your comments / suggestions.

What is puzzling is that the db essentially re-writes itself every time the disk page / attached disks are re-scanned. And a fresh install starts with a fresh db. If an existing entry (by serial) is found then it will be updated with the new info.

Lets see what the disk page looks like as Rockstor does have some hard limits that are not shared by btrfs in order that we encourage system disk / data disks seperation; with just a little node to flexibility in that direction.

In short did you do a default install, ie simple 3 auto partitions on system disk?

Just trying to work out your hardware / btrfs arrangement re disks in what pools etc.

Cheers.

Greg_Simpson · November 29, 2017, 1:54am

thanks phillxnet,

i have created a new share for the rockons in the main disk pool to work around this problem (the share was originally in the rockstor_rockstor pool

here are the disks

and the pools

and the shares

this was initially a vanilla rockstor installation - one sata disk as boot and os (partitioned automatically by install)
and the 4 sas drives as data in raid10 config

upgraded the emby server docker and things went rapidly downhill from there (no ssh to rockstor box, no network to it - emby web interface wouldn’t connect and neither would rockstor webgui) - centos was still running but none of the docker processes were running - the ssh daemon had failed as well)

reinstalled rockstor on 250GB sata and it came up okay - imported the btrfs volumes and they slotted back into place
just that warning about the serial for the sata drive and the fact that there’s now a 230GB device and a 3.2GB device both in the rockstor_rockstor pool.

cheers for your help
greg.

Greg_Simpson · November 29, 2017, 2:21am

if i have a look at the partitions

so somethings happened during the reinstall - it should have created a boot partition (sda1 - check) an OS partition (sda2 - check) and the rest as a btrfs partition (sda3 - check) but i’m not sure what this 3.2GB extended partition is at the end of the disk and whether that is what rockstor is complaining about - it certainly shows up and is usable

just delete it and rescan the drive?

Greg_Simpson · November 29, 2017, 1:59pm

ok - i think i got my head around it

by default rockstor creates a home and root pool using the os drive - i shouldn’t use it for anything but that

i’ve downsized the sata drive to a 10k 80GB drive that i had spare and let the re-install do its work
all successful - imported btrfs volumes and away we go

reinstall of emby server and the minor niggle of setting up the emby stuff again but all very easy

very happy with rockstor

thankyou very much for your help

phillxnet · November 29, 2017, 2:23pm

@Greg_Simpson I think you’ve found it.

Thanks for the pics, from both the disks and to pools pages we see that rockstor_rockstor is associated with 2 partitions. This is a no go for Rockstor so it reacts by labeling one of the devices as having a duplicate serial, hence the warning. The duplicate serial comes from both associated partitions existing on the same physical disk, and so the same hardware serial. There in is the issue, ie part5 is the anomaly that is not ‘allowed’ so when scanning disks a serial ‘clash’ is encountered.

The extended partition is a ‘container’ partition for instances where the normal limit of 4 partitions is insufficient. sda5 (the problem partition) is a logical partition within the ‘hacky’ extended partition limit of 4. I.e. installers, when partitioning an msdos type partition table will often create an extended partition if more than 3 partitions are requested. That way the 4th (would be sda4) doesn’t then block the reaction of any further partitions. I

The share is not the issue really, although better to not use system disk as import / data disk transfere then easier but the cause of the serial issue is that system disks, in Rockstor, can not have any additional partitions.

Could you establish if your rockstor_rockstor is on sda3 or sda5? The following command should tell us:

btrfs fi show

Rockstor is potentially confused about this as it only accounts for a single non swap non /boot partition on the system disk.
That way we can work out which of those partitions needs to be removed to return the system disk to Rockstor ‘spec’. I’d suggest the install was not completely standard on this one but as you state, if we remove the offending partition so we only have /boot, swap, and the rockstor_rockstor host partition you should be good to go and the disk page should stop complaining of the duplicate serial.

OK, looks like we have overlapping posts in time as you have just responded. And apparently sorted it. I’ll post anyway as then if others have the same issue it might help.

From your latest post:

Pretty much, except home and root are actually subvolumes within a single rockstor_rockstor pool which is hosted on a single (usually sda3) btrfs formatted partition.

Ideally as then you have system / data separation. Greatly helps with re-installs / system disk failure. And given it’s currently a pain to have redundancy on the system disk this partially negates that lack of redundancy as it then only concerns that which can easily be re-installed (the Rockstor system) and all data is preserved as it would reside on other physical drives.

Again if the config and data of the emby rock-on was on the data drives then you could simply point the fresh installs rock-on at them and pick-up where you left off. Where as a clean re-install where the system disk had data on for that or any rock-on would need configuring from scratch.

Super. We do have a way to go just yet but the effort is concerted and ongoing.

Your welcome. We probably need a Rockstor pool arrangement primer to add to the docs but I suspect this would best fit under the following existing rockstor-doc repo issue:

Hope that helps and chuffed you have it sorted now.

Greg_Simpson · November 29, 2017, 2:47pm

thanks so much phillip

just for the record
this

plus this

and these
rockstor3 rockstor4 rockstor7

and some of these

rockstor 5

one of these

and this

equals one of these (mostly from bits i had lying around)

Greg_Simpson · November 29, 2017, 2:48pm

it had to be the asrock board for this project lol

Greg_Simpson · November 29, 2017, 2:59pm

thanks for the primer

i used to be in IT years ago with IBM mainframes and did some sysprog and DBA work so i have an affinity with things that go round and round

i was going to go down the zfs path but the forums said that btrfs was doing good things so i took a punt on rockstor - glad i did

is there a huge difference between zfs and btrfs - it seems that there are some similar concepts - drives as storage objects only and then the abstraction and control layers are layered on top of that rather than the old model of a partition driving your storage architecture from the bottom up

i can remember when you had control of cylinder/head addressing - oh for those days again
hope those pics made you laugh

phillxnet · November 29, 2017, 3:18pm

@Greg_Simpson
Nice.

Pop rivets and all. But for the more disassembly minded there are always rivnuts / rivet nuts: space permitting.

On the coincidence / collision side (your ASRock / Rockstor reference) we also have a very similar logo/brand colour arrangement with be quiet so a recent build I did had one of their excellent PSU’s looking nicely matched (from the rear) with the accompanying Rockstor sticker (on a black case of course). One of those stickers might also suite your case come to think of it .

Yes, plenty of stuff yet to arrive in both btrfs and Rockstor (mostly in the drive failure / repair camp) but all are in the works. Ie it would be nice not to have to drop to the terminal when things go south for instance.

Essentially it. I really like it but then I had a stint on LVM and have been bugged by partition table variations / schemes / bugs for ever so to cast a bare disk as a pool member and do everything else in btrfs is really neat. Ie volume management and file management in one. We currently stick to partitions on the system disk due to grub limitations re btrfs so that’s a shame but hey ho all in good time.

Now now; lets not get carried away. Especially over 1024 cylinder limit etc. My first HDD was 10 MB !!! crazy.

Greg_Simpson · November 29, 2017, 4:03pm

ok yes 1024 cylinder limit but i recall we had to write new code for the 3380 ibm mainframe drives because they had two actuators in them so it went from cylinder cylinder head head to cylinder cylinder head head head if i recall that far back

but jumping forward a century or two

your thoughts on OO ?

do we start thinking about the lowest level being objects (in the true object model) and inherit from the base object up the class structure

blue sky here - but if at the “user” level they see an “object” of class:x we then just throw that object at and object of class:Store_Persistent and let the object tree take care of it

the file system then disappears and becomes an object handler that deals in persistent objects

there’s enough thought that had been put into database technology that there’s no reason that filesystem/device driver/bios layer couldn’t act as the “object server” for persistent instances

disks would then be merely bit buckets (as they should be) not an ongoing concern about partition sizes et al - with the whole partition model we basically just scaled up the “how many floppies does it take to store this” and then added some problems

the first hard drive i ever touched was a 3340 - you loaded it into a drive enclosure the size of a top loading washing machine - and hoped the heads didn’t crash

Greg_Simpson · November 29, 2017, 5:34pm

absolutely with the stickers - i will order some

i came from an IBM background and have been watching linux since ubuntu 10.04

now a convert gnome 3.26 on ubuntu 17.10

but the most gratifying thing in dealing with the non conformity of the various distros - or even down to individual software and how they handle their own fonts et al - when i took my flatmate who was a computer luddite and knew only the most shallow layer of windows

when i showed him linux and started the education process - a true convert - to the point where he is just starting to dip into the console

THAT’S a user friendly OS

phillxnet · November 29, 2017, 6:11pm

@Greg_Simpson

Showing your age there a little. Yes I did touch one of those, or it’s kin, one time: and it was still in service. But not at home obviously. I thought the partially evacuated vertical tape cabinets were a little more impressive. At least to look at. Like ‘proper’ technology - lots of moving parts to see. Hey ho - onward and downward.

I’m personally looking forward to playing with clustered file systems, I’d love to add a ‘one click’ (or not many more) Glusterfs setup to Rocsktor but we have a thing or two to do before such shenanigans.

Ultimately something has to talk to / monitor / manage the metal - hence our drive serial tracking.

Whilst on the object storage front, I think there’s a docker image somewhere that setups up S3 compatible object storage. That would make a nice Rock-on, pull request anyone.

Yes I particularly like the ethos of GNU / linux. It’s akin to the scientific approach but in software development. All in the open and sharing each others efforts. Bit hit-and-miss from time to time but better that than the proprietary alchemist approach. Also does wonders for democratising computer tech, at least on the software front.

Fedora (gnome) on main machine with Ubuntu on secondaries myself, though yet to upgrade to 17.10. Bar the numerous Rockstor ( CentOS ) installs of course.

If you fancy taking a peek / poke at Rockstor’s innards anytime it’s actually quite approachable. Though we are working on improving the in-code comments; but they do - mostly - exist. The majority of the ‘work’ is done in Python so it’s fairly approachable. The following Contributing to Rockstor - Overview doc section is a good place to start - but not on your production machine of course as it wipes the db when rebuilding.

I’ll try and post pics of my custom PSU mounts made from metal coasters with an angle grinder (cutting disk) some time.

Greg_Simpson · December 1, 2017, 1:17pm

please post them - would be interested

on a different tangent

the “server” you saw has an lsi sas controller card talking to 4 x toshiba sas drives via and sff 8087 breakout cable

just got an 8087 - 8087 cable and installed the hp backplane on the drive cage that i “encouraaged” to mate with the pc

just want a single cable basically and why not use some old hardware just because it’s there

at power up the drive leds on the backplane quickly cycle through their color gamut and then evrything boots just fine

EXCEPT the activity leds don’t fire on the backplane

is it worth looking at ledmon/ledctl - installing it on centos or is that going to break something with rockstor - i don’t know how “tailored” your centos release is for rockstor so i’d hate to inadvertently break something

cheers
greg

phillxnet · December 1, 2017, 7:53pm

@Greg_Simpson

On that front you should be good. Rockstor is pretty much CentOS but with a much much newer kernel and btrfs-tools and a hand full of more updated packages (less so as they are updated in CentOS). So your main concern is the newer kernel version that we pull straight from elrepo as their kernel-ml (main line). Our rockstor package also sprinkles a few files here and there during initial setup / boot ie systemd files and some custom config files but nothing that should interfere with your ledmon/ledctl idea.

Do keep us posted on this one. But note that the newer kernel may also mean you have to compile the relevant package as it may expect the really old kernel of standard CentOS.

Hope that helps.

Greg_Simpson · December 2, 2017, 7:38am

on some of the forums it mentions that os control of the drives can cause this (jbod rather than raid mode)

i’ll download source tonight and start looking at rockstor from a programmatic perspective

python will be a new one for me though - but hey i speak about 10 other languages so at least the python curve wont be too steep

i’m pretty sure it’s a disconnect between the lsi card and the hp backplance - because the leds cycle through their colours at power up - which tells me they’re working - but when the lsi card initialises there’s no activity

i’ll do some digging and see if there’s any functions within the lsi microcode that can be called to control the leds from rockstor itself

show errors and management functions (like a btrfs scrub or a snapshot) with custom led colour and flashing combinations

who do i approach to plan out a framework for

planning this functionality (if it doesn’t already exist)
executing it (i’ll write the code but integration effort (polishing) would need to be done by someone with way more experience)
is there any particular platform for dev you favour (yes i’ve got a github signin)
i’ve been doing OO stuff for quite some time (since the 80’s) so some of my design flavour follows that particular experience
and as always - if it can’t be done in IBM 360 assembly language - it’s quite possibly not worth doing LOL

i’ll start by seeing if i can’t get something running that takes control of the leds - the source for ledmon/ledctl should help with that
cheers,
greg

Greg_Simpson · December 2, 2017, 10:29am

oops RTFM - just did - thanks