Duplicate Appliance IDs Generated

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

I’m busy migrating some Thecus NAS devices to RockStor. However, I’m unable to link the 2 appliances I have built because they both have the same unique Appliance ID. How could this have happened? The model numbers of each hardware device are different, with different host names and different IPs. They were both installed from the same USB key. They both run from an internal SATA SSD device.

Detailed step by step instructions to reproduce the problem

Install latest stable build … 3.9.1-0

Web-UI screenshot

[Drag and drop the image here]

Error Traceback provided on the Web-UI

Traceback (most recent call last): File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception yield File "/opt/rockstor/src/rockstor/storageadmin/views/appliances.py", line 110, in post appliance.save() File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py", line 734, in save force_update=force_update, update_fields=update_fields) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py", line 762, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py", line 846, in _save_table result = self._do_insert(cls._base_manager, using, fields, update_pk, raw) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/base.py", line 885, in _do_insert using=using, raw=raw) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/query.py", line 920, in _insert return query.get_compiler(using=using).execute_sql(return_id) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/models/sql/compiler.py", line 974, in execute_sql cursor.execute(sql, params) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/utils.py", line 98, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) IntegrityError: duplicate key value violates unique constraint "storageadmin_appliance_uuid_key" DETAIL: Key (uuid)=(REMOVED) already exists.

Wow…from 10 days ago. Having read this, it is not the same issue, but I will provide the same details requested.

OK I can confirm that “cat /sys/class/dmi/id/product_uuid” produces the same UUID on each server, and it matches the appliance ID.

2 Likes

One is an N8900 and the other is an N16000PRO, however, I can see that the N16000PRO also uses the N8900 system board.

1 Like

I can see in here …

That the duplicate appliance is already included as a fake…

5C4606FA-192F-453A-B299-7B088C63BB9B

So not sure what to do now.

1 Like

Hi @nigel, and welcome to the community!

I’m rather limited on time so please excuse my rather brief and dry answers. This area is @phillxnet’s expertise, but I’m wondering whether you are running a Rockstor version dating prior to the addition of your product ID as a fake appliance ID, hence you are not benefitting from this addition. If this is the case, and if you’ve just installed, it might be a good opportunity to try out our new Rockstor 4 version. While we only have a DIY installer for now, it might be a good choice here for you; see @phillxnet’s post in the thread to which you linked, for instance:

If you are indeed running Rockstor-3.9.1-0, it will represent a very substantial update for you as that version is now very old. There might be the option of manually adding your UI (as in the code to which you linked) to your local file and restating the rockstor service but I’ve never tried that so I’m unsure of the consequences.

Sorry again for being very brief and probably not as helpful as I would have liked, but I do hope this still helps.

2 Likes

You are correct - that fake uid is not in here…

/opt/rockstor/src/rockstor/system/osi.py

However, I’ve downloaded the latest version on the rockstor website, so not sure how I could get to anything newer? I also don’t want to run something not production ready (e.g. version 4). Love to use it when it’s ready, but it’s not yet.

I attempted to add the UID to that file and restarted rockstor, but it hasn’t changed the appliance ID. Hopefully @phillxnet can help! I’ve got 3 of these appliances to configure, and would gladly purchase activation, but from what I can see, I’m actually unable to activate, because this is a fake UUID, and the activation is based on that.

I’m therefore stuck! I can download the latest version (3.9.2-60) of the source. Is there a way I can manually update the system using this?

@nigel Well done on getting this far already. The fake Appliance ID problem has been quite the annoyance / challenge for us. And as you have found this is already a know one to us and we have a chicken and egg situation.

The Appliance ID of a machine is established when you first install the machine and so it is insufficient to change this post the initial setup. You could try extending that list in the code directly after a fresh install and before doing the initial admin sign-up process. You will likely still need to restart the rocksor process. I’ve not actually tried this myself.

But really given our Rockstor 4 is the future and we are now at release candidate 7, in version 4.0.6:

I would strongly consider doing your fresh installs using a DIY installer build via the following method:

It’s a fair bit of messing about just as of yet but what you end up with will be what we are maintaining going forward. We no longer release updates for the CentOS variant anyway and by most accounts our beta ‘Built on openSUSE’ is now ahead in many cases from our prior, now stuck at 3.9.2-57 CentOS variant. Plus if you do find any issues we can hopefully fix them. Where as our CentOS variant is now legacy.

Just a thought.

Let us know how you get on with that appliance ID addition to the fake list and service re-start before the initial admin user creation step. But really, given this problem, as you found yourself, has already been addressed and released in our new installer, if you can get over the hurdle of building that installer you are in a far better position.

Let us know the route you fancy taking but 4 is actually now better behaved and has a far newer (by years) btrfs foundation than our CentOS variant and we are itching to release the final Stable with accompanying installer but all in good time as we need our latest changed to be testing in the field first.

So given you are doing new installs I’d definitely given the 4 variant a fresh look. And if you have difficulties in the DIY build method then let us know in a specific post as there are many forum members how have now successfully accomplished this task and it is one we wish to spread general knowledge about anyway as the end result is a fully updated ISO, which includes all upstream updates as of the time of the build. This is a super useful facility to know how to use and may well help you in the future. Especially on newer hardware than needs these fixes (such as the newer Appliance ID fixes in our code case). So give that another consideration as it may be a better path in the long run on what you can spend you time on in getting up and running.

Let us know who you get on and which direction you want to go in on this one.

Cheers.

4 Likes

Thanks @phillxnet. If I do attempt the Rockstor 4 option, is it seamless to transfer the contents? I have 10TB of data on each NAS, and am manually synchronising at the moment. I’d rather import the BTRFS after initial install than have to copy all over again. Will this import operation work between version 3 and 4?

1 Like

@nigel I have done this before: import a data pool from v3 to v4, and it worked fine. @phillxnet may wish to add some more to this (eg caveats if there are any), but for me it worked fine.
Just make sure you have a suitable backup, just in case… :slight_smile:

3 Likes

Thanks @GeoffA. The other NAS is the backup!

1 Like

Cool. It seems my middle name is ‘Backup’. I’m working on it… :slight_smile:
Let us know how things transpire.

1 Like

@phillxnet OK I have managed to install Rockstor 4 on a test NAS which is an partially empty Thecus N8900 (same model as above). I am able to add the previous NAS without the duplicate key error, however, this is because the new version 4 uses lower case instead of upper case.

3.9 : 5C4606FA-192F-453A-B299-7B088C63BB9B
4.0 : 5c4606fa-192f-453a-b299-7b088c63bb9b

I think therefore you need to update your file /opt/rockstor/src/rockstor/system/osi.py to include lower case versions, as it’s obviously not triggering whatever process is required to generate a properly unique ID for the system. I also did flash the latest bios for this device, so maybe that providers a lower case return from /sys/class/dmi/id/product_uuid, or maybe it’s OpenSUSE vs Centos.

1 Like

I did the following:

  1. Re-installed RS4
  2. Accessed via ssh before doing initial config
  3. Added the lower case UID to the osi.py file
  4. Rebooted
  5. Went through initial configuration

Now I finally have a unique appliance ID. So I guess this process would probably work with 3.9 too. I will now see if RS4 is stable enough for my uses.

1 Like

@nigel Well done on building the 4 installer and getting the install/applianceID think sorted. And thanks for narrowing down a new suspected bug re:

and

Nice find. We have suspected a case issue for some time now but unfortunately not gotten around to narrowing it down. But:

Looks to have proven it. Well done.

I’m voting for this myself, although it could be a change in the kernel that is incidental but either way it’s a 3 vs 4 thing we need to sort.

I’d ask that you open a new GitHub issue in our rockstor-core repo:

with your findings, i.e. “fake product_uuid should be case insensitivity” or the like.

and link back to this forum thread for context.

Thanks for an excellent, and patient, work through of this bug. Much appreciated. This very much looks like a candidate for inclusion prior to our stable 4 release so once you have created that GitHub issue I’ll add it to the ISO milestone so hopefully the next time you install it will be a far simpler affair.

And thanks for confirming that code change prior to the initial setup. Nice to know we can fall back on that method. And feel free to chip in on similar forum threads as they pop up given your experimentation in this area.

I ask that you create the GitHub issue as it then simplifies our attribution side of things.

Thanks again for sharing your findings and proving the work-around.

1 Like

@GeoffA The import did “work”… kind of. Failed on attempt 1, succeeded on attempt 2.

However, it then spun on a core permanently doing a btrfs qgroup operation, and continually disabled and enabled the quota. In the end I removed the pool and recreated, and it is synchronising across using rsync over nfs. I’m not too keen on the replication setup in Rockstor - seems overly complicated using snapshots and prone to error (which I got lots of).

@nigel That’s a strange one for sure, and not something I experienced. Sorry I am not able to throw any light on the issues you encountered.

Re replication, I’m a die-hard rsync user for any syncing - that’s how my backups are done.

@phillxnet Issue raised on GitHub…

2 Likes

@nigel, Given:

and

This sound like you were on the edge of the time out here. And the quota re-calculation can take quite some time. There is a guide to roughly how long in the tooltip of the quota enable/disable. It may well have been that you could have just waited this out as it can take quite some time.

Yes, we definitely have some robustness work to do there. And we are also working within the current constraints of btrfs replication not currently being able to suffer a network outage: i.e. not ability to resume a replication. Snapshots are required for the replication process and once they are all established it can be a very quick method for replicating a subvol from machine to machine as it only copies the changes and can ‘known’ what these are without having to scan all files, which is what rsync has to do. But yes, complicated and currently in need of more attention both in our realm and in upstream.

I believe there are a number of upstream improvements to btrfs replication in the works but as some will required a protocol / format change their addition is being held back until it’s more certain that the next protocol will be sufficient for some time.

Hope that helps.

1 Like