Replication. Snapshot does not exist on the system. So cannot use it

riceru · May 5, 2026, 9:38pm

I’ve performed several replications. I’ve encountered some errors due to network outages, which I resolved by deleting the latest snapshot. The problem is that now I’m constantly getting an error that I don’t know how to fix.

[29/Apr/2026 18:18:06] ERROR [smart_manager.replication.sender:79] Id: a8c07302-2bf56a42-561e-4a2a-b3d9-f082fdd25df1-5. b'Snapshot(/mnt2/rockstor-datos.snapshots/bbg-duplicati/bbg-duplicati_5_replication_56) does not exist on the system. So cannot use it.'. Exception: b'Snapshot(/mnt2/rockstor-datos.snapshots/bbg-duplicati/bbg-duplicati_5_replication_56) does not exist on the system. So cannot use it.'

But when I look at the snapshots, I see that the bbg-duplicati_5_replication_56 snapshot is there.

# ls -la /mnt2/rockstor-datos/.snapshots/bbg-duplicati/
total 0
drwxr-xr-x 1 root        root        186 Apr 29 18:18 .
drwxr-xr-x 1 root        root        638 Mar 30 20:20 ..
drwxr-x--- 1 bbgVolcado bbgVolcado  30 Apr 14 23:18 bbg-duplicati_5_replication_56
drwxr-x--- 1 bbgVolcado bbgVolcado  30 Apr 14 23:18 bbg-duplicati_5_replication_58
drwxr-x--- 1 bbgVolcado bbgVolcado  30 Apr 14 23:18 bbg-duplicati_5_replication_63

# btrfs-list --show-id --show-parent /mnt2/rockstor-datos
WARNING: to get refer/excl size information, please enable qgroups (btrfs quota enable /mnt2/rockstor-datos)
NAME                                                                               ID PARENT    TYPE     EXCL  MOUNTPOINT
rockstor-datos                                                                      -      -      fs    4.43T (raid1, 8.30T/12.73T free, 65.15%)
   [main]                                                                           5      - mainvol       -  /mnt2/rockstor-datos
   bbg-duplicati                                                                 827      5  subvol       -  /mnt2/bbg-duplicati
      .snapshots/bbg-duplicati/bbg-duplicati_5_replication_56                  1516      5  rosnap       -
      .snapshots/bbg-duplicati/bbg-duplicati_5_replication_58                  1541      5  rosnap       -
      .snapshots/bbg-duplicati/bbg-duplicati_5_replication_63                  1556      5  rosnap       -

I can’t find anything related to the error in journalctl.

What can I do to get the replications working again?

Thank you

Hooverdan · May 6, 2026, 8:16pm

@riceru hello again.

Could you run this command:

btrfs subvolume /mnt2/rockstor-datos.snapshots/bbg-duplicati/bbg-duplicati_5_replication_56 show

In the coding there seems to be a a condition where the btrfs subvolume <mount point> show command is used to check for this.

Though i am wondering whether the deletion of the last snapshot has caused some chain reaction that the coding for _refresh_rt is not ready to handle.

github.com/rockstor/rockstor-core

src/rockstor/smart_manager/replication/sender.py

fd8f1bc53


      
                  share_path, self.max_snap_retain, regex="_replication_"
              )
              if oldest_snap is not None:
                  logger.debug(f"Id: {self.identity}. Deleting old snapshot: {oldest_snap}")
                  self.msg = f"Failed to delete snapshot: {oldest_snap}. Aborting.".encode(
                      "utf-8"
                  )
                  if self.delete_snapshot(self.replica.share, oldest_snap):
                      return self._delete_old_snaps(share_path)
          
          def _refresh_rt(self):
              # for incremental sends, the receiver tells us the latest successful
              # snapshot on it. This should match self.rt in most cases. Sometimes,
              # it may not be the one refered by self.rt(latest) but a previous one.
              # We need to make sure to *only* send the incremental send that
              # receiver expects.
              self.msg = "Failed to validate/refresh ReplicaTrail.".encode("utf-8")
              if self.rlatest_snap is None:
                  # Validate/update self.rt to the one that has the expected Snapshot
                  # on the system.
                  for rt in ReplicaTrail.objects.filter(

If I read it correctly, you should get this message, if the Rockstor database say “I got the snapshot” but the file system (on the receiving side) says “I can’t find it”.

If the database says “I can’t find it”, and the file system does “have it” (or even if it doesn’t), the message would be No succeeded trail found for ...

But I might have this backwards.

Hooverdan · May 6, 2026, 8:17pm

Either way, I suspect you might need to force a full replication again. But maybe @Flox or @phillxnet can confirm that.

riceru · May 8, 2026, 6:44am

@Hooverdan, thanks for your help.

Before reading your comments, I was running some tests and accidentally deleted snapshot 56. I’m going to have to do a full replication again.
I’ll keep this thread open and update it with the results of the replication, whether it completes successfully or the error occurs again.

Thanks

riceru · May 11, 2026, 9:52pm

Hello.

Unfortunately, the problem has occurred again.

Replication worked fine quite a few times, but at one point it failed.

Got EMPTY on error command message from the receiver while transmitting fsdata. Aborting.

Then I deleted the snapshot to run the replication again, and it completed successfully (though too quickly), but from that point on, the error I mentioned at the start of this thread kept recurring.

[11/May/2026 07:53:05] ERROR [smart_manager.replication.sender:79] Id: a8c07302-2bf56a42-561e-4a2a-b3d9-f082fdd25df1-6. b'Snapshot(bbg-duplicati_6_replication_82) does not exist on the system. So cannot use it.'. Exception: b'Snapshot(/mnt2/rockstor-datos.snapshots/bbg-duplicati/bbg-duplicati_6_replication_82) does not exist on the system. So cannot use it.'

The output of the command @Hooverdan mentioned looks fine.

# btrfs subvolume show bbg-duplicati_6_replication_82
.snapshots/bbg-duplicati/bbg-duplicati_6_replication_82
        Name:                   bbg-duplicati_6_replication_82
        UUID:                   f9425c1d-7a99-354d-8866-bce8a5e92ba5
        Parent UUID:            40c65286-10b2-9144-80ec-a2cf6e1d4fa4
        Received UUID:          -
        Creation time:          2026-05-09 12:44:05 +0200
        Subvolume ID:           1614
        Generation:             38705
        Gen at creation:        38705
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           0
        Send time:              2026-05-09 12:44:05 +0200
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
        Quota group:            0/1614
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     121.72GiB
          Usage exclusive:      240.00KiB

What surprised me is that there’s a missing slash in the path indicated in the log.

The log shows:

/mnt2/rockstor-datos.snapshots/bbg-duplicati/......

But the path is:

/mnt2/rockstor-datos/.snapshots/bbg-duplicati/......

I’m very confused. I don’t know how to fix this so I don’t have to start over, since the problem could happen again.

Thanks in advance for the help.

Hooverdan · May 11, 2026, 11:55pm

I’m sorry to hear that it failed again. The replication has a few open items, and has not been tended to for a while (dev capacity limitations).

How many times did it successfully replicate, do you know? 4 times, 10 times?
Also, do you have quotas enabled (mostly because there was an issue some time ago, where quota group management cause an issue during the replication)?

Well, I don’t know exactly the implications, however, I can tell that in the above linked _refresh_rtin thesender.pythere are two different place where thesnap_path` is defined. In one case it contains the slash:

github.com/rockstor/rockstor-core

src/rockstor/smart_manager/replication/sender.py

fd8f1bc53


      
          # it may not be the one refered by self.rt(latest) but a previous one.
          # We need to make sure to *only* send the incremental send that
          # receiver expects.
          self.msg = "Failed to validate/refresh ReplicaTrail.".encode("utf-8")
          if self.rlatest_snap is None:
              # Validate/update self.rt to the one that has the expected Snapshot
              # on the system.
              for rt in ReplicaTrail.objects.filter(
                  replica=self.replica, status="succeeded"
              ).order_by("-id"):
                  snap_path = f"{settings.MNT_PT}{self.replica.pool}/.snapshots/{self.replica.share}/{self.rt.snap_name}"
                  if is_subvol(snap_path):
                      return rt
              # Snapshots from previous succeeded ReplicaTrails don't actually
              # exist on the system. So we send a Full replication instead of
              # incremental.
              return None
          
          if len(self.rlatest_snap) == 0:
              # Receiver sends empty string when it fails to reply back to an
              # incremental send request with an appropriate parent snapshot

and re-defined in the other case (closer to where the error message is thrown that you see) where the slash is missing:

github.com/rockstor/rockstor-core

src/rockstor/smart_manager/replication/sender.py

fd8f1bc53


      
              f"btrfs-send. Sender picked {self.rt.snap_name} but Receiver wants "
              f"{self.rlatest_snap}, which takes precedence."
          ).encode("utf-8")
          for rt in ReplicaTrail.objects.filter(
              replica=self.replica, status="succeeded"
          ).order_by("-id"):
              if rt.snap_name == self.rlatest_snap:
                  self.msg = f"{self.msg}. successful trail found for {self.rlatest_snap}".encode(
                      "utf-8"
                  )
                  snap_path = f"{settings.MNT_PT}{self.replica.pool}.snapshots/{self.replica.share}/{self.rlatest_snap}"
                  if is_subvol(snap_path):
                      self.msg = f"Snapshot({snap_path}) exists in the system and will be used as the parent".encode(
                          "utf-8"
                      )
                      logger.debug(f"Id: {self.identity}. {self.msg}")
                      return rt
                  self.msg = f"Snapshot({snap_path}) does not exist on the system. So cannot use it.".encode(
                      "utf-8"
                  )
                  raise Exception(self.msg)

In both cases it checks whether the snap_path is a subvolume - on the second place, if the missing slash causes it to always return and error message (which results in the one you see), then that might be the root cause and needs to be fixed. So this could explain why you are running into this problem.

Don’t know how comfortable you feel around your system (and whether this is your production environment or just a test system), but if you’re up for small experiment (I know, you’ve already been replicating many times now) and adjust that line 207 to include the / (and restart rockstor) and your issue goes away, this could be the solution for the problem. Otherwise, I have to see whether I can replicate your symptoms within a test setup.

@phillxnet, @Flox any other suggestions?

Hooverdan · May 12, 2026, 3:54am

I set up a simple replication:
Both VMs I’m running on the same host machine, sender is on Leap 15.6, receiver on Leap 16.0 both have Rockstor version 5.5.3-0
btrfs sending system:

btrfs --version
btrfs-progs v6.5.1

receiving system:

btrfs --version
btrfs-progs v6.14

I was able to get to 72 snapshots (drives are not full), and then some key violation occurred during the 73rd replication:

~# tail -n200 /opt/rockstor/var/log/rockstor.log

[11/May/2026 18:50:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_71 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_72
[11/May/2026 18:51:06] ERROR [storageadmin.util:45] Exception: duplicate key value violates unique constraint "storageadmin_snapshot_share_id_name_10142bd3_uniq"
DETAIL:  Key (share_id, name)=(4, abs-config_1_replication_73) already exists.
Traceback (most recent call last):
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/psycopg/cursor.py", line 117, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "storageadmin_snapshot_share_id_name_10142bd3_uniq"
DETAIL:  Key (share_id, name)=(4, abs-config_1_replication_73) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 40, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/snapshot.py", line 169, in post
    ret = self._create(
        share,
    ...<4 lines>...
        writable=writable,
    )
  File "/root/.local/share/pypoetry/python/cpython@3.13.11/lib/python3.13/contextlib.py", line 85, in inner
    return func(*args, **kwds)
  File "/opt/rockstor/src/rockstor/storageadmin/views/snapshot.py", line 147, in _create
    s.save()
    ~~~~~~^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 902, in save
    self.save_base(
    ~~~~~~~~~~~~~~^
        using=using,
        ^^^^^^^^^^^^
    ...<2 lines>...
        update_fields=update_fields,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1008, in save_base
    updated = self._save_table(
        raw,
    ...<4 lines>...
        update_fields,
    )
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1169, in _save_table
    results = self._do_insert(
        cls._base_manager, using, fields, returning_fields, raw
    )
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1210, in _do_insert
    return manager._insert(
           ~~~~~~~~~~~~~~~^
        [self],
        ^^^^^^^
    ...<3 lines>...
        raw=raw,
        ^^^^^^^^
    )
    ^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/query.py", line 1873, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/models/sql/compiler.py", line 1882, in execute_sql
    cursor.execute(sql, params)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        sql, params, many=False, executor=self._execute
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/psycopg/cursor.py", line 117, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "storageadmin_snapshot_share_id_name_10142bd3_uniq"
DETAIL:  Key (share_id, name)=(4, abs-config_1_replication_73) already exists.
[11/May/2026 18:51:06] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'Failed to create snapshot: abs-config_1_replication_73. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/4/snapshots/abs-config_1_replication_73
[11/May/2026 18:52:04] ERROR [storageadmin.util:45] Exception: Snapshot (abs-config_1_replication_73) already exists for the share (abs-config).
NoneType: None

After 10 attempts, the replication stopped

[11/May/2026 19:00:04] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'Failed to create snapshot: abs-config_1_replication_73. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/4/snapshots/abs-config_1_replication_73
[11/May/2026 19:01:02] ERROR [smart_manager.replication.listener_broker:154] Maximum attempts(10) reached for Sender(6f32cb58-f849-4c93-bc65-6ebda422c66d_1). A new one will not be started and the Replica task will be disabled.
[11/May/2026 19:01:03] ERROR [smart_manager.replication.listener_broker:336] Failed to start a new Sender for Replication Task(1). Exception: Maximum attempts(10) reached for Sender(6f32cb58-f849-4c93-bc65-6ebda422c66d_1). A new one will not be started and the Replica task will be disabled.

When checking on the receiving system, I see only:

# ls -la
drwxr-xr-x 1 root root 108 May 11 18:51 .
drwxr-xr-x 1 root root 108 May 11 17:39 ..
drwx------ 1 2000 2000 428 May 11 18:12 abs-config_1_replication_71
drwx------ 1 2000 2000 428 May 11 18:12 abs-config_1_replication_72

When checking the sending system:

# ls -la
total 0
drwxr-xr-x 1 root     root     108 May 11 18:52 .
drwxr-xr-x 1 root     root     232 May 11 17:39 ..
drwx------ 1 stalwart stalwart 428 May 11 18:12 abs-config_1_replication_72
drwx------ 1 stalwart stalwart 428 May 11 18:12 abs-config_1_replication_73

So, removing snapshot 73 from the sending system using btrfs:

 btrfs subvolume delete /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_73

confirmed on sending system

# ls -la
drwxr-xr-x 1 root     root      54 May 11 20:32 .
drwxr-xr-x 1 root     root     232 May 11 17:39 ..
drwx------ 1 stalwart stalwart 428 May 11 18:12 abs-config_1_replication_72

Restarting the replication process via the WebUI. New snapshots are sent:

[11/May/2026 20:34:05] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_72 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_73
[11/May/2026 20:35:06] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_73 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_84
[11/May/2026 20:36:08] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_84 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_85
[11/May/2026 20:37:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_85 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_86
[11/May/2026 20:38:06] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_86 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_87
[11/May/2026 20:39:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_87 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_88
[11/May/2026 20:40:08] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_88 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_89
[11/May/2026 20:41:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_89 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_90
[11/May/2026 20:42:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_90 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_91
[11/May/2026 20:43:08] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_91 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_92
...

I assume, the jump in snapshot numbers is related to the counter continuing for every failure (i.e. 10), hence picking back up at 84.

I will continue to observe, but so far no failures like observed by @riceru

Hooverdan · May 12, 2026, 3:47pm

After 113 replications, it fails again with this error:

[11/May/2026 21:05:11] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'b\'receiver-init-error\' received for 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. extended reply: b"Receiver(b\'6f32cb58-f849-4c93-bc65-6ebda422c66d-1\') already exists. Will not start a new one.". Aborting.'. Exception: b'b\'receiver-init-error\' received for 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. extended reply: b"Receiver(b\'6f32cb58-f849-4c93-bc65-6ebda422c66d-1\') already exists. Will not start a new one.". Aborting.'

and the next replication then shows the same error message (but not preceeded by a detailed call stack:

[11/May/2026 21:06:04] ERROR [storageadmin.util:45] Exception: Snapshot (abs-config_1_replication_114) already exists for the share (abs-config).
NoneType: None
[11/May/2026 21:06:04] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'Failed to create snapshot: abs-config_1_replication_114. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/4/snapshots/abs-config_1_replication_114
[11/May/2026 21:07:04] ERROR [storageadmin.util:45] Exception: Snapshot (abs-config_1_replication_114) already exists for the share (abs-config).
NoneType: None
[11/May/2026 21:07:04] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'Failed to create snapshot: abs-config_1_replication_114. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/4/snapshots/abs-config_1_replication_114
[11/May/2026 21:08:04] ERROR [storageadmin.util:45] Exception: Snapshot (abs-config_1_replication_114) already exists for the share (abs-config).
NoneType: None
[11/May/2026 21:08:04] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. b'Failed to create snapshot: abs-config_1_replication_114. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/4/snapshots/abs-config_1_replication_114
[11/May/2026 21:09:04] ERROR [storageadmin.util:45] Exception: Snapshot (abs-config_1_replication_114) already exists for the share (abs-config).
NoneType: None

and then stops replication after 10 attempts again. So, the error has changed a little bit, but results in the same as earlier.

So once again, remove “latest” snapshot:

btrfs subvolume delete /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_114

and turn the replication back on in the WebUI.

This time the sending system is running into the gpg lock issue. It seems that this process is holding an extended lock for the gpg, hence the replication cannot proceed. This is likely due to the fact, that I turned off the VMs over night and started them again today.

 # ps -aux | grep 1480
root       332  0.0  0.0   5812  2048 pts/0    S+   08:23   0:00 grep 1480
root      1480  0.0  0.1 256208  7280 ?        Ss   07:25   0:00 keyboxd --homedir /root/.gnupg --daemon

in the mail that’s sent by the task we can see as much:

gpg: Note: database_open 134217901 waiting for lock (held by 1480) ...
gpg: Note: database_open 134217901 waiting for lock (held by 1480) ...
gpg: Note: database_open 134217901 waiting for lock (held by 1480) ...
gpg: Note: database_open 134217901 waiting for lock (held by 1480) ...
gpg: Note: database_open 134217901 waiting for lock (held by 1480) ...
gpg: keydb_search failed: Connection timed out
gpg: public key decryption failed: No secret key
gpg: decryption failed: No secret key
Traceback (most recent call last):
  File "/opt/rockstor/.venv/bin/send-replica", line 3, in <module>
    from scripts.scheduled_tasks.send_replica import main
  File "/opt/rockstor/src/rockstor/scripts/__init__.py", line 23, in <module>
    django.setup()
    ~~~~~~~~~~~~^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/__init__.py", line 19, in setup
    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/conf/__init__.py", line 81, in __getattr__
    self._setup(name)
    ~~~~~~~~~~~^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/conf/__init__.py", line 68, in _setup
    self._wrapped = Settings(settings_module)
                    ~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/django/conf/__init__.py", line 166, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/root/.local/share/pypoetry/python/cpython@3.13.11/lib/python3.13/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/settings.py", line 121, in <module>
    SECRET_KEY = keyring.get_password("rockstor", "SECRET_KEY")
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/keyring/core.py", line 65, in get_password
    return get_keyring().get_password(service_name, username)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/keyring_pass/__init__.py", line 183, in get_password
    ret = command(
        [self.pass_binary, "show", self.get_key(servicename, username)]
    )
  File "/opt/rockstor/.venv/lib/python3.13/site-packages/keyring_pass/__init__.py", line 26, in command
    return subprocess.check_output(cmd, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/root/.local/share/pypoetry/python/cpython@3.13.11/lib/python3.13/subprocess.py", line 472, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               **kwargs).stdout
               ^^^^^^^^^
  File "/root/.local/share/pypoetry/python/cpython@3.13.11/lib/python3.13/subprocess.py", line 577, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['pass', 'show', 'python-keyring/rockstor/SECRET_KEY']' returned non-zero exit status 2.

This particular issue is not related to the replication functionality per se, there is an open issue on github for it already:

github.com/rockstor/rockstor-core

gpg: Note: database_open ... waiting for lock

opened 08:51AM - 26 Apr 26 UTC

phillxnet

In development only scenarios (to date) have lead to a locked gpg database durin…g build.sh's execution: ``` gpg: Note: database_open 134217901 waiting for lock (held by 18139) ... gpg: Note: database_open 134217901 waiting for lock (held by 18139) ... gpg: Note: database_open 134217901 waiting for lock (held by 18139) ... ``` with consequent timeouts on password-store management. The indicated PID in the above instance was: >keyboxd --homedir /root/.gnupg --daemon ## Possible work-around: ``` gpgconf --kill gpg-agent gpgconf --kill keyboxd rm -f ~/.gnupg/public-keys.d/pubring.db.lock gpgconf --reload gpg-agent ``` Issue opened to help collaboratively explore what is cause this GPG db lock to be held detrimentally. It may be that we need a preventative measure within build.sh to avoid the possibility that in-production updates are affected similarly.

I did find the corresponding messages where zypper times out fetching a list of updates, so similar.

So, to continue for now, killing off the process following the “possible workaround” mentioned:

gpgconf --kill gpg-agent
gpgconf --kill keyboxd
rm -f ~/.gnupg/public-keys.d/pubring.db.lock
gpgconf --reload gpg-agent

Restarted:

[12/May/2026 08:34:04] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_113 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_114
[12/May/2026 08:35:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_114 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_125
[12/May/2026 08:36:07] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_125 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_126
...

For completeness, here’s a log excerpt on the receiving system after the most recent restart:

[12/May/2026 08:35:04] INFO [storageadmin.views.snapshot:61] Supplanting share (6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config) with snapshot (abs-config_1_replication_112).
[12/May/2026 08:35:04] INFO [storageadmin.views.snapshot:103] Moving snapshot (/mnt2/rocksalami/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config/abs-config_1_replication_112) to prior share's pool location (/mnt2/rocksalami/6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config)
[12/May/2026 08:35:04] INFO [fs.btrfs:1636] Pool: rocksalami ignoring update_quota on -1/-1
[12/May/2026 08:36:04] INFO [storageadmin.views.snapshot:61] Supplanting share (6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config) with snapshot (abs-config_1_replication_113).
[12/May/2026 08:36:04] INFO [storageadmin.views.snapshot:103] Moving snapshot (/mnt2/rocksalami/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config/abs-config_1_replication_113) to prior share's pool location (/mnt2/rocksalami/6f32cb58-f849-4c93-bc65-6ebda422c66d_abs-config)
[12/May/2026 08:36:04] INFO [fs.btrfs:1636] Pool: rocksalami ignoring update_quota on -1/-1
...

Hooverdan · May 12, 2026, 4:18pm

This time the replication seemed to have failed with the same error a bit faster. After replication 145 it fails with the same duplicate error message. Logs/task mails look the same, so not posting them. I’ll do one more “fix” and restart. This time I didn’t wait for the 10 retries, but disabled the replication after a few fails, removed the offending subvolume and then restarted the replication service.

This time it immediately went back into the same error, again stuck at snapshot 146 (duplicate message). Not sure why that is. Waiting for the 10 retries and its timeout and performing the same procedure again, removing the 146 subvolume and restart the service.

Doing that, it seemed to have worked, and the replication is starting again, replicating 146 and then jumping to 161 (this time due to the fact that there were the 4 failed replications before i manually turned off the service and then the immediately following 10 failed retries after turning it on).

[12/May/2026 09:15:05] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_145 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_146
[12/May/2026 09:16:08] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-1. Sending incremental replica between /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_146 -- /mnt2/rockwurst/.snapshots/abs-config/abs-config_1_replication_161
...

Hooverdan · May 12, 2026, 7:56pm

After repeating this cycle another 4 times or so, I could not replicate @riceru error on my setup. However, I am wondering why this duplicate snapshot error occurs in my case. Wondering whether the database is updated periodically with snapshot information independent of the replication happening; and when they are too close together timing wise, the duplicate error will happen, because the snapshot is already comprehended in the database before the replication coding gets to it …

riceru · May 12, 2026, 9:26pm

Hello.

When it started failing, it had already completed 11 successful replications. And I don’t have quotas enabled.

It’s a production machine, but for now I only have one replica, so I tried adding a slash to line 207 in the sender.py file, as you suggested. Then I restarted the server. I deleted the snapshot that was created with the last failed replication and rescheduled the replication.

It started up fine and is transferring data. Since it’s several GB, it will take a few hours.
I’ll update you with the result.

Thanks