Once I set this up at around 19:45, so 5 mins before the first replication would start, I didn’t touch the system until an hour later or so, to ensure that I would not interfere with anything during the replication process. As one can see during the second and third replication, really no data is sent, since I did not do anything with the data set I had generated prior to the replication configuration.
Sender:
Receiver:
Sending machine: quotas enabled, shares set up below with some data (the warning is a quota warning, the disk itself has 6GB of which 5.74GB were consumed, but the quota is set to 1GB:
Receiving machine only had these shares:
rockwurst:/mnt2/Replication # tree
.
├── dir_1
│ ├── dir_1
│ │ ├── file_1
│ │ ├── file_2
│ │ └── file_3
│ ├── dir_2
│ │ ├── file_1
│ │ ├── file_2
│ │ └── file_3
│ └── dir_3
│ ├── file_1
│ ├── file_2
│ └── file_3
└── index.sh
Replication Send Process setup:
Send Process:
Receive process:
After the first replication there were two additional shares on the receiving machine, however I forgot to capture it. After disabling the Replication, this is how the share picture looks like:
Replication Receiver History:
(After first three replications, it showed still as successful, the screenshot is after the last attempt):
Snapshots on sending system after last failure (I don’t remember whether there were more initially, or not):
Snapshots list on receiving system (same here, not sure whether/how those listings changed over time):
Log on sending machine between 7:50 and 8:50pm (/opt/rockstor/var/log/rockstor.log
):
...
[16/Sep/2024 19:50:04] INFO [smart_manager.replication.sender:341] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending full replica: /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_1
[16/Sep/2024 20:00:03] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending incremental replica between /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_1 -- /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_24
[16/Sep/2024 20:10:02] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending incremental replica between /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_24 -- /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_25
[16/Sep/2024 20:20:03] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'unexpected reply(b\'receiver-error\') for 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. extended reply: b"b\'Failed to promote the oldest Snapshot to Share.\'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone". Aborting'. Exception: b'unexpected reply(b\'receiver-error\') for 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. extended reply: b"b\'Failed to promote the oldest Snapshot to Share.\'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone". Aborting'
[16/Sep/2024 20:30:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None
[16/Sep/2024 20:30:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26
[16/Sep/2024 20:40:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None
[16/Sep/2024 20:40:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26
[16/Sep/2024 20:50:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None
[16/Sep/2024 20:50:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26
[16/Sep/2024 21:00:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None
[16/Sep/2024 21:00:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26
…
Log for receiving machine (/opt/rockstor/var/log/rockstor.log
):
...
[16/Sep/2024 20:20:02] INFO [storageadmin.views.snapshot:61] Supplanting share (6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication) with snapshot (.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_5_replication_1).
[16/Sep/2024 20:20:03] ERROR [system.osi:287] non-zero code(1) returned by command: ['/usr/sbin/btrfs', 'qgroup', 'destroy', '0/257', '/mnt2/fresse_storage']. output: [''] error: ['ERROR: unable to destroy quota group: Device or resource busy', '']
[16/Sep/2024 20:20:03] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs qgroup destroy 0/257 /mnt2/fresse_storage. rc = 1. stdout = ['']. stderr = ['ERROR: unable to destroy quota group: Device or resource busy', '']
Traceback (most recent call last):
File "/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py", line 92, in create_repclone
remove_share(share.pool, share.name, PQGROUP_DEFAULT)
File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1196, in remove_share
qgroup_destroy(qgroup, root_pool_mnt)
File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1501, in qgroup_destroy
return run_command([BTRFS, "qgroup", "destroy", qid, mnt_pt], log=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rockstor/src/rockstor/system/osi.py", line 289, in run_command
raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs qgroup destroy 0/257 /mnt2/fresse_storage. rc = 1. stdout = ['']. stderr = ['ERROR: unable to destroy quota group: Device or resource busy', '']
[16/Sep/2024 20:20:03] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone
[16/Sep/2024 20:30:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.
[16/Sep/2024 20:40:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.
[16/Sep/2024 20:50:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.
[16/Sep/2024 21:00:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.
...
Journalctl sending system for first replication:
...
Sep 16 19:50:00 fresse CRON[13351]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:50:01 fresse CRON[13350]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:50:01 fresse postfix/pickup[9736]: 3855A50354: uid=0 from=<root>
Sep 16 19:50:01 fresse postfix/cleanup[13441]: 3855A50354: message-id=<20240917025001.3855A50354@localhost.localdomain>
Sep 16 19:50:01 fresse postfix/qmgr[1306]: 3855A50354: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 19:50:01 fresse postfix/local[13443]: 3855A50354: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.18, delays=0.16/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 19:50:01 fresse postfix/qmgr[1306]: 3855A50354: removed
Sep 16 19:51:56 fresse systemd[1]: Starting Hostname Service...
Sep 16 19:51:56 fresse systemd[1]: Started Hostname Service.
Sep 16 19:52:26 fresse systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Sep 16 19:53:00 fresse NetworkManager[866]: <info> [1726541580.7696] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 19:53:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 19:53:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 19:53:01 fresse dns-dnsmasq.sh[14517]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 19:53:01 fresse nm-dispatcher[14528]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 19:53:01 fresse nm-dispatcher[14498]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:53:01 fresse NetworkManager[866]: <warn> [1726541581.2191] dispatcher: (20) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:53:11 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Sep 16 19:55:00 fresse CRON[15233]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:55:02 fresse CRON[15232]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:55:02 fresse postfix/pickup[9736]: 5654E503BF: uid=0 from=<root>
Sep 16 19:55:02 fresse postfix/cleanup[15367]: 5654E503BF: message-id=<20240917025502.5654E503BF@localhost.localdomain>
Sep 16 19:55:02 fresse postfix/qmgr[1306]: 5654E503BF: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 19:55:02 fresse postfix/local[15369]: 5654E503BF: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.16, delays=0.13/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 19:55:02 fresse postfix/qmgr[1306]: 5654E503BF: removed
Sep 16 19:58:00 fresse NetworkManager[866]: <info> [1726541880.7556] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 19:58:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 19:58:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 19:58:00 fresse dns-dnsmasq.sh[16366]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 19:58:00 fresse nm-dispatcher[16375]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 19:58:00 fresse nm-dispatcher[16353]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:58:00 fresse NetworkManager[866]: <warn> [1726541880.8460] dispatcher: (21) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:58:10 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
...
This is essentially repeated every 10 minutes for the first three replications. For the fourth replication at 8:20pm (which is the first one failing), some additional message are logged during that time (not sure they’re related or not):
...
Sep 16 20:20:00 fresse CRON[23607]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:20:00 fresse CRON[23606]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:20:00 fresse postfix/pickup[9736]: DD97E505ED: uid=0 from=<root>
Sep 16 20:20:00 fresse postfix/cleanup[23696]: DD97E505ED: message-id=<20240917032000.DD97E505ED@localhost.localdomain>
Sep 16 20:20:00 fresse postfix/qmgr[1306]: DD97E505ED: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 20:20:00 fresse postfix/local[23698]: DD97E505ED: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.18, delays=0.15/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 20:20:00 fresse postfix/qmgr[1306]: DD97E505ED: removed
Sep 16 20:20:03 fresse systemd[1]: mnt2-6f32cb58\x2df849\x2d4c93\x2dbc65\x2d6ebda422c66d_Replication.mount: Deactivated successfully.
Sep 16 20:23:00 fresse NetworkManager[866]: <info> [1726543380.7602] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 20:23:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 20:23:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 20:23:00 fresse dns-dnsmasq.sh[24626]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 20:23:00 fresse nm-dispatcher[24635]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 20:23:00 fresse nm-dispatcher[24612]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:23:00 fresse NetworkManager[866]: <warn> [1726543380.8472] dispatcher: (26) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:23:10 fresse systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Sep 16 20:23:10 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Sep 16 20:23:10 fresse systemd[1]: Starting DBus interface for snapper...
Sep 16 20:23:10 fresse systemd[1]: Started DBus interface for snapper.
Sep 16 20:23:10 fresse systemd[1]: snapper-cleanup.service: Deactivated successfully.
Sep 16 20:24:11 fresse systemd[1]: snapperd.service: Deactivated successfully.
Sep 16 20:25:00 fresse CRON[25204]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:25:01 fresse CRON[25203]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:25:01 fresse postfix/pickup[9736]: BB32B50655: uid=0 from=<root>
Sep 16 20:25:01 fresse postfix/cleanup[25294]: BB32B50655: message-id=<20240917032501.BB32B50655@localhost.localdomain>
Sep 16 20:25:01 fresse postfix/qmgr[1306]: BB32B50655: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 20:25:01 fresse postfix/local[25296]: BB32B50655: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.16, delays=0.13/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 20:25:01 fresse postfix/qmgr[1306]: BB32B50655: removed
Sep 16 20:28:00 fresse NetworkManager[866]: <info> [1726543680.7618] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 20:28:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 20:28:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 20:28:00 fresse dns-dnsmasq.sh[26163]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 20:28:00 fresse nm-dispatcher[26172]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 20:28:00 fresse nm-dispatcher[26149]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:28:00 fresse NetworkManager[866]: <warn> [1726543680.8576] dispatcher: (27) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
...
For some reason on the receiving system, no journalctl
entries before today are visible (receiving system runs on Tumbleweed with Kernel 6.10.9 and btrfs-progs 6.10.1, while sending system runs on Leap 15.6 with Kernel 6.4.0 and btrfs-progs 6.5.1, both on Rockstor 5.0.14-0).
@Stevek from your recollection, did you observe anything in addition to the above?