Replication Failed - 2 systems on same LAN

I have just installed 4.5.8 on 2 machines and both are running fine.

I am trying to get Replication working and have done the following.

1 Created Appliances on both machines both with the same 5 Samba shares
2 Added other systems ip addresses in Appliances
3 Turned on Replication Services on both systems
4 Defined a Send Task in one system and turned on replication
5 Defined a Receive Task on the other system and turned on Replication
6 Set the task to run every 5 minutes.
7 On the sending system, I see the following in the "Replication history for SyncDocs-> ( Documents on 007f0100-039d4722-2xxxxxxxx… : fcwdata )
11 Documents_2_replication_1 May 7th 2023, 12:35:03 pm May 7th 2023, 12:35:03 pm .
failed in a few seconds 0 or < 1KB at N/A/sec

8 when I hover over the “Status!” field I see “Failed to create snapshot:Document_2_replication_1. Aborting … Exception . 500 Server Error. for url: http127.0.0.1:8000/api/shares/2/snapshots/Documents_2_replication_1”

What have I done wrong?

@fwinograd Hello again.
Re:

Possibly nothing, but did you make sure to select the network interface used by the replication service?

set-network-interface

I’ve seen that missed before I think. We have weak docs on replication unfortunately. And anyone wishing to update/flesh-them-out is welcome if familiar with the procedure.

Hope that helps.

2 Likes

Yes, I do have the interface selected on both machines.
I could help update the documentation if I can get it to work.

Are there error logs I should look at to try to narrow down the reason for the error?

1 Like

One thought I had. Is it a problem that both servers have the same name? although their IP addresses are different.

@fwinograd Hello again.
Re:

They must have different names yes, and likely even more important, they must have different Appliance ID’s. Let us know if this is an issue, changing the Appliance ID can be tricky - and shouldn’t actually be necessary as each install should define it’s own during the install.

The Appliance ID is used as a unique identifier and is involved in the Replication. We also hope to use the Appliance ID in our future (mid-long term) GlusterFS plans (https://www.gluster.org/).

The ones you have already seen extracts for from within the Web-UI plus the generic one of:

/opt/rockstor/var/log/rockstor.log

The general system log (journalctl at the command line) will also be of use as we basically just wrap btrfs send/receive so that it is wise to our shares (btrfs sub-volumes). And remember that it takes 5 successful replication events to settle down into it’s stable state.

Visible from our Logs Manager: System → Logs Manager
Which also suffers from an low/undocumented state: thought it has much affordance (easy of use).
https://rockstor.com/docs/interface/system/logs_manager.html

We have the following documentation on how to contribute to our documentation:
https://rockstor.com/docs/contribute/contribute_documentation.html

So if you do fancy having a dabble there that would be dandy. The code is often ahead of the documentation but we do work at keeping the main elements relevant: however we are a small team that is mostly fully occupied give all the large changes of late.

There is no issue as-of-yet for the Logs Manager, but we do have a long standing issue for the Duplicaion improvements:

and what we have is way old now: Replication — Rockstor documentation

Hope that helps.

2 Likes