[SOLVED] Replication failures in v4

I believe that I’ve followed the instructions for setting up a replication but I’m still stymied. I’m trying to set up replication between my ‘prod’ machine (‘prod’ is a loose term on a home network) and a test machine. Admittedly, the test machine is not a great hardware setup: it’s an i7 laptop with 4GB memory booting from an 8GB USB memory stick and running a second USB 149GB drive (mechanical drive).

Each appliance has the other one registered successfully. ‘rockstor.local’ is my prod machine and ‘rock2.local’ will host the replica. Based on other posts here, I have verified that the appliance IDs are unique and I also tried deleting the snapshot in between failures. I even went so far as reinstalling rock2 from scratch, as well as wiping the 149GB drive in the process.

rockstor.local is running v4.0.4-0 and rock2.local is running 4.0.6-0. Both OSs are current as of today.

On my initial replication, the sender (rockstor) logged an init error from the receiver, and the receiver (rock2) logged ‘No response received from broker.’ This sounded like it could have been a network timeout even though both are on the same wired 1Gb/s networks.

I deleted the snapshot from the sender and on the next run, it logged that it was unable to create the share (see screenshot). It did create a new snapshot. The receiver logged nothing. This latter scenario is what I was running into prior to wiping and reinstalling rock2.

So I’m a bit lost and it may be that I’m not seeing the obvious. Suggestions are welcome and I’m happy to retrieve logs if needed.

Thank you,

2 Likes

@wdc Hello again, and thanks for the detailed report.

We do have some fragility to the setup of replication still. However this all looks to be OK.

But have your checked the setup of the replication service ‘spanner’ icon? I.e. visit the services page and click on the spanner next to the replication service and select the interface this service is to use. Just a thought as I think I remember some fragility there.

Replication was one of the early things we proved working in 4 and I don’t think we have changed anything that would have obviously broken it. We also need our doc re-done on replication.

Let us know if this helps.

1 Like

Excellent thought but alas, I had done that. I wasn’t certain that I had so I verified it and restarted the service on both machines.

I also explicitly entered the remote machine’s IP on the replication send task; still no luck. Lastly, I considered that my messing with Share security could have been the cause so I created a new testing share, left security at Rockstor defaults, dropped a file in the share, and created a new replication task. Same failure.

On the left is the share security I have in place and the share security (unmodified) for my test share.

So I’m still puzzled and I’ll keep looking.

1 Like

SOLVED (probably, time will tell)
I discovered an oddity this morning. A couple of replications worked and then failed. The IP of the receiver had changed. “Gosh, that is strange”, I said. (No I didn’t say that - what I really said should not be repeated here.)
For some reason, my router decided that the IP address reservation for the receiving machine’s MAC should be ignored. I’ve had this router for 5 or 6 years and this is a first. So I changed the receiving Rockstor machine to a static address and so far, so good. If you’re curious, I still have a new address reservation in the router that is outside the DHCP scope.

Thank you again, @phillxnet. I appreciate the ‘consult’ you provided.

3 Likes

Weird… still fails about 50% but I’ll blame something in networking and/or the slowness of a cobbled-together receiving machine. At some point I’ll dig further in the system logs for oddities related to disk or network. For now, my gut tells me slow hardware leading to timeouts or something still flaky in the networking.

1 Like