Replication - Failure

Fellow #ROCKERS, your thoughts please.
Two appliances, both registered with each other, both with replication service on and a scheduled event configured. It barely lasts a second then this error.

No IP’s have changed

Any ideas? Thanks in advanced. Dave

[16/Feb/2018 19:25:04] ERROR [smart_manager.replication.sender:74] Id: 06621353-4BB0-41E9-8615-017696E81528-3. unexpected reply(receiver-error) for 06621353-4BB0-41E9-8615-017696E81528-3. extended reply: Failed to validate the source share(FileStorage) on sender(uuid: 06621353-4BB0-41E9-8615-017696E81528 ) Did the ip of the sender change?. Exception: [u’Invalid api end point: https://fs.redbeards.home:443/api/shares/FileStorage’]. Aborting. Exception: unexpected reply(receiver-error) for 06621353-4BB0-41E9-8615-017696E81528-3. extended reply: Failed to validate the source share(FileStorage) on sender(uuid: 06621353-4BB0-41E9-8615-017696E81528 ) Did the ip of the sender change?. Exception: [u’Invalid api end point: https://fs.redbeards.home:443/api/shares/FileStorage’]. Aborting

@dvanremortel Thanks for the report but this has already been fixed as of stable channel release 3.9.2-13 (current stable channel release version is 3.9.2-15):

See GitHub tags:

via the linked issue:

https://github.com/rockstor/rockstor-core/issues/1853

and it’s associated pull request:

The indicated api URL has the share name in where as we now use the share ids. It was quite a bumpy path to changing the whole project over (pools were also changed similarly) but the replication was, as far as I’m aware, the last regression candidate. Although we still have some code tidy and ‘nice to have’ stuff that needs to be re-asserted.

See also in-progress changes re testing channel releases discussed towards the end of the following thread:

Hope that helps.

Thank you for the replies. I have check both servers version and it states is it higher then the number indicated. Also I have TWO servers one is registered with production updates the other is a standard non-registered server with testing updates.
the replication job shows as OFF but it was on previously. I have included the versions of both servers and the replication configuration windows for both as well.

Any ideas?

@dvanremortel As per:

Unfortunately the highest testing updates version is currently 3.9.1-16 (which one of your servers indicates) and does not yet have the fix. Currently only the stable channel has this fix and both the sender and the receiver require the fix as they ‘chat’ back and forth and there were fixes on both sides to move to the new share and pool api.

Yes, after a number of failures it will auto disable to avoid spamming logs. If you enable email notifications it will email you each event, including the auto disable.

Also note that to resume a failed replication task it may be necessary to delete the last created snapshot on the sender side created for this purpose (via UI is find), should be named obviously enough. Improvements to be made there robustness wise that I hope to get around to soon, if another dev doesn’t beat me to it of course. Also note that I’m unsure of the state your replication tasks are in as given the required changes in the fix there was no point in testing between current testing and stable variants.

If in doubt delete the current tasks and once you have both sender and receiver ‘up to spec’ re-create and start fresh.

Hope that helps and thanks for helping to support Rockstor development.

Thank you. When will this capability be in the testing release ? I’m doing a PoC and this was a key feature.

thanks, Dave

@dvanremortel Oops, got my versions number wrong here:

Meant to be 3.9.2-13, sorry. I’ve corrected in my post above.

Probably on the next release I expect, however this is down to project lead @suman and is waiting on the update mechanism / docs that need to be in place first. As referenced earlier.

Hope that helps.