Replication of all snapshots

Marenz · February 2, 2020, 12:40pm

I’ve read the documentation on share replication, but I still have a few questions.

Would a replication send all created snapshots, too? Can I configure it to do that?

E.g. my plan is to create periodic snapshots and I want all of them replicated to my second system.
I am trying to work out as many details as possible before doing the actual move.

Related question: Can I backup/replicate a snapshot from my laptop /home to rockstor?

phillxnet · February 2, 2020, 6:15pm

@Marenz hello again.

No and No.

Although it does use snapshots itself. The docs need to be improved on this feature and I have some almost finished technical wiki docs to publish on the forum when I get time from when I last worked in that area of the code.

Rockstor’s replication system aims to replicate a share, btrfs subvol, and does it by snapshotting that share and then using btrfs send receive to send that snapshot to the target Rockstor system. On running a second time it takes another snapshot and sends the difference from the first snapshot (from my now fuzzy memory of this) and keeps doing this but will ‘tidy’ up old snapshots so that once it reaches a stable state (after 5 replication events) you will see the final state that it then maintains.

It’s actually a fair complex process and there is some non trivial bits of code to accomplish this.

So I would advise that you setup a small share and let it go through 5 to 6 replication events (if the share is small you could schedule them to be quite often) and take a look at the replication specific snapshot names that are created at each stage. And note that after the very first replication event you will have a strangely named share (actually a snapshot), ignore it, it will be auto removed after the second or 3 subsequent replication event. And don’t delete any of these obviously named shares as that will break the replication and you will have to start it over.

Replication worked when I last check in our latest Stable release and in our last CentOS based ISO release, but is broken in our now deprecated, but otherwise working CentOS testing channel release. It was one of the reasons we never released a stable at the end of the CentOS testing channel. 3.9.1-16 would have been released as 3.9.2 stable and put out to ISO but replication and a few other things were just broken so we continued on with the relevant fixes in the Stable Channel.

I don’t see why not, as long as your /home is btrfs; assuming you are suggesting btrfs send recieve here. You would have to set things up so that the target was understood by Rockstor, ie an existing share and you put snapshots in a location understood by Rockstor. It’s actually going to be quite tricky come to think of it and I’m afraid I won’t have time with all that is required for the move to help much at all with this. But you can always look at the mount points used by Rockstor and setup your send receive accordingly. This will all be command line stuff however, you are not likely going to be able to tie this into the Rockstor to Rockstor replication as it uses it’s messaging system etc. But if you are doing plain btrfs stuff, and your resulting subvols (which includes snapshots) are understood by Rockstor then your should be good.

Would be a fantastic feature to have a Web-UI initiated ‘receive my btrfs /home’ type setup that ended up prompting your to enter stuff on the client and you would then be done. Oh how we can dream, but the priority of that feature is likely to put it into the distant future with our current backlog of issues.

Hope the helps.

Marenz · February 2, 2020, 7:14pm

I am thinking how exactly the feature of transferring snapshots could look.
One easy way would be that each time a scheduled snapshot is done, it is also replicated. Logic that cleans up old snapshots would need to be extended to do that on the remote machine, too.

Would there be interest in such a feature? I could give this a shot. Or maybe you have a different idea of how this could work…

phillxnet · February 2, 2020, 8:07pm

@Marenz, Re:

If you are at all interested I would encourage you to take a look at the existing code for replication. Personally I think the focus should be on improving that code as it has some outstanding fragility; there are references to this within the code itself from when I was last there. But the mechanism your roughly describe is pretty much that which is used in the replication, sort of.

Try our what we have and take it through it’s full cycle, noting the state of snapshots for the given share at both ends. And make sure to go at least 6 consecutive replications along to reach the stead state (should be on the 5 replication run from memory).

But be warned that this code ares is non trivial and resources many of the other share / snapshot / clone technologies that we employ individually across the rest of the subvol management areas of the Web-UI and associated backend. It’s not for the feint hearted but any help in that ares would be great. And there may be a clever trick we can just ‘bolt on’ to what there is that will significantly aid that area of the code and it’s user utility. But from memory it’s not going to be a quick fix / addition. But could also be pretty rewarding to improve / extend. Also note that testing changes in that code is also very time consuming. I ran sever multi hundred repetitions of the code prior to releasing it last time and this ended up being, quite time consuming. Plus you have to account for overlapping requests and the like, ie a user configured request to duplicate TB’s of data every few minutes is not going to fly. But it does back off from what I remember and looks for a prior still running tasks and the like. Also logging is far from trivial in the case of the replication as there is often zero feedback from the btrfs tools for quite some time. I think, again from memory, either the receiver or the sender are far more chatty that way. You will see this during a large transfere one one side is only able to say a replication task is ongoing. While the other end will give you an near live update of it’s speed (which is often pretty slow by the way).

Would be great to get some more eyes on that code so do dive in if you fancy.

My last pull request on that code, which “mostly” fixed it from it’s prior broken state in our last released CentOS testing channel variant (because of necessary API changes in that testing run) was:

github.com/rockstor/rockstor-core

Fix replication regression re share api change. Fixes #1853

rockstor:master ← phillxnet:1853_suspected_replication_regression_re_share_api_change

opened 07:24PM - 24 Jan 18 UTC

phillxnet

+270 -70

Prior function was restore by updating the relevant api urls in the replication …system. But in one, non critical check this was not managed. As a result the non critical but desirable internal sanity check was, for the time being, remarked out and TODOs added to signify required future attention. Also includes a number of replication UI fix ups including table formatting, sort capability, appropriate ordering, and consistency with the rest of the UI. A core mechanic of the replication system was also abstracted such that it could benefit from the existing share / snapshot management system. This alteration made more explicit, in code, a quirk involving the initial subvol transferred and how it requires a special treatment given it's unique nature in the context of the existing share / snapshot / clone / import structures. Summary: - Update internal replication api urls to fit new share by id scheme. - Remove problematic non critical internal check (TODO added for later). - Create new repclone command to improve replication share / snap handling, ie more transactional by using existing structures via api. - Minor improvements were added in the new repclone system, ie sys refresh qgroup transfer, auto mounting etc. - Ensure we import initial btrfs receive quirk subvol with replica flag and add logging to indicate it's quirk nature. - Improve replication tables formatting, text, sort ability / order. - Reduce and mirror (between send and receive) rep-snap-count. - Fix bugs in receive status update, includes utilising the pending state. - Improve user communication re data transferred / rate table cell text. - Remove broken pool link in replication overview table. - Improve debug logging re share / snap import. - Minor steps / TODOs towards future share name/subvol_name seperation. - TODOs towards centralising mount path creation. - Various TODO added for future consideration. N.B. this pr assumes the prior application of the following (currently pending) pull request: "remove immutable flag prior to share delete. Fixes #1882" pr #1883 As that fix also addresses an observed breakage during replication runs and was used (pre-applied) during all testing of this pr's code. Fixes #1853 (if #1883 in pre-applied) @schakrava Ready for review. Testing included several hundred individual replication events (btrfs send receive hosts) with runs up to 150 ish. A real hardware arrangement was also tested where the source machine's (sender) share had more data than could be replicated in the chosen scheduled replication interval (10 mins). Additional send receive pairs were not activated and all data was successfully transferred. Note that with the in pr rep-snap-count change to 2, 3 snapshots are stored both on the sender and against the receivers share counterpart. And that for a replication cycle to reach it's final state, which will in turn there after be maintained by rotation, 5 replication events must be completed: the initial full-send followed by 4 incremental sends. There are still known limitations and bugs in the replication code post pr but these can be addressed more specifically in pending issues to be opened in due course: and depend on review outcome of this pr.

But not that there have been many improvement and some fairly deep changes to disk / pool / and subvol management since then. But it can server at least to given an indicating of the files concerned.

So do look at what we have already as you can then see more of the complexity involved. Plus it would be far more valuable to have this system improved / made more robust than to have additional functionality that may also come with more fragility. A key point here is that btrfs is currently unable to resume a replication that has been interrupted. A non trivial problem to ‘work around’. There are upstream plans however to ‘up’ the btrfs send / receive protocol but they are waiting until other stuff settles before making what would be breaking changes. Probably best in the long run.

Good luck and take you time looking through those code areas as they are non trivial, but quite interesting actually as there is a message passing system in place between the sender and the receiver at the Rockstor level from what I can remember. And if we could get this to be more robust (see my comments in that pr) then we would have quite a powerful system with with to ‘orchestrate’ this otherwise delicate process. And maybe even extend it to manage your proposed functional expansion.

Good luck.

Tex1954 · February 27, 2022, 7:05pm

I know this is old, but I just ran into a problem when I leaned over to check a monitor plug when some part of me hit the power button on my backup NAS.

Well, it happened to be in the process of replication update (a 5hr task in this case) and bombed the process. I would also point out that this was a NEW 1st time replication where the main NAS changed a bit requiring a reset of one pool. (went from 2.7 to 3.0 TBs)

I could not get it running again until I deleted all the snapshots, shares, and pool and started from scratch.

My question is, have we progressed enough to improve recovery process yet?

Also, can we speed up the data transfer? Right now I only get between 145MBs to 190MBs and my network shares can do 350-390MBs.

phillxnet · February 27, 2022, 7:31pm

@Tex1954 Re:

Only the last snapshot created on the target was likely required here I suspect. Unless of course there was more damage that just an interrupted replication task.

The btrfs send/receive upon which our replication is built, does not yet have a resume capability. But we definitely can improve our end on the wrapper side. But this is a tricky code area and takes a long time to prove any fixes improvements as even with a small data set it still takes 5 cycles to ‘settle’.

Not sure on this front actually. It is doing a generic btrfs send/receive under the hood, not sure if that has transmission speed issues currently. It does have to do a bunch of stuff to ‘know’ what to send and it may be that this is taking some time, hence the non line speed. Are those speed measured via network or via the replication task reports themselves? The latter are likely spot readings that may not be that reliable. We simply grab from the command line what the send/receive command is outputting. Might be worth checking what the bandwidth on the network actually is.

But yes that area of our code is in need of some more attention, but it’s a non trivial bit of code and so must be approached with care, and likely more time than one originally assumes . It would be good to have more eyes on that code if any folks familiar with our code base and the btrfs send/receive care to take a look. When last I was there, fixing up from a prior break via a system wide pool/share api change some years ago, i left a number of notes on what we need to do to improve it’s robustness.

Hope that helps, at least for some context.

Tex1954 · February 27, 2022, 7:45pm

Done extensive tests on 10Gbps LAN. The Rockstor Dashboard Network display speeds (when the correct eth port is selected) seem to be accurate compared to other setups; winderz, linux, Rockstor or whatever.

I know it must be a terribly intricate procedure to initiate and support a replication in the manner Rockstor uses, so no sweat if it isn’t the do-all be-all at the moment.

A UPS and more careful positioning of body parts seem to be my best answer! LOL!

In fact, one of the last things for me to do is place a cover over the “ON” switches to avoid such catastrophes. ( done this in the past)