Question about Replication

I have recently got really invested into building a fully self hosted echo system and currently am using the Rockstor as my NAS storage. I built a somewhat of a beefy setup for this so I could also run a Plex media server with local access to the data and removing the element of Smb shares.

I am relatively new to working directly with a NAS. I would like to back up the entirety of my NAS, including data to another backup device, in the even I experience any failures that will result in the loss of data.

From my understanding after reading documentation on Rockstor replication, I believe this is what I am looking for.

I just had a few questions about this I was hoping for some assistance with.

  1. In order to backup data and shares on the second Rockstor instance with Replication, do I need to have the same hard drive sizes? My thought is that it is based on shares, so if i have an 8 TB share on the main NAS, I would need at least a HDD to support an 8 TB share on the instance being replicated.

  2. Is this the best solution for having an entire backup of my NAS with the Data? I do not want to store data on anything that is not self hosted.

I plan to support this project once I finish making sure it will work for me the way I need it to.

I appreciate any feedback.

@m1dori this will be interesting, so if/when you have set this up, I would be curious to see how that setup looks.

Yes, the Rockstor replication can be the simplest way of handling your backup. As other people on this forum have said, you should always have backups of your data :slight_smile:

As for the target, since we’re using the btrfs system and you will likely run on some type of RAID level, your target system should have at least the same amount of available space. When you’re using the Rockstor replication service then your target system will also have to be set up as a Rockstor instance.

You should probably not think about it in terms of HDD size, but more of a share or pool size. Since with btrfs (and granted, other filesystems as well), the actual HDD size is not that pertinent, but the size of the pool, which can consists of 1 8TB, or 2 4TB or some other mix, that ends up providing you with an 8TB available space. So, if your NAS currently has, 8TB available as storage (say, in a RAID10c3 using 3 8TB disks) and you’re expecting that your data on that box will be a max of 6 TB and the remaining 2TB is data that’s not relevant for backup), then your target system should be set up to have at least 6TB as well. This is of course a little oversimplified since things like Metadata, snapshots, etc. also take space on the source system (and the target system).

But it won’t matter whether you represent the 6TB on the backup system (assuming btrfs) with the same HDD setup, or whether you use, say 2 x 6TB and 2 x 5TB (again with RAID10c3)… again, the calc is probably not quite accurate, but I hope you get what I am trying to say.

I’m sure there are other opinions on what’s best out there. You could of course also use things like rsync at the command line to send this to a non-Rockstor appliance for backup or other linux based backup approaches.

1 Like

@Hooverdan I appreciate your explanation and that does make more sense. I am awaiting a few small parts to build my second Rockstor instance for replication that is less powerful. I will let you know how it turns out when it is complete.

I ended up just building them in mAtx cases due to the cost of NAS and ITX being too much. This case is white, so hopefully they look cool with the black one I already have for the first one.

Thanks again for your help!

2 Likes

I have two servers set up and have one replication task as a test. I can’t seem to get it to not fail and am unsure where to start.

This is what the issue is from the sending server.
image

Issue from receiving server
image

They are on the same sub net, so there are no firewall rules necessary for them to talk.

I am also getting this error on the replication system.

image

I’m not sure what happened, but I think it started to work.

I had a successful share of 73 GB replicate, and now I am trying a much larger 2 TB one. I will let reply with details.

1 Like

@m1dori Thanks for the running update there.

Keep in mind that replication is sensitive (read fragile) to interruptions. This is just the nature of btrfs send-receive currently. It can’t handle a partial send-receive. In time this may improve, but if you are all local lan this is far less likely. See the following issue we have that details this failure from our perspective: but is shared by the likes of other btrfs based systems (e.g. Btrbk) that also use btrfs send-receive: Resume / partial send question · Issue #94 · digint/btrbk · GitHub

This is how a failed send/receive event surfaced in Rockstor, complete with the detail of how to work-around:

Incidentally the DataTables warning is unrelated and goes away with a browser refresh. One of our table javascript libraries has a bug which surfaces from time to time. We will be updating that in time and likely being better behaved along the way as we go.

Hope that helps. And do take a look at that issue of our as it looks to be very similar to your own. And could help with getting going again if a network situation does arise.

2 Likes

@phillxnet Thank you for your feedback. I think I broke it when I was being impatient…

I have a few more questions about replication that hopefully I can find answers for.

I was able to successfully replicate both the shares below.

However, note how the second larger share still has a snapshot share, is this just because it is taking time to process the snapshot and extract it to the destination share?

I noticed that the snapshot for the m1dori share has disappeared.

The other question is, when there is a change to the file system, and a replication task is started, I assume it will only process the new items in the share for replication?

UPDATE 1: I want to add, now the .snapshot has gone away but no data is in the replication share.

UPDATE 2: Okay, so I see that the reason it states 0, is because quota size is disabled. This however, cannot be turned back on. I have tried multiple times and it keeps defaulting to disabled. The share seems to have properly replicated however. I assume this may be a bug.

Thank you for your help!

@m1dori Hello again.
Re:

Pretty much yes. But more specifically it’s down to our replication taking 5 replication events (from memory) to settle down into it’s stable state. We keep the last 3 I think it is and so once the 5th event has taken place you should no longer see these strange .snapshot shares appearing. It’s a bit messy on our part but the replication subsystem is non trivial so does take time and attention which we are often short in. But in time these rougue shares should no longer be surfaced. An attempt was made to hide them but it failure and represents an aesthetic bug of sorts.

Yes, that one likely has now settled and so now only has 3 snapshots (I think it is) in it’s origin. And no odd .snapshot ones.

Correct. Only the changes are transmitted. Hence the need to keep consistency on both sides and that it only works in one direction. Btrfs knows what block on disk have changed between the two snapshots (once just taken and the last one used in the send). And will send the change at the block level (or btrfs’s equivalent).

OK, so that last replication (btrfs send) event managed to reach the stable state of 5 repititions (replication sends). So our strange surfacing bug on these snapshots no longer shows these share/subvols-by-a-strange-name.

Yes, quotas has been super expensive in the past, getting better all the time - but only gradually. I think modern kernels are now quite workable in btrfs with quota enabled. Plus we don’t yet fully report quotas as well as we could. But all in good time.

You will likely have to enable it via the command line. We have seen this happen before - and may be related to imported pools - or a hack we had to put in way back that is still there re enabling quotas twice as once wasn’t enough at one point !! So look to enabling quotas on the pool via the command line. There after you should be able to disable/re-enable via the Web-UI.

Sorted.

Thanks for the full length test/report here. What version/channel of Rockstor (and what OS base) are you currently seeing this all on incidentally?

1 Like

@phillxnet Could you detail the commands I need to use to enable Quotas when you have time please?

@m1dori Hello again. Apologies for not posting that the first time.

In our code we do:

Which equates to you running at the command line (as the ‘root’ user) the following command:

btrfs quota enable /mnt2/pool-name-here

(substitute ‘disable’ if need be)
Disable is instant.
Enable will take time to re-calculate all the sizes involved so git it time to settle after enabling.

From your screen shots you look to have called your Pools “3TB” and “8TB” !!
Given pools can be resized, live, (and we don’t yet support Web-UI pool renaming - sorry about that) this could be a source of confusion in the future.

Hope that helps.

2 Likes

@phillxnet Hello again!

Thank you for the code. I tried running the code from shell and got:
image

I used Putty to run the code, and it had no output.

image

Is this correct, or am I missing something?

Thanks again!

@m1dori in order to run btrfs commands you need root access. So either sudo to elevate your prompt, or - as you did - use PuTTY and log in with the root user to execute the command.

And, yes, it’s normal that you don’t get a success message. It should eventually show up in the WebUI as Enabled in the pool screen

Before:

btrfs quota enable /mnt2/rockleap

After:

To check that quota groups have been created (if you have shares already created) you could also run a quota show on the pool, e.g.:

btrfs qgroup show /mnt2/rockleap

Thank you for that clarification.

I am sorry but I’m having a few more issues.

The show command does not work for me.

When i try to turn on quotas, it simply will not enable it, even at the command line.

I also had a replication task that seems to have been broken. It had a .snap share and the replicated share for over 3 days. I am getting the following error when trying to delete the .snapshot share.

        Traceback (most recent call last):

File “/opt/rockstor/src/rockstor/storageadmin/views/share.py”, line 381, in delete
remove_share(share.pool, share.subvol_name, share.pqgroup, force=force)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 1019, in remove_share
toggle_path_rw(subvol_mnt_pt, rw=True)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 657, in toggle_path_rw
return run_command([CHATTR, attr, path])
File “/opt/rockstor/src/rockstor/system/osi.py”, line 227, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/bin/chattr -i /mnt2/8TB/.snapshots/0000-0000-000000000000_Backup_Kids/Backup_Kids_6_replication_1. rc = 1. stdout = [‘’]. stderr = [‘/usr/bin/chattr: Read-only file system while setting flags on /mnt2/8TB/.snapshots/-0000-0000-000000000000_Backup_Kids/Backup_Kids_6_replication_1’, ‘’]

The other issue is now the share that was replicating properly broke and is giving the error
image

This issue was discussed up top more, so I will read through that information on how to fix the issue, I just wanted to not leave out details in case there may be a connection.

Thank you!

the show quota groups command was my error, sorry about that. I corrected it above as well, but the example should have said:
btrfs qgroup show /mnt2/rockleap
(or the command preceded by sudo, if you’re running it from the shell)

When quotas are not enabled, it will give you an error when running the above command.

The example I showed was on a simple pool/share setup, so the enablement was pretty instantaneous. Based on this comment from above:

It might take some time before it shows as enabled.

As for this quota enablement possibly breaking the replication, I don’t know really, @phillxnet might need to think about that one …

Here is the output I receive.

image

This is the error you have explained.

The issue is that I will run the command to enable the quota
image

I will wait hours and it still will not enable. Is there something else I need to try by chance?

@m1dori Hello again.
Re:

You could first try disabling at the command line, then enabling there after - still at the command line. The equivalent of turning them off and then on again (at the terminal) to reset them: but at the command line. There-after they are likely to be better behaved via the Web-UI option to do the same.

Re your replication: it is likely that there was a communication issue or interruption. This is not something btrfs send/receive can handle and is addressed earlier in this thread via an issue reference that contains the work-around.

Hope that helps.

@phillxnet @Hooverdan Thank you both for your support on this and time.

I have successfully turned quotas back on, disabling first seemed to have fixed the issue.

I cannot delete this .snapshot share.

I am thinking maybe it has to do with the fact that it is “unmounted” but I am not certain. I tried to reboot the system, and it remounted the share, but I have the same error trying to delete.

        Traceback (most recent call last):

File “/opt/rockstor/src/rockstor/storageadmin/views/share.py”, line 381, in delete
remove_share(share.pool, share.subvol_name, share.pqgroup, force=force)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 1019, in remove_share
toggle_path_rw(subvol_mnt_pt, rw=True)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 657, in toggle_path_rw
return run_command([CHATTR, attr, path])
File “/opt/rockstor/src/rockstor/system/osi.py”, line 227, in run_command
raise CommandException(cmd, out, err, rc)
CommandException: Error running a command. cmd = /usr/bin/chattr -i /mnt2/8TB/.snapshots/0000-0000-000000000000_Backup_Kids/Backup_Kids_6_replication_1. rc = 1. stdout = [‘’]. stderr = [‘/usr/bin/chattr: Read-only file system while setting flags on /mnt2/8TB/.snapshots/0000-0000-000000000000_Backup_Kids/Backup_Kids_6_replication_1’, ‘’]

Update:
I tried to use the command line to delete the .snap and it states that it is read only. I use root to try and change ownership with chmod and it still is saying it’s read only.

Thanks in advance.

Hello again

I just wanted to say that I appreciate all the support you both provided with trying to making this work.

I am going to continue to use Rockstor as my main NAS system, because I am interested in seeing it develop. I believe it is providing enough benefit to pay for the updates and support the project. I have switched to TrueNAS as my backup with Cloud Sync.

Thank you again!

2 Likes