Disk structure under /mnt2 and replication question

Hi guys I’ve got a How To question.

I have setup 2 Rockstor machines so that I can sync/replicate the shares. It seems I can not figure out how to do this with replicate. When I run replicate on a share it puts it into a .snapshot directory with a long name and I’m not sure where else. but it doesn’t copy into the same share name as I would think a replica would be a copy. The Music directory is ~170 GB so to have all these copies is going to really fill the disk quickly. (This currently is not fully populated.) I’m, sure if I did a whole lot of reading I might be able to figure it out but thought I cold really use the help.

How do I go from share A on one machine to Share A on the second machine with Replicate? I saw some posts about Rsync. Is that the solution to my problem? I don’t have the skill to fix this but it sure would be a nice addition to Replicate. So do I just take the data that is now in the replication directory and copy it into the share myself or will that mess up something?

Secondly under /mnt2 is what appears to be a directory structure that is what I am talking about above. I understand that a share is put into /mnt2/pool_name/share_name but share_name directory is also mounted directly under /mnt2. I’m confused.

Use the music directory for example it is in the Raid-Pool. There is a copy under /mnt2/ and there is a copy in /Raid-Pool

Vault2:/ # cd mnt2
Vault2:/mnt2 # ls -al
total 48
drwxr-xr-x 1 root root 556 Sep 8 14:31 .
drwxr-xr-x 1 root root 228 Sep 4 10:06 …
drwxr-xr-x 1 root root 262 Sep 8 14:31 .snapshots
drwxrwxrwx 1 root root 0 Sep 5 15:50 Documents
drwxrwxrwx 1 root root 59134 Sep 8 17:09 Music
drwxrwxrwx 1 root root 0 Sep 5 15:49 Music_Playlist
drwxrwxrwx 1 root root 0 Sep 5 15:50 Music_Podcast
drwxr-xr-x 1 root root 0 Sep 5 15:49 Music_config
drwxrwxrwx 1 root root 0 Sep 5 15:51 Photos
drwxr-xr-x 1 root root 16 Sep 4 10:07 Pool-1
drwxr-xr-x 1 root root 98 Sep 4 11:09 Pool-2
drwx–x— 1 root root 168 Sep 8 14:17 ROCK-ONS
drwxr-xr-x 1 root root 400 Sep 8 14:31 Raid-Pool
drwxr-xr-x 1 root root 0 Sep 5 18:16 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents
drwxr-xr-x 1 root root 0 Sep 8 14:31 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Music
drwxr-xr-x 1 root root 0 Sep 6 14:09 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Photos
drwxrwxrwx 1 filebrowser root 26 Sep 4 11:11 fb-config
drwxrwxrwx 1 filebrowser root 0 Sep 4 11:08 fb-data
drwxrwxrwx 1 filebrowser root 28 Sep 4 11:13 fb-database
drwxrwxrwx 1 root root 24 Sep 4 10:59 nginx-config
drwxrwxrwx 1 root root 48 Sep 4 10:58 nginx-data

Vault2:/mnt2 # cd Raid-Pool
Vault2:/mnt2/Raid-Pool # ls -al
total 16
drwxr-xr-x 1 root root 400 Sep 8 14:31 .
drwxr-xr-x 1 root root 556 Sep 8 14:31 …
drwxr-xr-x 1 root root 262 Sep 6 14:09 .snapshots
drwxrwxrwx 1 root root 0 Sep 5 15:50 Documents
drwxrwxrwx 1 root root 59134 Sep 8 17:09 Music
drwxrwxrwx 1 root root 0 Sep 5 15:49 Music_Playlist
drwxrwxrwx 1 root root 0 Sep 5 15:50 Music_Podcast
drwxr-xr-x 1 root root 0 Sep 5 15:49 Music_config
drwxrwxrwx 1 root root 0 Sep 5 15:51 Photos
drwxr-xr-x 1 root root 0 Sep 5 18:16 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents
drwxr-xr-x 1 root root 0 Sep 8 14:31 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Music
drwxr-xr-x 1 root root 0 Sep 6 14:09 c0450590-f2b9-de11-8a1b-90e6ba7a072b_Photos

the btrfs replication works on the concept of copying a subvolume, like a snapshot to another machine. However, as the replication continues to operate over time, additional snapshots will be created. However, btrfs snapshots are very space efficient (managing the deltas between an earlier and a later snapshot), unless you have major changes on the data you’re replicating all the time.

@phillxnet can probably provide a better/detailed answer to your concerns.

For your second question:
Here is a wiki-fied post from a long time ago, so it’s not accurate in all points anymore:

See the sections for pool, pool mount point, share and share mount point.

One is the subvolume with the underlying data, the other is the mount point (exposing the data to be used by things like the WebUI, Samba, Rockon’s, etc.).

2 Likes

Dan I am getting an error on 2 of my replications. I’m afraid I messed something up trying to get these to work. I was in MNT2 and might have deleted something trying to get it to work.

[11/Sep/2024 16:11:04] INFO [storageadmin.views.snapshot:61] Supplanting share (c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents) with snapshot (.snapshots/c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents/Documents_2_replication_1).
[11/Sep/2024 16:11:04] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/Raid-Pool/.snapshots/c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents/Documents_2_replication_1 ro false. rc = 1. stdout = [‘’]. stderr = [‘ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that’, ‘’]
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py”, line 94, in create_repclone
set_property(snap_path, “ro”, “false”, mount=False)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 2314, in set_property
return run_command(cmd)
^^^^^^^^^^^^^^^^
File “/opt/rockstor/src/rockstor/system/osi.py”, line 289, in run_command
raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/Raid-Pool/.snapshots/c0450590-f2b9-de11-8a1b-90e6ba7a072b_Documents/Documents_2_replication_1 ro false. rc = 1. stdout = [‘’]. stderr = [‘ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that’, ‘’]
[11/Sep/2024 16:11:04] ERROR [smart_manager.replication.receiver:100] b’Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/38/snapshots/Documents_2_replication_1/repclone

Also I setup a testing share and got it to rep but the files I put in it never get to the testing share on the other machine. They are supposed to correct?

info from the receive job.

ID Snapshot Start Time End Time Status Duration Data transferred at rate
33 testing_11_replication_33 September 11th 2024, 2:40:03 pm September 11th 2024, 2:40:04 pm succeeded in a few seconds 0 or < 1KB at N/A/sec
32 testing_11_replication_1 September 11th 2024, 2:29:03 pm September 11th 2024, 2:29:04 pm succeeded in a few seconds 35.00 KB at 32.11 KB/sec

Yes, I believe it should. I think we probably need @phillxnet or someone that has used the replicate service more than me to give you a better answer.

here is the other error I’m getting
[11/Sep/2024 16:11:04] ERROR [smart_manager.replication.sender:79] Id: c0450590-f2b9-de11-8a1b-90e6ba7a072b-12. b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-12. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/38/snapshots/Documents_2_replication_1/repclone". Aborting’. Exception: b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-12. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/38/snapshots/Documents_2_replication_1/repclone". Aborting’
[11/Sep/2024 16:58:04] ERROR [smart_manager.replication.sender:79] Id: c0450590-f2b9-de11-8a1b-90e6ba7a072b-13. b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-13. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/39/snapshots/Music_3_replication_1/repclone". Aborting’. Exception: b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-13. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/39/snapshots/Music_3_replication_1/repclone". Aborting’

@Stevek I can chip-in a little on this one, along with what @Hooverdan has cleared up thus far:

Re:

We have need of the long overdue replication docs section improvements I think:

The last time we tended to the replication system, where each replication job concerns itself with only a single share, was in the following GitHub Pull request: where there is also a proof of function post the changes presented there (now merged). We had updated some dependencies in the current testing phase earlier on and it had broken what function we had before. This PR restored that same function under our newer Python etc.:

Re:

That is the stage when the weirdly named snapshots:

is turned into a proper share and the strange named ‘share’ disapears from the Web-UI.

The missing docs has again caused some confusion. Folks expect the first replication to have copied over a share. It does, to a remove snapshot type share. Only after 3 replication events does the target share appear. We do this to have some more redundancy. The btrfs send/receive system that our replication wraps, sends differences between a share and a snapshot. Hence we create snapshots of the source share and send the difference: this results in a strangely named first send, and only after 3 send (replication) events do things smooth-out on the Web-UI. That first visibility of the strangely named snapshot/share thing is actually a bug. A confusing one but it shows some the initial function. When previously working on replication I attempted to hide that weirdly named share: but the fact that snapshots/shares/subvolumes are all simply btrfs subvolumes caused some confusion in the code as well !

I would suggest that you ‘oldest Snapshot to Share.’ event could have been from you making changes while the process is still settling.

Do as you did before and create a test share with identifiable small content. Delete all existing send/receive tasks and start out fresh with sending this small share only. Set the replication task to happen every 5 or 10 mins (if very small amounts of data) and watch what happens. After 5 replication events it settle into a stable state: where each replication shuffles along the 3 snapshots held at each end (send/receive). It’s confusing without docs. Especially when unfamiliar with the other things we do here, i.e. the double mount etc. Btrfs has a lot of apparent ‘magic’ and sometimes it’s tricky to get to grips with what’s going on. The Rockstor Web-UI attempts to present what is complex in a simple manner: it fails to do this in some areas is the basic answer here. Plus we have some bugs that push us in the wrong direction. However over-all we are progressing, and the replication code/procedure is in need of more attention/folks familiar with it, better docs and explanations.

Hope that helps, at least with some context.

3 Likes

Thanks @phillxnet Phil and @Hooverdan Dan

I had 3 successful reps then on the 4th I got error. I have been messing with this for a week.

[16/Sep/2024 13:27:03] INFO [fs.btrfs:1492] Mount Point: /mnt2/Data_1 has indeterminate quota status, skipping qgroup show.
Try ‘btrfs quota disable /mnt2/Data_1’.
[16/Sep/2024 13:27:04] ERROR [smart_manager.replication.sender:79] Id: c0450590-f2b9-de11-8a1b-90e6ba7a072b-28. b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-28. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/10/snapshots/testing_28_replication_69/repclone". Aborting’. Exception: b’unexpected reply(b'receiver-error') for c0450590-f2b9-de11-8a1b-90e6ba7a072b-28. extended reply: b"b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/10/snapshots/testing_28_replication_69/repclone". Aborting’

What else can I give you to help with this?

Why the wait to create the share?

@Stevek Hello again,
Re:

Was this with a clean setup, i.e. no snapshots at either end (sender/receiver), and receiver does not have a Share by the same name. The replication system creates the share after a few replication tasks. If it already exists then the replication system cannot then promote a replication snapshot to a share of the same name.

Start out clean an get to know what to expect. With a small share on a frequent replication schedule. Then watch both ends. Also the same failure can be report in a more informative manner from one end than the other. So always check the fail report from both ends.

Hope that helps.

1 Like

That is what I did. I have a small number of files in share ‘testing’ I replicated 3 times sucessfully and the 4th failed trying to create the share. I do not have a share named ;testing; on the receiver

1 Like

also on the sending system I all of a sudden have this
image

I rebooted and it is the same. HELP

@phillxnet I setup a new share with a different name so there would not be any overlap and replication fails on the 4th and 5th run with the same error. unable to copy rep to share.

Once it fails it deletes on of the weird shares with all numbers and the share name at the end but the replication share it does not remove and the file system for that share ends up being RO and then everything after that is garbage. The only way I can get riid of that replication is to use the btrfs command @Hooverdan gave me.

Just tried going from valt2 to Vault which is opposite direction from all of the above reported problem so I’,m thinking it is a bug.

I set up a similar scenario just now. In my case it also failed after the third replication. I will post my setup and details I can find tomorrow.

1 Like

Once I set this up at around 19:45, so 5 mins before the first replication would start, I didn’t touch the system until an hour later or so, to ensure that I would not interfere with anything during the replication process. As one can see during the second and third replication, really no data is sent, since I did not do anything with the data set I had generated prior to the replication configuration.

Sender:

Receiver:

Sending machine: quotas enabled, shares set up below with some data (the warning is a quota warning, the disk itself has 6GB of which 5.74GB were consumed, but the quota is set to 1GB:

Receiving machine only had these shares:


rockwurst:/mnt2/Replication # tree
.
├── dir_1
│ ├── dir_1
│ │ ├── file_1
│ │ ├── file_2
│ │ └── file_3
│ ├── dir_2
│ │ ├── file_1
│ │ ├── file_2
│ │ └── file_3
│ └── dir_3
│ ├── file_1
│ ├── file_2
│ └── file_3
└── index.sh

image

Replication Send Process setup:

Send Process:

Receive process:

After the first replication there were two additional shares on the receiving machine, however I forgot to capture it. After disabling the Replication, this is how the share picture looks like:

Replication Receiver History:

(After first three replications, it showed still as successful, the screenshot is after the last attempt):

Snapshots on sending system after last failure (I don’t remember whether there were more initially, or not):

Snapshots list on receiving system (same here, not sure whether/how those listings changed over time):

Log on sending machine between 7:50 and 8:50pm (/opt/rockstor/var/log/rockstor.log):

...
[16/Sep/2024 19:50:04] INFO [smart_manager.replication.sender:341] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending full replica: /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_1

[16/Sep/2024 20:00:03] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending incremental replica between /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_1 -- /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_24

[16/Sep/2024 20:10:02] INFO [smart_manager.replication.sender:335] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. Sending incremental replica between /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_24 -- /mnt2/rockwurst/.snapshots/Replication/Replication_5_replication_25

[16/Sep/2024 20:20:03] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'unexpected reply(b\'receiver-error\') for 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. extended reply: b"b\'Failed to promote the oldest Snapshot to Share.\'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone". Aborting'. Exception: b'unexpected reply(b\'receiver-error\') for 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. extended reply: b"b\'Failed to promote the oldest Snapshot to Share.\'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone". Aborting'

[16/Sep/2024 20:30:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None

[16/Sep/2024 20:30:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26

[16/Sep/2024 20:40:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None

[16/Sep/2024 20:40:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26

[16/Sep/2024 20:50:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None

[16/Sep/2024 20:50:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26

[16/Sep/2024 21:00:02] ERROR [storageadmin.util:44] Exception: Snapshot (Replication_5_replication_26) already exists for the share (Replication). NoneType: None

[16/Sep/2024 21:00:02] ERROR [smart_manager.replication.sender:79] Id: 6f32cb58-f849-4c93-bc65-6ebda422c66d-5. b'Failed to create snapshot: Replication_5_replication_26. Aborting.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/14/snapshots/Replication_5_replication_26
…

Log for receiving machine (/opt/rockstor/var/log/rockstor.log):

...
[16/Sep/2024 20:20:02] INFO [storageadmin.views.snapshot:61] Supplanting share (6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication) with snapshot (.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_5_replication_1).

[16/Sep/2024 20:20:03] ERROR [system.osi:287] non-zero code(1) returned by command: ['/usr/sbin/btrfs', 'qgroup', 'destroy', '0/257', '/mnt2/fresse_storage']. output: [''] error: ['ERROR: unable to destroy quota group: Device or resource busy', '']

[16/Sep/2024 20:20:03] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs qgroup destroy 0/257 /mnt2/fresse_storage. rc = 1. stdout = ['']. stderr = ['ERROR: unable to destroy quota group: Device or resource busy', '']

Traceback (most recent call last):
File "/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py", line 92, in create_repclone
remove_share(share.pool, share.name, PQGROUP_DEFAULT)
File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1196, in remove_share
qgroup_destroy(qgroup, root_pool_mnt)
File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1501, in qgroup_destroy
return run_command([BTRFS, "qgroup", "destroy", qid, mnt_pt], log=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rockstor/src/rockstor/system/osi.py", line 289, in run_command
raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs qgroup destroy 0/257 /mnt2/fresse_storage. rc = 1. stdout = ['']. stderr = ['ERROR: unable to destroy quota group: Device or resource busy', '']

[16/Sep/2024 20:20:03] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/15/snapshots/Replication_5_replication_1/repclone

[16/Sep/2024 20:30:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.

[16/Sep/2024 20:40:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.

[16/Sep/2024 20:50:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.

[16/Sep/2024 21:00:02] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: Share matching query does not exist.
...

Journalctl sending system for first replication:

...
Sep 16 19:50:00 fresse CRON[13351]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:50:01 fresse CRON[13350]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:50:01 fresse postfix/pickup[9736]: 3855A50354: uid=0 from=<root>
Sep 16 19:50:01 fresse postfix/cleanup[13441]: 3855A50354: message-id=<20240917025001.3855A50354@localhost.localdomain>
Sep 16 19:50:01 fresse postfix/qmgr[1306]: 3855A50354: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 19:50:01 fresse postfix/local[13443]: 3855A50354: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.18, delays=0.16/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 19:50:01 fresse postfix/qmgr[1306]: 3855A50354: removed
Sep 16 19:51:56 fresse systemd[1]: Starting Hostname Service...
Sep 16 19:51:56 fresse systemd[1]: Started Hostname Service.
Sep 16 19:52:26 fresse systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Sep 16 19:53:00 fresse NetworkManager[866]: <info> [1726541580.7696] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 19:53:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 19:53:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 19:53:01 fresse dns-dnsmasq.sh[14517]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 19:53:01 fresse nm-dispatcher[14528]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 19:53:01 fresse nm-dispatcher[14498]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:53:01 fresse NetworkManager[866]: <warn> [1726541581.2191] dispatcher: (20) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:53:11 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Sep 16 19:55:00 fresse CRON[15233]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:55:02 fresse CRON[15232]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 19:55:02 fresse postfix/pickup[9736]: 5654E503BF: uid=0 from=<root>
Sep 16 19:55:02 fresse postfix/cleanup[15367]: 5654E503BF: message-id=<20240917025502.5654E503BF@localhost.localdomain>
Sep 16 19:55:02 fresse postfix/qmgr[1306]: 5654E503BF: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 19:55:02 fresse postfix/local[15369]: 5654E503BF: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.16, delays=0.13/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 19:55:02 fresse postfix/qmgr[1306]: 5654E503BF: removed
Sep 16 19:58:00 fresse NetworkManager[866]: <info> [1726541880.7556] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 19:58:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 19:58:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 19:58:00 fresse dns-dnsmasq.sh[16366]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 19:58:00 fresse nm-dispatcher[16375]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 19:58:00 fresse nm-dispatcher[16353]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:58:00 fresse NetworkManager[866]: <warn> [1726541880.8460] dispatcher: (21) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 19:58:10 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
...

This is essentially repeated every 10 minutes for the first three replications. For the fourth replication at 8:20pm (which is the first one failing), some additional message are logged during that time (not sure they’re related or not):

...
Sep 16 20:20:00 fresse CRON[23607]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:20:00 fresse CRON[23606]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:20:00 fresse postfix/pickup[9736]: DD97E505ED: uid=0 from=<root>
Sep 16 20:20:00 fresse postfix/cleanup[23696]: DD97E505ED: message-id=<20240917032000.DD97E505ED@localhost.localdomain>
Sep 16 20:20:00 fresse postfix/qmgr[1306]: DD97E505ED: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 20:20:00 fresse postfix/local[23698]: DD97E505ED: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.18, delays=0.15/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 20:20:00 fresse postfix/qmgr[1306]: DD97E505ED: removed
Sep 16 20:20:03 fresse systemd[1]: mnt2-6f32cb58\x2df849\x2d4c93\x2dbc65\x2d6ebda422c66d_Replication.mount: Deactivated successfully.
Sep 16 20:23:00 fresse NetworkManager[866]: <info> [1726543380.7602] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 20:23:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 20:23:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 20:23:00 fresse dns-dnsmasq.sh[24626]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 20:23:00 fresse nm-dispatcher[24635]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 20:23:00 fresse nm-dispatcher[24612]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:23:00 fresse NetworkManager[866]: <warn> [1726543380.8472] dispatcher: (26) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:23:10 fresse systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Sep 16 20:23:10 fresse systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Sep 16 20:23:10 fresse systemd[1]: Starting DBus interface for snapper...
Sep 16 20:23:10 fresse systemd[1]: Started DBus interface for snapper.
Sep 16 20:23:10 fresse systemd[1]: snapper-cleanup.service: Deactivated successfully.
Sep 16 20:24:11 fresse systemd[1]: snapperd.service: Deactivated successfully.
Sep 16 20:25:00 fresse CRON[25204]: (root) CMD (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:25:01 fresse CRON[25203]: (root) CMDEND (/opt/rockstor/.venv/bin/st-snapshot 2 \*-*-*-*-*-*)
Sep 16 20:25:01 fresse postfix/pickup[9736]: BB32B50655: uid=0 from=<root>
Sep 16 20:25:01 fresse postfix/cleanup[25294]: BB32B50655: message-id=<20240917032501.BB32B50655@localhost.localdomain>
Sep 16 20:25:01 fresse postfix/qmgr[1306]: BB32B50655: from=<root@localhost.localdomain>, size=1290, nrcpt=1 (queue active)
Sep 16 20:25:01 fresse postfix/local[25296]: BB32B50655: to=<root@localhost.localdomain>, orig_to=<root>, relay=local, delay=0.16, delays=0.13/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 16 20:25:01 fresse postfix/qmgr[1306]: BB32B50655: removed
Sep 16 20:28:00 fresse NetworkManager[866]: <info> [1726543680.7618] dhcp4 (enp0s8): state changed new lease, address=192.168.56.112
Sep 16 20:28:00 fresse systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 16 20:28:00 fresse systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 16 20:28:00 fresse dns-dnsmasq.sh[26163]: <debug> NETWORKMANAGER_DNS_FORWARDER is not set to "dnsmasq" in /etc/sysconfig/network/config -> exit
Sep 16 20:28:00 fresse nm-dispatcher[26172]: /etc/NetworkManager/dispatcher.d/ntp: line 11: netconfig: command not found
Sep 16 20:28:00 fresse nm-dispatcher[26149]: req:1 'dhcp4-change' [enp0s8], "/etc/NetworkManager/dispatcher.d/ntp": complete: process failed with Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
Sep 16 20:28:00 fresse NetworkManager[866]: <warn> [1726543680.8576] dispatcher: (27) /etc/NetworkManager/dispatcher.d/ntp failed (failed): Script '/etc/NetworkManager/dispatcher.d/ntp' exited with status 127
...

For some reason on the receiving system, no journalctl entries before today are visible (receiving system runs on Tumbleweed with Kernel 6.10.9 and btrfs-progs 6.10.1, while sending system runs on Leap 15.6 with Kernel 6.4.0 and btrfs-progs 6.5.1, both on Rockstor 5.0.14-0).

@Stevek from your recollection, did you observe anything in addition to the above?

2 Likes

to ensure that there wasn’t any quota processing getting in the way, I recreated the scenario but this time on the receiving system I disabled quotas at the pool level.

Did not seem to make a difference. it still failed at the fourth replication with the same error message on the receiving system:

[17/Sep/2024 13:55:02] INFO [storageadmin.views.snapshot:61] Supplanting share (6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication) with snapshot (.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1).
[17/Sep/2024 13:55:03] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force option -f if you really want unset the read-only status. The value of received_uuid is used for incremental send, consider making a snapshot instead. Read more at btrfs-subvolume(8) and Subvolume flags.', '']`
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py", line 94, in create_repclone
    set_property(snap_path, "ro", "false", mount=False)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 2314, in set_property
    return run_command(cmd)
           ^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 289, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force option -f if you really want unset the read-only status. The value of received_uuid is used for incremental send, consider making a snapshot instead. Read more at btrfs-subvolume(8) and Subvolume flags.', '']
[17/Sep/2024 13:55:03] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/18/snapshots/Replication_6_replication_1/repclone

Using the -f option in the above command manually seems to change the ro property to rw:

/usr/sbin/btrfs property get /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1
ro=false

Interestingly, when I shoehorned the -f into the code and tried this again, it suddenly didn’t seem to recognize the -f flag.

system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property set -f /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_7_replication_1 ro false. rc = 1. stdout = ['']. stderr = ["btrfs property: unknown token 'set -f'", 'usage: btrfs property get/ ...

but in any case, still failing after the third replication.

@Hooverdan Dan you seem to have done a great job documenting exactly what I am seeing. Including the disappearing number.replication name share being removed and the replication going RO and having to use btfrs. One thing I noticed is that I started getting error messages BTRFS info (device xxxx_: qgroup scan completed (inconsistency flag cleared).especially if I touch any files on my systems. Before the failing replication I was not getting these errors. RE: your last comment… It seems that once the failure happens the system acts differently.

Both my systems are on the latest code so I would expect the same behavior but even the logwatch command acts different. On one system if entered with no parms and on the other it runs with no output.

Let me know if I can do anything to help figure this out and I would still like to understand why it waits till the third rep to move to a share? When I rep something I expect I will see the data when the job completes not aft three runs which might be days apart.

UPDATE… I also have this issue now. I can not re-enable quotas. If I do it in the GUI it completes but then comes right back to disabled. Any thoughts how I can fix this?
Show

entries

Search:

Name Size Usage Quotas Raid Active mount options / Status Compression Extra mount options Disks Actions
Data-2_1TB 931.51 GB 251.20 GB (26.97 %) Enabled single rw,relatime,space_cache=v2,subvolid=5,subvol=/ no None "ata-ST31000528AS_5VP1AAS4 "
Data-3_1TB 931.51 GB 251.20 GB (26.97 %) Enabled single rw,relatime,space_cache=v2,subvolid=5,subvol=/ no None "ata-ST31000528AS_6VP3GE7X "
Data_1 931.51 GB 220.22 GB (23.64 %) Disabled single rw,relatime,space_cache=v2,subvolid=5,subvol=/ no None "ata-WDC_WD10EARX-00N0YB0_WD-WMC0S0882589 "
1 Like

You could try to do this at the command line. I seem to remember some thread, where that was happening in the UI, but it was possible to do it using the command line directly. E.g.:

btrfs quota enable /mnt2/Data-1

1 Like

Ran the command and there was no change. I have made copies of the data. Should I delete the original shares and delete and readd the disk or is there another option?

Documents 931.51 GB 0 bytes 0 bytes rw,relatime,space_cache=v2,subvolid=261,subvol=/Documents Data_1 (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Disabled Same as Pool
Documents-1 150.00 GB 10.02 GB 10.02 GB rw,relatime,space_cache=v2,subvolid=262,subvol=/Documents-1 Data-3_1TB (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Enabled Same as Pool
home 57.56 GB 16.00 KB 16.00 KB rw,relatime,ssd,discard=async,space_cache,subvolid=259,subvol=/@/home ROOT (rw,relatime,ssd,discard=async,space_cache,subvolid=256,subvol=/@) Quotas Enabled Same as Pool
Music 931.51 GB 0 bytes 0 bytes rw,relatime,space_cache=v2,subvolid=259,subvol=/Music Data_1 (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Disabled Same as Pool
Music-1 250.00 GB 193.72 GB 193.72 GB rw,relatime,space_cache=v2,subvolid=263,subvol=/Music-1 Data-3_1TB (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Enabled Same as Pool
Newone 1.00 GB 824.00 KB 824.00 KB rw,relatime,space_cache=v2,subvolid=257,subvol=/Newone Data-3_1TB (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Enabled Same as Pool
Photos 931.51 GB 0 bytes 0 bytes rw,relatime,space_cache=v2,subvolid=260,subvol=/Photos Data_1 (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Disabled Same as Pool
Photos-1 200.00 GB 46.50 GB 46.50 GB rw,relatime,space_cache=v2,subvolid=264,subvol=/Photos-1 Data-3_1TB (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Enabled Same as Pool
ROCK-ONS 5.00 GB 232.00 KB 232.00 KB rw,relatime,space_cache=v2,subvolid=256,subvol=/ROCK-ONS Data-3_1TB (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Quotas Enabled Same as Pool

Fundamentally, Rockstor does (still) not respect quotas as a constraint, however it helps/shows on the dashboard more relevant usage of a given share. So, you don’t have to go down the path you described above, unless you really want to.

Did you get any output when you enabled the quotas via command line?

Since I am not familiar whether that quota status is persisted somewhere or re-determined every time you refresh that screen, can you run:

btrfs qgroup show -reF /mnt2/Data-1 - as an aside, are you doing the command line using a root user (e.g. via PuTTy, or elevated prompt in the built-in shell on the UI)?

Thx for all your help…

no output after cmd.
Vault:~ # btrfs quota enable /mnt2/Data_1
Vault:~ # btrfs quota

Here is output from your suggestion.

Vault:~ # btrfs qgroup show -reF /mnt2/Data_1
ERROR: cannot find the qgroup 0/266
ERROR: can’t list qgroups: No such file or directory

I do get output if I list the other drives but not on the one with the issue. Should I create one?

I am using TERMIUS and logged on as root.
Vault:~ # whoami
root