Mysterious share - unmounted, read-only and immutable

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

I have 2 appliances A and B.

Detailed step by step instructions to reproduce the problem

I have 2 appliances, A and B. The share [appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1 appeared twice on B, apparently replicated from A. I have deleted one, but the second is unmounted and cannot be deleted, apparently because the file system is read-only and the i attribute is set. Running lsattr as root reveals no attributes at all; chattr -i returns “read only” as in the GUI.

SHARE_Lan_1 exists on appliance A and is scheduled to replicate to B. It is also scheduled for snapshots on A. It contains one text file, called Test.

The replication occurred twice, at 11.00pm yesterday and 3.42 this morning (as I was testing, I had changed the scheduled time after the first run).

At 11.00pm the Share appeared on B as [appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1, but without the Test file.

At 3.42am the Share appeared again on B, as [appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_2. I didn’t look to see whether the Test file was inside. At the same time [appliance A’s ID]_SHARE_Lan_1

At the same time SHARE_Lan_1_1_replication_1 appeared as a second share, though this time apparently in the snapshots folder: snapshots are not scheduled on appliance B and [appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1 does not appear in the ‘Snapshots’ tab on B.

Thinking to start afresh, I removed all SAMBA exports on B, then deleted all shares from B until I came to [appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1. As you will see, it will not delete, apparently because the immutable tag is set.

Web-UI screenshot

I can’t post the screenshot, as it reveals applicance A’s ID. However the error text reads:
Houston, we’ve had a problem.
Failed to delete the share (.snapshots/[appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1). Error from the OS: Error running a command. cmd = /usr/bin/chattr -i /mnt2/Pool1/.snapshots/[appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1. rc = 1. stdout = [‘’]. stderr = [‘/usr/bin/chattr: Read-only file system while setting flags on /mnt2/Pool1/.snapshots/[appliance A’s ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1’, ‘’]

Error Traceback provided on the Web-UI

Traceback (most recent call last): File "/opt/rockstor/src/rockstor/storageadmin/views/share.py", line 381, in delete remove_share(share.pool, share.subvol_name, share.pqgroup, force=force) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1179, in remove_share toggle_path_rw(subvol_mnt_pt, rw=True) File "/opt/rockstor/src/rockstor/system/osi.py", line 676, in toggle_path_rw return run_command([CHATTR, attr, path]) File "/opt/rockstor/src/rockstor/system/osi.py", line 246, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = /usr/bin/chattr -i /mnt2/Pool1/.snapshots/[appliance A's ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1. rc = 1. stdout = ['']. stderr = ['/usr/bin/chattr: Read-only file system while setting flags on /mnt2/Pool1/.snapshots/[appliance A's ID]_SHARE_Lan_1/SHARE_Lan_1_1_replication_1', '']

Thanks for any help you can give.

@john_H Welcome to the Rockstor community forum.
Re:

The replication service is in need of documentation. In total, from memory, I think it takes 5 replication events for it to settle into a stable state. This is because it keeps the last 3 snapshots and only starts cleaning up once these 3 snapshots exist on both the sender and the receiver. In the early stages there is a strange (read Web-UI miss-interpreted) share that after a while is auto-removed. It may well be that you have miss-interpreted this share and intervened by-hand in a process that is still ‘in-process’.

To get familiar with the whole procedure: try setting up a small share with, as you have done, a small amount of data and do a rapid scheduling of say once every 5 minutes. Assuming 5 mins is enough for the transfer. That way you get to see the whole process and it should then make more sense. Our replication system is based around btrfs send-receive and it require ro snapshots that must then be compared to prior state in order that only the changes are sent over. We have our own wrapper around btrfs send/receive in order that we can report on the procedure and add things like the multiple snapshot history.

The last work we did on this sub-system was to update it to our new Python, and to take account of changes in zmq that we use to send / receive stdout/stdin between the sender and the receiver.

I know this doesn’t answer your question entirely but the above PR has some pics on what to expect and without doc it is not appreciated on first use that there is a settle period that has likely thrown your interpretation of it’s normal function. We do still have a weakness, inherited from the base mechanism of btrfs send/receive where network failure requires a reset of sorts. Usually a delete of the last sending snapshot.

Let us know how you get on with further experimentation. Our wrapper has the potential to work-around / mitigate said weaknesses re intermittent network - but alas we need to attract more developers to become familiar and contribute for this potential to be realised. But on last testing, in referenced pull request, we were functioning in latest testing as per prior stable.

For further experimentation you will need to start from scratch with a new replication schedule etc. I’ve been meaning to document this whole process for some time, but as the project maintainer I have quite the juggling job :). But all in good time and do continue to report your findings as this all helps with what difficulties folk have when first encountering our replication system.

I’m pretty sure what you are describing is ‘normal’ for our replication as-is. There is definite weirdness early on, within first 5 replication events, that can lead to share/snapshot confution within the Web-UI which emerges from a share/snapshot/clone all essentially being btrfs sub-volumes. And our Web-UI has a bug where it miss-interprets replication snapshots as shares: but only in the early stages. We also block the deletion of these snapshots as they are associated with the replication service.

So take a look at the screen shots in the above referenced last-update of our replication system and it may clear-up part of how it works.

Hope that helps, at least for a bit of context. We have mostly been modernising our back-end in this last testing phase and plan to address more front-end Web-UI improvements in the next testing phase. So it’s unclear when the replication service might be improved as-of-yet.

2 Likes

Thanks very much: that’s really helpful.

I will let you know what happens. If nature doesn’t heal the problem as you suggest, am I safe to run:

btrfs property set [filename] ro false

so I can delete?

2 Likes

@john_H Hello again,
Re:

As long as you first delete the associated replication task and any exports of the associated shares, you can wipe the btrfs subvols by what-ever means. Rockstor has the Pool source of truth at the Pool. At least that is our intention. So a Web-UI page refresh (or two) should pick-up any by-hand subvol modifications. It can just get upset if those btrfs subvols are involved with other activities still, like the replication service and we likely have some bugs on handling this kind of low-level intervention, under what Rockstor normally manages for itself.

Hope that helps.

1 Like

It worked! The natural process, that is. Thanks again!

3 Likes