Internal implementation of Pools, Shares, Snapshots and Clones

This is a wikified post documenting the system level definitions and implementation of Storage structures in Rockstor, namely, Pools, Shares, Snapshots and Clones. I’ll also document how they are laid out, mounted etc… If you’d like to suggest improvements, this is the post to edit or reply to. Feedback from advanced BTRFS users is very much appreciated.

A Pool is simply a filesystem, and they can only be created with whole disk drives, not partitions. This is a Rockstor restriction, not a BTRFS one. The raid profile chosen by the user is used for both data and metadata. The command invoked to create the Pool is mkfs.btrfs -f -d <raid> -m <raid> -L <pool name> /dev/<disk1_name> /dev/<disk2_name> .... In case of single raid profile, the metadata profile is changed to dup. Filesystem label is set with the Pool’s name, which must be less than 255 characters in length.

###Pool mount point
After it’s creation, Pool is mounted at /mnt2/<pool name>. If /dev/disk/by-label/<pool name> exists, the command for mount is mount /dev/disk/by-label/<pool name> /mnt2/<pool name>. If not, then one of the Pool’s disks is used like so: mount /dev/<disk name> /mnt2/<pool name>.

Even though Pools are mounted, the mount points are not made public in any way by Rockstor. The intended storage containers to be used by end users are Shares.

If a compression algorithm is chosen, then a mount option compress=<algo> is added. Similarly, any extra mount options are also added. For example, if the user picks lzo compression and noatime mount option, then the mount command becomes mount /dev/disk/by-label/<pool name> /mnt2/<pool name> -o noatime,compress=lzo

###Pool deletion
When a Pool is deleted, it is simply unmounted (umount /mnt2/<pool name>) and it’s deleted from the database. The actual filesystem is untouched, which is why it can later be imported from UI. Importing just reverses these steps.

A Share is a BTRFS subvolume. It’s created with this command: btrfs subvolume create -i <qgroup_id> /mnt2/<pool name>/<share name>.

Qgroups are used to enforce size restrictions of Shares. However, the implementation is incomplete as of 4.1.x and has been temporarily disabled as it was throwing out of space errors due to wrong accounting. I’ll complete this section when we add the support back again.

###Share mount point
Similar to a Pool, a Share is mounted at /mnt2/<share name>. The command used is mount -t btrfs -o subvol=<share name> /dev/<disk name> /mnt2/<share name>. <disk name> is the name of one of the Pool’s disks.

###Share level compression
If compression is set at the Share level, then it’s turned on by this command. btrfs property set /mnt2/<share name> compression <compression also>. Question: If compression is set at the Pool level and then at the Share level, what is the resulting behavior?

###Share deletion
When a Share is deleted, it’s unmounted(umount /mnt2/<share name>) and then it’s deleted with btrfs subvolume delete /mnt2/<pool name>/<share name>.

Snapshot is a BTRFS snapshot of a Subvolume(aka Share). Snapshots of a Share are placed in /mnt2/<pool name>/.snapshots/<share name>/<snapshot name>.

A read-only snapshot is created with btrfs subvolume snapshot -r /mnt2/<pool name>/<share name> /mnt2/<pool name>/.snapshots/<share name>/<snapshot name>. For read-write snapshots, -r flag is omitted.

###Visibility of Snapshots to end users
Snapshots are not mounted by default like Shares or Pools. But the admin can choose if a Snapshot should be visible. If so, then it’s mounted under the Share mount point like so: mount -o subvol=.snapshots/<share name>/<snapshot name> /dev/<disk name> /mnt2/<share name>/.<snapshot name>. Once a Snapshot is mounted under it’s Share’s mount point as a hidden directory, it will be visible to the end user of the Share as a hidden directory.

###Snapshot deletion
Deleting a Snapshot is same as deleting a Share since both are BTRFS subvolumes. The command is bttrfs subvolume delete /mnt2/<pool name>/.snapshots/<share name>/<snapshot name>
##Cloning (a Share or a Snapshot)
Cloning is a Rockstor feature that translates to creating a read-write Snapshot of a Subvolume or a Snapshot. So, Shares and Snapshots both can be cloned to gain their first class new Share status. The end result of the cloning process is a new Share.

The command to clone a Share is btrfs subvolume snapshot /mnt2/<share name> /mnt2/<new clone/share name>.

The command to clone a Snapshot is btrfs subvolume snapshot /mnt2/<pool name>/.snapshots/<share name>/<snapshot name> /mnt2/<new clone/share name>

Even though internally Clones are BTRFS Snapshots, they are presented to the user as any other Share. This has some interesting/unpredictable usage accounting changes as files are deleted from the source of the Clone.

##Rolling back a Share
A Share can be rolled back to a state of it’s past represented by one of it’s Snapshots. It’s a three step process.

  1. Share is unmounted ('umount /mnt2/`)
  2. Share is removed (btrfs subvolume delete /mnt2/<share name>)
  3. Snapshot is moved (mv /mnt2/<pool name>/.snapshots/<share name>/<snapshot name> /mnt2/<pool name>/<share name>
  4. Share is mounted again (mount -t btrfs -o subvol=<share name> /dev/<disk name> /mnt2/<share name>

regarding the visibility of snapshots: it seems its not working as of now, auto-snaps with the visibility box ticked dont show up in the share, manually created too. When this is working again, this should work too.

edit: just tried the commands that should be executed internally, if there is a folder existing for the mount it works just fine, but it really seems the mount of the subvol is not executed

This is very intersting, however, I seem to have the problem of having a pool without disks (removed them- not sure if by ui or by detaching the (last) disk because of a faulty drive) that still had a share. I know can’t remove the share and thus also not the pool. How can I remove them from the database?

I think the root cause is similar to the one in this issue. I’ll update this thread once the issue is fixed.

A few questions to start :smile:

  • Why do shares at /mnt/sharename and snapshots at /mnt/poolname share the same namespace?
  • Does it make sense to mention /export/thing?
  • subvolume id versus subvolume names?
  • Could you rollback a share with a remount instead?

@suman ?

Hey @roweryan, sorry it took me a while to reply. Not sure if I fully understand the question. Snapshots are placed outside of Shares to reduce clutter. if placed inside the Share, each snapshot will list the share and all of the previous snapshots and it gets confusing. But again, not sure what the question is here. Please elaborate.

Could you elaborate?

Are you suggesting there is a better way to rollback compared to the procedure currently followed?

Just a quick update that the issue is now closed.