Any news on the deduplication features of BTRFS?

I have a 32 TB FreeNAS backup server filling up right now, that could REALLY bennefit from dedupe!
Unfortunately ZFS only does inline deduplication, which is dog slow, even with 32 GB of RAM… :frowning:

Right now i am evaluating Windows Server 2016. Is shows great dedupe ratios. 1.7 TB Veeam Backup (Virtual machines) test data shows around 700 GB space saved! But unfortunately RAID 5/6 write performance is REALLY slow (around 70 MB/s average with 5 disks + one parity disk). And also there is no easy SMART monitoring or email alerts. (Also this morning, i found the server had, without any obvious reason, kicked out all drives in my drive shelf!. Reboot, and all disks and volume was up again. Not very comforting…).

So i have just tried a small Rockstor setup, and i really like the concept - but the missing dedupe feature is a killer right now… :frowning:

Martin, welcome to the forums. Rockstor does use compression on the btrfs volumes and I generally use lzo (which is faster but you can also use another one that is slower but greater compression ratios). Dedupe really just eliminates duplicate blocks of data and empty blocks of data. In essence, it just combines things (which is honestly a better name because dedupe is like saying I’m undoing a duplication…but I digress!)

You can obviously use dedupe technology on volumes that run over btrfs (such as a 2016 server or veeam).

I’m unsure what you’re using that is giving you issues, is it FreeNAS or WIndows or Rockstor (referring to your 2nd paragraph). What setup are you using in regards to disks, RAID card (and how did you configure it)?

Well i think i described the issue quite well. But i will try to elaborate…

My data is backup archives from Veeam backup. This data is allready highly compressed, and i gain nothing from additional filesystem compression. The backups is allready deduplicated to some degree (Dedupe per job, not whole volume). But my tests, on Windows Server 2016 with offline deduplication, have shown, i can gain around 40% further space savings with Windows volume deduplication. So i know i have data that is dedupe friendly.

I could move my storage platform to Windows Server 2016, but besides good deduplication, there is really not much to like about this option. Slow RAID5 write speeds, no nice web gui, no easy reporting, no bitrot protection etc.

Right now i am running on FreeNAS. Wich is very fast and stable. Unfortunately it only does inline deduplication - and this is veeeery slow for me. So enabling dedupe on ZFS is not really an option.

So my last option is moving to BTRFS filesystem… I found Rockstor, and i really like it. Good web ui, very easy to use, BTRFS has good performance, scrubbing and much more. The only thing i am missing is the dedupe tools for BTRFS (simple CLI support would be good enough). I know other Linux distros have these tools, but i am struggling to get them to Rockstor underlying OS.

Hardware is Quad Core Xeon E5 server with 32GB RAM and LSI SAS (not RAID) controller connected to 12 4TB SAS disks.

Hope this helps understanding…

Hi @Damged, welcome to Rockstor again.

You’re right when you say that actually Rockstor is missing a deduplication tool (via UI or cli) and probably it would be a good addon (@suman)

At this time btrfs-tools (used from Rockstor all btrfs related ops) doesn’t have a builtin command to perform deduplication and there are only few tools for that (Deduplication from BTRFS Kernel page)

When i first read your post, also thinking about freenas/nas4free dedupe RAM matters, i took a look at bedup and duperemove (inband excluded cos’ in btrfs kernel page it’s defined ram hungry and we don’t wanna clone freenas/nas4free “bad min requirements”) and some tests should be performed to find the one that best fit with Rockstor.

Hope we’ll let you move to Rockstor and leave WIN2016 :wink:

To followup it looks like bedup is a good tool but it’s not stable for kernel v4+
There is active development and they are testing but it’s limited to the mailing list right now (you may want to test especially if you backup already and can reinstall).
I’m by no means an expert but it sounds like inline dedupe is what you want and right now that’s possible with btrfs and the last kernel version but the version rockstor uses 4.3.3 is being developed (at least for bedup).
http://blog.gmane.org/gmane.comp.file-systems.btrfs/page=17

hi @magicalyak, I know Rockstor is some versions behind actual btrfs-tools ver (i saw it to 4.5 ver on btrfs rescan issue, already solved) and i think @suman is still evaluating upgrading (current Rockstor with 4.3.3 works fine and it’s stable), but also with native support for dedupe i would never choose (personal opinion) an inline deduplication solution (risk : become like freebsd zfs inline dedupe asking tons of RAM :scream: )

@magicalyak:
Thank you for looking into this! The missing support for kernel 4+ explains why i could not get Bedup working from the command line. I had not seen that dev thread.
Perhaps looking at Dupremove, which seems to be another newer alternative could be a posibility?
My simple Linux experience just didn’t cut it for figuring out how to install this… :frowning:

And your a bit wrong about you inline vs offline deduplication assumptions…

On the contrary - I totally agree with Flyer Mirko Arena here. It is NOT inline deduplication we want. The RAM requirements needs to be big enough to containg the entire lookup table of block cheksums, or else everything becomes extremely slow (just like i am experiencing with ZFS - even with 32GB RAM).
The BTRFS wiki also states this: “This typically requires large amounts of RAM to store the lookup table of known block hashes. Patches are currently being worked on.”

Instead we want offline / out-of-band / batch deduplication like we can get from tools like Bedub or Duperemove.
These tools can be run at night / low activity hours, to compare, and remove duplicate block entries. And giving full-speed access to the filesystem in the daytime. (This is also the way Microsoft is doing it in Server 2012 / 2016 btw).

1 Like

duperemove is in the repositories (at least if you are running testing.) so all you need to do to install it is a quick “yum install duperemove” and then you can test it. Has worked pretty well on my testings so far.

I kindda nuked my Pool while moving to another test server with more RAM. No disks is showing up in Rockstor UI… :frowning:
Problably just a disabled HBA that can be fixed in the BIOS…

When i get it running again i will share my experience with Duperemove.

Guys, try it:

Hi Guys, Few days back I posted this new tool dduper with some performance numbers and quick demo. Check it out and let me know If you have thoughts/suggestions.

1 Like