SSD Read / Write Caching

Hello


First time using Rockstor, although I’ve been lurking for a bit. I was wondering if / when SSD read and write caching would be implemented? That and deduplication are the biggest things that I need for my production machine. I’ve never really liked the limitations of ZFS in regards to adding drives, and other than the ssd and dedup, Rockstor looks perfect


Thanks!
Marshall
2 Likes

Thanks for starting this discussion. We’ve gotten several requests for dedupe so it’s definitely on our roadmap.


For SSD cache, do you(or anyone else in the forums) have any specific recommendation? 

Are there any bcache users out there willing to share their experience?

I’ve read a bit about bcache, but I haven’t had any direct experience with it. 


The idea of having a couple of SSD drives (raid0) as a very fast cache for inexpensive slow disks sounds very promising. I’ve seen the benefits with ZFS.

Dedup would be fantastic, but for me personally, SSD caching is way more important.

(ps, I was the one that wrote y’all on twitter)

Reading some more, it really sounds like there are just a couple of commands to add bcache to rockstor. There aren’t a lot of parameters to mess with, which would make a very clean UI. I’m getting some SSD’s in next week that I’d like to try. Maybe go and mess with it via the command line since it’s baked into the Kernel already.


Please let me know if I’m wrong about this: On the write side, you would be writing directly to the ssd’s and in the background, the ssd’s would move the data to the hdd.

Sounds great! Please share your experience with bcache.

I’m not sure how bcache could work with btrfs in a reasonable way, but I admit I haven’t looked at it very hard or long.


bcache works on block devices, and btrfs is not in the business (yet) of creating virtual block devices.

As far as I can tell, bcache only supports one block device as a “backing” device (spinning disc)

So you’d have to do something funky like partition your SSD to use a slice of it as a cache for each backing device (not going to play well with the idea of adding drives later) or a dedicated ssd per hdd ($$) or some way to abstract a bunch of hdd’s in to a single block device and that gets ugly pretty quick (loopback devices anybody?).

lvmcache is another option but isn’t lvm and btrfs more or less mutually exclusive?

Thanks for your feedback. It would work best with Rockstor if a pool can be the backing device for bcache, but pool is not a block device but a fstree. From the docs, bcache is fs agnostic and works at the block device level and can back arbitrary number of devices per cache. This is contrary to what you are saying.


Assuming you can have more than one backing device per cache, I wonder if that’s good enough for Rockstor.

Yup looking more closely bcache does permit configuring an arbitrary number of backing device, my mistake.


I’ll put it on the list of things to play with!

Sorry for the necromancy, but before anyone gets their hearts set on bcache, you might want to take a look here:
https://wiki.archlinux.org/index.php/Bcache

I cobbled together a NAS using Arch and btrfs + EnhanceIO over a couple afternoons so I’m sure it could use some tweaking. For certain workloads, like the repetitive use of files, I’ve noticed huge performance increases. Before EnhanceIO I was sustaining 34MB/s writes and 95MB/s reads over a Gigabit wired network. With EnhanceIO I see 107MB/s read and write sustained. The before and after is using CIFS to transfer ISO’s from a Notebook PC (Aorus X3+ V2 w/ 3 mSATA SSD’s in RAID0) to the NAS and back again. Considering the overhead cost of TCP traffic, I have effectively slammed against my bandwidth cap on a Gigabit Ethernet network. I have not started to experiment with bonding both of my NAS’s interfaces so I don’t know what performance would look like in that scenario. Could be better, might be disappointing.

To provide a sense of scale, my NAS hardware is as follows:

  • ASRock C2759D4I Mainboard (w/ Intel Avoton 8-core Atom processor), two 10/100/1000bps Ethernet ports + 1 10/100 IPMI over Ethernet port
  • 32GB ECC Unbuffered RAM (from Crucial; V-Color, Mushkin, Kingston, Geil, Adata, and Hynix did’t work)
  • 6x WD Red 6TB HDD (~ 33TB NAS storage)
  • 1x Crucial M4-CT128M4SSD2 128GB SSD (boot drive)
  • 1x Crucial M500 960GB SSD (cache drive)

It’s not exactly a wimpy rig. I think anything less than a Core2Duo or AMD A4 and the IOPS of the SSD would exceed the capacity of the processor, so bear that in mind. The Avoton, for reference is roughly equivalent to an E3-1200v2 Xeon. Certainly overkill for a home NAS but probably right on for SMB use.

Back to EnhanceIO. I’ve little experience with EnhanceIO so my observations should not be taken as an exhaustive or conclusive analysis. It does work really well for repetitive loads. I’ve migrated and booted a VMware VM (Windows 2012 r2 w/ Exchange 2010) from this NAS and performance was excellent for our small environment (roughly 225 accounts, about 150 actual users, light use). I don’t have any throughput metrics for this use, but I did notice that the longer we used it, the better performance got until it had peaked after about 48 hours had passed.

At this point, this box is tedious to manage on command line only so I’m looking for something a little easier to work with. I will say that I had some trouble getting EnhanceIO to compile on CentOS7 which is why I went the Arch route in the first place. That being said, I don’t consider this box production ready because Arch is notorious for borking installs with an update. If someone wants to continue actively exploring caching with btrfs on Rockstor, let me know. I’ll be installing it shortly on the NAS above and experimenting.

1 Like

i havent digged into bcache because i noticed it says “bcache can corrupt your btrfs filesystem if used with it” when searching for btrfs and bcache, for enhanceio which i hadnt tested till yet but definitely will, it sounds promising as its an offspring off facebooks own caching algorithm (flashcache), and they want/have migrated to btrfs

EnhanceIO looked promising, but has essentially been abandoned since 2012, so I would be hesitant to use it.

Bcache’s sole developer now has a new project (bcachefs) keeping him busy, so I doubt there will be much happening development on bcache in the near future.

The biggest problem all these options have is a lack of redundancy in the cache device, resulting in filesystem corruption if the cache device fails. It seems to be one of those planned but never realised features. Writethru or read-only caching should still be safe, but it misses most of the performance benefit.

1 Like

to my understanding cow transaction (like in zfs) are atomar, so corrupting the fs shoud never happen during normal operation and failing cache devices that write normal writes to btrfs. But yes the data on the top layer might be inconsistent (VMs for example), essentially the latest data in the cache is missing on the fs, but the old data is still there.

regarding eio i didnt know they stopped development, sad to hear, lets hope there is/will be something else soon

I had noticed enhanceio hadn’t been updated much but the last update in the git repo was 8 months ago.
If flashcache is the only maintained option then I guess that’s the one I’ll work with but I’ll need to spend some time this weekend trying again to get Rockstor to install on my hardware. Crashes during post-installation, which then boots with either a kernel panic or to shell depending on how many times I reboot.

FWIW, I did find a bug filed against enhanceio concerning ubuntu and file system corruption.

As someone who’s been using bcache on my main desktop with a BTRFS backing device for about a year, I thought I’d be a good person to bump this with some info.

  1. The bcache/BTRFS corruption issue was present in kernels <3.19. It has since been resolved and is considered stable.

  2. bcachefs does not detract from the development of bcache. bcache by nature is a file system of its own, or at least incorporates all of the critical elements of a file system as it is a package for writing and reading data on a storage device. bcachefs is just the logical extension of bcache into a full fledged CoW file system with native caching device support like ZFS current offers. Development of bcachefs will not impact the stability or progress of bcache. They are essentially the same thing.

  3. bcache performance gains are pretty incredible. My setup is a 256GB SSD in front of a 2TB BTRFS file system hosting Arch Linux. With bcache as my root device, my startup times are as if I’m running just an SSD. I have writeback enabled so copying data to my computer from fast USB 3.0 flash drives is nearly instantaneous. I have no had any issues with writeback corruption despite a couple of hard crashes and power failures since I started using it.

  4. A caching device is good for your RAID array. Funneling reads through a caching device means less seeking and wear on your backing array disks. For example, let’s say you’re hosting TV shows on your NAS and you have six people who are going to individually watch an episode after it is initially downloaded. Instead of six reads from array for the episode you’ve got one, to the caching device. The rest of the reads are going to be cache hits. If you use writeback, it could be zero reads as the file is cached before it has a chance to be written to the array. You’ve reduced the amount of reads to your backing array for that file by at least over 80%. My overall cache hit ratio since I setup bcache quite a long time ago is 91%. I have about 600GB of data on the backing device.

  5. bcache provides an incredible amount of statistics that would look pretty on the dashboard. (That’s a reason, right?)

bcache is an awesome tool. I highly recommend incorporating it.

5 Likes

Isn’t Bcache built into the kernel these days?

I am moving to Rockstor from openfiler, I noticed openfiler used available memory as cache as well… Although ssd cache is a great ideal i would like to be able to max my memory and use that as well

Caching to RAM is a native feature of the Linux kernel. Unless Rockstor has specifically disabled support for it in their kernel then it will already be enabled by default.

1 Like

I just thought I’d chime in that I’m looking at FreeNAS and Rockstor, and SSD Cache support is, I think, the sole reason I’ll probably end up on FreeNAS. It’s too important for achieving decent speeds in our environment. Otherwise, I think Rockstor looks as good or better for our needs. I guess lack of an Amazon S3 backup plugin is the other thing, but that’s easier to work around.

Not much point until NIC teaming/bonding arrives, otherwise most SSD cache users will suffer a 1gb bottleneck.

I’m definately interested in SSD caching as well, possibly using bcache or dm-cache.

I do have experiences on dm-cache on top of lvm/ext4 and it’s pretty amazing lowering response times of local loads such as sql, or more interesting on rockstor: transcoding speeds up as well!