re: Sandisk (apparently, they bought Fusion-io, who in turn, got bought out by WD).
So…it depends on how they calculate drive writes per day.
Some people assume only 8 hours of operation per day, whereas, for me (and presumably for you, because the systems runs 24/7), I tend to count 24 hours of operation per day.
Given that, 22000 TB / 6.4 TB = 3437.5 drive writes total write endurance / (5 years * 365 days/year, e.g. warranty life) = 1.883562 DWPD.
It’s definitely better than like Intel or Samsung SSDs which range from 0.3 DWPD to 0.7 DWPD, but because I ran into this problem earlier this year, my eyes are on Micron drives. (There are some WD/HGST datacenter SSDs that goes up to 11 DWPD.)
I haven’t spent a great deal of time testing U.2 PCIe/NVMe SSDs in RAID0 because that’s currently out of my reach in terms of budget/capital expenditures.
So I live with what I am able to get.
re: PCIe SSDs
In THEORY they should be able to have all of the PCIe 2.0 x8 lanes available to them, but it really depends on how you have it set up.
The Sandisk PCIe SSD apparently uses a PCIe 2.0 x8 lane (40 Gbps).
Also, apparently, the Dell R510 and also the Dell PERC H700 are also running PCIe 2.0 x8 (as the max speed), which means that unless the drives are either SAS or SATA 3 Gbps, if you have 12 SAS/SATA 4 TB HGST drives that run at 6 Gbps each, the drives would demand a total theorectical bandwidth than the PCIe 2.0 x8 slot will be able to provide.
For the same reason, this is why my Broadcom/Avago/LSI MegaRAID 9341-8i 8-port SAS/SATA 12 Gbps actually outstrips the bandwidth available to the RAID card itself because a PCIe 3.0 x8 slot can only deliver 64 Gbps, but the card itself, if fully populated with SAS 12 Gbps drives, would demand a total of 96 Gbps. (But this isn’t a problem for me because I’m still only using SATA 6 Gbps drives.)
Ffour SATA 6 Gbps HDDs writing to four SATA 6 Gbps SSDs, I can max it out a 3 GB/s locally for a very short period of time before it quickly settles down to around 500 MB/s across the two RAID0 arrays. From my testing/benchmarking, the advertised speeds are often peak, buffered speeds. Unbuffered speeds can sometimes be even slower than the advertised random I/O operations per second speed which can be converted to MB/s as most typically will publish the block size that they ran that with.
In other words, it has been my experience that in actual, practical usage, the advertised numbers are almost meaningless. I’ve been moving around about 100 TB of data (as I mentioned, preparing the data to be written to tape), and with four nodes, and a headnode on 100 Gbps IB, and two NAS units (which maxes out at 10 Gbps SFP+ speeds), my headnode has been reading/writing data at around 200-ish MB/s when reading/writing lots of tiny files. If I am moving around very large files (1 TB+), then I can write at upto 1800 MB/s, but only very briefly/momentarily. It’s not stable. (I’m using iotop to measure this.)
The only way for me to test anything faster would be to use U.2 NVMe/PCIe 3.0 x4 drives, but there isn’t a hardware RAID adapter for that. The Broadcom that’s available is just a HBA, not a RAID HBA.
The other consideration is how often you think you’re going to replace the hardware. With mechanically rotating disks, if I have a lot of them like you do, I don’t ever have to replace them - at least not due to write endurance. With SSDs, you WILL have to replace them eventually.
My point is that like you, I’ve embarked on this journey as well, but I have to balance between assigning PCIe 3.0 lane bandwidth between the actual storage devices, the GPU (even if it is something very small/with very little in the way of functional features), and networking.
My Mellanox ConnectX-4 takes up a full PCIe 3.0 x16 slot. My current GTX Titan that’s in the headnode takes up another PCIe 3.0 x16 slot. And my MegaRAID 9341-8i takes up a PCIe 3.0 x8 slot. My Core i7-4930K can only supply 40 PCIe 3.0 lanes.
I’ve been looking to moving to either AMD Ryzen Threadripper 3000-series or AMD EPYC, but I’m waiting to see if AMD is going to release a 64-core 3rd gen Threadripper (and how many PCIe 4.0 lanes that’s going to come with) because now I am PCIe lane bound. And the bare system alone is going to be almost $5500 US, so…it can be quite expensive. (If I go with AMD EPYC, then that can jump to $13000 US).
re: iSCSI
Now that I have all of this IB stuff, what I would be looking at, if I were to deploy iSCSI would be ISER. You might have a use for that as well.
I haven’t personally set up iSCSI on my QNAP NAS, but they have tutorials on how to do it, and they make it pretty easy now with the pretty GUI. Yes, I’ve done some stuff with port forwarding.
re: eBay
eBay is awesome for this.
re: render farm
Depends on what kind of a render farm you’re looking to deploy. If you’re talking about Adobe Premiere, unfortunately, I haven’t tried that. In theory, it might be doable, and I think that Adobe has a network rendering engine so that you can add render nodes to it via the network, but I don’t have any experience configuring that.
My work is centered around mechanical engineering/high performance computing/computer aided engineering/finite element analysis/computational fluid dynamics.
Someone gave me some scripts and instructions on how to set up a blender network rendering stuff (for a Blender networked render farm), but I haven’t tried to deploy that yet as I am getting ready to switch to a different Linux distro (CAE Linux, built off of Xubuntu 16.04) for my mechanical engineering stuff. (I’m testing different distros to find the one that works with as many of my engineering applications as possible.)
But that would be on my to-do list once I can get the core engineering cluster back up and running with this other distro as it would be a nice “fringe” benefit to be able to have a Blender networked rendering farm available because I would have the rest of the hardware and infrastructure already there to support it.