But I’ve never gotten transfer over 10Gbps.
That will depend on what your storage system is set up. I have four HGST 6 TB SATA 6 Gbps 7200 rpm HDDs in RAID0 with a Broadcom/Avago/LSI MegaRAID SAS 12 Gbps 9341-8i so the theorectical maximum bandwidth should be only about 24 Gbps (4 * 6 Gbps). I can hit that if I transfer data locally to my other RAID0 array locally on the system, but over RDMA, I think that I’ve maxed out at around 16 Gbps (out of a theorectically possible 24 Gbps) running NFSoRDMA.
In other words, I don’t have enough spindles to be able to max that out any further.
Just to clarify, I don’t have a managed switch. My switch is externally managed, so one of my headnodes runs OpenSM as the subnet manager and that is what makes the switch work. Without it, my IB switch won’t work/run.
Yeah, I’m moving my systems pretty much all to 100 Gbps IB. The reason for it is since I already got the switch (which is probably one of the most expensive pieces of the entire puzzle), now that I have 36-ports at 100 Gbps capacity, the rest is just buying the cards used off eBay (which is how I got into it in the first place, after watching one of the videos from Linus Sebastian of Linus Tech Tips. And compared to the retail prices, I’m buying the cards at a discount of more than 50%. And I don’t mind the used hardware. The cables were the next most expensive component. For short distances, I used copper direct attach cables (QSFP28 to QSFP28 to the switch). And I got the switch because I have four nodes, which means that I have a minimum of 6 pair-wise connections that I can make. The cards each have two ports, but that means that I don’t have enough total ports to just be able to direct connect four nodes to each other. So I had to get the switch to be able to do that. And then I added in my first headnode to manage the cluster, and now I am bringing online a second headnode because it has the LTO8 backup drive. So one of the headnodes processes and prepackages the data to be written to tape and the other headnode writes it to tape. (Writing about 10 TB of data over GbE was estimated to take 25 hours, so I need to bring up the second headnode with 100 Gbps IB so that it can write at the LTO drive speed of 300 MB/s and cut that down to a theorectical 8 hours instead.)
For the price, again, with the exception of the switch and the cables, the $ per gigabit per port is actually cheaper for me to go with 100 Gbps Infiniband than it would have been for me to go with 10GbE (either Cat6 RJ45 or SFP+ fiber). I still don’t see to many switches where they have like 10-12 10GbE port with a 100GbE uplink port(s). So with me using 100 Gbps IB, my network won’t be the bottleneck anymore. It’s other stuff. (Storage arrays, etc.)
The Mellanox installation manual for the MLNX_OFED drivers also has the instructions to uninstall it should you choose.
Again, it’s really only critical, I think, if you plan on using NFSoRDMA. I think that if you’re using a Windows system, that’ll depend on whether your IB host is also Windows or whether it’s Linux. If you have a Linux system that can run the opensm, but then your IB host runs Windows, you might actually try to use SMBDirect instead, at which point, it would be functionally and practically irrelevant whether you use the inbox driver or the MLNX_OFED driver because you’re only using it to run opensm.
To the best of my knowledge, I don’t know if you can “criss cross” protocols like that – meaning having Linux side run NFSoRDMA and the Windows side run SMB Direct and having both of those talk to each other. I don’t think that it works like that. I also don’t know if Windows can mount a NFSoRDMA mount (like Linux can).
If your clients are Windows and you want really fast speeds, then that suggests that you would need or should need the host to be Windows as well so that you can run SMB Direct, which also means that you need a third system that’s running Linux to run opensm.
Otherwise, I’m not sure if you will see much of a benefit because the real advantage of Infiniband is its RDMA capabilities.
Conversely, if you have one of the Mellanox ConnectX-3 cards (or ConnectX-4 40 Gbps IB cards) where the ports can be configured, you might be able to set the port(s) to run in ethernet mode (mlxconfig -d device SET_LINK_TYPE_P1=2 or something like that), and that would make it so that you don’t need opensm.
And yes, the other method that I have described/proposed works in regards to running opensm. You just have to make sure that you install the opensm package in the OS, in addition to
# yum -y groupinstall 'Infiniband Support'
so that you will be able to get opensm up and running.
Like I said, this is how I have configured my cluster right now and it works because I wanted the NFSoRDMA capabilities because my entire cluster runs on Linux. (And because I have a Linux system that runs opensm, in theory, I can make everything else run IB on Windows since the opensm runs my externally managed switch, but I don’t have it set up right now.)
The one catch/caution is that if you’re going to get an IB switch, the ports on the IB switch is NOT configurable to what their port types are. (Which I think is stupid because they have that capability on their cards, but they choose not to put it on their switches. The 100 GbE switches cost a lot more per port than their 100 Gbps IB switches.)
So just keep that in mind. But it should work for you.
Let me know if you have questions.
Thanks.