Phase 10 is complete.

0

After two months of work I now consider phase 10 of the monster network complete.
Accomplishments:

– rebuilt both core i3 servers into a Xeon e5s to allow more virtual servers to be hosted, both with more horsepower and more maximum memory (256gb+)
– upgraded infiniband switch from SDR to DDR (8gbps to 16gbps) due to issues with old switch and wanting more speed (and I got a great deal)
– removed final AMD server due to power usage.  Replaced with a Xeon e3.
– ended up adding an LSI 9260 raid controller to increase the IOPS on the SAN, happy so far with performance increase.
– all servers now have IPMI, allowing easy remote  management along with allowing full control by SCVMM.
– rebuilt San storage, switching from a Linux based solution (ESOS) to windows iSCSI target.  Along with that change I rebuilt the storage and added automatic storage tiering and a 20gb write back cache using Windows storage spaces.
– all power supplies are now 80+ bronze rated or higher.
– with all the upgrades and rebuilds I ended up lowering overall power usage by 20-40 watts, not too bad!

Failures:
– all was not perfect.  Initial plan was to virtualize the iSCSI target server and then cluster it, but the performance hit was sizeable, I then found out that the only solution to that problem was SR-IOV, which isn’t fully supported by the chipset on the motherboard.  Bummer. So I’m back to single point of failure for now.

Phase eleven planning:

I’ve already started planning for the next stage of design, but due to both increased workload and the oncoming bicycling session. It’s both going to be smaller in scope and will take longer.

– prepare for removal of sharonapple.  I prefer having the hosts, I’ve armrests wanted to have the hosts, but sharonapple is an oddity.  The motherboard uses different memory than the other Xeon e5 boards I have now, and it’s limited to 32gb of ram.  Also the unplanned for rebuild of the San host into a Xeon also made it possible to use that also as a hyper v host.  While far from best practice, in a lab I don’t think it matters.  This would also let power usage by about 80 watts.

– addition of a standalone pfsense router box.  I’ve had some issues with  virtualizing pfsense on hyper v, so I may get a cheap Intel atom board and load it on there.

Introduction to Infiniband Pt.2 (Linux and vmware platforms)

1

In part one I went over some basic Infiniband terms and concepts.   In this part I am going to go over Infiniband basics for Vmware and Linux.  I will not go into full details on either as I no longer run Vmware, and only one of my machines runs Linux, but there are things that you should know that will save you problems.

Infiniband under Linux:

GOOD NEWS!!  You have picked about the best platform to run Infiniband on.  Linux has very mature Infiniband drivers available for almost every brand and type of Infiniband card available.  They also have the most expansive library of  Infiniband troubleshooting and diagnostic  tools out there (SCST).    You can either install install the software on your favorite distribution or use one that comes pre-installed.  I ended up loading ubuntu on a spare workstation with a spare Infiniband card and used it for troubleshooting while I was setting mine up.  It depends on your distribution, but there is a good tutorial here for debian and here for ubuntu.  Both of those articles give very good instruction on basic setup, and until you get more familiar with Infiniband, I would try getting 2 Linux machines to talk, just to test your knowledge.   ibnetdiag is your friend!  Another advantage is there exists an eIPoIB shim for some virtualization platforms under linux to allow virtual machines to share your Infiniband fabric.  I’ve never gotten that far, but you see mentions of it places like here.  If you don’t want to “roll your own”, you can download distributions that are already setup to use Infiniband:

ESOS:  This seems to be the best solution, and one I use here for the Monster Network.   The software is actively being developed, with new versions almost every week.   It supports automatic storage tiering via Btier, iSCSi, Infiniband, FC, SSD Caching, RAID cards, and storage clustering.   The main drawback to ESOS (for me)  is it has no web GUI, so you have to manage the software via SSH or the console with a test-based GUI called “TUI”, a little intimidating, but user friendly.

Openfiler:  I originally planned on using this, but while the software seemed really nice, the free version does not give you a GUI for infiniband setup (which I wanted) and it does not seem to be as actively developed as ESOS.  But if you want a nice web GUI and don’t mind have to do some command line work, it seems to work really well.

Infiniband under Vmware

You have picked a platform where you have to choose your hardware carefully, but it will work.  Before you purchase any Infiniband HCAs, make SURE that it’s either on the Vmware HCL or you can find VMware drivers for the card.  I originally tried to use Mellanox Infinihost III cards, but they weren’t supported on vmware 5.1.  I had to upgrade to a ConnectX card, which technically isn’t supported.  I had to use an older driver and downgrade the firmware on the card to get it to work correctly.  I also don’t know if that trick with work with Vmware 5.5  Start here to read up on what I had to go through, and with the ConnectX card you do not get SRP, you have to use IPoIB and iSCSI.   so if you’re using vmware, make sure to purchase only ConnectX-2 and connectx-3 cards if you don’t want a headache.

Introduction to Infiniband. Pt1.

1

As some of you may know, I have a very over-engineered lab, and as part of that I run an Infiniband SAN.   When I went for Infiniband there really wasn’t much information about what it is or how to use it, only that it’s a cheap way to get a crazy fast SAN connection, which isn’t the whole story.  In the next weeks I will start with what Infiniband is, and how to make an informed decision if it’s right for you or not.

I will start with going over some terminology that goes with Infiniband.  This will lay the groundwork for the next few posts.  One of the mistakes I made with Infiniband is not understanding the Infiniband terminology, I just made the false assumption that it’s just like ethernet, only faster.

Infiniband Speeds:

  1. SDR:  Single Data Rate, this is referred to as 10Gbps, although with the overhead from encoding you’ll only see about 8Gbps.  This is the cheapest speed to start with Infiniband, switches run about $200.  These use a CX4 Connector
  2. DDR: Double Data Rate, this is referred to as 20Gbps when it’s really only 16Gbps with the encoding overhead.   DDR switches start around $400-$500.  DDR Connections still use a CX4 Connector
  3. QDR: Quad Data Rate, this is referred to as 40Gbps, when it’s really only 32Gbps with the encoding overhead.  QDR switches start around $1000.  QDR and above connections use a QSFP  Connector.
  4. FDR-10, FDR & EDR:  these run at 40Gbps50Gbps and 100Gbps respectively.   The encoding overhead on these are only about 3%.   These are beyond the scope of this article.

HCA:  

Host Channel Adapter, this is what the Infiniband PCI adapter is referred to.

Infiniband Cable Types: 

Infiniband cables are  expensive, SDR / DDR cables run around $30-$50 each, and QSFP cables can go for $70-$100 at least.   You have to be very carefuly when shopping for cables as Infiniband CX4 cables look very similar to an SAS SFF-8470 cable.  If the price seems to good to be true, then it’s not an Infiniband cable.    Cables They come in both copper and fiber varieties, the optical cables are lighter and have a longer length, but tend to be more expensive.

  1. CX4:  This is the cable type used for both SDR and DDR Infiniband.  The problem is while DDR CX4 is backward compatible with SDR, the reverse is not true.  Cables for sale on the internet don’t always specify which type of CX4 it is, so if it doesn’t say “DDR” on it, make sure and ask.  These connectors tend to be made of steel and are very heavy duty.  They come in either “pinch” style or “latch” style.  Either works on adapters and switches.4X-Infiniband-L
  2. QSFP:  This is used for QDR and above connections, but it is backward compatible, so you can get a QSFP to CX4 cable adapter.  QSFP cables have a 2-3 inch plug that is inserted into the switch and adapter for a more secure connection than a CX4 connector.  they usually have a pull table dangling off to remove the cable after it’s inserted.TSC0507-QSFP-Plus-Passive

Signaling Rate:

Each Infiniband speed has a base signalling rate, for SDR it’s 2.5Gbp, DDR is 5Gbps and QDR is 30Gbps.   Sometimes you will see on cables, adapters and switches either “CX1”, “CX4” or “CX12”, this refers to how many lanes of traffic is supported by that link.  Mostly you will see “CX4” connections which would give you a 10Gbps SDR connection or a 20Gbps DDR connection.  Some of the higher-end switches offer “CX12” connections which would give you a 30Gbps SDR Connection and a 60Gbps DDR Connection.  CX12 uses a different type of connector and cabling, so make sure if you buy a switch with CX12 connections, you have CX12 connectors on the cards, and CX12 cables to connect them.

Infiniband Switches:

Infiniband, like Ethernet has switches, and they come in either managed or unmanaged.  Unless the switch specifically says “managed”, it’s not.  Infiniband is different as you can daisy change the adapters together to avoid using a switch, but you will experience some performance loss.  Obviously daisy chaining them requires dual-port cards.

Subnet Manager:

For an Infiniband fabric to be fully functional, you must have at least one subnet manager running.   A subnet manager assigns a unique identifier to each adapter and builds a routing table.  You can have multiple subnet managers running for failover, but only one can be active at the same time.  The second one you add will detect the first one running and switch to a passive mode.  Most managed switches will include a subnet manager, but if it’s not managed, you’ll have to run the subnet manager on one of the connected nodes.  OpenSM is included with most drivers.

Infiniband Adapters:

These come in many shapes and sizes.  I try to stay with switches from mellanox and voltaire, as they are still in the Infiniband business and have a very active support community.   You will See “Infinihost III” cards for about $30 each, but I would go for at least ConnectX cards, or if you’re running windows, go for ConnectX-2 cards, as there is better driver support.

IP over Infiniband:

IPoIB is used to run IP over Infiniband, so you can assign IP addresses and ping other machines.  Depending on your O/S it’s either installed by default (Windows) or has to be enabled (linux).  IPoIB lets you use iSCSI and other IP based applications over your fast Infiniband fabric.  IPoIB does add some overhead to the fabric, depending on what you’re running but it’s usually around 25%.  IPoIB is also not bridgeable, so even though windows sees it as just another network adapter, you will be unable to share your infiniband network with your virtual machines.  The only way to do that is with network virtualization.

SRP:

SCSI Remote Protocol, this is the alternative to using IPoIB and iSCSI.  SRP is basically running SCSI command directly over the Infiniband fabric, giving you a very low latency and high speed connection without the overhead from IPoIB.  This is only available in infiniband and certain 10gb ethernet adapters.  But for SRP to be available, the drivers must support it.

SCST:

Generic SCSI subsystem.  This is software used on linux to give you Infiniband or FC targets.  There are different ways to get SCST, either add it to an existing linux install or have it come with your distribution.  ESOS and Openfiler come with it.   SCST is not currently supported under windows, so if you want a lower-priced or free Infiniband Target server, you will use SCST on linux, there is no way to cheaply or easily run an Infiniband target server under windows unless you use IPoIB and iSCSI

This is the first part of my Infiniband Introduction, next I will explain some setup issues and what makes it good and bad.

Links for further Research:

http://www.openfabrics.org Openfabrics is the home of an open-source driver software mostly for mellanox adapters.  While site is still running, the forums are dead and the website hasn’t shown activity in a while.  The driver still does not support server 2012 or 2012R2.

http://community.mellanox.com/welcome Active support site for mellanox Infiniband products.  Help there is generally useful and you’ll usually get a reply to help requests.

http://www.servethehome.com/ A website with good information and many people using Infiniband that are active on their forums.

Free Tools worth their weight in gold Part 1: ESOS

0

Here in the monster network, I try to do things as cheaply and efficiently as possible.  Originally when the network was first coming together I decided I wanted a SAN for clustering.  Originally it was a windows storage server device running iSCSI target software and the cluster members connected via MPIO dual gigabit connections.  This worked well for quite a while, but I never saw throughput go much over 1 gigabit and it never seemed to run as fast as I thought it would.  Then finally I sat down and sketched out what I had and what I wanted to do and started doing research.  This led me to either fiber channel or Infiniband.  My requirements led me to picking Infiniband, for reasons i’ll cover in a later post.  So as part of my decision I could no longer use windows as the drivers do not support an Infiniband target.  My research led me to a driver packaged called SCST which exists solely on Linux.  As I said in the first sentence, I try to also be efficient and me having to re-learn Linux and installing packages and dealing with all that junk just didn’t work for me.  So after further research, I located a software package called “ESOS” (Enterprise Storage Operating System) which is actually a copy of Linux with all the needed packages pre-installed and configured, you simply copy it to a flash drive and boot off it, and it gives a nice curses-based text console for configuration called TUI (Text-based User Interface).  This software allows me to use Infiniband for my SAN traffic.  Now as I will cover in a later post, possibly to be called “Infiniband: It’s great for Linux, but sucks for windows”.  Part of what makes Infiniband so awesome is RDMA which offloads a lot of the processing to the Infiniband card, making storage incredibly fast, as long as the drivers support it.  Now I use Mellanox ConnectX cards for my Infiniband traffic.  The drivers for these cards do not support RDMA under server 2012.  the cards are no longer supported by mellanox, and there are open-source drivers for them, but those drivers only work for 2008R2.  So that left me back at using iSCSI for traffic, but this time over an 8GB connection instead of 1GB.  Luckily ESOS supports both RDMA and iSCSI, and you can mix and match.   I will write more on this later, but ESOS has been for the most part rock solid here in the Monster Network.  The main issues I have with it so far have been caused by my inexperience with Linux or defective cables or hardware.   While ESOS doesn’t support most of the cools things that windows has like storage tiering or storage pools, it does have software RAID support, hardware RAID support, 3 types of SSD storage acceleration, tape drive emulation, and monitoring.  It will also dump performance data to a MySQL server, along with sending e-mail alerts.   also since ESOS boots from a flash drive, It’s not installed on a platter so it won’t be taken out by a bad hard drive.

ESOS