The Monster Network is earning it’s name early this year!

I haven’t been keeping the blog up to date with the changes, but this year has been crazy so far.

1.  I finally got fed up with the poor SAN performance and replace the final two 4+yr old drives in the RAID-50 Array.  They were WD Green drives and I think I was being bitten by one or both of them going bad and the TLER issue with those type of drives in a RAID.

2. I rebuilt one of the servers into a new case, which exposed some strange issues on my core-i3 hosts when pfsense is hosted on it, things like poor bandwidth and DNS issues, when both the host and the guest have plenty of ram and CPU cycles

3.  I had a heat sink break off one of my ConnectX Infiniband Cards, causing the card to cook itself, and the only reason I found that out was my SAN fabric was dropping out at random times, except when that host was turned off.  Opened it up as part of the rebuild and saw the heatsink hanging off the card.

4. I apparently also have a bad Infiniband cable which i’m trying to track down.

5. My Infiniband switch is managed (Voltaire ISR9024-M), but the serial port uses a mini-usb connector for some stupid reason, so i’m unable to research the above infiniband issues, trying to track down a cable for it has proved difficult at best.

6. My pfsense VM corrupted itself, causing me to have to reload pfsense from scratch and from memory as ar as vlan assignments…joy

7. Trying to get a host to simply boot off a USB drive proved much more difficult than it should be.  You have to flip the “removable” bit in the firmware but the tools to do that are very hard to find and use.  This also caused me to ruin 2 flash drives, at which point I gave up and bought an SSD replacement.

8. Power supply went out on a host, had to replace

9. The host I rebuilt now has an issue with the nics where not all of them are passing VLAN tags from the trunk ports, so i’m unable to team them until I diagnose which ports are the issue and why it’s happening.\

10. I used one of the WD Green drives I pulled from the SAN box to use for local VM storage, and when the server is rebooted, the drive letter was lost.  I posted my frustration on Facebook and was suggested I had a corrupt partition table and to completely wipe the drive and start from scratch which fixed it.

Now that most of the above are fixed, i’m tackling the final items in my current lab rebuild phase:

1. Configure Btier on my ESOS SAN target (automatic storage tiering)

2. Move remaining VMs to clustered storage

3. Rebuild last core-i3 server into a Xeon server

Leave a comment