Skip to main content

USS 7110 apacolypse

We utilize a USS7110 for the ability to have shared storage to our 2 ESX servers, via iSCSI. For the most part the device is rather tremendous. The other part... well, not so much. As it turns out the USS7110 is actually just a Sun x4240 running Solaris 11 and 16 HDDs. 2 for the OS, and the rest for data.  Our array was configured to use Double Parity with a spare. Very good resiliency with minimal sacrifice of space, IMO.

The city of Wayzata sent out their Dream Team to do something involving machinery and digging, etc.. in front of our building. Even though there are literally hundreds of those little flags marking whatever it is they mark, they somehow managed to go through the power feed, apparently. The UPS did not signal to the ESX servers or the Array to shutdown.. so everything just simply crashed. This happens quite frequently so I figured it was not a big deal. I was horribly mistaken.

The actual significance of what took place is probably quite minimal. A file or 2 became corrupt on the OS disks of the "appliance" and it would only partially boot. Unfortunately one of those files was the dbus.xml used by SMF which prevented HAL from loading, which subsequently prevented rvolmgr from loading. (I'll explain how/why I know all this later). After multiple reboots, because that is what a real admin will do, right... still no love from the array. Put a call into Oracle. They walk me through a few basic checks and advised me to run a few commands. Still, nothing. :-( They basically threw in the towel and said "I hope you have backups". Awesome...

So, as I sat there sulking over this debacle... I started to formulate a plan. I happen to have an identical array in my basement, which had not been used in quite some time. So, in a nutshell... I fired up the alt-array, configured it to resemble the dead array (which was a waste of time, in hindsight), shut it down, pulled the 2 OS drives and went back to the office.

NOTE: the following assumes you have an array that "almost works" and has a completely functional ILOM.
Also - your warranty is ABSOLUTELY VOID if you follow these instructions. I imagine if you had maintenance on your device, you wouldn't be in this boat anyhow.


  • 1st step -- power down the broken array and remove ALL the drives. And by "remove" I mean pull them out far enough so the connections in the back are not engaged. I don't recommend pulling them from the chasis as they will probably get mixed up.
  • pull out the corrupt OS drives. Set them aside.
  • install only 1 of the alt-drives in Slot 0 (lower left hand corner)
  • power on the USS using the ILOM web interface
  • ssh to the ILOM
  • start the console
  • connect to "the shell"
  • remove all the zpool entries which point to disks that aren't there
  • cleanly/gracefully shut the box down again
  • power on the array
  • connect to the BUI (https://10.10.31.54:215/)
  • import the zpool
  • setup the shares again.. and you're golden.

I will be attempting to document this in a reasonable format (and not just blog about it). This may be my first document I publish in my blog. ;-) If you happen to come across this and would like a detailed explanation, contact me. I'm happy to explain all that went down (no pun intended) on that fateful evening.



Comments

Popular posts from this blog

P2V using dd for KVM-QEMU guest

Preface: I have certainly not exhaustively tested this process.  I had a specific need and found a specific solution that worked. Situation:  I was issued a shiny new laptop running Red Hat Enterprise Linux 7 (with Corp VPN, certs, Authentication configuration, etc...)  The image was great, but I needed more flexibility on my bare metal.  So, my goal was to P2V the corporate image so I could just run it as a VM. * Remove corporate drive and install new SSD * install corp drive in external USB-3 case * Install RHEL 7 on new SSD * dd old drive to a disk-image file in a temp location which will be an image which is the same size as your actual drive (unless you have enough space in your destination to contain a temp and converted image) * convert the raw disk-image to a qcow file while pushing it to the final location - this step should reduce the disk size - however, I believe it will only reduce/collapse zero-byte blocks (not just free space - i.e. if you de...

Sun USS 7100 foo

TIP: put ALL of your LUNs into a designated TARGET and INITIATOR group when you create them.  If you leave them in the "default" group, then everything that does an discovery against the array will find them :-( I'm struggling to recognize a reason that a default should even be present on the array. Also - who, exactly, is Sun trying to kid.  The USS is simply a box.. running Solaris .. with IPMP and ZFS.  Great.  If you have ever attempted to "break-in" or "p0wn" your IBM HMC, you know that there are people out there that can harden a box - then.. there's Sun.  After a recent meltdown at the office I had to get quite intimate with my USS 7110 and learned quite a bit.  Namely: there's a shell ;-) My current irritation is how they attempt to "warn you" away from using the shell (my coverage expired a long time ago to worry about that) and then how they try to hide things, poorly. I was curious as to what version of SunOS it ...

Extending SNMP to run arbitrary shell script

Why are we here... This is not likely something I would have pursued under normal circumstances.  I happen to be working for a customer/client who is not afforded a lot of flexibility to accomplish their goals.  In this case, the rigor is justified.  They have to sometimes be fairly creative with how they solve problems. In this case they would like to utilize an existing snmp implementation to execute a command (or shell script) on a remote system.  They came to me with the idea of using Net-SNMP extend. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/sect-System_Monitoring_Tools-Net-SNMP-Extending.html NOTE:  This is NOT a good implementation strategy in the "real world"  it will simply allow you to test the functionality.  There are a TON of security implications which would need to be taken in to consideration. Implementation Steps: [root@rh7tst01 ~]# yum -y install net-snmp net-snmp-utils ...