Skip to main content

USS 7110 apacolypse

We utilize a USS7110 for the ability to have shared storage to our 2 ESX servers, via iSCSI. For the most part the device is rather tremendous. The other part... well, not so much. As it turns out the USS7110 is actually just a Sun x4240 running Solaris 11 and 16 HDDs. 2 for the OS, and the rest for data.  Our array was configured to use Double Parity with a spare. Very good resiliency with minimal sacrifice of space, IMO.

The city of Wayzata sent out their Dream Team to do something involving machinery and digging, etc.. in front of our building. Even though there are literally hundreds of those little flags marking whatever it is they mark, they somehow managed to go through the power feed, apparently. The UPS did not signal to the ESX servers or the Array to shutdown.. so everything just simply crashed. This happens quite frequently so I figured it was not a big deal. I was horribly mistaken.

The actual significance of what took place is probably quite minimal. A file or 2 became corrupt on the OS disks of the "appliance" and it would only partially boot. Unfortunately one of those files was the dbus.xml used by SMF which prevented HAL from loading, which subsequently prevented rvolmgr from loading. (I'll explain how/why I know all this later). After multiple reboots, because that is what a real admin will do, right... still no love from the array. Put a call into Oracle. They walk me through a few basic checks and advised me to run a few commands. Still, nothing. :-( They basically threw in the towel and said "I hope you have backups". Awesome...

So, as I sat there sulking over this debacle... I started to formulate a plan. I happen to have an identical array in my basement, which had not been used in quite some time. So, in a nutshell... I fired up the alt-array, configured it to resemble the dead array (which was a waste of time, in hindsight), shut it down, pulled the 2 OS drives and went back to the office.

NOTE: the following assumes you have an array that "almost works" and has a completely functional ILOM.
Also - your warranty is ABSOLUTELY VOID if you follow these instructions. I imagine if you had maintenance on your device, you wouldn't be in this boat anyhow.


  • 1st step -- power down the broken array and remove ALL the drives. And by "remove" I mean pull them out far enough so the connections in the back are not engaged. I don't recommend pulling them from the chasis as they will probably get mixed up.
  • pull out the corrupt OS drives. Set them aside.
  • install only 1 of the alt-drives in Slot 0 (lower left hand corner)
  • power on the USS using the ILOM web interface
  • ssh to the ILOM
  • start the console
  • connect to "the shell"
  • remove all the zpool entries which point to disks that aren't there
  • cleanly/gracefully shut the box down again
  • power on the array
  • connect to the BUI (https://10.10.31.54:215/)
  • import the zpool
  • setup the shares again.. and you're golden.

I will be attempting to document this in a reasonable format (and not just blog about it). This may be my first document I publish in my blog. ;-) If you happen to come across this and would like a detailed explanation, contact me. I'm happy to explain all that went down (no pun intended) on that fateful evening.



Comments

Popular posts from this blog

RHN Satellite Server (spacewalk) repomd.xml not found

"repomd.xml not found" If you add a channel, or if your RHN cache gets corrupted, and one of your guests complains that it cannot find repomd.xml for jb-ews-2-x86_64-server-5-rpm (for example) - you need to rebuild your repodata cache. Normally this is an automated job - which is exemplified by the fact that you have obviously built out your entire Satellite environment and never had to do any of the steps you are about to do. So - some prep work: Open 3 terminals to your Satellite Server and run: # Term 1 cd /var/cache/rhn watch "ls -l | wc -l" # Term 2 pwd cd /var/log/rhn tail -f rhn_taskomatic_daemon.log # Term 3 satellite-sync --channel=jb-ews-2-x86_64-server-5-rpm Once the satellite-sync has completed, you >should< see the count increment by one.  If you are unlucky (like me) you will not. You then need to login to the Satellite WebUI as the satellite admin user. Click on the Admin tab (at the top) Task Schedules (on the left) fin

Install RHEL 7 on old HP DL380 g5

Someone at work had been running RHEL on an HP DL380 G5 and blew it up.  After several attempts at doing an installation that made me conclude the hardware was actually bad... I kept digging for the answer. Attempt install and Anaconda could not find any disks - try a Drivers Disk (dd.img) both cciss and hpsa.   -- once we did that, when the system would reboot it would say it could not find a disk. hmmm. Boot from your installation media and interrupt the startup at grub. Add hpsa.hpsa_allow_any=1 hpsa.hpsa_simple_mode=1 to the line starting with linuxefi press CTRL-X to boot. Once the system restarts after the install, you need to once again interrupt the startup and add the line from above. After the system starts, edit /etc/default/grub and add those 2 parameters to the end of the line starting with GRUB_CMDLINE_LINUX (which likely has quiet at the end of the line currently). then run # cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.orig # grub2-mkconfig -o /boot/grub2

MOTD with colors! (also applies to shell profiles)

I'm not sure why I had never looked into this before, but this evening I became obsessed with discovering how to present different colored text in the /etc/motd. A person had suggested creating a shell script (rather than using special editing modes in vi, or something) and I agree that is the simplest way of getting this accomplished quickly. This most noteworthy portion of this script is the following: RESET="\033[0m" that puts the users shell back to the original color. I typically like a green text on black background. Also - a great reference for the different colors and font-type (underscore, etc...) https://wiki.archlinux.org/index.php/Color_Bash_Prompt I found this example on the web and I wish I could recall where so that I could provide credit to that person. #!/bin/bash #define the filename to use as output motd="/etc/motd" # Collect useful information about your system # $USER is automatically defined HOSTNAME=`uname -n` KERNEL=`un