Ran in to a very bothersome issue over the weekend.
Pushed a package installation automatically via Satellite to ALL of my Linux hosts at 11pm on Sunday evening. Around 9am on Monday I was invited to a bridge to discuss some Oracle issue where no new connections were allowed to the database on the Listener. The fun begins...
So - we discuss what changes had been applied recently and my software push is mentioned. I had tested this software push on all our non-prod hosts 8 weeks prior - and everything has been fine. So.. what the hell happened.
As it turns out the number of hosts affected were less than 1% - However, the way the problem was communicated made the problem sound much worse. So - at this point I assume that the approved deployment to ALL our hosts broke EVERY.SINGLE.BOX in our environment.
I start to review the actual output of the software push and I notice a package that had been updated which was not included in my testing 8 weeks ago. PCRE... So, I pull that RPM down and look at the scripts and I notice that it runs ldconfig. I also check all files that were updated in /etc (using find) and notice that /etc/prelink.cache had been updated. I review the prelink cache and everything appears fine, but.. that file is just a collection of "paths" in a binary/data file.
We find a Red Hat article indicating that if you have an Oracle host, with and SGA greater than 200GB... that you should disable prelink cache. Huh?
I uninstall all the packages from the night before - problem persists. Ugg..
I then disable the prelink cache
# prelink -ua
# ls -l /etc | grep prelink
Problem goes away. Progress... sort of?
I recreate the cache...
# prelink -va
# ls -l /etc | grep prelink
Still no issue. Hmm.. So, I re-install all the packages from the earlier deployment. Still no issue.
Summary: Somehow the /etc/prelink.cache file became invalid, it would seem. I'm not sure if there was a special character in there.. or if the path was screwed up by a new package installation (making Oracle find a binary in the wrong order).
Anyhow - I recommend Google Searching whether prelinking is a good/bad thing. It's quite a spectacle to see all the discussions out there. I'm still on the fence - but I'm leaning towards thinking that the prelinking is a good thing.
Pushed a package installation automatically via Satellite to ALL of my Linux hosts at 11pm on Sunday evening. Around 9am on Monday I was invited to a bridge to discuss some Oracle issue where no new connections were allowed to the database on the Listener. The fun begins...
So - we discuss what changes had been applied recently and my software push is mentioned. I had tested this software push on all our non-prod hosts 8 weeks prior - and everything has been fine. So.. what the hell happened.
As it turns out the number of hosts affected were less than 1% - However, the way the problem was communicated made the problem sound much worse. So - at this point I assume that the approved deployment to ALL our hosts broke EVERY.SINGLE.BOX in our environment.
I start to review the actual output of the software push and I notice a package that had been updated which was not included in my testing 8 weeks ago. PCRE... So, I pull that RPM down and look at the scripts and I notice that it runs ldconfig. I also check all files that were updated in /etc (using find) and notice that /etc/prelink.cache had been updated. I review the prelink cache and everything appears fine, but.. that file is just a collection of "paths" in a binary/data file.
We find a Red Hat article indicating that if you have an Oracle host, with and SGA greater than 200GB... that you should disable prelink cache. Huh?
I uninstall all the packages from the night before - problem persists. Ugg..
I then disable the prelink cache
# prelink -ua
# ls -l /etc | grep prelink
Problem goes away. Progress... sort of?
I recreate the cache...
# prelink -va
# ls -l /etc | grep prelink
Still no issue. Hmm.. So, I re-install all the packages from the earlier deployment. Still no issue.
Summary: Somehow the /etc/prelink.cache file became invalid, it would seem. I'm not sure if there was a special character in there.. or if the path was screwed up by a new package installation (making Oracle find a binary in the wrong order).
Anyhow - I recommend Google Searching whether prelinking is a good/bad thing. It's quite a spectacle to see all the discussions out there. I'm still on the fence - but I'm leaning towards thinking that the prelinking is a good thing.
Comments
Post a Comment