Quotes
“Some people succeed because they are destined to, but most people succeed because they are determined to.”
by  Henry Ford
  American founder of the Ford Motor Company (1863 - 1947)
Currently Reading….
Categories
Archives
Search

Posts Tagged ‘Mike Hagan’

Achieving IT Service Quality: The Temptation of Band-Aids

Google Buzz

Our BlogTalk Radio show guests Chris Oleson, Mike Hagan, and Christophe DeMoss has given me permission to reprint this excerpt from their book: Achieving IT Service Quality – The Opposite of Luck.

Perhaps after reading this brief section that I found so “right on” in my own IT work with clients, you might see why I’ve been talking up this book! You can download a chapter and connect with Chris, Mike, and Christophe at their blog: ooluck.com

The Temptation of Band-Aids

The memory leak example is a good one to continue with to discuss something very important: avoiding long-term use of Band-Aids.  In order to survive, organizations that are unable, unwilling, or just don’t know how to get to true root cause will inevitably resort to Band-Aids as a way of getting by.

We were once brought in to help a particularly troubled sales application where a memory leak bug caused a server to run out of memory in the middle of a particularly critical production day. When we pushed for root cause, we were told the cause of the failure had been human error. This seemed quite possible to us. We began to wonder where the human error had been made. Was it during the testing process that allowed the bug to make it into production? Was the human error during the coding process that created the bug in the first place? We were eager to hear where the human error had occurred so the root cause could be addressed.

Imagine our surprise when we were told the root cause was an administrator having inadvertently disabled a cron11 job to automatically reboot the server every night to clear the memory that was being consumed by the memory leak. You see, the memory leak was a known problem for more than a year, and somewhere along the line someone had decided that doing an automated nightly bounce of the server was the right permanent solution. In the mind of the manager performing the root cause investigation, the problem had been solved by the nightly auto-restart, and the mistake occurred when the administrator inadvertently turned off the job that kicked off the reboot. Clearly, we had different ideas than the manager about what root cause actually means. He had stopped several whys short instead of looking deeper.

In this particular case, a Band-Aid had been mistaken for the root cause corrective action. Instead of fixing the code bug causing the memory leak (and fixing the process that allowed the code bug to make it into production), the decision was consciously made to just apply the Band-Aid. This prevented reoccurrence, but also masked the root problem. To be sure, writing a bounce script is a lot easier than diagnosing and fixing a memory bug, and is a valid intermediate step to take in case the leak takes a few weeks to fix, test, and deploy, but the easy way out continues to put you at risk over the long term. The root problem continued to lurk in the environment until, just like The Ostrich Postulate says it will, it reoccurred at an even more inopportune time. Had the root problem been fixed when first identified a year earlier, the repeat outage could have been prevented.

And on top of all of that, it turns out that increases in processing volumes had caused memory use on the server to grow, so the leak effectively became larger. Since running out of memory on the server causes an incident, the nightly bounce would have only worked for a few more weeks before it would not have had enough, memory would be consumed by midday, and daily outages would have begun to occur. This is a classic example of The Ostrich Postulate in action.

[End of Excerpt]

I’m sure you can see how down to earth and plainly written this book is. This is why I highly recommend it to anyone involved in IT or who has to deal with performance issues in the workplace at all. Whether you work for a big enterprise or are a small IT company, the practical advice in this book is invaluable! Even those involved with building reports and metrics for the IT division will find this invaluable!

I have no reason – no monetary compensation other than perhaps an Amazon affiliate payment if you buy through my site – to promote this book. It was given to me free prior to the guests being on my show, but I’m telling you, I CANNOT put it down!

I am sharing this with you because I think YOU – my readers – can benefit from this book. This is a MUST READ!  It has been an enormous help to me already in my work!

11 Cron: A feature in UNIX systems (which stands for “command run on”) that allows commands or scripts to run automatically at a scheduled time.



Achieving IT Service Quality: The Opposite of Luck (Paperback)

By (author) Chris Oleson, Mike Hagan, Christophe DeMoss

List Price: $32.95 USD
New From: $21.74 In Stock
Used from: $13.85 In Stock



bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

VN:F [1.8.4_1055]
Rating: 0.0/10 (0 votes cast)
VN:F [1.8.4_1055]
Rating: 0 (from 0 votes)
Share This Post
Join Our Email List
Email:  
For Email Marketing you can trust
Join today and receive a FREE copy of our "Why is My PC So Slow?" eBook!
BlogTalk Radio
MICE Critical Alert!
Phone number

Carrier

*Standard text messaging rates may apply from your carrier*

Bad Behavior has blocked 468 access attempts in the last 7 days.