home news blogs forums events research newsletter whitepapers careers


Network Computing Network Computing Network Computing
HOT PICKS

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers



The Business of IT
C E N T E R F O L D  
App Monitoring Minus False Alarms

  February 6, 2003
  By Kelly Higgins


TOC Issue TOC
Printer Print full article
E-Mail E-Mail this URL
Discuss Discuss this article
flame author Flame the author

No harm, no alarm: That's Kaiser Permanente's litmus test for keeping applications management manageable.

Kaiser's Sun Alpha Support Services (SASS) group came up with this approach for deciding which elements of an application to manage. By minimizing the false and extraneous alarms Kaiser's application monitors generate, serious problems aren't overlooked just because technicians are distracted by spikes in a disk's I/O rate.

SASS, which monitors hundreds of Kaiser's OpenVMS and Tru64 Unix systems and applications, is selective about that monitoring. It might keep track of the disk capacity of its laboratory application, but not the I/O rate to each and every disk. That kind of performance typically isn't noticeable to users anyway, and the alarms it would generate could distract SASS from real problems with an application.


The high-stakes applications, including Kaiser's laboratory, medical records and pharmacy tools, are the reason the nonprofit health-care organization puts so much stock in managing its applications--and doing it efficiently. If one application in the organization's laboratory system chain fails, for instance, it affects the medical staff and patients, too.

Discuss Join other NWC readers in discussing this article.
"We know within a minute if an application is having a problem," says Chip Gauthier, manager of SASS for the Oakland, Calf.-based HMO. "Instead of waiting for users to call in, we're more proactive."

Like most businesses, Kaiser operates with an especially tight IT budget these days. There is little wiggle room to hire more labor. So the organization relies heavily on monitoring tools to help track the company's growing population of servers and applications.

Kaiser's SASS group recently cut some of its overhead by adding a new management console, Heroix Corp.'s RoboCentral, to centralize alarm monitoring. The system eliminated the need for a full-time staffer to gather and consolidate alarms, Gauthier says.

Code Red

One of Kaiser's applications, Gauthier says, is its family of OpenVMS-based lab applications. Kaiser's lab instruments are connected to the applications, so test results are sent to the system automatically. When a nurse orders a blood test using Kaiser's medical-record-management application, that order feeds into the lab-management system. If the lab application suffers a disk crash, the nurse can't get the test results for a patient, and that patient's treatment could be delayed.

So Kaiser runs application-monitoring tools to detect problems before or right as they occur. "In a network as large and complex as ours, no one person can understand what's going on," says Ralph Wagenet, a senior technician for Kaiser who handles day-to-day application-monitoring duties for the SASS group.

Kaiser's technicians write interfaces for these applications and load the Heroix RoboMon software-based monitors onto the application server. When an application's response time exceeds a predefined threshold, the monitor generates an alarm. The alarm is automatically fed into RoboCentral, and from there, the alarms and information are sent to the organization's Tivoli Enterprise Console management system, which forwards alarms to Kaiser's helpdesk application. There a trouble ticket is opened and the appropriate technician is paged.

The monitors track various conditions. One set of monitors Kaiser built checks whether transactions between its primary and backup database are working. The monitor is set to restart the transaction or to page a technician if the problem escalates.

But there's a catch: To get the monitors to work with the applications, Kaiser has to write custom interfaces because many applications don't come with performance-monitoring APIs. And, Wagenet says, you need a way to detect specific activities, such as when an interface within an application fails. "It's hard to find a hook that gets you the data you really want, like the status of an interface inside an application," he says.

In addition to RoboMon, RoboCentral and Tivoli, SASS uses a management tool called Fortel's SightLine, which tracks application performance. RoboMon and SightLine aren't integrated, but SightLine sends its performance analysis to Tivoli.

Wagenet says most problems with an application aren't performance-related. They are more about the application's availability, such as when a disk crashes or a controller fails.

And, as Kaiser has learned, redundancy built into the network and systems architecture can elicit problems of its own. Kaiser's large number of backup servers, databases and network switches makes tracking applications more complicated. Software bugs or a disk failure in a primary or secondary server can go undetected since the network automatically shifts traffic to the backup device. "Sometimes you don't know something's broken until another piece you're backing up breaks," Wagenet says. "It's important to have monitors detect these failures or else a second failure can result" and wipe out the application.

False Alarm

Yet even with Kaiser's careful planning and selection of what it watches in its applications, it still isn't immune to false alarms. Wagenet and other SASS technicians often chase down what prove to be extraneous alarms, like those caused when a threshold for a dial-up connection isn't set correctly. One way to stem this particular type of alarm, Wagenet says, is to build in automatic retries for a dial-up connection rather than sounding an alarm every time the connection fails or drops.

The frequency of alarms is inconsistent, too. Some RoboMon monitors trigger alarms every five minutes, but others not frequently enough, Wagenet says. Some times the team gets multiple alarms for the same problem. The good news is that you can keep customizing the monitors to adjust alarm overkill or frequency, he says. But it's not possible to eliminate false alarms entirely without the risk of missing relevant ones, Wagenet says.

Next for Kaiser is a big database upgrade for its laboratory system in the organization's data center in Corona, Calif. Kaiser is replacing its old MUMPS-based database with InterSystems' Cache for storing different types of patient data. Bottom line: The SASS group will have to retire some of its old application monitors and create new ones that support the new database.

It still will take some sleuthing to pinpoint problems the monitors identify in the new or newly tweaked applications. Even though Kaiser can customize its monitors to watch specific functions, managing applications is not a science. "There's a lot of trial and error," Wagenet says. "We look at what breaks and try to figure out why it's breaking and how to reduce the probability of it breaking again."

Tell us about you Network and we may profile it in a future issue. Send e-mail to centerfold@nwc.com or call (516) 562-5914.

The Hard Sell
It came down to sleep deprivation. Kaiser Permanente's Sun Alpha Support Services (SASS) technicians were getting paged at all hours of the day and night for everything from major outages to a printer on the fritz, so the group decided to do something about it.

Fast forward nine years: Not only does the SASS group get more shut-eye, but it's still using the latest version of the same application-monitoring solution the health-care organization purchased back then to keep tabs on its sensitive medical, laboratory and other applications. Kaiser spends about $46,000 per year in upgrades to the Heroix RoboMon software--now in version 7.0C--and its new management console, RoboCentral.

What sold Kaiser's senior management on the tools then, and now, is SASS's estimates that the application monitors save the organization's labs and other clients about $12,000 per hour in application-outage costs, says Chip Gauthier, manager of SASS for the Oakland, Calf.Ðbased company. "Once I showed my clients and lab technicians they were going to get improved application availability, [the implementation] got easy approval" within the organization, he says.

Sticking with the same application-monitoring tools has also been a necessity given the rules and interfaces SASS has developed for the tools over the years and because it's just plain expensive to make a change. "It does the job, so there's no reason to go out and replace it," Gauthier says.

It also doesn't make sense to invest in new tools for the aging Digital Alpha OpenVMS and Tru64 Unix platforms. Kaiser won't install any new applications on these servers, which run the organization's key laboratory, medical and pharmacy applications, Gauthier says. That doesn't mean Kaiser will tear out the Alpha platforms running its key applications any time soon. "It takes a lot of money to migrate to a new radiology or lab system," Gauthier says. "As long as the software vendors support the platform, it's not going away."

Meanwhile, Gauthier says health-care organizations need to do more than just ping their network devices and servers. It's crucial to give users service-level agreements like Kaiser's that guarantee uptime, he says. "My job is to have the systems and applications available 24x7 at any cost," he says.

15 Minutes
Chip Gauthier: Manager, Sun Alpha Support Services (SASS),Kaiser Permanente, Oakland, Calf.

Chip Gauthier, 47, has spent 20 of his 24 years in IT with Kaiser Permanente. He is responsible for the healthcare organization's HP Alpha and Sun server platforms, which run Kaiser's massive laboratory information system and other mission-critical applications. Gauthier's team uses monitoring tools to keep tabs on hundreds of OpenVMS and Tru64 Unix systems and applications.

Education: B.S. in Hospital/Health Care Business Management, California State University, Domingus Hills

If I Knew Then What I Know Now: I would have gotten a lot more sleep. We knew coding and/or scripting our own monitoring, however crude, would save the company capital dollars. But we couldn't keep up with the demand of proactive problem resolution and high availability requirements [that came out of our implementation]. We found it was nice to have an escalation procedure for anything that died--from a printer to whatever.

Next Time I Will: Install and roll out the application monitoring tools a lot faster across all my platforms. Our proactive application and system monitoring provides high availability to the clinical applications and the lab, radiology and other technicians who use them.

What Sealed the Deal: The deciding factor for the application monitoring tool we picked was a detailed cost analysis that included the savings we would get from reduced outages to our clinical application systems.

Biggest Mistake Made in Technology Circles Today: Not thinking outside the box and asking "What if?"

Just for Fun: My name is Chip and I'm a golf-a-holic. I also like to water ski off the back of my three-man, personal watercraft, Sea-Doo.

Wheels: Pontiac Grand Am. My wife thought it looked cool.

Biggest Bet Ever Made: That I wouldn't still be working for Kaiser Permanente after 20 years. Kaiser has always remained ahead of the IT curve, and the ever-changing technical environment and challenges here have kept me on board.









Ready to take that job and shove it?

Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.










InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Purchase Today: $299
 
ROLLING RIGHT ALONG
Follow key Network Computing Reviews from conception to completion. This Week: Holistic APM.



Network Computing Reports Emerging Enterprise Podcast Series: Secrets to Success








TechSearch


Microsite of the Week


Powerful Information at Your Fingertips



techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics
 
   
   
App Infrastructure   |   Messaging & Collaboration   |   Network & Systems Mgmt   |   Network Infrastructure   |   Security  |   Storage & Servers   |   Wireless   |   Enterprise Apps
About Us  |  Contact Us  |  Site Map  |  Technology Marketing Solutions  |   Briefing Centers
Copyright © 2008  United Business Media Limited  |  Privacy Statement  |  Terms of Service  |  Your California Privacy Rights