DATA CENTERS

  • 11/24/2010
    10:00 AM
  • Network Computing
  • News
  • Connect Directly
  • Rating: 
    0 votes
    +
    Vote up!
    -
    Vote down!

Automation Not The Solution For Human Error

We've all had our share of misfortunes with IT devices and services that have failed to perform as expected in an increasingly information-centric world. But as much as we may want to fault the technology, it appears that we are to blame in the majority of cases, at least as far as data-center outages. The solution is not to replace humans with lights-out automation, but provide better training, processes and procedures, says Julian Kudritzki, vice president of the Uptime Institute. "It's the sa
We've all had our share of misfortunes with IT devices and services that have failed to perform as expected in an increasingly information-centric world. But as much as we may want to fault the technology, it appears that we are to blame in the majority of cases, at least as far as data-center outages. The solution is not to replace humans with lights-out automation, but provide better training, processes and procedures, says Julian Kudritzki, vice president of the Uptime Institute. "It's the same things over and over causing the failures, either the lack of processes, procedures and training, or the procedures are not followed."

The institute recently published the Operational Sustainability standard to address the human factor. According to a recent survey from the Ponemon Institute, 95 percent of U.S. data centers have had an unplanned outage.

Respondents averaged 2.48 complete data center shutdowns over the two-year period, with an average duration of 107 minutes. While complete shutdowns are frequent, row or rack-based outages had an average occurrence of 6.8 times with an average duration of 152 minutes. Rack-and server-based downtime had an average occurrence of 11.2 times during the two-year timeframe with an average duration of 153 minutes. While not the biggest factor, accidental EPO (emergency power off)/human error accounted for 51 percent of the outages.

Kudritzki says human error is in fact a bigger problem, accounting for up to 70 percent of data-center outages. The institute has been gathering data from over 100 of the largest most critical sites globally since 1994 (Abnormal Incident Reports), and with just under 5,000 reports in, including 500 on full data-center shut-downs, over 73 percent of events were attributed to human factors.

The problem of human error also seems to be worsening, he adds. "When looked at over the last one-and-a-half to two years, we've actually seen a slight uptick in process-related failures. There's a lot of work we need to do as an industry to address this."


Log in or Register to post comments