Special Coverage Series

Network Computing

Special Coverage Series


Oregon Storage Debacle Highlights Need To Plan For Failure

When a storage upgrade went haywire for the state of Oregon, the results were disastrous. But the IT gaffe serves as a lesson for all in contingency planning.

Every so often, a public IT failure provides a reminder of why it’s important that IT folks pay attention to the little things. Just such an incident landed the state of Oregon in the news recently, when a storage upgrade went awry, causing what a state spokesman reportedly called a “catastrophic failure” that cut off the state’s storage area network from the agencies it services.

The impact of the failure was felt throughout the state. Child support payments and unemployment checks for new recipients were delayed. Employees couldn’t access email. The forestry service lost access to maps, the state’s job search portal crashed, and overnight computing processes were interrupted. In other words, it really was a catastrophic failure--the kind of nightmare scenario that IT folks dread.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Naturally, blame has to be assigned, and early indications are that the state is pointing the finger at its storage vendor, Hitachi.

But Greg Schulz, a senior advisory consultant with research firm StorageIO, said there’s plenty of blame to go around in such situations, and that the state may also want to take a look in the mirror.

“Ultimately, you--the deployer--are responsible,” Schulz said. “Did everything get outsourced to Hitachi, or did they have oversight? They have to do a post-mortem, address how it happened, how it could have been prevented, and look at what their options are.”

Schulz also said there’s an opportunity to learn from the storage fiasco--both for the state of Oregon and for IT departments everywhere. One of those lessons, he said, is to always have contingency plans in place.

“Fundamental IT 101 is that all technology will fail, despite what the vendors tell you,” Schulz said. And the most likely time technology will fail, he notes, is when people are involved--doing configurations, making changes or updates, or performing upgrades.

[Get tips on how to shake up your continuity training in "Creative Tests for Your Business Continuity Plan."]

The prospect of such failures should motivate organizations to perform more due diligence when buying and deploying storage and other infrastructure technology so that they can minimize potential damages. Specifically, there are three steps organizations should take, according to Schulz:

• When making buying decisions, companies should think hard about how they’re going to use new tools. Businesses that jump into a technology purchase without thinking through use scenarios may run into problems down the road.

• Vendors might say they can address any issues online, but Schulz suggests asking them to put you on the phone with another customer under a non-disclosure agreement before making a purchase so you can candidly ask what to expect when things don’t go quite right.

• Be clear about the availability you require from each of your applications, and make sure you replicate the ones with high-availability requirements in a parallel system to protect them from inevitable failures.

Schulz suspects that Oregon could have minimized the damage of its recent incident if it had ensured higher availability for its services in the event of failure. Rather than throw its vendor under the bus, he said, the state should focus on answering a fundamental question as it troubleshoots its storage area network: How could this have been prevented?

The simple act of asking such questions could mean that Oregon reduces a future large-scale failure to a relative blip.

“You need to isolate and contain faults to prevent them from becoming a disaster,” Schulz said. “If anything can happen, it will. If there is that chance that it can happen, you mitigate it.”



Related Reading



Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 

Editor's Choice

Research: 2014 State of Server Technology

Research: 2014 State of Server Technology

Buying power and influence are rapidly shifting to service providers. Where does that leave enterprise IT? Not at the cutting edge, thatís for sure: Only 19% are increasing both the number and capability of servers, budgets are level or down for 60% and just 12% are using new micro technology.
Get full survey results now! »

Vendor Turf Wars

Vendor Turf Wars

The enterprise tech market used to be an orderly place, where vendors had clearly defined markets. No more. Driven both by increasing complexity and Wall Street demands for growth, big vendors are duking it out for primacy -- and refusing to work together for IT's benefit. Must we now pick a side, or is neutrality an option?
Get the Digital Issue »

WEBCAST: Software Defined Networking (SDN) First Steps

WEBCAST: Software Defined Networking (SDN) First Steps


Software defined networking encompasses several emerging technologies that bring programmable interfaces to data center networks and promise to make networks more observable and automated, as well as better suited to the specific needs of large virtualized data centers. Attend this webcast to learn the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging.
Register Today »

Related Content

From Our Sponsor

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

Business executives are challenging their IT staffs to convert data centers from cost centers into producers of business value. Data centers can make a significant impact to the bottom line by enabling the business to respond more quickly to market demands. This paper demonstrates, through a series of examples, how data center infrastructure management software tools can simplify operational processes, cut costs, and speed up information delivery.

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Both hot-air and cold-air containment can improve the predictability and efficiency of traditional data center cooling systems. While both approaches minimize the mixing of hot and cold air, there are practical differences in implementation and operation that have significant consequences on work environment conditions, PUE, and economizer mode hours. The choice of hot-aisle containment over cold-aisle containment can save 43% in annual cooling system energy cost, corresponding to a 15% reduction in annualized PUE. This paper examines both methodologies and highlights the reasons why hot-aisle containment emerges as the preferred best practice for new data centers.

Monitoring Physical Threats in the Data Center

Monitoring Physical Threats in the Data Center

Traditional methodologies for monitoring the data center environment are no longer sufficient. With technologies such as blade servers driving up cooling demands and regulations such as Sarbanes-Oxley driving up data security requirements, the physical environment in the data center must be watched more closely. While well understood protocols exist for monitoring physical devices such as UPS systems, computer room air conditioners, and fire suppression systems, there is a class of distributed monitoring points that is often ignored. This paper describes this class of threats, suggests approaches to deploying monitoring devices, and provides best practices in leveraging the collected data to reduce downtime.

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Rack power of 10 kW per rack or more can result from the deployment of high density information technology equipment such as blade servers. This creates difficult cooling challenges in a data center environment where the industry average rack power consumption is under 2 kW. Five strategies for deploying ultra-high power racks are described, covering practical solutions for both new and existing data centers.

Power and Cooling Capacity Management for Data Centers

Power and Cooling Capacity Management for Data Centers

High density IT equipment stresses the power density capability of modern data centers. Installation and unmanaged proliferation of this equipment can lead to unexpected problems with power and cooling infrastructure including overheating, overloads, and loss of redundancy. The ability to measure and predict power and cooling capability at the rack enclosure level is required to ensure predictable performance and optimize use of the physical infrastructure resource. This paper describes the principles for achieving power and cooling capacity management.