8 Hardest Storage Lessons Learned

Battle-scarred IT veterans talk about experiences that left a lasting impression

May 8, 2007

16 Min Read
Network Computing logo

We don't usually think of storage as character building. But anyone who's ever spent more than a few years in a data center has a few battle scars -- and stories -- from experiences that left a deep impression. Byte and Switch talked to a number of storage professionals about what happened, what they learned, or what they'd do differently if faced with the same conditions again.

The surprising upshot here is that many of their lessons weren't about technology -- there's a smattering of the "Boy, did we really underestimate the I/O queue depth when we moved over to our Fibre Channel SAN."

But many of the IT pros we spoke to checked in more with common sense lessons or discomforting experiences about user feedback (ungrateful) to the nature of vendors (lies and half-truths). Our sample shared how they learned to navigate the shifting, uncertain currents of internal politics, burned by their own bosses, senior execs, or department heads they misread or failed to cultivate. Of course, there was also a bit of technology that didn't work as promised, or the gremlins in the infrastructure, long dormant 'til the master plan for upgrades got activated.

Good times, the sort any employed storage professional never forgets. From the storage trenches come these lessons learned the hard way.

The Lessons:

The Editors, Byte and Switch

Next Page: Least Expensive Isn't Always Cheapest

Five years ago, when SANs were first all the rage, Joseph Foran recalls the RFP his company put out. He was working for a filtration manufacturer at the time, and, like most enterprises, the company was looking to centralize its storage and get a handle on the "runaway growth" of data in its database and ERP systems.

EMC, Hitachi Data Systems, IBM, and Network Appliance responded, and Foran says they went through three separate rounds with the vendors. "Sales guys came in with brief overviews, and we saw demos at customer sites. Then we did a financial piece where we were going to get all the information that we needed, what it would cost, and what our upgrade paths would be," says Foran, now director of information technology for FSW Inc., a nonprofit healthcare services company in Bridgeport, Conn. "That part no one completed successfully."

EMC and Hitachi made the shortlist, but then EMC undercut its price on what was then the highest end of the Clariion line before customers got steered over to Symmetrix. Foran says price, coupled with EMC's nearby headquarters in Hopkinton, Mass., made it a clear choice for the filtration maker.That may have been the happiest part of the whole process, as Foran describes it. The SAN "under-performed and a lot of features in the RFP went from 'promised delivery' to 'value add' very quickly," he says. Then came the disk upgrade prices, which were about twice the cost of upgrading a NetApp filer or Hitachi Thunder. Instead, he went for a higher-end EMC model with sufficient capacity and speed. "It resulted in about $100,000 of additional cost," Foran grimaces.

"There was a big disconnect between what we expected and what they actually delivered," he says. His advice? "I'd say get every last detail in ink. We probably still would have gone with EMC because of their strength and the closeness of their headquarters, and level of support -- even if they hadn't been the least expensive."

The ironic P.S. on this incident is that Foran learned after leaving the company that his successor ripped out the SAN he'd put in. The vendor that replaced it? EMC.

Next Page: Trust, But Verify

It's tough to find a storage pro these days who buys into vendor hype. Indeed, judging from SNIA's latest study, vendor credibility with IT customers is disturbingly low. (See Users Blast Vendors in SNIA Survey.)But slipups happen. A few years back, Raul Robledo, now a storage specialist with the Affinion Group, a relationship management company, was working somewhere else when he ran into his first -- and last -- hype experience.

"I've found in this industry that you really need to do benchmarks, or do a proof of concept on any product you're considering," Robledo says. "Vendors tell you what you want to hear."

His former employer had a large network populated with every leading IT vendor -- Brocade, Hitachi, HP, 3PAR, NetApp, Microsoft, and Sun were among them. His boss decided to implement software from an SRM startup called AppIQ, which HP later bought and incorporated into its Storage Essentials products (remember, this was a few years back). (See HP Reshuffles More Software and HP Sticks to SRM Guns.)

"I think we were so impressed with what the product could do that we decided to buy without testing," Robledo says. When asked, he says his boss made the final decision.

But neither vendor nor customer had done due diligence. When it came time to install the software, there were a number of unpleasant surprises. It turned out the company Robledo worked for didn't have the right software revision levels in switches, HBA drivers, or Oracle databases to support the new package.Had they known ahead of time how long it would take to set things right, the pair wouldn't have acted so soon or invested so much. "We were frustrated," Robledo says. "We purchased a product to help monitor our SAN, and we weren't able to get it off the ground."

In the end, the ROI was extended, as the team underwent a long process involving months of scheduling downtime to install patches and firmware upgrades to support their chosen SRM tool. It worked out in the end, but Robledo hasn't forgotten his lesson: Test or be tested.

Next Page: Copper Show Stopper

The smallest detail can completely derail even the largest storage implementations, warns Josh Howard, storage specialist in the consulting division of CDW.

The exec went on to describe a snafu that occurred during a recent project at a major consultancy firm. "We had an issue with a SAN-to-SAN replication," he says. "When we got there to do the install, there wasn't the ability to connect from the SAN routing architecture to the LAN architecture."Specifically, there wasn't a single copper RJ45 connection on the firm's LAN switching architecture. Copper connections are typically deployed within networks as a cheaper alternative to more expensive optical interfaces, although this client relied more on optical interconnects.

The exec admits he was thrown by the missing RJ45. "I have probably done at least a hundred SAN replication projects over the years, but this was the first time that something like this had cropped up -- you kind of assume that everybody has copper ports," he says.

He considered adding an optical transceiver to the switch, but this was soon ruled out. "The [SAN] hardware vendor that we were working with did not support it."

Ultimately, Howard and his team opted to add copper connections to the client's switching infrastructure, although this was not without its own frustrations. "The modules needed were not in stock -– that was a tiny detail, but it put us back about a week in the implementation project," he adds.

Building the copper interfaces into the client's architecture cost about 1 percent of the project's total $250,000 price tag, according to the consultant, who confirmed that EMC's MirrorView product was used for replication.For Howard, the experience underlines the importance of meticulous planning when it comes to a storage overhaul. "You have got to plan -- you need to do an inventory of what you want to connect to the new architecture," he says, adding that he now makes a point of asking his clients about their copper connections.

Next Page: Take No Snap For Granted

Whitney Kuszmaul isn't one for loose ends. The network manager for the Cleveland Indians baseball team has had just three instances of downtime in eight months on an infrastructure that includes an ever-expanding 8-Tbyte SAN with EMC CX400 array, 26 servers (19 of them virtual via VMware), multiple remote field workstations from IBM, backup software from CommVault, and a disk-to-disk-to-tape video backup system from Overland Storage -- just to name some highlights.

Indeed, Kuszmaul's 2004 installation of the Indians' digitized video system, which records every pitch and every at-bat for every player in any Indians event, has drawn industry attention at various shows and conferences. (See Tape Triumphs Despite Bad Rap.)

But one of Kuszmaul's most painful IT lessons came not from inaccessible clips of C.C. Sabathia, but from a snafu involving email."There's always something to learn, but I remember when I had to restore Exchange from scratch," Kuszmaul says. He was installing an update from Microsoft, so he used the VMware Volume Snapshot Manager (VSM) to record the state of Exchange before installing the service pack. After applying the upgrade, he went to retrieve the snapshot -- and found his data corrupted.

"When the patch was done and we went to remove the snapshot, the node hung and corrupted the data store. We had to take everything out to apply the original VM. We killed the snapshot, then laid down the most recent backup," Kuszmaul recalls.

There was no impact to his network, but Kuszmaul learned something nonetheless. Calls to VMware revealed that the Indians' Exchange setup had too many transaction logs for the snapshot utility to handle.

Kuszmaul assumed the email logs would back up safely to his virtual environment utility -- and therein lay the problem. "I learned that you don't do VSM snaps on transaction-based applications like SQL and Exchange."

In other words, take nothing for granted.Next Page: Control Issues

Call Steve Damadeo old fashioned -- while he didn't have unlimited budget, he also didn't believe in going out and buying storage gear for the sake of losts of extra, fallow capacity. "What I didn't want to have happen was this idea of 'Let's throw money at the problem, and expand and expand and expand,' " says the IT operations supervisor for Festo Corp., a manufacturing concern in Hauppauge, N.Y. "We needed some sort of internal control of information."

Translated, that meant limiting the amount of data each user could store on the server and instituting policies about data types and acceptable use (and enforcing them, but more about that in a bit). Damadeo supports 220 users onsite and about eight remote users.

All this came to a head a couple years ago when Festo's North American operations went from direct attached storage to a SAN. "We bought an HP EVA1000. Ironically, you think the hardware issue is going to be the biggest problem, I've had almost no issue." In three years, only one or two drives have had to be replaced, he says.

"My mistake -- and I helped write the presentation for executive management -- was not making internal controls part of the deal," Damadeo says. "And we let the internal control part fall away -- quotas and limiting what is stored and what is not. We don't need to store lots of copies of the same files, or movies, or 5-6 Gbytes of mail per person. The hard part is keeping all the egregious nonsense off the network."Why so cranky? Turns out the main file server blew up before the SAN was purchased and implemented. "Try telling 300 people we were re-building and they'd have to wait while we restored the system" from direct attached, he says. He and his crew stayed til 3 a.m. to bring the network back.

Festo had a couple policies that were instituted on the SAN when it went live in 2004. First, the SAN handles only email and file server data; smaller applications have been kept on direct attached. Secondly, Festo decreed that all personnel data would be stored on network drives, not external storage. "I've had to revisit that with our HR director," Damadeo says, adding that old habits die hard.

In August 2005, Festo adopted a flexible quota of 500 Mbytes per user, after Damadeo measured a 150 percent increase in the amount of music being stored on the SAN. "The quotas rectified that in a hurry," he says. Users must undertake an upgrade request process if they find they need substantially more.

"We don't have a quota on email, but I anticipate we'll likely add them in the next year -- and get raked over the coals by execs" as a result, Damadeo says. "We do not, to date, restrict department shares or non-project based data, like we do with normal user data. We will be putting into use an archival data and destruction system for those. What that's designed to do is prevent us from carrying forward antiquated data that hasn't been touched in 10 years that's still deemed necessary."

Getting executive buy-in for IT upgrades can be challenging, but that's not the thorniest issue Damadeo runs up against. "Where it becomes difficult is to get those same people's support for enforcement of rules they've asked us to implement. The people you think are going to back you are often the people who are going to sabotage you."Next Page: When a Vendor Stumbles, Badly

Jonathan Wynn was in a bind. He'd picked the wrong vendor, and someone in finance had signed the paperwork to install a new business process automation (BPA) system for his employer, Del Monte Foods.

"We had sort of a loose RFP," Wynn, now the manager of advanced technologies and collaborative services at Del Monte, recalls. "We were sold a bill of goods."

It wasn't entirely his fault. Wynn, who works as an in-house liaison to Microsoft, ensuring that Redmond's technologies are adopted effectively by his company, was given a recommendation by Microsoft contacts on the new BPA vendor -- who, as it happens, had supplied Del Monte with a solid solution in another part of IT. He also had a clear set of requirements for creating a system that could gather content across the full range of Del Monte's business applications, from Exchange and Sharepoint to mainframe-based financials.

"The vendor came in with a good sales guy, and he had a little engineer with him, a kid sort of wet behind the ears, eager to help," Wynn recalls. "They sold me a bill of goods."But when it came time to install the software, there was trouble. "They had misrepresented themselves. They couldn't do it. They just couldn't do the job. They were struggling with the first line of code… It was a horrible experience."

It got more horrible as Wynn tried to fix things. Though Del Monte had staff ready to help the vendor with the installation coding, after about three weeks it was plain the best option was to remove the vendor from the network. "It's terrible the battling that goes back and forth when you try to disengage with a vendor. I had gone down a terribly dark route."

Since Wynn and his team had confirmed the sale, lawsuits were threatened. "There was a lot of mudslinging and meanness," Wynn recalls. Finally, he had to pay an undisclosed sum to settle the matter. "I wasn't going to fight them in court. It had just spiraled out of control."

Wynn knew his credibililty was damaged. But he was determined to rise above it. He sought and found an alternative vendor. "They came in and didn't get on the bandwagon badmouthing what had gone on. They just said, 'Let's not talk about it, let's just do it.' "

He also verbalized his intentions to his colleagues. "I said, 'Give me a chance. I've got it now.' " To the new vendor he said, "I need people who'll be with me for the long haul. I'm going to be successful, And I'll go to bat for you later if you do right by me now."The tide turned. The new BPA solution worked like a charm. The lawyers stopped calling after about two or three months. And when Wynn visited Redmond and told his Microsoft contacts who his new vendor was, they approved: "Great vendor. You're on the right path."

Next Page: The Broader the Input, the Better

By his own admission, Paul Macht was pretty impressed with himself. As his daughter Jesse prepared to head off to college, Macht, senior enterprise architect of IT for Duke Medicine and Duke University Health System, assembled the latest processor, motherboard, chipset, and memory so she'd have the most screamin' PC on campus.

"We flipped it on and the desktop was there almost immediately. And I asked her what she thought, and she sorta shrugged and said 'It's OK,' " he recalls. He laughs about it now, but says it was a good lesson that users aren't going to reward you for things working fast or even correctly. "Typical user reaction is either 'It's slow and it sucks,' or 'It's OK.' And my daughter saying that reinforced the idea that people aren't going to give you praise for a computer that performs well."

How does that play out in the data center with storage? It's Macht's position that unless you combine the perspectives of the Oracle DBA, the system DBA, and the storage architect, you'll probably end up with a blasé user or a project that's fraught with bugs and problems."It's my job to bring those three layers together in the architecture design. No matter how we act as individuals, we're never going to get more than an 'It's OK,' unless we act as a single group for large, deployed applications."

Macht knows a thing or two about "large, deployed applications." In addition to overseeing the IT needs for the school of medicine and the three hospitals that comprise the Duke University Health System, he works with more than 10,000 end users and an IT department of about 500. With four data centers and more than 500 TBytes of data in play, Macht has worked on making available the associated data for apps associated with critical care functions; finance and accounting; and other ancillary apps.

And a broad-based conversation is equally useful for something as basic as how the SAN, databases, and systems each do their striping of data. "If they don't talk, you'd have a 25 percent success rate, but if they do talk you get closer to 90 percent."

Those kinds of odds make it all worth talking about.

Next Page: Open-Source Pandora's BoxOne IT manager explains that one of his toughest lessons was learned during a major storage deployment that relied heavily on open-source software. "We were doing a large, complex deployment of hardware and services," says Shlomi Harif, director of network systems and support at the Austin Independent School District

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights