DR on the Sly

Unannounced tests, waiting for admin's vacation improve healthcare provider's DR times

August 25, 2006

4 Min Read
NetworkComputing logo in a gray background | NetworkComputing

You may have the most cutting edge technology at work, but you're sunk without properly documented disaster recovery procedures, as one major healthcare provider found out through regular, unannounced tests. (See DR Picks Up Steam and Users Describe DR Detriments.)

Chris Panagiotopoulos, IT director of LifeBridge Health, says that was the lesson he learned after doing surprise DR tests last year to see if his hospitals could recover quickly from losing servers in its main data center.

Panagiotopoulos began running DR tests in March 2005 to see if his team could restore the main data center at Sinai Hospital in Baltimore from an identical SAN set up 10 miles away in one of the network's other hospitals. What began as testing of one lost server at a time evolved into integrated testing of all servers simultaneously by last April as recovery times continued to drop.

"We tell them, 'you can always expect at least one unannounced test a quarter,'" Panagiotopoulos says. "Needless to say, they love me for that."

To further endear himself to staff, he doesn't pick testing times randomly. He waits until key personnel are out, to see if their subordinates can facilitate recovery."We run simulations where we declare disasters at all 11 systems attached to our SAN that have gone offline. Then we see how fast we can bring up systems at our DR site," Panagiotopoulos says. "And I'll do the tests when I know primary administrators are out on vacation, and see if the backup folks can bring us back."

Panagiotopoulos says that creates interesting exchanges from his admins returning from vacation. "In the beginning, procedures were not as detailed as they should have been," he says. "When a guy comes back from vacation, he'll hear 'You made me look bad. You're documentation is not good.'"

The upshot: Panagiotopoulos says procedures are documented well enough now that the hospital system meets its goal of restoring the SAN in less than two hours. When he first started testing, Panagiotopoulos says most servers took at least four hours to bring back and some as much as seven hours because of lack of proper documentation. And he doesn't just stop at bringing back servers.

"After they bring the systems up, I turn it over to an application team to do validation," he says. "We want to make sure the patient registration system is really up and can conduct business."

The testing paid off earlier this year when LifeBridge moved 200 servers to a new production data center at Sinai over a six-month period. Due to configuration issues on the mainframe server, the IT staff could not reboot when trying to bring up the production server a few days after the migration. They had to fail back to the old data center using the same procedures from their unannounced simulations."Three years ago, our recovery strategy was tape," Panagiotopoulos says. "It would literally take days upon days to recover. Now it's two hours under simulated conditions. Whenever we do upgrades or have downtime for preventative maintenance, we have those systems up and running just in case."

LifeBridge set up its DR site in late 2004, installing EMC Symmetrix 8430 SANs with 25 Tbytes apiece in the primary and secondary data centers. Panagiotopoulos added two EMC Centera archiving systems with 20 Tbytes capacity to archive records from its Picture Archiving and Communications Systems (PACS) digital radiography system. (See PACS Poses Storage Challenge.) PACS store 64-bit images, which can take up more than 20 megabytes per image.

The SAN serves more than 6,000 users spread over four hospitals. LifeBridge stores its clinical systems, payroll, patient registration systems, and email on its Symmetrix SANs. PACS servers are connected to Symmetrix, which has faster response times than Centera. After six months, PACS images are moved off to Centera.

LifeBridge still maintains an IBM TotalStorage 3584 Tape Library "as a last measure in case something were really to go wrong."

Panagiotopoulos says, like many healthcare facilities, LifeBridge has gone from a tech laggard to leader. Besides state of the art patient testing systems such as PACS, LifeBridge also uses a Vocera wireless system that lets hospital staff communicate with any other hospital personnel in the building by initiating contact through voice commands."We've gone here from being about five years behind in technology to cutting edge now," he says.

Dave Raffo, News Editor, Byte and Switch

  • EMC Corp. (NYSE: EMC)

  • IBM Corp. (NYSE: IBM)

  • Vocera Communications Inc.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights