From a storage-recovery perspective, a full-redundancy strategy also implies real-time or symmetrical mirroring. As data is written to storage platforms in the primary location, it is also written to identical storage platforms at the backup site. Typically, a high-speed network exists between the facilities to support these mirroring operations.
This is the strategy Daniel Crespo-Dubie, general manager of IT communications, and Joe Phillips, manager of systems programming, are developing to protect the mission-critical operations of KeySpan Energy Corp., Brooklyn, N.Y. Their mirroring strategy is enabled by three central facts: the availability of two data centers interconnected by a home-grown fiber network, the presence of a centralized storage infrastructure and top-down corporate agreement about the critical nature of continuous access to key applications.
KeySpan has grown over the past five years through a series of corporate acquisitions and mergers, including both energy and telecommunications companies, to become one of the largest energy and services companies serving the Northeast. In the process, several data centers have come under the aegis of the holding company. These have recently been consolidated into two entities.
Crespo-Dubie notes that the two sites are connected via a privately owned fiber optic network originally deployed to handle ESCON traffic between mainframes and peripheral devices. When a redundancy strategy was established for business continuity, network bandwidth needed to be expanded to support the mirroring of storage.
"We had 48 strands of fiber connecting the sites that were maxed out by mainframe connectivity requirements," Crespo-Dubie says. "When we decided to add mirroring for open-systems storage, we had two options: hang more fiber for several millions of dollars or use multiplexing." He noted that, in addition to the expense, the prospect of running 30 additional fiber pairs created a maintenance concern.
Crespo-Dubie went to the street for an alternative solution and considered fiber multiplexing products from Cisco Systems, IBM and Nortel Networks. Cisco's Metro 1500 Dense Wave Division Multiplexer (DWDM) got the nod following a competitive analysis. The Cisco product "gave us 32 simultaneous conversations per pair and helped us to make the best use of the fiber we had already hung," Crespo-Dubie says. The DWDM solution, at approximately $250,000, also cost significantly less than the alternatives.
Deploying adequate network bandwidth removed one of the obstacles to the storage-recovery solution envisioned by the ad hoc team charged with contingency planning, which Phillips headed. He notes that one challenge was to consolidate open systems and mainframe storage in a coherent infrastructure that could be managed and mirrored efficiently. The technology operations group agreed on a storage infrastructure that comprised EMC Symmetrix 5700 and 8730 series disk arrays interconnected using Brocade Communications Systems Fibre Channel switches. Two 5700s were deployed at each data center to serve the storage-mirroring requirements for mainframe data, amounting to approximately 5 TB. Additionally, dual 8730 arrays were deployed at each location to mirror nearly 3 TB of open-systems data. The strategy leveraged EMC's Symmetrix Remote Data Facility (SRDF) mirroring software and used switched Fibre Channel as the interconnect protocol.
"Symmetrical mirroring," Phillips says, "lets us recover storage for critical applications in less than an hour." He notes that the solution was implemented more quickly with mainframe-based data than it has been with open systems, where the effort is ongoing.
Some of KeySpan's business units, Phillips explains, continue to use servers with their own dedicated SSA (Serial Storage Architecture) disk arrays. He adds that he is attempting to convince the holdout business units to migrate their storage to the EMC platform by stressing that their data, which is backed up to tape via IBM's Adstar Distributed Storage Manager (ADSM), will require five days to recover in the event of an outage. Moreover, he notes, IBM has said it plans to drop support for ADSM in the near future in favor of Tivoli Storage Manager (TSM). In the meantime, KeySpan is in the process of deploying TSM for enterprisewide storage management.
Another challenge to the mirroring strategy, according to Phillips, is the identification of mission-critical applications. One mainframe application that immediately comes to mind for both Phillips and Crespo-Dubie is the Computer Aided Restoration of Electrical Services (CARES). CARES is used to correlate trouble reports from the company's energy customers so that transmission problems can be identified quickly and repaired. Crespo-Dubie, who regards CARES as the No. 1 application-recovery target, says the company's well-earned reputation for better-than-average restoration practices is directly linked to the application.
"During a winter storm," Crespo-Dubie says, "we experience demand spikes and display the load on our power grid based on input from systems that capture feeder information." When an outage occurs, this data provides one method for localizing the fault. CARES adds to this capability by taking the additional input from the "more than 1 million telephone calls per hour" that KeySpan receives during the storm. CARES correlates the calls with the display, helping to further localize the fault in some cases.
According to Phillips, the mirroring solution is tested twice a year through a process of real-time failover from one data center to the other. "Our data centers are both production facilities," he notes, "but they have sufficient spare storage capacity on each site to enable the load to be switched over in the event of an emergency."
The energy company IT manager points out that, with KeySpan's mirroring solution, storage recovery is possible at the flick of a switch.
Jon William Toigo is an independent consultant. Send your comments on this article to him at jtoigo@intnet.net.