Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

6 Critical Data Center Concepts

Today, integration and cross-training is becoming the name of the game in IT. Networking pros help out with storage, and data center engineers make sure they stay up to date in security. But there are certain data center topics the entire IT team must learn and keep in mind. This article will provide a brief overview about these concepts.

Disaster recovery
When designing a data center, we choose reliable equipment, components and technology, but eventually every system fails. Thus, resiliency is an important aspect of the design plan. Disaster recovery is the response and remediation that a company follows after a planned or unplanned failure. Businesses often have a secondary data center used mostly for backup. This facility can take on responsibility in the case of a primary data center failure, but the recovery time will depend on business requirement like cost, distance between data centers, application criticality, and scope of the recovery.

The amount of data loss a company can tolerate, also known as its recovery point objective (RPO) is a very important parameter. Recovery time can be between 2 hours to days or even weeks, based on the company’s applications. Highly critical applications require less downtime and less data loss. For that reason, a disaster avoidance solution might be a better option for businesses with many highly critical applications.

Disaster avoidance
Early action can be taken to protect business critical applications from failure, thereby avoiding downtime and data loss. Consider a scenario in which you have been informed that a storm is coming and will hit your data center within 6 hours. If all the application workload can be moved to another company’s data center, your operations can proceed normally.

Three critical elements play a key role in disaster avoidance. First, you need to know that a disaster is coming. Second, the two facilities should be close enough to perform live migration or what VMware calls vMotion. (Note that VMware is discussing long distance vMotion, however.) Third, the data link carrying the application workload between data centers must be big enough to carry all the memory pages involved in the live migration of the virtual machines.

Although disaster avoidance includes the term "disaster," this process can also be used for maintenance purpose. There are times you may migrate a workload to another data center for routine maintenance.

Many other technologies can be included to support a seamless user experience during disaster avoidance. IP localization, clustering, inbound traffic optimization, and load balancing are just a few.

Active-active data centers
Rather than running all applications in one data center with another for backup, some businesses chose the option to distribute applications to more than one data center. If you have two data centers, for example, you can run some applications in one data center, and some in the second. Or the same application could operate run from both data centers in scale-out fashion. Both are considered an active-active data center model because the two data centers are actively used and resources are not sitting idle. Data loss and recovery time are expected to be zero or near zero.

The design of an active-active model is different from a compute, storage, applications and network services standpoint than a traditional design. Replication, stretch clustering, scale-out design, IP localization, layer 2 extension, load balancing, and state synchronization are the main characteristics of an active-active data center design.

RPO and RTO
We have already mentioned data loss and recovery time, as well as RPO. The recovery time objective (RTO) is the time it takes applications to come back up after the planned or unplanned failure. Backup technologies can be used to avoid data loss, but replication technologies at the host, application or storage level are also used to decrease RPO.

To support active-active data centers, avoid traffic hair-pinning, and provide read/write capability to storage, you may need to implement a solution specific to your storage environment. This can helps to reduce RTO, because storage at both data centers can be leveraged by applications.

To reduce RPO at the storage level, a synchronous replication solution can be implemented. Latency should be within the limit of the replication solution to avoid poor application performance.

East-west and north-south traffic
Applications residing in the data center can be built in many tiers. Three-tier application architecture is well known and commonly implemented by developers. Applications may interact with services such as Active Directory, DNS, and DHCP, and also can interact with other applications. Within or between data centers, if the application talks with other application or within application tiers, with other components, or with common services, the traffic pattern is called east-west.

When web servers talk with application servers or application servers talks with database servers, these interactions are examples of an east-west traffic pattern. Applications can be accessed from campus networks, branch offices, the WAN, or the Internet. In this case, applications are considered to be to the south, so the communication is defined as north-south.

Leaf-and-spine architecture
The leaf and spine network architecture, or distributed core, was developed to allow traffic to scale, and to travel in east-west routes, in addition to the more traditional north-south patterns. The model consists of two components: spine switches and leaf switches. The main idea of the leaf-and-spine architecture is that every leaf connects to every spine and every spine connects to every leaf. In general, leaves are not connected to each other and spines are not connected to each other.

For specific and critical workloads, leaf devices can be connected to each other, providing even lower latency. On top of leaf and spine physical architecture, it's possible to deploy Layer 2 or Layer 3 protocols such as TRILL, FabricPath, SPB, or VXLAN. Cisco recently announced that its ACI fabric uses VXLAN tunnels on top of leaf and spine architecture with the Nexus 9000 series switches.