I've been thinking a lot about WAN vMotion over the past few months, in no small part because a client asked me to test how their product improved the process. I haven't been the only one, as Cisco's OTV and EMC's VPLEX are directly targeted at enabling the movement of virtual machines from one location to another. Where disaster recovery solutions are designed to minimize service outages and data loss in the event of a disaster, WAN Vmotion, and/or WAN Live Migration with Hyper-V, enables the new practice of disaster avoidance. For instance, an IT department can move VMs from their Mobile data center to Hattiesburg when a hurricane is expected in a couple of days.
I see three different but related problems that have to be solved to make WAN Vmotion practical. The first is that the user and/or internet network the servers connect to has to exist in both locations. For the past 20 years network guys have routed all their WAN connections to prevent broadcast and spanning tree traffic from crowding out real data. Technologies like MPLS and OTV can bridge the subnets while filtering out the stuff we don't want across the WAN and making sure spanning tree doesn't disable the high speed WAN link because the ISDN backup link is up. The second problem is squeezing the vMotion into available bandwidth. A basic WAN optimization problem that Riverbed, F5, NetEx etc. can address with TCP elimination and compression.
The storage side is the most complicated. While you can move the VM's disks from Mobile to Hattiesburg, that is GBs/TBs of data, and moving all the critical data may take more than the two days notice the weather service provides. Plus, we already replicate data to Hattiesburg, because there are some disasters we can't predict. What we really want is for the storage systems in both locations to present the same identity (WWN and all) to the VMware hosts, so when the VM is migrated from one location to another, it can access the data from a local storage system. Compellent's Live Volume and HDS' HAM have been partial solutions to this problem, but EMC's VPLEX does it right.
With VPLEX Metro, the VPLEX appliances in the two data centers present the same identity for the LUNs they manage. They also provide a coherent cache in both locations, sending updates across the link between the sites. Since VMs don't share VMDKs, and any one VM exists in only one data center, each VMDK, or at least the disk blocks they occupy, are being written in only one data center at a time. VPLEX metro is based on synchronous functionality so it's limited to working across links of 60mi. or less as it can't handle the latency of longer links.
I'm looking forward to VPLEX Geo which is asynchronous and can work across longer distances, so you can migrate from Mobile to Atlanta, because no one really wants to build a data center in Hattiesburg MS. Then we can migrate a VM to Atlanta, have the VPLEXen buffer writes until the async replication catches up and then force the replication relationship between the two arrays to reverse so production becomes DR and vice-versa, all without the users even noticing. EMC is promising such technology, even for disk arrays that weren't born in Hopkinton. That will be a brave new world. Disclosure: Of the companies mentioned in this blog post, I am currently working on projects for EMC and NetEx.