Private Cloud Lessons You Can't Learn From Amazon, Google, Etc
March 12, 2012
When discussing private cloud, questions often come up about how the big boys are doing it; Google, Amazon, Microsoft, etc. The thinking is that the large scale data centers they are running can teach us lessons about smaller scale infrastructure for private clouds, which, on the surface, seems to make sense. Take the lessons learned in big data centers about scale, efficiency and reliability and apply them to smaller scale private cloud deployments. This method is not however without problems. Very little of what the large public cloud providers do is actually applicable to a private cloud. The reason for this is twofold: scale and application.
A recent Wired article discusses Facebook's methodology behind data center design with a focus on the custom hardware they design and use. The article talks about custom servers and storage among other things. Facebook uses custom hardware to gain efficiency and serviceability. This method works for Facebook because of the economies of scale in which they operate.
Without the buying power of thousands of servers and disks this would be extremely cost prohibitive. A private cloud build out will not achieve the required scale to make this cost effective. For private clouds Enterprise hardware will need to be used over custom equipment. In some cases this will be traditional enterprise systems, in others it will be stripped down off the shelf versions from the major vendors designed for larger scale.
The second piece of the puzzle is the application(s) the infrastructure is designed for. Using Google as an example the majority of the infrastructure they run is intended to support one thing; search. A private cloud is very different in that the infrastructure must support a wide array of enterprise applications. Purpose built single application infrastructure is much different than a private infrastructure running a multitude of different services.
With a single service infrastructure the I/O, data, compute, etc. needs are much better defined. Because of that it's possible to standardize much more tightly on hardware that closely fits the known demands. Additionally the application itself can be written to provide reliability rather than relying on the underlying hardware for uptime. Hardware failures are transparent to the overall service and nodes can be replaced or added on the fly for capacity.
This doesn't mean that no lessons can and should be learned from these large scale data centers. Many design concepts and operational methodologies can be learned. The catch is understanding which lessons are applicable at the scale of your design and applying that correctly. Understanding the purpose of your intended infrastructure is key to doing this correctly. Start with requirements and work your way up from there.
Disclaimer: This post is not intended as an endorsement for any of the vendors/products mentioned. All companies are used for example purposes only.