We in the storage world have long viewed snapshots as a core feature of a modern disk array. As we virtualize the servers in our data centers, many of us have realized that consolidating multiple servers into a common data store has made array-based snapshots a lot less valuable in our brave new data center.
Back in the days when an array LUN held the data for a single server, I would take a snap before I applied patches or made any other significant change to a server just to be sure I could easily--and quickly--revert to a known good state. My mission-critical servers took periodic snaps during the day for the same reason.
True, as I wrote in "All Snapshots Are Not Created Equal," some disk arrays are more efficient with their snapshots than others, and I never really believed the vendors that said they could take snapshots every minute with no impact on performance. But snapshots let me sleep better at night. On a system with good snapshots I could even create a read-write clone of the production data for the developers without copying all the data.
Once I started putting multiple virtual servers in a common data store, things changed. For a snapshot to provide a viable fallback position for all but the most dire of situations, it has to hold not just a picture of the data as it existed on the disk, but also a consistent picture of the state of the application. Getting an application-consistent snapshot requires some coordination between the application, which has to quiesce itself and flush its buffers to disk, and the storage system so it knows when to take the snap.
If you have even a handful of VMs in a data store, it becomes impossible to quiesce all those applications at the same time for a snapshot. Any storage system snapshot of a data store is only crash consistent--that is, your data is only as consistent as it would be if the server crashed. Crash-consistent data is better than no data at all, but it can lead to lost files or a database engine that takes hours to run a consistency check before allowing your users to get back to work.
What we need are storage systems that take snapshots on a per-VM, rather than per-LUN, basis so they can coordinate snapshots with the applications in each VM via Windows volume shadow copy service or scripts--the way we've been able to for physical servers. Of course, that will require storage systems to see a vSphere, or other hypervisor, data store as more than just a logical disk to track which data belongs to each VM. VMware's promised vVols are one way to address the problem--luckily, a few vendors have come up with their own solutions without waiting for VMware.