Compromising Virtualization

Virtualization in its pure form has totally stateless servers, with user data stored on networked drives. This is a great computer science concept, with a good deal of appeal, but it has run into the realities of shared storage farms for a while, forcing compromises in implementation that reduce the level of virtualization considerably and which add some risk to cloud usage.

It was recognized early on that storage performance was going to be a problem, since most server connections even a couple of years back were just 1 Gigabit Ethernet, which just about matched the throughput of a single disk drive. With as many as 64 virtual instances per server, this clearly was too slow, and it led to situations such as virtual desktop boot storms that created huge levels of dissatisfaction among end-users.

To speed up performance, hypervisor developers looked to a variety of deduplication options, all aimed at reducing the number of times common code is loaded into a server. The hypervisor itself is often loaded from a thumb flash drive, and a logical extension is to keep a single common version of the operating system used in each server. While this makes the servers stateful, it can be argued that it's only true at the platform level not at the user level, so the system is still truly virtual.

Many application environments, such as a LAMP stack, also are very standard, and a logical extension of the thought process is to add these to the image cache in each server, again without compromising user state. If the application had low levels of user I/O to storage, this essentially resolved the problems.

It's worth noting that a complementary approach uses all-flash arrays for the user storage, with 10GbE allowing apps with high I/O levels to perform much better.

However, with cloud-based services for big data analytics looking to be a high-growth segment, instances that could handle the storage performance issues were needed. Large memory sizes for in-memory databases were no problem, but adequate I/O to feed that memory was. Most in-memory database systems use PCIe SSD to load data and require a low latency for write operations.

The cloud service providers' answer to this problem is instance storage. Since most in-memory instances use a whole server, and the rest are limited to two or four per server, why not add an "instance store" to the virtual instance? In other words, this is an SSD that gives the same performance as a direct-attached drive in an in-house server.

This is a very elegant compromise, and performance-wise, offers the same levels as a dedicated SSD. However, there's a real risk to security that has to be addressed. An early security "hole" in virtualization was the fact that DRAM held the state, and data, of a user when it was torn down. An astute hacker could have opened a new instance on any machine and seen the previous user's data as a result; memory need to be purged between users. That problem has been fixed for a while, but the instance storage concept re-opens the problem, with the added issues that purging a terabyte of instance storage is non-trivial.

There are two primary concerns with instance stores. One is where the server fails -- the SSD records the state of the user's app at the point of failure. Unlike networked storage, the option to encrypt data at rest is problematic due to performance considerations. With a zero-repair approach to servers, data could remain for several years in the CSP's shop, but the likely outcome will be the drive getting crushed prior to the servers being sold on, or at least overwritten.

The other concern is more serious. If power fails or instance tear-down occurs, the SSD is still stateful. It has to be purged before re-use. This isn't trivial, since SSDs continually recycle deleted blocks into the spares pool, and only overwrite the data when the block is up for re-use. Malware could find those un-erased blocks and read the data out, compromising security in a big way.

On a hard drive, the traditional fix is to write random data to the deleted blocks. Irrespective of whether this is done by the user or by the hypervisor, this will not work with SSD because of the over-provisioning, especially if multiple users share the drive. Ideally, a vendor-provided utility would be needed to do the job properly, but having this work with proprietary partitioning from the hypervisor vendor complicates things.

One solution is the method used in Microsoft Azure, where metadata indicates if the block is a valid read for the new user. If the block hasn't been written, the data is returned as zeroes, protecting the original user's security. However, most other CSPs appear to be silent on this issue.

At PCIe SSD speeds, Azure's method will compromise performance. Further, the Azure solution leaves users' data on the (former) instance storage and out of their control for an extended time. It also does not explicitly discuss the over-provisioned spare pool with SSD and, as I'm writing this column, these appear to be unresolved problems.

With big data including healthcare information and financials, un-erased data could well be a problem for compliance to HIPAA and SOX. You should request a detailed explanation of your CSP's approach to the issue. In-house cloud operations also should explore any exposures.

Long term, we'll continue this debate when persistent NVDIMM and flash's successors enter the picture. These and the Hyper-Memory Cube approach increase bandwidth, which will force CSPs to use the technologies in a stateful way. This issue won't go away soon!