Jake McTigue

Network Computing Blogger


Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

Snapshot Caveats On VMware VI4

A typical morning in IT: The phone rings first thing in the morning and a mail server is down. Since e-mail is the lifeblood of every organization, this is the type of problem that can seriously ruin your day. The cause: A VMware ESX snapshot gone awry and the disk files have used up all available space on the datastore. On powering up the virtual machine, I was greeted with an error message that the "RedoLog" was corrupt and the machine needed to be powered off. My subsequent investigation into what went wrong and how it could be fixed revealed to me that the net was woefully inadequate in describing the problem and providing a remedy.

Snapshots, according to the VMware admin guides, are a short term preventative measure for otherwise risky server maintenance tasks. They are meant to be taken immediately before a particularly dangerous or risky task, and kept for the bare minimum time period to make sure that the server is indeed stable and ready to provide services. The upside to snapshots is that they provide a near instantaneous method of reverting to a previous configuration. For network administrators and consultants used to fragile servers and dangerous tasks, they are a godsend. However, very few network administrators of VMware use snapshots correctly, and we're asking for trouble by taking snapshots gratuitously. It's easy to take a snapshot of a favored virtual machine for a rainy day, but there are caveats and they can quickly get serious.

Snapshots wreak havoc on the underlying file structure of VMFS file systems. Each time you snaphot a virtual machine, it terminates the original .vmdk file and starts a new file. The changes, called a delta, are written to the new file instead of the original .vmdk. The problem with this is that as multiple snapshots are taken, the ESX host must consult each file in the chain of snapshots in order to ascertain the state of a given VM. This negatively impacts performance speed while the machine is in use, but it also violates the original size constraints of the source .vmdk file. Snapshots can continue to grow even beyond the original disk size. This becomes a problem on servers which undergo a high rate of data change, because day in and day out, the server continues to utilize free disk space on the datastore until a low space condition occurs. After a while, the file structure begins to look like this:
VMDK_Structure.png
If a datastore runs out of free space, you will get an error that "The RedoLog for "SERVERNAME" has been detected to be corrupt. The virtual machine needs to be powered off. If this problem persists, you need to discard the RedoLog." This is a typical message that occurs when a datastore has run out of space while VMware attempts to commit writes to the disk file. If you can free up some space on the datastore, you can delete all the snapshots on the virtual machine, which reconciles the change files against the original .vmdk and joins the separate disk files into a single file. This can take forever, so don't be alarmed if it takes twelve hours to complete.

To solve this problem you need to have as much free space on the VMware data store as the total thick provisioned size of all the snapshotted disks in question. If you don't, VMware will remove all .vmsn snapshot files but disk consolidation will fail. You can fix this but it's much more labor intensive and can cause data loss if the disk geometry gets miffed.

In other words, if you have an 20GB operating system partition and an 80GB data partition, you should have 100GB free on the data store to remove all snapshots operation. If you don't have this type of free space, you can use vCenter Converter standalone to migrate VMs to another datacenter or even download the virtual machine files to an administrative workstation with free storage. Remember, you can always re-upload them later and then browse the data store, and right click on the .vmx file and select import to bring a dead vm back to life.

Related Reading


More virtualization Insights



Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers