Jasmine McTigue

Network Computing Blogger


Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

Snapshot Caveats On VMware VI4

A typical morning in IT: The phone rings first thing in the morning and a mail server is down. Since e-mail is the lifeblood of every organization, this is the type of problem that can seriously ruin your day. The cause: A VMware ESX snapshot gone awry and the disk files have used up all available space on the datastore. On powering up the virtual machine, I was greeted with an error message that the "RedoLog" was corrupt and the machine needed to be powered off. My subsequent investigation into what went wrong and how it could be fixed revealed to me that the net was woefully inadequate in describing the problem and providing a remedy.

Snapshots, according to the VMware admin guides, are a short term preventative measure for otherwise risky server maintenance tasks. They are meant to be taken immediately before a particularly dangerous or risky task, and kept for the bare minimum time period to make sure that the server is indeed stable and ready to provide services. The upside to snapshots is that they provide a near instantaneous method of reverting to a previous configuration. For network administrators and consultants used to fragile servers and dangerous tasks, they are a godsend. However, very few network administrators of VMware use snapshots correctly, and we're asking for trouble by taking snapshots gratuitously. It's easy to take a snapshot of a favored virtual machine for a rainy day, but there are caveats and they can quickly get serious.

Snapshots wreak havoc on the underlying file structure of VMFS file systems. Each time you snaphot a virtual machine, it terminates the original .vmdk file and starts a new file. The changes, called a delta, are written to the new file instead of the original .vmdk. The problem with this is that as multiple snapshots are taken, the ESX host must consult each file in the chain of snapshots in order to ascertain the state of a given VM. This negatively impacts performance speed while the machine is in use, but it also violates the original size constraints of the source .vmdk file. Snapshots can continue to grow even beyond the original disk size. This becomes a problem on servers which undergo a high rate of data change, because day in and day out, the server continues to utilize free disk space on the datastore until a low space condition occurs. After a while, the file structure begins to look like this:
VMDK_Structure.png
If a datastore runs out of free space, you will get an error that "The RedoLog for "SERVERNAME" has been detected to be corrupt. The virtual machine needs to be powered off. If this problem persists, you need to discard the RedoLog." This is a typical message that occurs when a datastore has run out of space while VMware attempts to commit writes to the disk file. If you can free up some space on the datastore, you can delete all the snapshots on the virtual machine, which reconciles the change files against the original .vmdk and joins the separate disk files into a single file. This can take forever, so don't be alarmed if it takes twelve hours to complete.

To solve this problem you need to have as much free space on the VMware data store as the total thick provisioned size of all the snapshotted disks in question. If you don't, VMware will remove all .vmsn snapshot files but disk consolidation will fail. You can fix this but it's much more labor intensive and can cause data loss if the disk geometry gets miffed.

In other words, if you have an 20GB operating system partition and an 80GB data partition, you should have 100GB free on the data store to remove all snapshots operation. If you don't have this type of free space, you can use vCenter Converter standalone to migrate VMs to another datacenter or even download the virtual machine files to an administrative workstation with free storage. Remember, you can always re-upload them later and then browse the data store, and right click on the .vmx file and select import to bring a dead vm back to life.

Jasmine McTigue is the IT manager for Carwild Corp. She is responsible for IT infrastructure and has worked on numerous customer projects as well as ongoing network management and support throughout her 10-plus-year career.


Related Reading


More Insights


Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Research and Reports

Network Computing: April 2013



TechWeb Careers