Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi

RedoLog corrupt errors are a common issue on VMware ESX/ESXi machines where machines are snapshotted and the datastore is allowed to run low on space. Use this step by step guide to get your machine back in business.

Jasmine McTigue

March 11, 2010

3 Min Read

In my last entry, I discussed basic best practices for using snapshots in VMware environments. Today I want to get a little more technical by talking about recovery options for virtual machines which will not boot because of snapshot errors.

When you issue a delete all snapshots from the context menu of the Virtual Infrastructure (VI) client, if the disk space is insufficient to complete the operation, VMware has a nasty tendency to remove the physical snapshot files (.vmx) and leave you with a non-functional VM without any snapshots listed. When you try and power on the virtual machine in question, you will get the familiar: "The RedoLog for "SERVERNAME" has been detected to be corrupt. The virtual machine needs to be powered off. If this problem persists, you need to discard the RedoLog."

Unfortunately, because you already tried to reconcile snapshots with insufficient disk space, there are no longer any .vmx files on the datastore and there are no snapshots listed in the snapshot manager. Because of this, you can no longer issue the "remove all snapshots" command from the VI Client and consequently can't fix the problem from the VI client GUI.

Start by freeing up space on the datastore equal to the total size of the disks attached to the VM. Sit down at the console or start up an SSH session to your ESX host. Change directory to the datastore and virtual machine folder in question. The disks for the fragmented VM will be split into as many different files as there are snapshots. In order to repair the disk files, we need to clone the fragmented disks to a new file. Run the command:

vmkfstools -i vmname.vmdk vmname-repaired.vmdkYou will see the system cloning the disk to a new file. Once you are done you will have a vmname-repaired.vmdk and vmname-repaired-flat.vmdk file. Use a linux based text editor to edit the original vmname.vmx file and look for a line that says scsi0:0.fileName = "vmname-00001.vmdk" or something similar. Use the text editor to replace the vmname-00001.vmdk with the newly repaired vmname-repaired.vmdk. Repeat the cloning process with any other disks attached to your machine and you're done! Power on the virtual machine and you're ready to roll.

This method works great most of the time, however it can fail because of mangled virtual disk geometry. This happens when the disk attempts to commit writes on the most recent snapshot and fails, rendering the most recent snapshot file unusable. When this occurs, the clone process will inexplicably error out and you won't be able to complete the clone step above. The workaround for this is to target the next snapshot back in the chain. For example:

vm.vmdk has three snapshots. The first snapshot has a name of vm-000001.vmdk, the second vm-000002.vmdk and so forth up to vm-000003.vmdk, the final snapshot. Instead of targeting the clone operation at the parent vm.vmdk file, we can go ahead and target it at the second snapshot. The syntax for this is:

vmkfstools -i vm-000002.vmk vm-repaired.vmdk

This command will commit the changes up to and including snapshot two to a single vmdk which you can then edit into the .vmx file. This process does lose the changes in the final snapshot, but it can be the only way to recover a corrupt .vmdk file without restoring from backup. Good Luck!

About the Author

Jasmine McTigue

Principal, McTigue AnalyticsJasmine McTigue is principal and lead analyst of McTigue Analytics and an InformationWeek and Network Computing contributor, specializing in emergent technology, automation/orchestration, virtualization of the entire stack, and the conglomerate we call cloud. She also has experience in storage and programmatic integration. Jasmine began writing computer programs in Basic on one of the first IBM PCs; by 14 she was building and selling PCs to family and friends while dreaming of becoming a professional hacker. After a stint as a small-business IT consultant, she moved into the ranks of enterprise IT, demonstrating a penchant for solving "impossible" problems in directory services, messaging, and systems integration. When virtualization changed the IT landscape, she embraced the technology as an obvious evolution of service delivery even before it attained mainstream status and has been on the cutting edge ever since. Her diverse experience includes system consolidation, ERP, integration, infrastructure, next-generation automation, and security and compliance initiatives in healthcare, public safety, municipal government, and the private sector.

See more from Jasmine McTigue

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi

About the Author

Editor's Choice

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

<span class="ArticleBase-LargeTitle">Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi</span>Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi

About the Author

Editor's Choice

Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi