• 03/11/2010
    2:50 PM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi

RedoLog corrupt errors are a common issue on VMware ESX/ESXi machines where machines are snapshotted and the datastore is allowed to run low on space. Use this step by step guide to get your machine back in business.
In my last entry, I discussed basic best practices for using snapshots in VMware environments. Today I want to get a little more technical by talking about recovery options for virtual machines which will not boot because of snapshot errors.

When you issue a delete all snapshots from the context menu of the Virtual Infrastructure (VI) client, if the disk space is insufficient to complete the operation, VMware has a nasty tendency to remove the physical snapshot files (.vmx) and leave you with a non-functional VM without any snapshots listed. When you try and power on the virtual machine in question, you will get the familiar: "The RedoLog for "SERVERNAME" has been detected to be corrupt. The virtual machine needs to be powered off. If this problem persists, you need to discard the RedoLog."

Unfortunately, because you already tried to reconcile snapshots with insufficient disk space, there are no longer any .vmx files on the datastore and there are no snapshots listed in the snapshot manager. Because of this, you can no longer issue the "remove all snapshots" command from the VI Client and consequently can't fix the problem from the VI client GUI.

Start by freeing up space on the datastore equal to the total size of the disks attached to the VM. Sit down at the console or start up an SSH session to your ESX host. Change directory to the datastore and virtual machine folder in question. The disks for the fragmented VM will be split into as many different files as there are snapshots. In order to repair the disk files, we need to clone the fragmented disks to a new file. Run the command:

vmkfstools -i vmname.vmdk vmname-repaired.vmdk


Finally Recovered my Data Server (3 weeks offline)


It may seem overstated to say "you saved my life" but this article was the only one in 3 weeks that helped me recover my Data Server VM on ESXi 5.1. It turned out that I did run out of space and although I was doing data backups I didn't back up my Virtual Machine offline. I will going forward.

I just wanted to let you know that your article was appreciated because we so often do not get positive feedback on these things.

Thanks again,

Jim H


Great post..