Recovering From RedoLog Corrupt Errors On VMware ESX/ESXi
RedoLog corrupt errors are a common issue on VMware ESX/ESXi machines where machines are snapshotted and the datastore is allowed to run low on space. Use this step by step guide to get your machine back in business.
March 11, 2010
In my last entry, I discussed basic best practices for using snapshots in VMware environments. Today I want to get a little more technical by talking about recovery options for virtual machines which will not boot because of snapshot errors.
When you issue a delete all snapshots from the context menu of the Virtual Infrastructure (VI) client, if the disk space is insufficient to complete the operation, VMware has a nasty tendency to remove the physical snapshot files (.vmx) and leave you with a non-functional VM without any snapshots listed. When you try and power on the virtual machine in question, you will get the familiar: "The RedoLog for "SERVERNAME" has been detected to be corrupt. The virtual machine needs to be powered off. If this problem persists, you need to discard the RedoLog."
Unfortunately, because you already tried to reconcile snapshots with insufficient disk space, there are no longer any .vmx files on the datastore and there are no snapshots listed in the snapshot manager. Because of this, you can no longer issue the "remove all snapshots" command from the VI Client and consequently can't fix the problem from the VI client GUI.
Start by freeing up space on the datastore equal to the total size of the disks attached to the VM. Sit down at the console or start up an SSH session to your ESX host. Change directory to the datastore and virtual machine folder in question. The disks for the fragmented VM will be split into as many different files as there are snapshots. In order to repair the disk files, we need to clone the fragmented disks to a new file. Run the command:
vmkfstools -i vmname.vmdk vmname-repaired.vmdkYou will see the system cloning the disk to a new file. Once you are done you will have a vmname-repaired.vmdk and vmname-repaired-flat.vmdk file. Use a linux based text editor to edit the original vmname.vmx file and look for a line that says scsi0:0.fileName = "vmname-00001.vmdk" or something similar. Use the text editor to replace the vmname-00001.vmdk with the newly repaired vmname-repaired.vmdk. Repeat the cloning process with any other disks attached to your machine and you're done! Power on the virtual machine and you're ready to roll.
This method works great most of the time, however it can fail because of mangled virtual disk geometry. This happens when the disk attempts to commit writes on the most recent snapshot and fails, rendering the most recent snapshot file unusable. When this occurs, the clone process will inexplicably error out and you won't be able to complete the clone step above. The workaround for this is to target the next snapshot back in the chain. For example:
vm.vmdk has three snapshots. The first snapshot has a name of vm-000001.vmdk, the second vm-000002.vmdk and so forth up to vm-000003.vmdk, the final snapshot. Instead of targeting the clone operation at the parent vm.vmdk file, we can go ahead and target it at the second snapshot. The syntax for this is:
vmkfstools -i vm-000002.vmk vm-repaired.vmdk
This command will commit the changes up to and including snapshot two to a single vmdk which you can then edit into the .vmx file. This process does lose the changes in the final snapshot, but it can be the only way to recover a corrupt .vmdk file without restoring from backup. Good Luck!
About the Author
You May Also Like