Committing snapshots generates a content ID mismatch error

0 Comments ESX 3.5 Tips, ESXi 3.5 Tips, Storage

I had a big problem Monday AM on one of my core SAP VM instances, that also happens to have a SQL DB server on it. Our VCB process finishes up on late Sunday night, if you’re not aware of how VCB works, it basically creates a snapshot of the Virtual Machine, then mounts the now readable VMDK parent to a proxy server where your backup agent resides. Once the backup is complete the snapshot is committed.  This wasn’t the case Monday AM — the VM crashed and I was paged. Snapshot didn’t commit, parent VMDK could not be found, had to manually set Parent CID in the delta VMDK file then finally when I got it back online the SQL DB was corrupt :( — luckily I had a full SQL backup from the night before.

This is where VMware KB 1007969 comes into the story…

Symptoms

  • Performing a commit of a snapshot fails
  • The virtual machine shuts down abruptly during snapshot commit
  • Performing a snapshot commit generates the error:

    Content ID mismatch 

  • Powering on the virtual machine generates the error:
    Content ID mismatch 
  • The virtual machine log contains the following:
    Sep 11 03:01:45.328: vmx| DISKLIB-LINK : Attach: Content ID mismatch (d504c2f0 != 62e0e8bf).
    Sep 11 03:01:45.331: vmx| DISKLIB-CHAIN : “/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/test/w2k3-lsi-64.vmdk” : failed to open (The parent virtual disk has been modified since the child was created).
    Sep 11 03:01:45.336: vmx| DISKLIB-VMFS : “/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/186-testing/w2k3-lsi-64-000001-delta.vmdk” : closed.
    Sep 11 03:01:45.342: vmx| DISKLIB-VMFS : “/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/186-testing/w2k3-lsi-64-000013-delta.vmdk” : closed.
    Sep 11 03:01:45.348: vmx| DISKLIB-VMFS : “/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/test/w2k3-lsi-64-flat.vmdk” : closed.
    Sep 11 03:01:45.352: vmx| DISKLIB-LIB : Failed to open ‘/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/186-testing/w2k3-lsi-64-000001.vmdk’ with flags 0xa (The parent virtual disk has been modified since the child was created).
    Sep 11 03:01:45.355: vmx| DISK: Cannot open disk “/vmfs/volumes/48a1b01c-67422c6d-f5aa-00188b50e0ff/186-testing/w2k3-lsi-64-000001.vmdk”: The parent virtual disk has been modified since the child was created (18).

Resolution

VMware is aware of, and actively investigating the issue.

When a snapshot delete is requested:

  1. The CID of the disk being combined into is updated
  2. The virtual disk is updated with changes.
  3. The CIDs of the children (that are not being removed) are updated.

If a failure occurs during the combine process (I/O errors or running out of disk space), the combine process aborts. The CIDs of the supporting children files never get updated, resulting in mismatch.

Warning: Do not perform a Go To and do not Revert to the parent snapshot.

You must correct the snapshot parent/child relationship.

To correct the parent/child relationship:

  1. Log in to the ESX Server console and verify the CID of all the virtual disks. The current snapshot disks are identified in the virtual machine configuration file (.vmx).
  2. Examine the virtual disk header files to verify the CID and ParentCID of each member to ensure that they match all the way up the tree.
  3. When the one that does not match is found, update the ParentCID of the child to match the CID of the file next up the chain.

Note: For more information related to performing these steps, see Consolidating Snapshots (1007849).

The virtual machine powers on normally at this point.

You can safely continue to use the snapshot or perform the commit operation again. VMware recommends to perform the commit operation so that all changes found in the delta are written down to the next level of snapshot or base disk (if there was only one level of snapshot).