Random Reboots with VMware ESX 3.5 U3

0 Comments ESX 3.5 Tips, ESXi 3.5 Tips, VMware HA

This goes back to a posting on December 12th about VMware HA’s Virtual Machine Monitoring rebooting guests after a VMotion, which was originally brought up by Scott Lowe and Duncan Epping. Well, a reader on Scott’s site has been talking with him about this problem and even after disabling Virtual Machine Monitoring it hasn’t gone away.

Here is the excerpt from Scott’s website;

I’ve been communicating with a reader who is experiencing random reboots of virtual machines on his HA/DRS-enabled cluster running VMware ESX 3.5 Update 3. At first, I thought his problem was related to the bug with VM failure monitoring that I discussed here, but upon further discussion the random reboots are continuing to occur even when VM failure monitoring is disabled. The only relief the reader has been able to find thus far has been to completely disable VMware HA on his cluster, which—to be honest—is a less than acceptable solution.

After a little bit of digging around, I turned up this VMware Communities thread, in which several other users also indicate they are seeing the same kinds of problems. The thread closes out by referencing this post by Duncan Epping regarding the VM failure monitoring bug. Clearly, though, this bug should not be affecting users who do not have VM failure monitoring enabled. I also found this blog post about another user having the issue, although it sounds like his problem was solved by disabling VM failure monitoring.

Further research turned up this KB article on a post-Update 3 patch that may address some of the random reboot issues. Judging from the KB article, it looks like the random reboots may be caused due to an unexpected interaction between VMotion and an option to automatically upgrade VMware Tools. This is just speculation, of course, but the symptoms seem to fit.

Have any other users out there experienced this problem? If so, what was the fix, if any? It sounds like there may be more to this issue than perhaps I first suspected.</div> </blockquote>

Well, we’ll need to keep on top of it, but here is one theory that I left as a comment on Scott’s site;

The VMotion/VMware Tools upgrade theory might be a possibility. Remember, when doing a VMotion the Virtual Machine is actually started on a new host, if VMware Tools is configured to automatically upgrade during boot it is quite possible a “hole” left open in U3 could turn this into an unwanted feature.

VM Running -> Automatically Upgrade VMware Tools Enabled -> VMotion to New Host -> VMware Tools Upgrade Initiated -> Guest O/S Reboots

This also makes me think about how in U3 of the Free Edition of ESXi the programmers left the Read/Write RCLI commands in there….. Whats going on over at the VMware QC department ?

This is all speculation – I would be curious in knowing if it is the same VMs rebooting, or do they reboot once then never again. This would be more icing on the cake for the VMware Tools theory if it were the latter.

I will keep you all posted.</div>