Defunct cimservera processes freeze VMware ESX 3.5

0 Comments ESX 3.5 Tips, ESXi 3.5 Tips, Monitoring

Earlier today Virtual IEF posted a blog about not being able to access an ESX host, not by ssh, not by VIC and not even through vCenter.  Evidently this is caused by the hardware monitoring agents (HP, Dell, etc) auto-detecting the management server and failing authentication.  This causes cimservera to be spawned hundreds if not thousands of times thus causing so much CPU and Network traffic locally that it cannot respond within the thresholds for SSH, or VPX agents.

Cimservera is an authentication daemon and a defect was discovered that leaves the defunct process on failed login. This issue is not specific to certain management agents. This was observed to happen on idling VMware ESX hosts with hardware management agents installed and discovered by their corresponding management application. High number of the defunct processes could result in various symptoms resulting from the VMware ESX Service Console failing to spawn new processes.
Symptoms include:
  • Unable to login through SSH to VMware ESX host
  • Unable to login on local Service console
  • HA errors.
If you are able to logon to the Service Console, you may verify this issue via the following command which might show from few up to few thousands of cimservera defunct processes:
# ps -ef
root 6232 0.0 0.0 0 0 ? Z Sep24 0:00 [cimservera ]
root 6377 0.0 0.0 0 0 ? Z Sep24 0:00 [cimservera ]
root 6496 0.0 0.0 0 0 ? Z Sep24 0:00 [cimservera ] </div>
In addition, /var/log/messages will show the failed logins
Nov 17 18:29:32 blr-cpd-018 cimservera[505]: user “root” failed to authenticate
Nov 17 18:29:34 blr-cpd-018 cimservera[506]: user “root” failed to authenticate
Nov 17 18:29:36 blr-cpd-018 cimservera[507]: user “root” failed to authenticate
Nov 17 18:29:39 blr-cpd-018 cimservera[508]: user “root” failed to authenticate
To resolve this issue you need to login to the console of the host and restart the pegasus service;
service pegasus restart
More information on this can be found in VMware KB 1007887