How to fix an ESX Host disconnected from VCenter

When an ESX host gets disconnected from vCenter, your ability to manage the host, add or remove and/or stop and restart virtual machines becomes impossible. I recently faced an issue with a 4.1 ESX host which was in a disconnected state in vCenter. I followed the knowledgebase article-1003490 but it failed to work. Investigating the issue further, I noticed that there were abnormal amounts of hostd processes running and that the watchdog-hostsd.PID file was missing even after restarting the vmware services. The resolution seemed to be killing the hostd processes and then following the instructions in the KB article. Here’s a quick summary on the necessary steps to re-establish connectivity between the vCenter and the ESX host:

1. Login as root to your ESX 4.1.x server and type:

Cd /var/run/vmware

2. Now that you are in the correct directory, type:

ls -l vmware-hostd.PID watchdog-hostd.PID
You should see a similar output to the one below:
/var/run/vmware # ls -l vmware-hostd.PID watchdog-hostd.PID
-rw-r–r– 1 root root 8 Oct 9 18:35 vmware-hostd.PID
-rw-r–r– 1 root root 9 Oct 9 18:35 watchdog-hostd.PID

3. If the watchdog-hostd.PID file is missing, then proceed with typing the line below:

ps -g | grep hostd

4. You will notice that there are many spawned processes similar to the output below:

13393994 13446434 vix-async-pipe 13446434 13446434 hostd
13451993 13446434 hostd-worker 13446434 13446434 hostd
13469052 13446434 hostd-worker 13446434 13446434 hostd
13446434 13446434 hostd-worker 13446434 13446434 hostd
13446457 13446434 hostd-poll 13446434 13446434 hostd
13446458 13446434 hostd-worker 13446434 13446434 hostd
13450577 13446434 vix-high-p 13446434 13446434 hostd
13467054 13446434 vix-async-pipe 13446434 13446434 hostd
13413808 13446434 hostd-worker 13446434 13446434 hostd
13458865 13446434 vix-poll 13446434 13446434 hostd
13200897 13446434 vix-async-pipe 13446434 13446434 hostd

5. To release the extra hostd processes, determine the PID number from which all the hostd’s are being spawned (for the example above, that would be 13446434)
Stop the main hostd file by typing:

Kill -9 13446434
cd /var/run/vmware
rm vmware-hostd.PID

6. Then restart the services daemon

./sbin/services.sh

If you have any questions or comments, please feel free to reach me by email at aperez@crossrealms.ca

~Alex Perez
aperez@crossrealms.ca