I have an active ticket open with VMware Support, but there is little to no movement on it so far.
Our Auto Deploy service in our VCSA 5.5 Update 3 instance does not seem to be working at all. The Dell blades we use (M620 and M630) are successfully PXE booting, as well as successfully loading the undionly.kpxe.vmw-hardwired iPXE image from our TFTP server. But when it comes to the host registering itself with VCSA/vCenter, it just sits there trying and trying over again.
Looking at the rbd-cgi and ssl_accesss logs on VCSA, I see messages like this:
2015-12-14 19:19:31,504 [16959]INFO:director:('127.0.0.1', 39967) - - "GET /EflEZrwVSgt7cbdI/vmw/rbd/host/9b8b6f89331f6e278d9a712d8ef5d17b/config?bootmac=<OBSFUSCATED> HTTP/1.1" 503 -
The blade console shows that its trying to run the tramp file in iPXE, which contains the following (which I believe is quite default/standard):
#!gpxe
set retry-delay 20
post /vmw/rbd/host-register?bootmac=${mac}
I've already tried multiple times to re-register the auto deploy service, and rebuild the deploy rules. But the behavior just keeps happening. Anyone have any ideas on what I can try next? I was thinking of just rebuilding vCenter from scratch, and re-attaching the running ESXi hosts.
One oddity is that while the blades have obviously grabbed a DHCP address, I'm getting a lot of ping timeouts when I try to ping the host address while they're trying to register with VCSA. I'm not sure if this is a limitation in iPXE or not. I've even tried to interrupt the boot sequence, and drop into the iPXE console (via control-B), but I didn't find anything out of the ordinary.