Details
-
Technical task
-
Resolution: Done
-
P1: Critical
-
None
-
None
Description
In long running Windows VMs we can observe that the IP address may change over time. This breaks long running processes such as provisioning where we write the log output to the TCP connection. When the IP address changes we'll get an error trying to write into the existing connection (on golang level) and the TCP stack will produce an error.
The symptom appears to be that at a certain point - at least during provisioning - the network is reconfigured and while the DHCP Discover message sent out contains a request for the same IP (to preserve it), the returned DHCP Offer returns a new address.
On the DHCP server side it looks as if the DHCP server might "try" an ICMP echo request, decide that the IP is taken and therefore choose a new IP.
I think we may sometimes see the same effect in the vmware environment.
I'm contemplating that perhaps we should make the communication between the agent and coin robust against these scenarios. We already re-try the RPC calls back "home" and we could also try to have supporting for re-establishing the TCP connection for logging.
What remains is a Window between the agentLaunched message from agent to coin and the call from coin to the agent with the build request. During that Window the IP must not change.