This week, one of our gateway-servers stopped transmitting to the management group.
When we cleared all the heartbeat-alerts from our console and entered maintenance mode we started troubleshooting.
The odd thing was that the eventlog complained about connectivity issues, while a telnet-session over port 5723 to our management server worked perfectly.
Since this was an important production system we decided to reinstall the gateway to restore our monitoring capability of the attached client-servers ASAP. After the reinstall the gateway appeared to be okay again.
But soon, its state went gray in the Operations Console, and the GW’s eventlog got flooded with ugly messages:
Luckily, the systems reporting to the gateway stayed healthy, minimizing the problem to the GW’s own agent.
After a review of the steps taken to reinstall the gateway we noticed that cumulative update 5 was deployed. Our RMS’ and MS’ are still running cumulative update 4 awaiting a maintenance cycle.
Could this issue be pinned on the fact that there was a patch difference? We reinstalled the gateway with the CU4-patch, and the board went fully green again!
Lessons learned: keep everything on the same patchlevel within a single topology, or things break… (why does this sound so logical all of a sudden? J)