A while back I ran into a peculiar situation where I had to setup Cisco MSE High-Availability (HA). I’ll explain the peculiar part after addressing the requirements for MSE HA first –
- MSE Virtual Appliance supports only 1:1 HA.
- One secondary MSE can support up to two primary MSEs.
- HA supports Network Connected and Direct Connected.
- Only MSE Layer-2 redundancy is supported. Both the health monitor IP and virtual IP must be on the same subnet and accessible from the Network Control System (NCS). Layer-3 redundancy is not supported.
- Health monitor IP and virtual IP must be be different.
- You can use either manual or automatic failover.
- You can use either manual or automatic failback.
- Both the primary and secondary MSE should be on the same software version.
- Every active primary MSE is backed up by another inactive instance. The secondary MSE becomes active only after the failover procedure is initiated.
- The failover procedure can be manual or automatic.
- There is one software and database instance for each registered primary MSE.
My problem arose from point 4 above. Only MSE Layer-2 redundancy is supported. Layer-3 redundancy is not supported.
I had two Cisco MSE Virtual Appliances (Primary and Secondary) with no virtual server infrastructure at the local site. Thus, I had to install the primary in DC1 and the secondary in DC2.
The only way this could be done was Overlay Transport Virtualization (OTV). OTV provides an operationally optimized solution for the extension of Layer 2 connectivity across any transport. With the help of our Data Centre engineers we got OTV up-and-running but the heartbeat between primary and secondary did not come up.
Cisco TAC was my next call but after two weeks they were still looking at logs until one day, to my surprise, the heartbeat was up and the MSE HA solution was working.
I then had to match the time when the heartbeat came up to changes on the network and the resolution was …..
Maximum Transmission Unit (MTU) that was initially implemented as the standard 1500 bytes between the two DC’s but changed to jumbo frames and solved the issue.
Cisco TAC seemed surprised and it was never a consideration for them but that is how things work sometimes.
For more on Cisco MSE HA check out this link as I also used it as a reference: