Networking can be very challenging. A simple mistake can cause a widespread network outage and millions of dollars in losses. Let me share one such embarrassing story…
Back in year 2003, I worked as NCE for a semi-conductors company in Singapore. One of the backup switches failed in the data center, which was serving the application servers for the semiconductor FAB. As we were waiting for the replacement, we had to honor an urgent request to create an L2 segment on the primary switch.
It was a simple change, so I went ahead and created the segment. I had checked with the ticket requester that the segment was working as expected.
After 4 hours, around midnight, I received a call from my supervisor. Apparently, the network connectivity to an entire data center was down and they suspected that it was due to the changes that I had made earlier.
Long story short, it was a simple error on my part. Very often, I had the habit of entering shorthand for network commands. To check status of an interface, the appropriate command was “show int”. Out of habit, I had entered “sh int”. More so, this command should be used in the privilege mode. I had entered this command erroneously under the interface configuration mode. In that context, the command is interpreted as “shutdown” interface. Hence, the main interface was shut down accidentally. Unfortunately this interface was connected to a large data center and the loss of connectivity resulted in route convergence issues, routing loops as well as application failures. It took over 6 hours to isolate the root-cause starting from the application to the networking teams.
Luckily for me, compared to the other projects, this network was well documented and hence the problem was resolved quickly.
However, today’s networks are extremely complex with limited documentation and multiple touch points and multiple vendors. There is an extreme burden on network operations teams to ensure 99.99% availability.
I am so grateful to network automation and orchestration software such as Anuta NCX, which helps to avoid such human errors and avoids configuring multiple devices manually for day to day operations.
Do you have a similar story about networking nightmares? Share with us to win some exciting prizes.
We are giving away multiple Samsung android tablets just in time for the holidays. Just tell us your “Worst Networking Nightmare Story” and email your stories to kgarrison at anutanetworks.com before 5:00 pm on December 19th. You may also post the story in the comments section below. Make sure to include your name, company name and email address.
The lucky winners will be notified by email and will be announced on LinkedIn.
Don’t forget to tell all your friends as they also have a chance to win!
– Reddy Bhupathi, Dec 4th, 2014.