Networks, Troubleshooting and a Weird MTU Story
“If you can’t measure it, you can’t improve it.” – Peter Drucker
One of the quotes we love around here is that great one from Peter Drucker, management and business guru on metrics – “I you can’t measure it, you can’t improve it.” You may find this basic idea repeated in a number of different ways including “You can’t manage what you don’t measure.” The basic thought is that management is ideally data driven and it is that data which is the foundation of efforts to do things better, faster, cheaper.
While this applies to business as a whole, it also applies to networking and in particular the fun filled nexus where networks and applications meet. That particular node is a fun one because of the great temptation to shunt problems off on the other guy. User complains about an application that is somehow broken or underperforming and the network guy will assert that it is the application that is broken while the application guy will vouchsafe that his servers are just fine but something bad is happening on the wire.
Which brings me to a story about “back in the day.” This was so long ago that many shops still had mail servers onsite (most had whole data centers onsite) and one of the key roles for many was that of the Exchange Administrator. Generally a not so bad gig, as long as nothing broke, as the power and the magic of Exchange comes at a price in complexity. Again, when things are good, they are very good and when they are bad there was great temptation to trade in that admin account for a paper hat and spatula down at the local burger place.
In one case, we had a situation where users in a particular office could not get email attachments. They could ping stuff just fine. They could browse the web, they could hit SMB file shares, they could do all sorts of stuff up to and including logging into Exchange and sending and receiving email, but they could not do attachments.
This was as you would expect a great mystery. We escalated internally with the Exchange guys and the conclusion of the lesser wizards was “hmmm, should just work.” We escalated externally with some other Exchange guys and the conclusion of the greater wizards was “hmmm, should just work.” Then we escalated to the network guys and they said “This is going to take a little while, this is a new segment on a network from the acquisition and we need to place some taps.” In the fullness of time the wizards from the network team came back and said “think we found the problem, can you try it now.”
We got a user on the line and like magic they were able to get mail attachments again. “Very cool!” I said, “How’d ya’ do it?”
It turns out that a router for that branch had been set with a lower MTU than others on the network. Not sure why or how, but it was. That MTU was evidently large enough such that normal Exchange mail traffic was fine, but too small for attachments. Once the MTU was set in accord with the rest of the network, like magic, the application then worked again.
This was such an odd and peculiar case that it was unlikely that anyone would have guessed a solution. Which is why the data provided by network visibility was key to a successful resolution. Without visibility into the network traffic in and out of that office at a fairly low level, MTUs probably would never have been questioned.
Which brings me to a white paper we recently published, “Troubleshooting Network Quality of Service and Performance – How Network Visibility Can Help.” Remember Drucker and measuring things so you can improve them? This white paper talks about a number of different scenarios where having better network visibility from things like taps, packet brokers and proactive monitoring can help you improve troubleshooting and management of enterprise networks.
Thanks for reading.