ICMP Redirects

I've been doing some analysis to figure out why the Internet hits the 5s DNS timeouts so frequently, and started on a merry adventure. This is a story I've discovered as the result of my analysis.

The Route Cache

One way that you can configure a network, is to have a bunch of "dumb" hosts, that just have one (or more) default gateway configured. If there are additional gateways on the network, then they will receive an ICMP Redirect message telling them about them on a host by host basis. After you have received an ICMP redirect message, you update your route table (also known as the route cache) with a temporary "this host actually goes via this other non-default gateway".

Dead Gateways

But a problem occurs, when you can discover that a gateway is no longer functional, this can be one of the gateways you learnt about via ICMP Redirects, or it might be that you have multiple default gateways and one is unavailable. If a gateway learnt by ICMP Redirect dies, then the idea is that you expire the route cache entry, go back to using the default gateway, and hope that it can tell you somewhere else to go (e.g. because a dynamic routing protocol has updated).

So this process is called "dead gateway detection". There are various ways of detecting dead gateways. But, RFC1122 section 3.2.2.1 says that you MUST NOT use ICMP destination host/net unreachable errors as indications of dead gateways as these errors are considered transient.

So, uh, wasn't this story about DNS?

Now, the Linux kernel reports non-transient errors to the application, via POLLERR on connect()ed UDP sockets. Many DNS servers rely on this feedback to detect problems, and to try alternate approaches (for example trying a different DNS server). But, because of this definition of a transient errors in RFC1122, they never get notified about ICMP destination host/net unreachable errors, and thus must wait a full 5s timeout before retrying.

Ref

* https://sourceware.org/bugzilla/show_bug.cgi?id=24047#add_comment

* https://tools.ietf.org/html/rfc1122#section-3.2.2.1
index