Hi,
I get DNS lookup timeouts many times a day when running Go HTTP clients on
Kubernetes, where UDP packets are sometimes dropped due to a race condition
in the Linux kernel. I'd like to propose a small standard library change to
help debug these scenarios and get some feedback on the idea and
implementation here before posting it as a Github issue.
For any http.Client use with a timeout set on the transport, when the
timeout is exceeded and an error is returned it can be challenging to
discover the cause of the timeout from the error message. Is it DNS related
or TCP related or something else? Consider the following simulation of a
DNS lookup that does not return within the timeout.
package main
import (
"context"
"net"
"net/http"
"time"
)
var timeout = 50 * time.Millisecond
var host = "http://example.com/"
// delayedDialer introduces a delay when performing DNS lookups.
func delayedDialer(ctx context.Context, network, address string) (net.Conn,
error) {
time.Sleep(2 * timeout)
d := net.Dialer{}
return d.DialContext(ctx, network, address)
}
func main() {
r := &net.Resolver{
PreferGo: true,
Dial: delayedDialer,
}
tr := &http.Transport{
DialContext: (&net.Dialer{
Timeout: timeout,
Resolver: r,
}).DialContext,
}
client := &http.Client{Transport: tr}
_, err := client.Get(host)
if err != nil {
panic(err)
}
}
In the event of a DNS lookup timeout, the output is:
panic: Get "http://example.com/": dial tcp: i/o timeout
In the event of a TCP connection timeout (comment out the time.Sleep()
call) the output is:
panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o timeout
The missing IP address in the DNS lookup timeout output is the only clue
that the root cause is a DNS problem, rather than a TCP problem. If users
do not expect to see an IP address in that error message, they won't know
it is missing, and their debugging and testing effort may be wasted
investigating possible TCP-related causes, instead of investigating DNS
(which normally runs over UDP).
The error message could communicate the root cause to users more
clearly. After a DNS lookup timeout, the error returned by client.Get()
looks like this in Go 1.14.3:
&url.Error{
Op: "Get",
URL: "http://example.com/",
Err: &net.OpError{
Op: "dial",
Net: "tcp",
Source: nil,
Addr: nil,
Err: &poll.TimeoutError{},
},
}
Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future
releases after commit d422f54
<https://github.com/golang/go/commit/d422f54619b5b6e6301eaa3e9f22cfa7b65063c8>
.
The resolver in net/lookup.go
<https://github.com/golang/go/blob/master/src/net/lookup.go#L316> is the
source of the *poll.TimeoutError. Instead it could return a *net.DNSError
error when a lookup times out. Doing so would provide additional error
context for all use cases (not only http.Client) and may save debugging
time. For example client.Get() could return:
&url.Error{
Op: "Get",
URL: "http://example.com/",
Err: &net.OpError{
Op: "dial",
Net: "tcp",
Source: nil,
Addr: nil,
Err: &net.DNSError{
Err: "i/o timeout",
Name: "example.com",
Server: "",
IsTimeout: true,
IsTemporary: false,
IsNotFound: false,
},
},
}
This would preserve the existing "i/o timeout" string and the ability to
call err.Timeout(), while adding "lookup example.com: " to the Error()
output and the capability to explicitly check for a *net.DNSError. The
output from my simulator code would also be more explicit:
panic: Get "http://example.com/": dial tcp: lookup example.com: i/o timeout
The following change to implement this works in my usage and passes the
tests in all.bash.
$ git diff
diff --git a/src/net/lookup.go b/src/net/lookup.go
index 5f7119872a..b1a22dd4e6 100644
--- a/src/net/lookup.go
+++ b/src/net/lookup.go
@@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context,
network, host string) ([]IP
}()
}
err := mapErr(ctx.Err())
+ if t, ok := err.(timeout); ok && t.Timeout() {
+ err = &DNSError{Err: err.Error(), Name: host,
IsTimeout: true}
+ }
if trace != nil && trace.DNSDone != nil {
trace.DNSDone(nil, false, err)
}
I can submit a Github issue and pull request if anyone thinks that this is
a good idea and implementation.
Thanks.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/26d8e190-107e-4dd1-9de0-cab02fe5699a%40googlegroups.com.