In article <[email protected]>, Tory M Blue <[email protected]> wrote:
> I've running into some issues and trying to diagnose, so maybe folks > on here can help me with steps to troubleshoot. > > Bind 9.6.1-P1 > Fedora Core > > What I am experiencing and led to my investigation is a random 5 > second delay in name resolution. Now I know that nslookup/dig resolver > has a default 5 second retry, if it doesn't get an answer it will try > the second server listed in the resolv.conf.. So I sort of could > explain the 5 second delay, didn't understand why it was happening, > but felt I was getting closer. > > So then I started running some network traces (which takes some time, > as the 5 second delay is very random}, however being patient and > running enough "time dig host +trace" revealed a few 5 second delays, > for the most part they are all low ms (as I expect), but a couple were > 5 second. > > The delay occurs in the upper part of dig. (although interesting > enough not one section shows more than say 175ms, ever). > > [tb...@w05 ~]$ time dig apps.domain.com +trace +stats > > ; <<>> DiG 9.3.2 <<>> apps.domain.com +trace +stats > ;; global options: printcmd > . 317993 IN NS C.ROOT-SERVERS.NET. > . 317993 IN NS J.ROOT-SERVERS.NET. > . 317993 IN NS B.ROOT-SERVERS.NET. > . 317993 IN NS L.ROOT-SERVERS.NET. > . 317993 IN NS D.ROOT-SERVERS.NET. > . 317993 IN NS I.ROOT-SERVERS.NET. > . 317993 IN NS F.ROOT-SERVERS.NET. > . 317993 IN NS G.ROOT-SERVERS.NET. > . 317993 IN NS M.ROOT-SERVERS.NET. > . 317993 IN NS K.ROOT-SERVERS.NET. > . 317993 IN NS A.ROOT-SERVERS.NET. > . 317993 IN NS H.ROOT-SERVERS.NET. > . 317993 IN NS E.ROOT-SERVERS.NET. > > <<<<PAUSES HERE>>>>> I think it's trying to do a reverse lookup of 216.249.24.15 to display the server name in the message below. This isn't part of the actual resolution of apps.domain.com, just part of +stats. So it may not be related to your original problem. > ;; Query time: 1 msec > ;; SERVER: 0.0.0.15#53(216.249.24.15) > ;; WHEN: Sat Feb 27 21:25:21 2010 > ;; MSG SIZE rcvd: 500 > > net. 172800 IN NS H.GTLD-SERVERS.net. > net. 172800 IN NS M.GTLD-SERVERS.net. > net. 172800 IN NS I.GTLD-SERVERS.net. > net. 172800 IN NS F.GTLD-SERVERS.net. > net. 172800 IN NS K.GTLD-SERVERS.net. > net. 172800 IN NS L.GTLD-SERVERS.net. > net. 172800 IN NS E.GTLD-SERVERS.net. > net. 172800 IN NS J.GTLD-SERVERS.net. > net. 172800 IN NS D.GTLD-SERVERS.net. > net. 172800 IN NS G.GTLD-SERVERS.net. > net. 172800 IN NS B.GTLD-SERVERS.net. > net. 172800 IN NS A.GTLD-SERVERS.net. > net. 172800 IN NS C.GTLD-SERVERS.net. > ;; Query time: 14 msec > ;; SERVER: 192.33.4.12#53(C.ROOT-SERVERS.NET) > ;; WHEN: Sat Feb 27 21:25:21 2010 > ;; MSG SIZE rcvd: 505 > > domain.com. 172800 IN NS ns1.domain.com. > domain.com. 172800 IN NS ns2.domain.com. > ;; Query time: 54 msec > ;; SERVER: 192.55.83.30#53(M.GTLD-SERVERS.net) > ;; WHEN: Sat Feb 27 21:25:26 2010 > ;; MSG SIZE rcvd: 104 > > apps.domain.com. 300 IN A 216.249.24.50 > domain.com. 86400 IN NS ns2.domain.com. > domain.com. 86400 IN NS ns1.domain.com. > ;; Query time: 0 msec > ;; SERVER: 0.0.0.15#53(ns1.domain.com) > ;; WHEN: Sat Feb 27 21:25:26 2010 > ;; MSG SIZE rcvd: 120 > > > real 0m5.090s > user 0m0.004s > sys 0m0.004s > > So since I finally caught one of these in the wild, I could look at > the network trace. I was caught off guard when I saw "No such Name" > "Flags: 0x8483 (Standard query response, No such name)" It would help if you told us WHICH query elicited this response. > > What? I can query my 4 servers (behind a Load balancer or through the > LB) and the resolve fine, all are running, all have current zone files > (they are slaves), so I don't understand the "no such name", I have > no idea why this server is giving this response. And since it's so > infrequent it makes no sense at all. Servers are not busy, very low > load, gig network, no saturation, no retransmissions, everything seems > healthy. > > So now I'm wondering if that's the 5 second delay, it sends out a > request, one server sends back, no such name, so it queries my other > set of dns servers and get's an immediate response. However all 4 > servers seem fine. > > So I've sniffed the traffic, I've looked at what logs I have, is there > other logs I can enable to catch, watch for this, Is there a possible > configuration that is out dated wrong? > > New to the bind list so not clear what information will allow me to > help you, help me. > > I even thought that maybe I had a bad hint file named.cache file, but > it appears to be current (well last major update seems to have been > Dec 08). Even if you did, one of the first things BIND does when it starts up is query a root server to get the current root server list, and this is used instead of the hints. -- Barry Margolin, [email protected] Arlington, MA *** PLEASE don't copy me on replies, I'll read them in the group *** _______________________________________________ bind-users mailing list [email protected] https://lists.isc.org/mailman/listinfo/bind-users

