Ok, this is really bugging me. The newer version of glibc is reducing the problem, but it still exists. There seems to be something blocking network activity during DNS lookups, perhaps only when the lookup has problems but it could just be that lookups when everything is normal go so fast that it's not noticeable. I don't know if this is a libc issue, some other low level library, or a kernel issue, but I'm now able to reproduce it in a controlled way.
Both machines in this test are behind this router 192.168.234.254 (wrt). DNS is provided by ISP on IP 68.105.28.11, 68.105.29.11, and 68.105.28.12 (cdns*.cox.net). One machine is an old Debian system which I'm using for a control and to access the router. At no point does its ping to 192.168.234.254 drop during these tests using the same commands. The Ubuntu system has the following network config: # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:01:6c:ea:99:0b inet addr:192.168.234.12 Bcast:192.168.234.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:535502 errors:0 dropped:0 overruns:0 frame:0 TX packets:400989 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:532182373 (532.1 MB) TX bytes:72156008 (72.1 MB) If I release my IP on the switch to the ISP, the Debian system continues to ping. However, I'm seeing the following on the Ubuntu system: # window 1: $ date; ping -n -c 20 192.168.234.254; date Tue Sep 1 11:24:05 EDT 2009 PING 192.168.234.254 (192.168.234.254) 56(84) bytes of data. 64 bytes from 192.168.234.254: icmp_seq=1 ttl=63 time=2.68 ms 64 bytes from 192.168.234.254: icmp_seq=2 ttl=63 time=2.38 ms --- 192.168.234.254 ping statistics --- 20 packets transmitted, 2 received, 90% packet loss, time 19002ms rtt min/avg/max/mdev = 2.384/2.536/2.688/0.152 ms Tue Sep 1 11:24:25 EDT 2009 # window 2: $ date; host www.google.com; date Tue Sep 1 11:24:04 EDT 2009 ;; connection timed out; no servers could be reached Tue Sep 1 11:24:18 EDT 2009 Essentially the ping command was succeeding to my internal network until a few seconds into the DNS lookup. At that point, no further pings were returned. I then renewed my IP from my ISP on the router from the Debian system (Ubuntu can't even reach it by IP in this state as you can see from the ping), and after receiving an address from the ISP, ran the same commands again: # window 1: $ date; ping -n -c 20 192.168.234.254; date Tue Sep 1 11:25:11 EDT 2009 PING 192.168.234.254 (192.168.234.254) 56(84) bytes of data. 64 bytes from 192.168.234.254: icmp_seq=6 ttl=63 time=2.95 ms 64 bytes from 192.168.234.254: icmp_seq=7 ttl=63 time=2.13 ms 64 bytes from 192.168.234.254: icmp_seq=8 ttl=63 time=2.45 ms 64 bytes from 192.168.234.254: icmp_seq=9 ttl=63 time=2.18 ms 64 bytes from 192.168.234.254: icmp_seq=10 ttl=63 time=2.13 ms 64 bytes from 192.168.234.254: icmp_seq=11 ttl=63 time=2.15 ms 64 bytes from 192.168.234.254: icmp_seq=12 ttl=63 time=2.28 ms 64 bytes from 192.168.234.254: icmp_seq=13 ttl=63 time=2.19 ms 64 bytes from 192.168.234.254: icmp_seq=14 ttl=63 time=1.80 ms 64 bytes from 192.168.234.254: icmp_seq=15 ttl=63 time=2.08 ms 64 bytes from 192.168.234.254: icmp_seq=16 ttl=63 time=2.17 ms 64 bytes from 192.168.234.254: icmp_seq=17 ttl=63 time=2.17 ms 64 bytes from 192.168.234.254: icmp_seq=18 ttl=63 time=2.35 ms 64 bytes from 192.168.234.254: icmp_seq=19 ttl=63 time=2.08 ms 64 bytes from 192.168.234.254: icmp_seq=20 ttl=63 time=2.18 ms --- 192.168.234.254 ping statistics --- 20 packets transmitted, 15 received, 25% packet loss, time 19026ms rtt min/avg/max/mdev = 1.805/2.224/2.955/0.240 ms Tue Sep 1 11:25:31 EDT 2009 # window 2: $ date; host www.google.com; date Tue Sep 1 11:25:10 EDT 2009 www.google.com is an alias for www.l.google.com. www.l.google.com has address 209.85.225.99 www.l.google.com has address 209.85.225.103 www.l.google.com has address 209.85.225.104 www.l.google.com has address 209.85.225.147 Tue Sep 1 11:25:17 EDT 2009 As you can see from this, shortly before the DNS resolution completes, the pings resume. I can't quite make out why or how a DNS lookup would block a ping command to an IP without name resolution active, but that appears to be the case. I've attached a "tcpdump -i eth0 -v" of the session as well to help track down what the network is doing in each scenario. The only logs I see in syslog are from the tcpdump and from a cron job that does an imap connection to the Debian system: Sep 1 11:24:03 bmitch-t42 kernel: [82087.299987] device eth0 entered promiscuous mode Sep 1 11:25:02 bmitch-t42 /USR/SBIN/CRON[31111]: (bmitch) CMD (offlineimap >${HOME}/.offlineimap.log 2>&1) Sep 1 11:25:39 bmitch-t42 kernel: [82183.164614] device eth0 left promiscuous mode Let me know how else I may assist in tracking down this bug. Thank you. ** Attachment added: "TCP Dump" http://launchpadlibrarian.net/31148982/tcpdump.20090901.gz -- DNS lookups hang network in 2.9-4ubuntu6 https://bugs.launchpad.net/bugs/422016 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs