Public bug reported: Hi, the reproducibility of this is hard, but I've seen it twice in a week now so it is not a stray cosmic ray after all.
I've seen nmap hang that was running with a timeout already which made me wonder. $ nmap -Pn 192.168.122.0/24 --host-timeout 10 In PS the output looks normal usually in a wchan with a timeout: $ ps axlf ... 4 0 10155 0 20 0 25372 17320 poll_s S ? 0:07 nmap -Pn 192.168.122.0/24 --host-timeout 10 Full wchan: cat /proc/10155/wchan poll_schedule_timeout.constprop.0 I found that nmap isn't dead, it is looping it seems. Obviously on GDB you always find it at the timeout 0x00007fe5b73a725a in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41 41 ../sysdeps/unix/sysv/linux/select.c: No such file or directory. (gdb) bt #0 0x00007fe5b73a725a in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41 #1 0x000055900ff4f838 in pcap_select (p=p@entry=0x55901131ec10, timeout=timeout@entry=0x7ffec6db1670) at netutil.cc:1012 #2 0x000055900ff4fa15 in pcap_select (usecs=999993, p=0x55901131ec10) at netutil.cc:1031 #3 read_reply_pcap (pd=0x55901131ec10, to_usec=999993, accept_callback=accept_callback@entry=0x55900ff4e530 <accept_arp(unsigned char const*, pcap_pkthdr const*, int, size_t)>, p=p@entry=0x7ffec6db16f0, head=head@entry=0x7ffec6db1700, rcvdtime=rcvdtime@entry=0x7ffec6db1770, datalink=0x7ffec6db16ec, offset=0x7ffec6db16f8) at netutil.cc:4234 #4 0x000055900ff56014 in read_arp_reply_pcap (pd=<optimized out>, sendermac=sendermac@entry=0x7ffec6db1792 "", senderIP=senderIP@entry=0x7ffec6db176c, to_usec=<optimized out>, rcvdtime=rcvdtime@entry=0x7ffec6db1770, trace_callback=0x55900ff0ef70 <PacketTrace::traceArp(int, unsigned char const*, unsigned int, timeval*)>) at netutil.cc:4324 #5 0x000055900fefce80 in get_arp_result (USI=0x7ffec6db1890, stime=0x7ffec6db1860) at scan_engine_raw.cc:1557 #6 0x000055900fef5e67 in ultra_scan(std::vector<Target*, std::allocator<Target*> >&, scan_lists*, stype, timeout_info*) () at scan_engine.cc:2546 #7 0x000055900ff0e1cc in arpping (hostbatch=<optimized out>, num_hosts=255) at targets.cc:187 #8 0x000055900ff0ed00 in refresh_hostbatch (hs=<optimized out>, exclude_group=0x559010b87780, ports=0x559010142980 <ports>, pingtype=1) at targets.cc:591 #9 0x000055900ff0ee5d in nexthost (hs=hs@entry=0x7ffec6db2010, exclude_group=exclude_group@entry=0x559010b87780, ports=ports@entry=0x559010142980 <ports>, pingtype=<optimized out>) at targets.cc:644 #10 0x000055900fec8ec3 in nmap_main (argc=<optimized out>, argv=0x7ffec6db3058) at nmap.cc:2063 #11 0x000055900fe9e215 in main (argc=5, argv=0x7ffec6db3058) at main.cc:237 But strace has shown that the syscall is bad # strace -rT -p 10155 strace: Process 10155 attached 0.000000 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=438201}) = 0 (Timeout) <0.438773> 0.439032 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000028> 0.000170 getpgrp() = 0 <0.000021> 0.000092 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=999992}) = 0 (Timeout) <1.001123> 1.001253 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000019> 0.000075 getpgrp() = 0 <0.000018> I expected nmap to give up if the select call went bad - and indeed the code would: 1008 FD_SET(fd, &rfds); 1009 1010 do { 1011 errno = 0; 1012 ret = select(fd + 1, &rfds, NULL, NULL, timeout); 1013 if (ret == -1) { 1014 if (errno == EINTR) 1015 netutil_error("%s: %s", __func__, strerror(errno)); 1016 else 1017 netutil_fatal("Your system does not support select()ing on pcap devices (%s). PLEASE REPORT THIS ALONG WITH DETAILED SYSTEM INFORMATION TO THE nmap-dev MAILING LIST!", strerror(errno)); But glibc returns a good RC on this ioctl error: 1012 ret = select(fd + 1, &rfds, NULL, NULL, timeout); (gdb) n 1013 if (ret == -1) { (gdb) p ret $1 = 1 There isn't much more code in glbic shown in GDB Breakpoint 1, pcap_select (p=p@entry=0x55901131ec10, timeout=timeout@entry=0x7ffec6db1670) at netutil.cc:976 976 in netutil.cc (gdb) c Continuing. Breakpoint 2, __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:39 39 { (gdb) n 41 return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds, (gdb) s Is it a bug in glibc, would it need to return -1 in that case? Note: it also is a bug in nmap to loop on it but they could say "but the answer fromt he socket call is wrong", so lets sort that out first. ** Affects: glibc (Ubuntu) Importance: Undecided Status: New ** Affects: nmap (Ubuntu) Importance: Undecided Status: Incomplete ** Also affects: glibc (Ubuntu) Importance: Undecided Status: New ** Changed in: nmap (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1861389 Title: nmap hang due to BADF ioctl inside select call is returning good rc To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1861389/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs