On 3/1/18 11:51 AM, William Tu wrote: > On Thu, Mar 1, 2018 at 10:36 AM, David Ahern <dsah...@gmail.com> wrote: >> On 3/1/18 10:29 AM, William Tu wrote: >>> Hi, >>> >>> We're running commands below on kernel 4.15.0: >>> 1) ip netns add at_ns0 >>> 2) ip link add p0 type veth peer name ovs-p0 >>> 3) ip link set p0 netns at_ns0 >>> 4) ip link set dev ovs-p0 up >> >> # uname -a >> Linux kenny-jessie3 4.16.0-rc2+ #162 SMP Thu Mar 1 08:48:58 PST 2018 >> x86_64 GNU/Linux >> >> # bash -x /tmp/2 >> + ip netns add at_ns0 >> + ip link add p0 type veth peer name ovs-p0 >> + ip link set p0 netns at_ns0 >> + ip link set dev ovs-p0 up >> >> Works fine for me on top of tree. >> >> What is the output of 'cat /proc/<pid>/stack' when it hangs? >> > root@osb:~/iproute2# ps aux | grep ip > root 3652 0.0 0.0 11532 884 pts/24 S+ 10:43 0:00 ip > link add p0 type veth peer name ovs-p0 > > root@osb:~/iproute2# cat /proc/3652/stack > [<0>] __skb_wait_for_more_packets+0x103/0x160 > [<0>] __skb_recv_datagram+0x69/0xc0 > [<0>] skb_recv_datagram+0x3f/0x60 > [<0>] netlink_recvmsg+0x59/0x420 > [<0>] ___sys_recvmsg+0xee/0x230 > [<0>] __sys_recvmsg+0x4e/0x90 > [<0>] entry_SYSCALL_64_fastpath+0x24/0x87 > [<0>] 0xffffffffffffffff > > if I run strace on "ip link add p0 type veth peer name ovs-p0" > open("/usr/lib/ip/link_veth.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No > such file or directory) > sendmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=00000000}, > msg_iov(1)=[{"X\0\0\0\20\0\5\6\315J\230Z\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 88}], msg_controllen=0, msg_flags=0}, 0) = 88 > recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=00000000}, msg_iov(1)=[{NULL, 0}], msg_controllen=0, > msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36 > recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=00000000}, > msg_iov(1)=[{"$\0\0\0\2\0\0\1\315J\230Z1\24\0\0\0\0\0\0X\0\0\0\20\0\5\6\315J\230Z"..., > 36}], msg_controllen=0, msg_flags=0}, 0) = 36 > > Thanks a lot > William >
I still can not reproduce the hang, but try this and see if it fixes your problem (whitespace damaged on paste): diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 7ca47b22581a..9d692afbc740 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -670,8 +672,9 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, struct iovec *iov, free(buf); if (h->nlmsg_seq == seq) return 0; - else + else if (i < iovlen) goto next; + return 0; } if (rtnl->proto != NETLINK_SOCK_DIAG &&