On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote: > On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote: > > __ip_options_echo() uses the current network namespace, and > > currently retrives it via skb->dst->dev. > > > > This commit adds an explicit 'net' argument to __ip_options_echo() > > and update all the call sites to provide it, usually via a simpler > > sock_net(). > > > > After this change, __ip_options_echo() no more needs to access > > skb->dst and we can drop a couple of hack to preserve such > > info in the rx path. > > > > Signed-off-by: Paolo Abeni <pab...@redhat.com> > > --- > > David, Paolo > > This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options: > explicitly provide net ns to __ip_options_echo()") > > needs to be backported to linux-4.13 stable version to avoid these kind > of crashes [1] > > This is because of MSG_PEEK operation, hitting skb_consume_udp() while > skb is still in receive queue. > > Next read() finding again the skb then can see a NULL skb->dst > > Thanks ! > > [1] > general protection fault: 0000 [#1] SMP KASAN > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > task: ffff8801cd0a4380 task.stack: ffff8801cc498000 > RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 > RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246 > RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 > RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28 > R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918 > R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018 > FS: 0000000000979880(0000) GS:ffff8801db200000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ip_options_echo include/net/ip.h:574 [inline] > ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline] > ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207 > udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641 > inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793 > sock_recvmsg_nosec net/socket.c:792 [inline] > sock_recvmsg+0xc9/0x110 net/socket.c:799 > SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788 > SyS_recvfrom+0x40/0x50 net/socket.c:1760 > entry_SYSCALL_64_fastpath+0x1f/0xbe > RIP: 0033:0x444c89 > RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d > RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89 > RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004 > RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010 > R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390 > R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000 > Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8 > 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c > 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc > RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP: > ffff8801cc49f628 > ---[ end trace b30d95b284222843 ]--- > Kernel panic - not syncing: Fatal exception
Thank you Eric for the report! Darn me, I seriously messed-up with the stateless consume. I think we can have similar issues pith ipsec/secpath and MSG_PEEK, even if with less catastropthic outcome. What about the following, which should cover both cases? (only compile tested, I'll test it tomorrow morning my time) --- diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d67a8182e5eb..63df75ae70ee 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb); void kfree_skb_list(struct sk_buff *segs); void skb_tx_error(struct sk_buff *skb); void consume_skb(struct sk_buff *skb); -void consume_stateless_skb(struct sk_buff *skb); +void __consume_stateless_skb(struct sk_buff *skb); void __kfree_skb(struct sk_buff *skb); extern struct kmem_cache *skbuff_head_cache; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e07556606284..f2411a8744d7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb); * consume_stateless_skb - free an skbuff, assuming it is stateless * @skb: buffer to free * - * Works like consume_skb(), but this variant assumes that all the head - * states have been already dropped. + * Alike consume_skb(), but this variant assumes that all the head + * states have been already dropped and usage count is one */ -void consume_stateless_skb(struct sk_buff *skb) +void __consume_stateless_skb(struct sk_buff *skb) { - if (!skb_unref(skb)) - return; - trace_consume_skb(skb); if (likely(skb->head)) skb_release_data(skb); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 62344804baae..979e4d8526ba 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len) unlock_sock_fast(sk, slow); } + if (!skb_unref(skb)) + return; + /* In the more common cases we cleared the head states previously, * see __udp_queue_rcv_skb(). */ if (unlikely(udp_skb_has_head_state(skb))) skb_release_head_state(skb); - consume_stateless_skb(skb); + __consume_stateless_skb(skb); } EXPORT_SYMBOL_GPL(skb_consume_udp);