On Wed, 2018-06-06 at 16:28 +0300, Kirill Tkhai wrote:
> On 06.06.2018 16:16, Paolo Abeni wrote:
> > KCM removes the packets from sk_receive_queue in requeue_rx_msgs()
> >
> > without acquiring any lock. Moreover, in R() when the MSG_PEEK
> > flag is not present, the skb is peeked and dequeued with two
> > separate, non-atomic, calls.
> >
> > The above create room for races, which SYZBOT has been able to
> > exploit, causing list corruption and kernel oops:
> >
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] SMP KASAN
> > Dumping ftrace buffer:
> > (ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 8484 Comm: syz-executor919 Not tainted 4.17.0-rc7+ #74
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__skb_unlink include/linux/skbuff.h:1844 [inline]
> > RIP: 0010:skb_unlink+0xc1/0x160 net/core/skbuff.c:2921
> > RSP: 0018:ffff8801d012f6f0 EFLAGS: 00010002
> > RAX: 0000000000000286 RBX: ffff8801d6e073c0 RCX: 0000000000000001
> > RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000008
> > RBP: ffff8801d012f718 R08: ffffed0038bb3b6d R09: ffffed0038bb3b6c
> > R10: ffffed0038bb3b6c R11: ffff8801c5d9db63 R12: 0000000000000000
> > R13: 0000000000000000 R14: ffff8801c5d9db60 R15: ffff8801d012fce0
> > FS: 0000000000ab7880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020e5b000 CR3: 00000001c31fb000 CR4: 00000000001406f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > kcm_recvmsg+0x48d/0x590 net/kcm/kcmsock.c:1160
> > sock_recvmsg_nosec+0x8c/0xb0 net/socket.c:802
> > ___sys_recvmsg+0x2b6/0x680 net/socket.c:2279
> > __sys_recvmmsg+0x2f9/0xb80 net/socket.c:2391
> > do_sys_recvmmsg+0xe4/0x190 net/socket.c:2472
> > __do_sys_recvmmsg net/socket.c:2485 [inline]
> > __se_sys_recvmmsg net/socket.c:2481 [inline]
> > __x64_sys_recvmmsg+0xbe/0x150 net/socket.c:2481
> > do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x4417a9
> > RSP: 002b:00007ffe27282838 EFLAGS: 00000206 ORIG_RAX: 000000000000012b
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004417a9
> > RDX: 00000000040000f7 RSI: 00000000200002c0 RDI: 0000000000000006
> > RBP: 0000000000000000 R08: 0000000020000200 R09: 00007ffe272829f8
> > R10: 0000000000000060 R11: 0000000000000206 R12: 00000000000001f3
> > R13: 000000000001f871 R14: 0000000000000000 R15: 0000000000000000
> > Code: 00 00 00 49 8d 7d 08 4c 8b 63 08 48 ba 00 00 00 00 00 fc ff df 48 c7
> > 43 08 00 00 00 00 48 89 f9 48 c7 03 00 00 00 00 48 c1 e9 03 <80> 3c 11 00
> > 75 5b 4c 89 e1 4d 89 65 08 48 ba 00 00 00 00 00 fc
> > RIP: __skb_unlink include/linux/skbuff.h:1844 [inline] RSP: ffff8801d012f6f0
> > RIP: skb_unlink+0xc1/0x160 net/core/skbuff.c:2921 RSP: ffff8801d012f6f0
> >
> > To fix the above, we need to use the locked version of the socket dequeue
> > helper in requeue_rx_msgs() and kcm_wait_data is changed to dequeue
> > the available skb when not peeking.
> >
> > RFC -> v1:
> > - use skb_dequeue(), as suggested by Tom
> > - explicitly close the race between skb_peek and skb_unlink
> >
> > Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
> > Reported-and-tested-by:
> > [email protected]
> > Signed-off-by: Paolo Abeni <[email protected]>
> > ---
> > This is an RFC, since I'm really new to this area, anyway the syzport
> > reported success in testing the proposed fix.
> > This is very likely a scenario where the upcoming skb->prev,next ->
> > list_head
> > conversion would have helped a lot, thanks to list poisoning and list debug
> > ---
> > net/kcm/kcmsock.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
> > index d3601d421571..dd2d02bb35ae 100644
> > --- a/net/kcm/kcmsock.c
> > +++ b/net/kcm/kcmsock.c
> > @@ -223,7 +223,7 @@ static void requeue_rx_msgs(struct kcm_mux *mux, struct
> > sk_buff_head *head)
> > struct sk_buff *skb;
> > struct kcm_sock *kcm;
> >
> > - while ((skb = __skb_dequeue(head))) {
> > + while ((skb = skb_dequeue(head))) {
>
> I try to find how the patch protects against the following race:
>
> requeue_rx_msgs() kcm_recvmsg()
> skb = skb_dequeue() skb = kcm_wait_data(peek = true)
> ... ...
> free skb ...
> ... skb_copy_datagram_msg(skb) <--- Use after free?
>
> Isn't there possible a use-after-free?
You are right, this patch does not fix the above race: is addressing a
different one, when recvmsg() is not peeking.
The race itself is not introduced by this code, and I think a separate
patch for the the above would be better (we probably need to increment
the skb reference count while peeking and consume the skb after the
copy)
Cheers,
Paolo