On 09/11/2015 07:11 AM, David Miller wrote: ...
Looking more deeply into this, I think we have the same exact problem with netlink skbs that use vmalloc memory at skb->head.
Yes, agreed, the test in the patch covered those as well via: if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head))
We have a special hack, but only when netlink_skb_clone() is used, to preserve the destructor. But these skbs can escape anywhere and be eventually cloned as we have seen with the mmap stuff.
Yes, it looks like they are currently only used from user space to kernel space. I saw that 3a36515f7294 ("netlink: fix splat in skb_clone with large messages") fixed a couple of these in upper layers with regards to large skbs, so there's a chance that this can be overseen rather easily as well in other places and then only come to light in cases where we allocate more than NLMSG_GOODSIZE, so we don't actually use the alloc_skb() path. :/ So I like your idea below!
I'm wondering if we should do something more precise to fix this, and in a way that handles both the mmap and vmalloc cases.
Perhaps it might also be useful if the kernel would one day want to use netlink_alloc_large_skb() for answers back to user space, or in places where we use network skbs (haven't looked into it with regards to this). Some more comments below:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2738d35..77b804c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -584,8 +584,9 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - xmit_more:1; - /* one bit hole */ + xmit_more:1, + clone_preserves_destructor;
( Nit: maybe as clone_preserves_destructor:1 )
+ kmemcheck_bitfield_end(flags1); /* fields enclosed in headers_start/headers_end are copied diff --git a/net/core/skbuff.c b/net/core/skbuff.c index dad4dd3..4a7b8e3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -825,7 +825,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len; n->cloned = 1; n->nohdr = 0; - n->destructor = NULL; + if (!skb->clone_preserves_destructor) + n->destructor = NULL;
I think we also need here: else C(destructor);
C(tail); C(end); C(head);
[...]
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 7f86d3b..214f1a1 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -719,6 +719,7 @@ static void netlink_ring_setup_skb(struct sk_buff *skb, struct sock *sk, skb->end = skb->tail + size; skb->len = 0; + skb->clone_preserves_destructor = 1; skb->destructor = netlink_skb_destructor; NETLINK_CB(skb).flags |= NETLINK_SKB_MMAPED; NETLINK_CB(skb).sk = sk; @@ -854,6 +855,14 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb) #define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, scm) 0 #endif /* CONFIG_NETLINK_MMAP */
One more thing that came to my mind. For netlink mmap skbs, the skb->clone_preserves_destructor is actually not enough. Already calling into skb_clone() is an issue itself, as the data area is user space buffer, and skb_clone() as well as skb_copy() access skb_shinfo() area. :/ So in that regard netlink mmap skbs are even further restrictive on what we can do than netlink large skbs. Best, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html