On 09/11/2015 07:11 AM, David Miller wrote:
...
Looking more deeply into this, I think we have the same exact problem
with netlink skbs that use vmalloc memory at skb->head.

Yes, agreed, the test in the patch covered those as well via:

  if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head))

We have a special hack, but only when netlink_skb_clone() is used,
to preserve the destructor.  But these skbs can escape anywhere
and be eventually cloned as we have seen with the mmap stuff.

Yes, it looks like they are currently only used from user space to
kernel space. I saw that 3a36515f7294 ("netlink: fix splat in skb_clone
with large messages") fixed a couple of these in upper layers with
regards to large skbs, so there's a chance that this can be overseen
rather easily as well in other places and then only come to light in
cases where we allocate more than NLMSG_GOODSIZE, so we don't actually
use the alloc_skb() path. :/ So I like your idea below!

I'm wondering if we should do something more precise to fix this,
and in a way that handles both the mmap and vmalloc cases.

Perhaps it might also be useful if the kernel would one day want to
use netlink_alloc_large_skb() for answers back to user space, or in
places where we use network skbs (haven't looked into it with regards
to this).

Some more comments below:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2738d35..77b804c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -584,8 +584,9 @@ struct sk_buff {
                                fclone:2,
                                peeked:1,
                                head_frag:1,
-                               xmit_more:1;
-       /* one bit hole */
+                               xmit_more:1,
+                               clone_preserves_destructor;

( Nit: maybe as clone_preserves_destructor:1 )

+
        kmemcheck_bitfield_end(flags1);

        /* fields enclosed in headers_start/headers_end are copied
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dad4dd3..4a7b8e3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -825,7 +825,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, 
struct sk_buff *skb)
        n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
        n->cloned = 1;
        n->nohdr = 0;
-       n->destructor = NULL;
+       if (!skb->clone_preserves_destructor)
+               n->destructor = NULL;

I think we also need here:

        else
                C(destructor);

        C(tail);
        C(end);
        C(head);
[...]
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7f86d3b..214f1a1 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -719,6 +719,7 @@ static void netlink_ring_setup_skb(struct sk_buff *skb, 
struct sock *sk,
        skb->end     = skb->tail + size;
        skb->len     = 0;

+       skb->clone_preserves_destructor = 1;
        skb->destructor      = netlink_skb_destructor;
        NETLINK_CB(skb).flags |= NETLINK_SKB_MMAPED;
        NETLINK_CB(skb).sk = sk;
@@ -854,6 +855,14 @@ static void netlink_ring_set_copied(struct sock *sk, 
struct sk_buff *skb)
  #define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, scm)     0
  #endif /* CONFIG_NETLINK_MMAP */

One more thing that came to my mind. For netlink mmap skbs, the
skb->clone_preserves_destructor is actually not enough.

Already calling into skb_clone() is an issue itself, as the data area is
user space buffer, and skb_clone() as well as skb_copy() access skb_shinfo()
area. :/ So in that regard netlink mmap skbs are even further restrictive
on what we can do than netlink large skbs.

Best,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to