On 08/14/2015 12:01 PM, Daniel Borkmann wrote:
On 08/14/2015 10:58 AM, Ken-ichirou MATSUZAWA wrote:
  Hi,

Thank you for taking your time.
Please let me explain these with code samples on gist.
I can not describe and arrange it well, sorry.

     normal socket nflog sample:
     
https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/nflog.c

set iptables

     iptables -A INPUT -p icmp --icmp-type echo-request \
         -j NFLOG --nflog-group 2 --nflog-threshold 4

monitor nlmon (like netsniff-ng), run this sample and
ping -i 0.2 -c 10 from another hosts. This sample only shows receive
size and nlmsg_type. Same things can be done with rx mmaped socket.

     rx only mmaped nflog sample:
     
https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/rxring-nflog.c

This sample gets a panic if monitoring nlmon.

     panic message:
     
https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/mmaped_netlink_panic

I think it's because of accessing a skb_shared_info when releasing
skb, although mmaped netlink skb does not have a skb_shared_info. I
tried to fix this at patch 1 and 2 by introducing helper function
which will not access a skb_shared_info.

And I think nm_status should be set to UNUSED when releasing it so
also tried to fix it patch 3.

Ok, I'm trying to understand the issue: you are saying that whenever
there's an skb_clone on an netlink mmaped skb, we have the situation
that skb->data, skb->head etc points to the mmaped user space buffer
slot, and thus _must_ have no shared info.

Currently, what happens is that the shared info accesses whatever
memory is there in the mmaped region. So when you already do an
skb_clone() you should already get into trouble right there f.e. when
we test for orphaning frags etc (if at the right offset in the mmap
buffer, the tx_flags member would contain a SKBTX_DEV_ZEROCOPY bit).

Ken-ichirou, have you observed this issue only in relation to nlmon?
Haven't checked yet if there are any upper layer netlink consumers that
would call for some reason into skb_clone() as well. I am thinking that
if taps are indeed the only ones affected, it might probably not be
worth adding that much complexity for a fix itself, but to keep it simple
instead. I don't know if there are any real users of netlink mmap, but
if it would really be needed, we could think about it on a net-next basis?
It seems you have some other, separate fixes in your series, so you might
want to submit them separately against the net tree, instead?

 include/linux/netlink.h  |  4 ++++
 net/netlink/af_netlink.c | 12 +++++++-----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 9120edb..42cdcd8 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -35,6 +35,10 @@ struct netlink_skb_parms {
 #define NETLINK_CB(skb)                (*(struct 
netlink_skb_parms*)&((skb)->cb))
 #define NETLINK_CREDS(skb)     (&NETLINK_CB((skb)).creds)

+static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb)
+{
+       return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
+}

 extern void netlink_table_grab(void);
 extern void netlink_table_ungrab(void);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 67d2104..4307446 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -238,6 +238,13 @@ static void __netlink_deliver_tap(struct sk_buff *skb)

 static void netlink_deliver_tap(struct sk_buff *skb)
 {
+       /* Netlink mmaped skbs must not access shared info, and thus
+        * are not allowed to be cloned. For now, just don't allow
+        * them to get inspected by taps.
+        */
+       if (netlink_skb_is_mmaped(skb))
+               return;
+
        rcu_read_lock();

        if (unlikely(!list_empty(&netlink_tap_all)))
@@ -278,11 +285,6 @@ static void netlink_rcv_wake(struct sock *sk)
 }

 #ifdef CONFIG_NETLINK_MMAP
-static bool netlink_skb_is_mmaped(const struct sk_buff *skb)
-{
-       return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
-}
-
 static bool netlink_rx_is_mmaped(struct sock *sk)
 {
        return nlk_sk(sk)->rx_ring.pg_vec != NULL;
--
1.9.3




----

With both tx/rx mmaped,

     both tx/rx mmaped nflog sample:
     
https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/ring-nflog.c

This sample will not work, since msg->msg_iter.type in
netlink_sendmsg() is set to 1 (WRITE) when this sample calls
sendto(). patch 4 fix this by accepting it.

----

After applying patch 1 and 2, rx only sample can work but it behaves
differ from normal one. patch 5 may fix this.

And it also works well with my another code which set frame
nm_status to SKIP and passes it to worker threads and the worker
threads set status to UNUSED, even though ring becomes full.

That my another code may set UNUSED status in random, not
sequensially, so that it seems I need to check whole ring.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to