Hello,

On Sun, 28 Jun 2015, Eric W. Biederman wrote:

> Julian Anastasov <j...@ssi.bg> writes:
> 
> > commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
> > fixes a problem where processed packet comes from device
> > with destroyed inetdev (dev->ip_ptr). This is not expected
> > because inetdev_destroy is called in NETDEV_UNREGISTER
> > phase and packets should not be processed after
> > dev_close_many() and synchronize_net(). But backlog
> > processing is deactivated later after NETDEV_UNREGISTER_FINAL
> > phase which allows CPUs to initiate processing for new
> > packets during and after the NETDEV_UNREGISTER phase.
> >
> > Fix it by moving flush_backlog after the device driver
> > is stopped and before the synchronize_net() call. This
> > allows dev->ip_ptr to be always valid during packet
> > processing.
> 
> Do you have any evidence that there are other places that care
> besides the location in ip_error?

        Examples like that are not many because code is full with
in_dev checks:

- fib_compute_spec_dst: BUG_ON(!in_dev);

        This may need additional fix...

        But the fact is that sd->process_queue can contain
many packets and the synchronize_net call can stop maximum one
packet of them. So, we can reach NETDEV_UNREGISTER chain
while at the same time process_backlog continues to deliver
more packets. Soon, we may start to work with freed dev->ip_ptr.
The window is very small: from seeing dev->ip_ptr != NULL
to using the in_device fields. We may need slow hardirq that
can interrupt the packet processing and to crash on such
use-after-free case.

> If not this patch should be a canidate for net-next which is currently
> closed until after the merge window.
> 
> The testing is difficult and I was never able to reproduce the actual
> crash.  So I do not know how much evidence should come from testing.

        I guess, chance for crash can be increased if
large value is used for /proc/sys/net/core/netdev_max_backlog
but it looks difficult to reproduce such races.

> That said this patch actually appears wrong.  Nothing in the veth driver
> that I can see will cause packets from the other side to not be
> transmitted when only one side is closed.

        veth looks correct because veth_dellink is called
with head!=NULL.

> Further even assuming that dev_close_many actually works and causes no
> new packets to be processed there is the larger problem of calling
> flush_backlog before syncrhonize_net.   Things happening in rcu
> criticial sections (aka enqueuing packets to the backlog) are not
> expected to stop until after the end of the critical section
> and we are not out of the critical section until after synchronize_net.

        You are right. So, below is my next attempt,
still not tested. I'm trying to avoid processing for
enqueued packets. Sadly, enqueue_to_backlog() is used
both on rx (eg. NAPI) and on loop (eg. loopback_xmit) but
we are only interested to avoid enqueue_to_backlog for
the loop case. And as the looped packet may not notice
the new device state, we have to execute smp_mb on all
other CPUs and to add check in fast path. I hope I'm
doing it correctly...

> So while this work looks interesting, I don't think this is a fix for
> a serious bug, and I don't think you have yet created a correct patch.
> 
> Eric

RFC patch:

Subject: [PATCHv2 net] net: do not process device backlog during
 unregistration

commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). But backlog
processing is deactivated later after NETDEV_UNREGISTER_FINAL
phase which allows CPUs to initiate processing for new
packets during and after the NETDEV_UNREGISTER phase.

As new packets can be enqueued during processing we have
to drop them by adding new netif_running check after
memory barrier executed on every CPU.

Reported-by: Vittorio Gambaletta <linuxb...@vittgam.net>
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Signed-off-by: Julian Anastasov <j...@ssi.bg>
---
 net/core/dev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index aa82f9a..28d76307 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3670,6 +3670,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, 
bool pfmemalloc)
        rcu_read_lock();
 
 another_round:
+       if (!netif_running(skb->dev))
+               goto drop;
        skb->skb_iif = skb->dev->ifindex;
 
        __this_cpu_inc(softnet_data.processed);
@@ -5992,6 +5994,11 @@ static void net_set_todo(struct net_device *dev)
        dev_net(dev)->dev_unreg_count++;
 }
 
+static void force_netif_barrier_func(void *arg)
+{
+       /* smp_mb in csd_unlock */
+}
+
 static void rollback_registered_many(struct list_head *head)
 {
        struct net_device *dev, *tmp;
@@ -6029,6 +6036,8 @@ static void rollback_registered_many(struct list_head 
*head)
                dev->reg_state = NETREG_UNREGISTERING;
        }
 
+       /* Make sure other CPUs see the new netif_running state */
+       on_each_cpu(force_netif_barrier_func, NULL, 1);
        synchronize_net();
 
        list_for_each_entry(dev, head, unreg_list) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to