John Hurley <john.hur...@netronome.com> wrote: > TC hooks allow the application of filters and actions to packets at both > ingress and egress of the network stack. It is possible, with poor > configuration, that this can produce loops whereby an ingress hook calls > a mirred egress action that has an egress hook that redirects back to > the first ingress etc. The TC core classifier protects against loops when > doing reclassifies but there is no protection against a packet looping > between multiple hooks. This can lead to stack overflow panics among other > things. > > Previous versions of the kernel (<4.2) had a TTL count in the tc_verd skb > member that protected against loops. This was removed and the tc_verd > variable replaced by bit fields. > > Extend the TC fields in the skb with an additional 2 bits to store the TC > hop count. This should use existing allocated memory in the skb. > > Add the checking and setting of the new hop count to the act_mirred file > given that it is the source of the loops. This means that the code > additions are not in the main datapath. > > v1->v2 > - change from per cpu counter to per skb tracking (Jamal) > - move check/update from fast path to act_mirred (Daniel) > > Signed-off-by: John Hurley <john.hur...@netronome.com> > --- > include/linux/skbuff.h | 2 ++ > net/sched/act_mirred.c | 9 +++++++++ > 2 files changed, 11 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2ee5e63..f0dbc5b 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -645,6 +645,7 @@ typedef unsigned char *sk_buff_data_t; > * @tc_at_ingress: used within tc_classify to distinguish in/egress > * @tc_redirected: packet was redirected by a tc action > * @tc_from_ingress: if tc_redirected, tc_at_ingress at time of redirect > + * @tc_hop_count: hop counter to prevent packet loops > * @peeked: this packet has been seen already, so stats have been > * done for it, don't do them again > * @nf_trace: netfilter packet trace flag > @@ -827,6 +828,7 @@ struct sk_buff { > __u8 tc_at_ingress:1; > __u8 tc_redirected:1; > __u8 tc_from_ingress:1; > + __u8 tc_hop_count:2;
I dislike this, why can't we just use a pcpu counter? The only problem is with recursion/nesting; whenever we hit something that queues the skb for later we're safe. We can't catch loops in real (physical) setups either, e.g. bridge looping back on itself.