On Wed, Dec 5, 2018 at 7:57 AM Jiri Wiesner <jwies...@suse.com> wrote: > > The *_frag_reasm() functions are susceptible to miscalculating the byte > count of packet fragments in case the truesize of a head buffer changes. > The truesize member may be changed by the call to skb_unclone(), leaving > the fragment memory limit counter unbalanced even if all fragments are > processed. This miscalculation goes unnoticed as long as the network > namespace which holds the counter is not destroyed. > > Should an attempt be made to destroy a network namespace that holds an > unbalanced fragment memory limit counter the cleanup of the namespace > never finishes. The thread handling the cleanup gets stuck in > inet_frags_exit_net() waiting for the percpu counter to reach zero. The > thread is usually in running state with a stacktrace similar to: > > PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" > #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 > #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b > #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c > #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 > #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 > #10 [ffff880621563e38] process_one_work at ffffffff81096f14 > > It is not possible to create new network namespaces, and processes > that call unshare() end up being stuck in uninterruptible sleep state > waiting to acquire the net_mutex. > > The bug was observed in the IPv6 netfilter code by Per Sundstrom. > I thank him for his analysis of the problem. The parts of this patch > that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. > > Signed-off-by: Jiri Wiesner <jwies...@suse.com> > Reported-by: Per Sundstrom <per.sundst...@redqube.se> > ---
Acked-by: Peter Oskolkov <p...@google.com> > net/ipv4/ip_fragment.c | 7 +++++++ > net/ipv6/netfilter/nf_conntrack_reasm.c | 8 +++++++- > net/ipv6/reassembly.c | 8 +++++++- > 3 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c > index d6ee343fdb86..aa0b22697998 100644 > --- a/net/ipv4/ip_fragment.c > +++ b/net/ipv4/ip_fragment.c > @@ -515,6 +515,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff > *skb, > struct rb_node *rbn; > int len; > int ihlen; > + int delta; > int err; > u8 ecn; > > @@ -556,10 +557,16 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff > *skb, > if (len > 65535) > goto out_oversize; > > + delta = - head->truesize; > + > /* Head of list must not be cloned. */ > if (skb_unclone(head, GFP_ATOMIC)) > goto out_nomem; > > + delta += head->truesize; > + if (delta) > + add_frag_mem_limit(qp->q.net, delta); > + > /* If the first fragment is fragmented itself, we split > * it to two chunks: the first with data and paged part > * and the second, holding only fragments. */ > diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c > b/net/ipv6/netfilter/nf_conntrack_reasm.c > index d219979c3e52..181da2c40f9a 100644 > --- a/net/ipv6/netfilter/nf_conntrack_reasm.c > +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c > @@ -341,7 +341,7 @@ static bool > nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct > net_device *dev) > { > struct sk_buff *fp, *head = fq->q.fragments; > - int payload_len; > + int payload_len, delta; > u8 ecn; > > inet_frag_kill(&fq->q); > @@ -363,10 +363,16 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff > *prev, struct net_devic > return false; > } > > + delta = - head->truesize; > + > /* Head of list must not be cloned. */ > if (skb_unclone(head, GFP_ATOMIC)) > return false; > > + delta += head->truesize; > + if (delta) > + add_frag_mem_limit(fq->q.net, delta); > + > /* If the first fragment is fragmented itself, we split > * it to two chunks: the first with data and paged part > * and the second, holding only fragments. */ > diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c > index 5c3c92713096..aa26c45486d9 100644 > --- a/net/ipv6/reassembly.c > +++ b/net/ipv6/reassembly.c > @@ -281,7 +281,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct > sk_buff *prev, > { > struct net *net = container_of(fq->q.net, struct net, ipv6.frags); > struct sk_buff *fp, *head = fq->q.fragments; > - int payload_len; > + int payload_len, delta; > unsigned int nhoff; > int sum_truesize; > u8 ecn; > @@ -322,10 +322,16 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct > sk_buff *prev, > if (payload_len > IPV6_MAXPLEN) > goto out_oversize; > > + delta = - head->truesize; > + > /* Head of list must not be cloned. */ > if (skb_unclone(head, GFP_ATOMIC)) > goto out_oom; > > + delta += head->truesize; > + if (delta) > + add_frag_mem_limit(fq->q.net, delta); > + > /* If the first fragment is fragmented itself, we split > * it to two chunks: the first with data and paged part > * and the second, holding only fragments. */ > -- > 2.16.4 >