On 2019-06-21 1:58 p.m., Joe Stringer wrote:
Hi folks, picking this up again..
[..]
During LSFMM, it seemed like no-one knew quite why the skb_orphan() is
necessary in that path in the current version of the code, and that we
may be able to remove it. Florian, I know you weren't in the room for
that discussion, so raising it again now with a stack trace, Do you
have some sense what's going on here and whether there's a path
towards removing it from this path or allowing the skb->sk to be
retained during ip_rcv() in some conditions?


Sorry - I havent followed the discussion but saw your email over
the weekend and wanted to be at work to refresh my memory on some
code. For maybe 2-3 years we have deployed the tproxy
equivalent as a tc action on ingress (with no netfilter dependency).

And, of course, we had to work around that specific code you are
referring to - we didnt remove it. The tc action code increments
the sk refcount and sets the tc index. The net core doesnt orphan
the skb if a speacial tc index value is set (see attached patch)

I never bothered up streaming the patch because the hack is a bit embarrassing (but worked ;->); and never posted the action code
either because i thought this was just us that had this requirement.
I am glad other people see the need for this feature. Is there effort
to make this _not_ depend on iptables/netfilter? I am guessing if you
want to do this from ebpf (tc or xdp) that is a requirement.
Our need was with tcp at the time; so left udp dependency on netfilter
alone.

cheers,
jamal
commit 4d130b0a883b4aebc36a88ca116746594e176c6a
Author: Jamal Hadi Salim <h...@mojatatu.com>
Date:   Fri Nov 25 15:45:48 2016 -0400

    transparent proxy workaround so we can get the tcaction to work

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index fa2dc8f692c6..29b303dbbfd9 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -482,8 +482,11 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
 	IPCB(skb)->iif = skb->skb_iif;
 
-	/* Must drop socket now because of tproxy. */
-	skb_orphan(skb);
+	/* Must drop socket now because of tproxy,
+	 * if we didnt set it already as usable
+	 * */
+	if(skb->tc_index != 0xFFFF)
+		skb_orphan(skb);
 
 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
 		       net, NULL, skb, dev, NULL,
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9ee208a348f5..10148f2eec03 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -77,12 +77,16 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 	u32 pkt_len;
 	struct inet6_dev *idev;
 	struct net *net = dev_net(skb->dev);
+	struct sock *orig_sk = NULL;
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		kfree_skb(skb);
 		return NET_RX_DROP;
 	}
 
+	if(skb->tc_index == 0xFFFF)
+		orig_sk = skb->sk;
+
 	rcu_read_lock();
 
 	idev = __in6_dev_get(skb->dev);
@@ -202,8 +206,17 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 
 	rcu_read_unlock();
 
+	if (skb->tc_index == 0xFFFF && !skb->sk && orig_sk)
+	{
+		skb_orphan(skb);
+		skb->sk = orig_sk;
+		skb->destructor = sock_edemux;
+		atomic_inc_not_zero(&skb->sk->sk_refcnt);
+	}
+
 	/* Must drop socket now because of tproxy. */
-	skb_orphan(skb);
+	if(skb->tc_index != 0xFFFF)
+		skb_orphan(skb);
 
 	return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		       net, NULL, skb, dev, NULL,

Reply via email to