On 2019-06-21 1:58 p.m., Joe Stringer wrote:
Hi folks, picking this up again..
[..]
During LSFMM, it seemed like no-one knew quite why the skb_orphan() is
necessary in that path in the current version of the code, and that we
may be able to remove it. Florian, I know you weren't in the room for
that discussion, so raising it again now with a stack trace, Do you
have some sense what's going on here and whether there's a path
towards removing it from this path or allowing the skb->sk to be
retained during ip_rcv() in some conditions?
Sorry - I havent followed the discussion but saw your email over
the weekend and wanted to be at work to refresh my memory on some
code. For maybe 2-3 years we have deployed the tproxy
equivalent as a tc action on ingress (with no netfilter dependency).
And, of course, we had to work around that specific code you are
referring to - we didnt remove it. The tc action code increments
the sk refcount and sets the tc index. The net core doesnt orphan
the skb if a speacial tc index value is set (see attached patch)
I never bothered up streaming the patch because the hack is a bit
embarrassing (but worked ;->); and never posted the action code
either because i thought this was just us that had this requirement.
I am glad other people see the need for this feature. Is there effort
to make this _not_ depend on iptables/netfilter? I am guessing if you
want to do this from ebpf (tc or xdp) that is a requirement.
Our need was with tcp at the time; so left udp dependency on netfilter
alone.
cheers,
jamal
commit 4d130b0a883b4aebc36a88ca116746594e176c6a
Author: Jamal Hadi Salim <h...@mojatatu.com>
Date: Fri Nov 25 15:45:48 2016 -0400
transparent proxy workaround so we can get the tcaction to work
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index fa2dc8f692c6..29b303dbbfd9 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -482,8 +482,11 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
IPCB(skb)->iif = skb->skb_iif;
- /* Must drop socket now because of tproxy. */
- skb_orphan(skb);
+ /* Must drop socket now because of tproxy,
+ * if we didnt set it already as usable
+ * */
+ if(skb->tc_index != 0xFFFF)
+ skb_orphan(skb);
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
net, NULL, skb, dev, NULL,
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9ee208a348f5..10148f2eec03 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -77,12 +77,16 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
u32 pkt_len;
struct inet6_dev *idev;
struct net *net = dev_net(skb->dev);
+ struct sock *orig_sk = NULL;
if (skb->pkt_type == PACKET_OTHERHOST) {
kfree_skb(skb);
return NET_RX_DROP;
}
+ if(skb->tc_index == 0xFFFF)
+ orig_sk = skb->sk;
+
rcu_read_lock();
idev = __in6_dev_get(skb->dev);
@@ -202,8 +206,17 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
rcu_read_unlock();
+ if (skb->tc_index == 0xFFFF && !skb->sk && orig_sk)
+ {
+ skb_orphan(skb);
+ skb->sk = orig_sk;
+ skb->destructor = sock_edemux;
+ atomic_inc_not_zero(&skb->sk->sk_refcnt);
+ }
+
/* Must drop socket now because of tproxy. */
- skb_orphan(skb);
+ if(skb->tc_index != 0xFFFF)
+ skb_orphan(skb);
return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
net, NULL, skb, dev, NULL,