On Thu, Oct 12, 2017 at 03:48:07PM -0700, Cong Wang wrote: > We need a real-time notification for tcp retransmission > for monitoring. > > Of course we could use ftrace to dynamically instrument this > kernel function too, however we can't retrieve the connection > information at the same time, for example perf-tools [1] reads > /proc/net/tcp for socket details, which is slow when we have > a lots of connections. > > Therefore, this patch adds a tracepoint for tcp_retransmit_skb() > and exposes src/dst IP addresses and ports of the connection. > This also makes it easier to integrate into perf. > > Note, I expose both IPv4 and IPv6 addresses at the same time: > for a IPv4 socket, v4 mapped address is used as IPv6 addresses, > for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel. > Also, add sk and skb pointers as they are useful for BPF. > > 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans > > Cc: Eric Dumazet <eduma...@google.com> > Cc: Alexei Starovoitov <alexei.starovoi...@gmail.com> > Cc: Hannes Frederic Sowa <han...@stressinduktion.org> > Cc: Brendan Gregg <brendan.d.gr...@gmail.com> > Cc: Neal Cardwell <ncardw...@google.com> > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> > --- > include/trace/events/tcp.h | 68 > ++++++++++++++++++++++++++++++++++++++++++++++ > net/core/net-traces.c | 1 + > net/ipv4/tcp_output.c | 3 ++ > 3 files changed, 72 insertions(+) > create mode 100644 include/trace/events/tcp.h > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h > new file mode 100644 > index 000000000000..749f93c542ab > --- /dev/null > +++ b/include/trace/events/tcp.h > @@ -0,0 +1,68 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM tcp > + > +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_TCP_H > + > +#include <linux/ipv6.h> > +#include <linux/tcp.h> > +#include <linux/tracepoint.h> > +#include <net/ipv6.h> > + > +TRACE_EVENT(tcp_retransmit_skb, > + > + TP_PROTO(struct sock *sk, struct sk_buff *skb, int segs), > + > + TP_ARGS(sk, skb, segs), > + > + TP_STRUCT__entry( > + __field(void *, skbaddr) > + __field(void *, skaddr) > + __field(__u16, sport) > + __field(__u16, dport) > + __array(__u8, saddr, 4) > + __array(__u8, daddr, 4) > + __array(__u8, saddr_v6, 16) > + __array(__u8, daddr_v6, 16) > + ), ... > if (likely(!err)) { > TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS; > + trace_tcp_retransmit_skb(sk, skb, segs);
looks great to me, but why 'segs' is there? It's unused.