On 2/19/19, 10:29 AM, "[email protected] on behalf of Eric Dumazet"
<[email protected] on behalf of [email protected]> wrote:
On 02/18/2019 09:38 PM, brakmo wrote:
> +
> +static __always_inline void get_nrm_pkt_info(struct bpf_sock *sk,
> + struct nrm_pkt_info *pkti)
> +{
> + if (sk->family == AF_INET6 || sk->family == AF_INET) {
> + pkti->is_ip = true;
> + pkti->is_tcp = (sk->protocol == IPPROTO_TCP);
> + if (pkti->is_tcp) {
> + struct bpf_tcp_sock *tp;
> +
> + tp = bpf_tcp_sock(sk);
> + if (tp)
> + pkti->ecn = tp->ecn_flags & TCP_ECN_OK;
> + else
> + pkti->ecn = 0;
> + } else {
> + pkti->ecn = 0;
> + }
> + } else {
> + pkti->is_ip = false;
> + pkti->is_tcp = false;
> + pkti->ecn = 0;
> + }
> +}
>
This looks very strange.
ECN capability is per packet, and does not need access to the original
TCP socket really.
We definitely can use ECN with UDP packets.
IMO this sample looks like a work in progress.
This sample NRM program focuses on TCP, I should have made that more explicit.
I originally was checking the ECN bits on the packet, but someone pointed out
that since I was focusing on TCP, it would be more efficient to just look at
the TCP state (saving having to read the packet header). However, your point is
correct in that I could be wrongly marking packets that I should not mark, such
as pure ACKs. I will go back to the previous version that looks at the ECN bits
in the header and not limit ECN marking to just TCP.
EDT model allows to implement full shaping (not only virtual one)
by twaking/advancing skb->tstamp
(see commit f11216b24219a bpf: add skb->tstamp r/w access from tc clsact
and cg skb progs)
I have a version of an NRM BPF program that uses EDT, so I know what you mean.
However, I decided to send it as a separate patch (including an ingress sample
program).
Implementing shaping with the need of accessing TCP sockets seems a
layering violation,
a lot of things can go wrong with this model.
For instance, you wont be able to offload this.
On the other hand, there are also advantages to having access to the TCP
socket. For example, one has more tools to affect the TCP rate (like the
proposed helper bpf_tcp_enter_cwr() which does not require dropping the
packet). We will need more experience to fully understand the tradeoffs between
the various implementations of rate limiting and/or shaping.
Thank you for your feedback.