RX data

John Fastabend Wed, 17 Jan 2018 12:32:25 -0800

On 01/16/2018 10:20 PM, Alexei Starovoitov wrote:
> On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote:
>>
>>> but this program will see only first SG ?
>>
>> Correct, to read further into the msg we would need to have a helper
>> or some other way to catch reads/writes past the first 4k and read
>> the next sg. We have the same limitation in cls_bpf.
>>
>> I have started a patch on top of this series with the current working
>> title msg_apply_bytes(int bytes). This would let us apply a verdict to
>> a set number of bytes instead of the entire msg. By calling
>> msg_apply_bytes(data_end - data) a user who needs to read an entire msg
>> could do this in 4k chunks until the entire msg is passed through the
>> bpf prog.
> 
> good idea.
> I think would be good to add this helper as part of this patch set
> to make sure there is a way for user to look through the whole
> tcp stream if the program really wants to.


Sure I'll add it to this series.

> I understand that program cannot examine every byte anyway
> due to lack of loops and helpers, but this part of sockmap api
> should still provide an interface from day one.
> One example would be the program parsing http2 or similar
> where in the header it sees length. Then it can do
> msg_apply_bytes(length)
> to skip the bytes it processed, but still continue within
> the same 64Kbyte chunk when 0 < length < 64k
> 

Yep, this is my use case.

>>> and it's typically going to be one page only ?
>>
>> yep
>>
>>> then what's the value of waiting for MAX_SKB_FRAGS ?
>>>
>> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS
>> pages in one call if possible. It seems better to me to run this loop
>> over as much data as we can.
> 
> agree on trying to do MAX_SKB_FRAGS as a 'processing unit',
> but program should still be able to skip or redirect 
> parts of the bytes and not the whole 64k chunk.
> From program point of view it should never see or worry about
> SG list boundaries whereas right now it seems that below code
> is dealing with them (though program doesn't know where sg ends):
> 

The apply_bytes() helper allows for skip and redirect of
parts of the bytes and not the whole 64k chunk.

>> +
>> +            switch (eval) {
>> +            case __SK_PASS:
>> +                    sg_mark_end(sg + sg_num - 1);

All this does is tell the TCP sendpage call that after this sg entry
we don't have anymore data so go ahead and flush it.

>> +                    err = bpf_tcp_push(sk, sg, &sg_curr, flags, true);
>> +                    if (unlikely(err)) {
>> +                            copied -= free_sg(sk, sg, sg_curr, sg_num);
>> +                            goto out_err;
>> +                    }
>> +                    break;
>> +            case __SK_REDIRECT:
>> +                    sg_mark_end(sg + sg_num - 1);
>> +                    goto do_redir;
> ...
>>>> +static int bpf_tcp_ulp_register(void)
>>>> +{
>>>> +  tcp_bpf_proto = tcp_prot;
>>>> +  tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
>>>> +  tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
>>>> +  return tcp_register_ulp(&bpf_tcp_ulp_ops);
>>>
>>> I don't see corresponding tcp_unregister_ulp().
>>>
>>
>> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of
>> available ULPs and never removes it. To remove it we would have to
>> keep a ref count on the reg/unreg calls. This would require a couple
>> more patches to the ULP infra and not sure it hurts to leave the ULP
>> reference around...
>>
>> Maybe a follow on patch? Or else it could be a patch in front of this
>> patch.
> 
> I see. I'm ok with leaving that for latter.
> It doesn't hurt to keep it registered. Please add a comment though.
> 

OK will add comment.

Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

Reply via email to