On 01/16/2018 10:20 PM, Alexei Starovoitov wrote: > On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote: >> >>> but this program will see only first SG ? >> >> Correct, to read further into the msg we would need to have a helper >> or some other way to catch reads/writes past the first 4k and read >> the next sg. We have the same limitation in cls_bpf. >> >> I have started a patch on top of this series with the current working >> title msg_apply_bytes(int bytes). This would let us apply a verdict to >> a set number of bytes instead of the entire msg. By calling >> msg_apply_bytes(data_end - data) a user who needs to read an entire msg >> could do this in 4k chunks until the entire msg is passed through the >> bpf prog. > > good idea. > I think would be good to add this helper as part of this patch set > to make sure there is a way for user to look through the whole > tcp stream if the program really wants to.
Sure I'll add it to this series. > I understand that program cannot examine every byte anyway > due to lack of loops and helpers, but this part of sockmap api > should still provide an interface from day one. > One example would be the program parsing http2 or similar > where in the header it sees length. Then it can do > msg_apply_bytes(length) > to skip the bytes it processed, but still continue within > the same 64Kbyte chunk when 0 < length < 64k > Yep, this is my use case. >>> and it's typically going to be one page only ? >> >> yep >> >>> then what's the value of waiting for MAX_SKB_FRAGS ? >>> >> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS >> pages in one call if possible. It seems better to me to run this loop >> over as much data as we can. > > agree on trying to do MAX_SKB_FRAGS as a 'processing unit', > but program should still be able to skip or redirect > parts of the bytes and not the whole 64k chunk. > From program point of view it should never see or worry about > SG list boundaries whereas right now it seems that below code > is dealing with them (though program doesn't know where sg ends): > The apply_bytes() helper allows for skip and redirect of parts of the bytes and not the whole 64k chunk. >> + >> + switch (eval) { >> + case __SK_PASS: >> + sg_mark_end(sg + sg_num - 1); All this does is tell the TCP sendpage call that after this sg entry we don't have anymore data so go ahead and flush it. >> + err = bpf_tcp_push(sk, sg, &sg_curr, flags, true); >> + if (unlikely(err)) { >> + copied -= free_sg(sk, sg, sg_curr, sg_num); >> + goto out_err; >> + } >> + break; >> + case __SK_REDIRECT: >> + sg_mark_end(sg + sg_num - 1); >> + goto do_redir; > ... >>>> +static int bpf_tcp_ulp_register(void) >>>> +{ >>>> + tcp_bpf_proto = tcp_prot; >>>> + tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg; >>>> + tcp_bpf_proto.sendpage = bpf_tcp_sendpage; >>>> + return tcp_register_ulp(&bpf_tcp_ulp_ops); >>> >>> I don't see corresponding tcp_unregister_ulp(). >>> >> >> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of >> available ULPs and never removes it. To remove it we would have to >> keep a ref count on the reg/unreg calls. This would require a couple >> more patches to the ULP infra and not sure it hurts to leave the ULP >> reference around... >> >> Maybe a follow on patch? Or else it could be a patch in front of this >> patch. > > I see. I'm ok with leaving that for latter. > It doesn't hurt to keep it registered. Please add a comment though. > OK will add comment.