Dominique Martinet wrote on Sun, Aug 05, 2018: > (I'm not sure about offset, since we pass the full skb to parse message, > wouldn't it look at the start of the buffer everytime? Well, offset > seems to be 0 everytime the first time that check fails so I can > probably ignore that for now...)
Oh, this might actually not have been such a bad remark; if I have the client write two "messages" in a single write() kcm seems to reliably fail the same way... Conversely, if I setsockopt(s, IPPROTO_TCP, TCP_NODELAY...) on the sender socket, *and* make it wait till the kcm socket has been created to start sending data, then it dramatically reduces the probability of this happening (I had to let the reproducer run in a loop for 5 minutes, wheras it used to happen within seconds). So I think the problem is packet aggregation, and strparser not handling that properly... The first packet still fails with TCP_NODELAY but there's probably aggregation on the recv side as well before the socket is attached to the multiplexor... I guess the low probability failure that is still happening could be similar. (I also noticed that I've mistakedly believed that the problem was the first packet contained the 2nd packet's data because of an off-by-one in the receiver, it really is the second packet, it only has the first packet's length) I've moved my check from kcm_rcv_strparser to just before parse_msg in __strp_recv and surely enough it fails everytime there's an offset. It's getting late but I'll try adding a pskb_pull in there tomorrow, it would be better to make the bpf program start with an offset but I don't think that'll be easy to change... I'm almost done spamming, thanks for being a good rubber duck! :p -- Dominique Martinet