Re: [tcpdump-workers] some questions about TPACKET3

Mario Rugiero via tcpdump-workers Sun, 28 Jun 2020 12:24:41 -0700

--- Begin Message ---

El sáb., 27 jun. 2020 a las 23:56, Michael Richardson
(<m...@sandelman.ca>) escribió:
>
>
> Mario, can you confirm my understanding here.
>
Hi Michael.


> In TPACKET3 mode, there are (tp_block_nr) pools of memory.
> The beginning of each block is tp_block_size in size, which can
> be large numbers like 4M in size. (2^22 in the kernel documentation example).
> (We, however seem to pick a blocksize which is only just big enough to hold
> the maximum snaplen.)
>
> Each one has a linked-list of tp3_hdr, which are interleaved with the packet
> data itself.  The "next" pointer is the tp_next_offset.
> It seems from my reading of code that the kernel returns an entire chain
> of tp3_hdr to us, controlled by a *single* block_status bit.
> That is, we get entire chains of tp3_hdr from the kernel, and we return
> them to the kernel in single blocks.
>
> I think that this was not the case with tp2: in that packets were passed
> to/from the kernel one at a time, each one with their own TP_STATUS_KERNEL
> bit.
>
AFAIK all of this is correct.

> For a contract, I am trying to improve the write performance by using
> async I/O.  {I also need to associate requests and responses, which makes the
> ordering of operations non-sequential}
> I therefore do not want to give the blocks back to the kernel until the
> write has concluded, and for this I'm working on a variation of
> linux_mmap_v3(), which will callback with groups of packets, through
> a pipeline of "processors", each of which may steal the packet, and
> then return it later.
>
> I am realizing that I have to keep track of the blocks, not just the
> packets.  I guess my original conceptual thinking was too heavily
> influenced by V2, and I was thinking that V3 had changed things by
> splitting the hdr from the packet, putting the constant-sized hdrs
> into a fixed sized ring, while the packet content was allocated
> as needed.
> I see that I am mistaken, but I'd sure love confirmation.
>
I believe you may be thinking of AF_XDP. As you probably know, libpcap doesn't
have support for it (yet), but I don't think you'll have trouble using
it directly.
I worked briefly with the RX side of it, so I may be able to help you with that.
As you said, it splits headers from packets, sort of.
The packet contents are stored in blocks of a buffer called UMEM. Contrary to
PACKET_MMAP, you work with two queues per path, both containing descriptors
to find the data in UMEM. These descriptors fit the role of the headers.
For the RX side you have the FILL queue, where you store descriptors to indicate
the kernel a given block is free to use, and the RX queue, where the
kernel gives
these blocks back when a packet passes the filter[0].
The TX side has a TX queue, where you store descriptors pointing to the data you
want to send in the UMEM buffer, and a COMPLETION queue, where the kernel
gives you the blocks back for reuse after the data was sent.
IIRC, AF_XDP allows queuing packets to later send in a burst on
request, but since
I didn't work with that path, so I'm not 100% certain.

Since the UMEM blocks are fixed size and one block is used for each packet, they
consume more memory, but are much simpler to use for this and allows
out-of-order
release of resources.

[0]: AF_XDP requires eBPF filters to be installed in the kernel.

> I am also considering rewriting packet_mmap.txt :-)
>
> --
> ]               Never tell me the odds!                 | ipv6 mesh networks [
> ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> ]     m...@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    
> [
>
>
>
>

--- End Message ---

_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Re: [tcpdump-workers] some questions about TPACKET3

Reply via email to