[tcpdump-workers] Re: Flush OS buffer before termination

2024-10-20 Thread Guy Harris



> On Oct 20, 2024, at 12:11 AM, Garri Djavadyan  wrote:
> 
> On Sat, 2024-10-19 at 23:58 -0700, Guy Harris wrote:
>> On Oct 19, 2024, at 5:01 PM, Garri Djavadyan 
>> wrote:
>> 
>>> I am looking for a way to force tcpdump flush Linux OS buffer
>>> before
>>> terminating. I have checked the man page and the mailing list
>>> archives
>>> but did not manage to find anything related.
>>> 
>>> When I terminate tcpdump process with SIGINT or SIGTERM, the
>>> process
>>> quits immediately, leaving packets in the buffer. I know that the
>>> signal USR2 forces the buffer to be flushed, but it does stop
>>> filling
>>> the buffer and the process remains active.
>>> 
>>> I have to use a very big buffer with a very slow storage, much
>>> slower
>>> than the rate of coming packets received by the filter, and it is
>>> preferred not to lose a single packet after initiating termination
>>> the
>>> process.
>> 
>> OK, so is the buffer to which you're referring the buffer that holds
>> captured packets for tcpdump to read, i.e. the *input* buffer for
>> tcpdump, rather than, for example, the standard I/O buffer containing
>> packet dissection text to be printed or the I/O buffer containing
>> packets to be written to the file specified by -w, i.e. an *output*
>> buffer for tcpdump?
> 
> Correct. I meant the input buffer, specified with the -B flag.

OK, so by "flushing" the buffer - which, for an input buffer, usually means 
discarding everything that's in the buffer and, for an output buffer, usually 
means writing the buffer contents to the target file - you meant "draining" the 
buffer, as in "processing all the packets in the buffer".

> When I terminate tcpdump process with SIGINT or SIGTERM, the process
> quits immediately, leaving packets in the buffer. I know that the
> signal USR2 forces the buffer to be flushed, but it does stop filling
> the buffer and the process remains active.

No, SIGUSR2 flushes the *output* buffer for the file being written to with -w.  
The tcpdump man page does not make that clear; I will update it to do so.

> I have to use a very big buffer with a very slow storage, much slower
> than the rate of coming packets received by the filter, and it is
> preferred not to lose a single packet after initiating termination the
> process.

What do you mean by "with a very slow storage"?  You can set the size with -B, 
but that just tells the capture mechanism in the kernel how big a buffer to 
allocate.  It's not as if it tells it to be stored in some slower form of 
memory.

> There are a few options to overcome the problem. For example,
> by dumping packets to the memory storage first (e.g. /dev/shm)

Presumably meaning you specified "-w /dev/shm" or something such as that?

If so, how does that make a difference?

> Still, I wonder if this can be done by tcpdump itself.

That would require that tcpdump be able to tell the capture mechanism to stop 
capturing packets; otherwise, tcpdump could continue reading packets from the 
buffer an processing them, but it's not as if the capture mechanism will stop 
adding packets to the buffer, so that would behave as if tcpdump continued 
capturing.

There is no current mechanism in libpcap by which tcpdump (or any other program 
using libpcap to capture networking traffic, e.e. Wireshark) can indicate to 
libpcap that it doesn't want any *more* packets from the network device, but 
wants to be able to keep reading from the packets already *in* the buffer until 
the last packet has been retrieved. That means tcpdump can't be told to do that 
with any existing version of libpcap.
___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[tcpdump-workers] Re: Flush OS buffer before termination

2024-10-20 Thread Garri Djavadyan
On Sat, 2024-10-19 at 23:58 -0700, Guy Harris wrote:
> On Oct 19, 2024, at 5:01 PM, Garri Djavadyan 
> wrote:
> 
> > I am looking for a way to force tcpdump flush Linux OS buffer
> > before
> > terminating. I have checked the man page and the mailing list
> > archives
> > but did not manage to find anything related.
> > 
> > When I terminate tcpdump process with SIGINT or SIGTERM, the
> > process
> > quits immediately, leaving packets in the buffer. I know that the
> > signal USR2 forces the buffer to be flushed, but it does stop
> > filling
> > the buffer and the process remains active.
> > 
> > I have to use a very big buffer with a very slow storage, much
> > slower
> > than the rate of coming packets received by the filter, and it is
> > preferred not to lose a single packet after initiating termination
> > the
> > process.
> 
> OK, so is the buffer to which you're referring the buffer that holds
> captured packets for tcpdump to read, i.e. the *input* buffer for
> tcpdump, rather than, for example, the standard I/O buffer containing
> packet dissection text to be printed or the I/O buffer containing
> packets to be written to the file specified by -w, i.e. an *output*
> buffer for tcpdump?

Correct. I meant the input buffer, specified with the -B flag.

Regards,
Garri
___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[tcpdump-workers] Re: Flush OS buffer before termination

2024-10-20 Thread Guy Harris
On Oct 20, 2024, at 2:57 AM, Garri Djavadyan  wrote:

>>> I have to use a very big buffer with a very slow storage, much
>>> slower
>>> than the rate of coming packets received by the filter, and it is
>>> preferred not to lose a single packet after initiating termination
>>> the
>>> process.
>> 
>> What do you mean by "with a very slow storage"?  You can set the size
>> with -B, but that just tells the capture mechanism in the kernel how
>> big a buffer to allocate.  It's not as if it tells it to be stored in
>> some slower form of memory.
> 
> Let me show an example. To demonstrate the issue, I am generating 2MB/s
> stream of dummy packets:
> 
> [src]# pv -L 2M /dev/zero | dd bs=1472 > /dev/udp/192.168.0.1/12345
> 
> 
> and dumping them to a storage, with cgroup-v2-restricted write speed of
> 1MB/s:
> 
> [dst]# lsblk /dev/loop0
> NAME  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> loop0   7:00  3.9G  0 loop /mnt/test
> 
> [dst]# cat /sys/fs/cgroup/test/io.max
> 7:0 rbps=max wbps=1024000 riops=max wiops=max
> 
> 
> To temporarily avoid kernel-level drops,

Emphasis on *temporarily* - 2MB/s worth of packet data can only be saved in its 
entirety if you have 2MB/s or greater write speed.

> it is clearly seen that the input buffer is being filled at 1MB/s rate
> (the diff between the generated traffic rate (2MB/s) and the writing
> speed of the storage (1MB/s):
> 
> tcpdump: 0 packets captured, 0 packets received by filter, 0 packets
> dropped by kernel
> tcpdump: 218 packets captured, 715 packets received by filter, 0
> packets dropped by kernel

On all platforms, "packets captured" means "packets read from libpcap and 
written to the capture file".

On Linux, "packets received by filter" means "packets that passed the filter" 
(rather than "packets that were run through the filter, whether or not they 
passed the filter", which is what it means on *BSD/macOS/Solaris 11/AIX; 
unfortunately, you can't get the latter value from Linux and can't get the 
former value from BSD, so that value *can't* be made to mean the same thing on 
all platforms).  It includes packets that passed the filter but could not be 
added to the buffer because the buffer was full.

On Linux, "packets dropped by kernel" means "packets that passed he filter but 
could not be added to the buffer because the buffer was full".

(The pcap_stats man page has an entire paragraph devoted to giving the message 
that the meaning of the statistics differs between platforms.)

I.e., when tcpdump exits, the difference, on Linux, between "packets received 
by filter" and "packets captured" is, indeed, "packets dropped because tcpdump 
exited without draining the packet buffer".  (On *BSD/macOS/Solaris 11/AIX, the 
latter value cannot be determined, as per the above.)

>>> There are a few options to overcome the problem. For example,
>>> by dumping packets to the memory storage first (e.g. /dev/shm)
>> 
>> Presumably meaning you specified "-w /dev/shm" or something such as
>> that?
>> 
>> If so, how does that make a difference?
> 
> I mean I can first dump packets to the lightning-fast RAM storage and
> after being done with the capturing part, copy the dump to the slow
> storage.

I.e., it means that, when you signal tcpdump to exit, it's not as far behind 
the capture mechanism with regards to writing to the capture file, because it's 
stalling less waiting for write() calls to finish (if the write rate limitation 
you mention limits the rate at which write() calls can push data to the file 
descriptor), so the "packets captured" count is larger.

> I see. Thank you so much for the explanation.
> 
> Do you think this case can justify feature requests both for libpcap
> and tcpdump on github?

Yes, as it means that tcpdump (and, potentially, other programs such as 
Wireshark) can write out *all* packets received before being told to stop 
capturing.

The implementations for various platforms would probably have to 1) set a "drop 
all packets" filter on the capture device, 2) possibly put the capture device 
in non-blocking mode (as there's no point in blocking, as no more packets will 
be seen), and 3) cause the packet processing loop in libpcap to quit as soon as 
 it finds that there are no more packets available to read.  For programs using 
pcap_loop(), that should be transparent; for programs using pcap_dispatch(), 
they would have to treat a return value of 0, if they've put the capture device 
in "draining mode", as meaning "done" rather than "the packet buffer timeout 
expired and no packets were provided, keep capturing".

tcpdump uses pcap_loop(), so it'd only have to be changed to use the new "stop 
capturing" API.
___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[tcpdump-workers] Re: Flush OS buffer before termination

2024-10-20 Thread Garri Djavadyan
On Sun, 2024-10-20 at 10:27 -0700, Guy Harris wrote:
> On Oct 20, 2024, at 2:57 AM, Garri Djavadyan 
> wrote:
> 
> > > > I have to use a very big buffer with a very slow storage, much
> > > > slower
> > > > than the rate of coming packets received by the filter, and it
> > > > is
> > > > preferred not to lose a single packet after initiating
> > > > termination
> > > > the
> > > > process.
> > > 
> > > What do you mean by "with a very slow storage"?  You can set the
> > > size
> > > with -B, but that just tells the capture mechanism in the kernel
> > > how
> > > big a buffer to allocate.  It's not as if it tells it to be
> > > stored in
> > > some slower form of memory.
> > 
> > Let me show an example. To demonstrate the issue, I am generating
> > 2MB/s
> > stream of dummy packets:
> > 
> > [src]# pv -L 2M /dev/zero | dd bs=1472 > /dev/udp/192.168.0.1/12345
> > 
> > 
> > and dumping them to a storage, with cgroup-v2-restricted write
> > speed of
> > 1MB/s:
> > 
> > [dst]# lsblk /dev/loop0
> > NAME  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> > loop0   7:0    0  3.9G  0 loop /mnt/test
> > 
> > [dst]# cat /sys/fs/cgroup/test/io.max
> > 7:0 rbps=max wbps=1024000 riops=max wiops=max
> > 
> > 
> > To temporarily avoid kernel-level drops,
> 
> Emphasis on *temporarily* - 2MB/s worth of packet data can only be
> saved in its entirety if you have 2MB/s or greater write speed.

That is right. However, it also depends on how long one needs to
mediate mismatching rates using a large input buffer. For example, with
a 2GB input buffer and 1MB/s rate difference, one could safely be
filling the buffer for more than half an hour. Safe buffer draining
would help a lot in such situations.


> > it is clearly seen that the input buffer is being filled at 1MB/s
> > rate
> > (the diff between the generated traffic rate (2MB/s) and the
> > writing
> > speed of the storage (1MB/s):
> > 
> > tcpdump: 0 packets captured, 0 packets received by filter, 0
> > packets
> > dropped by kernel
> > tcpdump: 218 packets captured, 715 packets received by filter, 0
> > packets dropped by kernel
> 
> On all platforms, "packets captured" means "packets read from libpcap
> and written to the capture file".
> 
> On Linux, "packets received by filter" means "packets that passed the
> filter" (rather than "packets that were run through the filter,
> whether or not they passed the filter", which is what it means on
> *BSD/macOS/Solaris 11/AIX; unfortunately, you can't get the latter
> value from Linux and can't get the former value from BSD, so that
> value *can't* be made to mean the same thing on all platforms).  It
> includes packets that passed the filter but could not be added to the
> buffer because the buffer was full.
> 
> On Linux, "packets dropped by kernel" means "packets that passed he
> filter but could not be added to the buffer because the buffer was
> full".
> 
> (The pcap_stats man page has an entire paragraph devoted to giving
> the message that the meaning of the statistics differs between
> platforms.)
> 
> I.e., when tcpdump exits, the difference, on Linux, between "packets
> received by filter" and "packets captured" is, indeed, "packets
> dropped because tcpdump exited without draining the packet buffer". 
> (On *BSD/macOS/Solaris 11/AIX, the latter value cannot be determined,
> as per the above.)
> 
> > > > There are a few options to overcome the problem. For example,
> > > > by dumping packets to the memory storage first (e.g. /dev/shm)
> > > 
> > > Presumably meaning you specified "-w /dev/shm" or something such
> > > as
> > > that?
> > > 
> > > If so, how does that make a difference?
> > 
> > I mean I can first dump packets to the lightning-fast RAM storage
> > and
> > after being done with the capturing part, copy the dump to the slow
> > storage.
> 
> I.e., it means that, when you signal tcpdump to exit, it's not as far
> behind the capture mechanism with regards to writing to the capture
> file, because it's stalling less waiting for write() calls to finish
> (if the write rate limitation you mention limits the rate at which
> write() calls can push data to the file descriptor), so the "packets
> captured" count is larger.

Exactly.


> > I see. Thank you so much for the explanation.
> > 
> > Do you think this case can justify feature requests both for
> > libpcap
> > and tcpdump on github?
> 
> Yes, as it means that tcpdump (and, potentially, other programs such
> as Wireshark) can write out *all* packets received before being told
> to stop capturing.
> 
> The implementations for various platforms would probably have to 1)
> set a "drop all packets" filter on the capture device, 2) possibly
> put the capture device in non-blocking mode (as there's no point in
> blocking, as no more packets will be seen), and 3) cause the packet
> processing loop in libpcap to quit as soon as  it finds that there
> are no more packets available to read.  For programs using
> pcap_loop(), that should be transparent; for programs using
> pcap

[tcpdump-workers] Re: Flush OS buffer before termination

2024-10-20 Thread Garri Djavadyan
On Sun, 2024-10-20 at 01:03 -0700, Guy Harris wrote:
> 
> 
> > On Oct 20, 2024, at 12:11 AM, Garri Djavadyan
> >  wrote:
> > 
> > On Sat, 2024-10-19 at 23:58 -0700, Guy Harris wrote:
> > > On Oct 19, 2024, at 5:01 PM, Garri Djavadyan
> > > 
> > > wrote:
> > > 
> > > > I am looking for a way to force tcpdump flush Linux OS buffer
> > > > before
> > > > terminating. I have checked the man page and the mailing list
> > > > archives
> > > > but did not manage to find anything related.
> > > > 
> > > > When I terminate tcpdump process with SIGINT or SIGTERM, the
> > > > process
> > > > quits immediately, leaving packets in the buffer. I know that
> > > > the
> > > > signal USR2 forces the buffer to be flushed, but it does stop
> > > > filling
> > > > the buffer and the process remains active.
> > > > 
> > > > I have to use a very big buffer with a very slow storage, much
> > > > slower
> > > > than the rate of coming packets received by the filter, and it
> > > > is
> > > > preferred not to lose a single packet after initiating
> > > > termination
> > > > the
> > > > process.
> > > 
> > > OK, so is the buffer to which you're referring the buffer that
> > > holds
> > > captured packets for tcpdump to read, i.e. the *input* buffer for
> > > tcpdump, rather than, for example, the standard I/O buffer
> > > containing
> > > packet dissection text to be printed or the I/O buffer containing
> > > packets to be written to the file specified by -w, i.e. an
> > > *output*
> > > buffer for tcpdump?
> > 
> > Correct. I meant the input buffer, specified with the -B flag.
> 
> OK, so by "flushing" the buffer - which, for an input buffer, usually
> means discarding everything that's in the buffer and, for an output
> buffer, usually means writing the buffer contents to the target file
> - you meant "draining" the buffer, as in "processing all the packets
> in the buffer".

Thank you for the correction. Indeed, I should have used "draining"
here.


> > When I terminate tcpdump process with SIGINT or SIGTERM, the
> > process
> > quits immediately, leaving packets in the buffer. I know that the
> > signal USR2 forces the buffer to be flushed, but it does stop
> > filling
> > the buffer and the process remains active.
> 
> No, SIGUSR2 flushes the *output* buffer for the file being written to
> with -w.  The tcpdump man page does not make that clear; I will
> update it to do so.

Hmm. I see. Thank you in advance for updating the man page.


> > I have to use a very big buffer with a very slow storage, much
> > slower
> > than the rate of coming packets received by the filter, and it is
> > preferred not to lose a single packet after initiating termination
> > the
> > process.
> 
> What do you mean by "with a very slow storage"?  You can set the size
> with -B, but that just tells the capture mechanism in the kernel how
> big a buffer to allocate.  It's not as if it tells it to be stored in
> some slower form of memory.

Let me show an example. To demonstrate the issue, I am generating 2MB/s
stream of dummy packets:

[src]# pv -L 2M /dev/zero | dd bs=1472 > /dev/udp/192.168.0.1/12345


and dumping them to a storage, with cgroup-v2-restricted write speed of
1MB/s:

[dst]# lsblk /dev/loop0
NAME  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0   7:00  3.9G  0 loop /mnt/test

[dst]# cat /sys/fs/cgroup/test/io.max
7:0 rbps=max wbps=1024000 riops=max wiops=max


To temporarily avoid kernel-level drops, I set a 1GB (sufficient to get
needed packets before it overflows) input buffer:

[dst]# tcpdump -i veth0 -w /mnt/test/udp.pcap -B 1024000


Now, if start inspecting tcpdump's stats every second:

[dst]# while true; do killall -10 tcpdump; sleep 1; done


it is clearly seen that the input buffer is being filled at 1MB/s rate
(the diff between the generated traffic rate (2MB/s) and the writing
speed of the storage (1MB/s):

tcpdump: 0 packets captured, 0 packets received by filter, 0 packets
dropped by kernel
tcpdump: 218 packets captured, 715 packets received by filter, 0
packets dropped by kernel
tcpdump: 890 packets captured, 2145 packets received by filter, 0
packets dropped by kernel
tcpdump: 1575 packets captured, 3575 packets received by filter, 0
packets dropped by kernel
tcpdump: 2246 packets captured, 5005 packets received by filter, 0
packets dropped by kernel
tcpdump: 2931 packets captured, 6435 packets received by filter, 0
packets dropped by kernel
tcpdump: 3603 packets captured, 7867 packets received by filter, 0
packets dropped by kernel
tcpdump: 4288 packets captured, 9440 packets received by filter, 0
packets dropped by kernel
tcpdump: 4960 packets captured, 10870 packets received by filter, 0
packets dropped by kernel
tcpdump: 5645 packets captured, 12300 packets received by filter, 0
packets dropped by kernel
tcpdump: 6317 packets captured, 13730 packets received by filter, 0
packets dropped by kernel
tcpdump: 6988 packets captured, 15160 packets received by filter, 0
packets dropped by kernel
tcpdump: 7675 packets captur