[tcpdump-workers] Request new DLT value for Infiniband link

2012-11-11 Thread Oren Kladnitsky
Hi,

I'd like to request a new DLT/LINKTYPE value for Infiniband traffic 
(DLT_INFINIBAND).
Infiniband spec is available at:
http://members.infinibandta.org/kwspub/spec/V1r1_2_1.Release_12062007.zip
(registration required).
See "Chapter 5.: Data packet format" for the packet layout. LRH (Local Route 
Header) is the first header of all data packets.

Wireshark already has support for Infiniband dissection 
(epan/dissectors/packet-infiniband.c) under encap of DLT_ERF.

I'd like Infiniband to have a dedicated DLT for future pcapng support which 
does not require the ERF encapsulation.

Thanks,
Oren Kladnitsky
Staff engineer, Apps and Embedded group manager
Tel:   +972-74-7236370
Cell:  +972-50-7349271

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


[tcpdump-workers] mmap consumes more CPU

2012-11-11 Thread abhinav narain
hi,
  I just checked the two mechanism :
(1) Using mmap to fetch packets from kernel to userspace
(2) Using recvfrom() call to fetch packets

I see top reports
(1) 34% memory 20% cpu usage
(2) 21% memory 7% cpu usage !

I wanted a performance improvement using mmap but I am slowing my small
router for packet capture( I can't use pcap for that; I have modified
skbuff ) and its worst than twice !
Memory increase is fine.
Can anyone suggest what going on ? Or how to improve

-Abhinav Narain
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


[tcpdump-workers] regarding usage of recv calls in mmap code

2012-11-11 Thread abhinav narain
hi
I wanted to know why is MSG_PEEK used in the recv() call in mmap code
and
not recvfrom() with MSG_TRUNC flag.
The reason i am asking is .. because I see my code takes a lot of CPU
which
is due to more looping, I suppose.
The flag description for MSG_PEEK shows it doesn't discard the bytes
even
after reading from the queue.
Can someone please explain.
I would like to use recvfrom with MSG_TRUNC .. is that fine ?

-Abhinav
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] vlan tagged packets and libpcap breakage

2012-11-11 Thread Bill Fenner
On Wed, Oct 31, 2012 at 6:20 PM, Guy Harris  wrote:
>
> On Oct 31, 2012, at 2:50 PM, Ani Sinha  wrote:
>
>> pcap files that already have the tags reinsrted should work with
>> current filter code. However for live traffic, one has to get the tags
>> from CMSG() and then reinsert it back to the packet for the current
>> filter to work.
>
> *Somebody* has to do that, at least to packets that pass the filter, before 
> they're handed to a libpcap-based application, for programs that expect to 
> see packets as they arrived from/were transmitted to the wire to work.
>
> I.e., the tags *should* be reinserted by libpcap, and, as I understand it, 
> that's what the
>
> #if defined(HAVE_PACKET_AUXDATA) && 
> defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)
> ...
> #endif
>
> blocks of code in pcap-linux.c in libpcap are doing.
>
> Now, if filtering is being done in the *kernel*, and the tags aren't being 
> reinserted by the kernel, then filter code stuffed into the kernel would need 
> to differ from filter code run in userland.  There's already precedent for 
> that on Linux, with the "cooked mode" headers; those are synthesized by 
> libpcap from the metadata returned for PF_PACKET sockets, and the code that 
> attempts to hand the kernel a filter goes through the filter code, which was 
> generated under the assumption that the packet begins with a "cooked mode" 
> header, and modifies (a copy of) the code to, instead, use the special 
> Linux-BPF-interpreter offsets to access the metadata.
>
> The right thing to do here would be to, if possible, do the same, so that the 
> kernel doesn't have to reinsert VLAN tags for packets that aren't going to be 
> handed to userland.

In this case, it would be incredibly complicated to do this just
postprocessing a set of bpf instructions.  The problem is that when
running the filter in the kernel, the IP header, etc. are not offset,
so "off_macpl" and "off_linktype" would be zero, not 4, while
generating the rest of the expression.  We would also have to insert
code when comparing the ethertype to 0x8100 to instead load the
vlan-tagged metadata, so all jumps crossing that point would have to
be adjusted, and if the "if-false" instruction was also testing the
ethertype, then the ethertype would have to be reloaded (again
inserting another instruction).

Basically, take a look at the output of "tcpdump -d tcp port 22 or
(vlan and tcp port 22)".  Are the IPv4 tcp ports at x+14/x+16, or at
x+18/x+20?  If we're filtering in the kernel, they're at x+14/x+16
whether the packet is vlan tagged or not.  If we're filtering on the
actual packet contents (from a savefile, for example), they're at
x+18/x+20 if the packet is vlan tagged.

Also, an expression such as 'tcp port 22' would have to have some
instructions added at the beginning, for "vlan-tagged == false", or it
would match both tagged and untagged packets.

This would be much more straightforward to deal with in the code
generation phase, except until now the code generation phase hasn't
known whether the filter is headed for the kernel or not.

  Bill
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Request new DLT value for Infiniband link

2012-11-11 Thread Guy Harris

On Nov 11, 2012, at 4:53 AM, Oren Kladnitsky  wrote:

> I'd like to request a new DLT/LINKTYPE value for Infiniband traffic 
> (DLT_INFINIBAND).
> Infiniband spec is available at:
> http://members.infinibandta.org/kwspub/spec/V1r1_2_1.Release_12062007.zip
> (registration required).

> See "Chapter 5.: Data packet format" for the packet layout. LRH (Local Route 
> Header) is the first header of all data packets.
> 
> Wireshark already has support for Infiniband dissection 
> (epan/dissectors/packet-infiniband.c) under encap of DLT_ERF.
> 
> I'd like Infiniband to have a dedicated DLT for future pcapng support which 
> does not require the ERF encapsulation.

OK, I've added LINKTYPE_INFINIBAND and DLT_INFINIBAND, with a value of 247.  
The list of link-layer header types at

http://www.tcpdump.org/linktypes.html

should update in 24 hours or so; the description is "Raw InfiniBand frames, 
starting with the Local Routing Header, as specified in Chapter 5 "Data packet 
format" of InfiniBand™ Architectural Specification Release 1.2.1 Volume 1 - 
General Specifications.", with "InfiniBand™ Architectural Specification Release 
1.2.1 Volume 1 - General Specifications" linking to the URL you gave.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] regarding usage of recv calls in mmap code

2012-11-11 Thread Guy Harris

On Nov 7, 2012, at 10:28 AM, abhinav narain  wrote:

>  I wanted to know why is MSG_PEEK used in the recv() call in mmap code
> and not recvfrom() with MSG_TRUNC flag.
> The reason i am asking is .. because I see my code takes a lot of CPU which 
> is due to more looping, I suppose.
> The flag description for MSG_PEEK shows it doesn't discard the bytes even 
> after reading from the queue.
> Can someone please explain.
> I would like to use recvfrom with MSG_TRUNC .. is that fine ?

The recv() is *not* reading a packet, it's reading an error code.  There 
shouldn't even *be* any skbuffs to read from the socket in the mmapped code 
path - they should be in the memory-mapped buffer.  That's why it's doing a 
recv() with MSG_PEEK.

The comment "A recv() will give us the actual error code." perhaps doesn't 
indicate that clearly enough, but that's what it's doing.  A poll() that 
includes the descriptor for the socket should set the POLLERR flag if there's 
an error condition on the descriptor, such as a "this network interface has 
gone down" indication.  You have to do a recv() from the socket to get the 
error code and clear the error indication so that a subsequent poll() that 
includes that descriptor won't set POLLERR for it.

If that code is being invoked a significant number of times, it means you have 
a problem - you're getting a lot of errors, not packets.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] PCAP file questions...

2012-11-11 Thread Guy Harris

On Nov 11, 2012, at 2:55 PM, barcaroller  wrote:

> The libpcap C API provides functions for writing (pcap_dump) and reading 
> (pcap_next) a PCAP file.  I have two questions:
> 
> - How do I remove a packet from a PCAP file using the libpcap C API?

You can't remove a packet from an existing file - pcap files are sequential 
files.

What you *can* do is read a file and write out all the packets, except the ones 
you don't want, to a new file.

> - Once I close a PCAP file (pcap_close), I find I cannot re-open it later 
> (pcap_dump_fopen) and append to it.  I get a corrupt file every time.  Are 
> PCAP files not meant to be appended to?

They could, in principle, be appended to, but that can't be done with the 
existing APIs - you'd need an "open for appending" call, which would, unlike 
the "create a new file" calls (pcap_dump_open(), pcap_dump_fopen()), *not* 
write a file header.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] mmap consumes more CPU

2012-11-11 Thread Guy Harris

On Nov 5, 2012, at 11:03 AM, abhinav narain  wrote:

>  I just checked the two mechanism :
> (1) Using mmap to fetch packets from kernel to userspace
> (2) Using recvfrom() call to fetch packets
> 
> I see top reports
> (1) 34% memory 20% cpu usage
> (2) 21% memory 7% cpu usage !
> 
> I wanted a performance improvement using mmap but I am slowing my small
> router for packet capture( I can't use pcap for that; I have modified
> skbuff ) and its worst than twice !

Have you tried running profiled versions of your code to see where the CPU 
usage is happening?

(By "can't use pcap for that" do you mean that you're not using libpcap, you're 
using your own code that either reads from the socket or memory-maps the 
socket?  If so, make sure you're doing the memory-mapped stuff the same way 
libpcap does - it's *not* trivial to get right, as you'll notice if you see the 
number of changes that have had to be made to that code.)
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] PCAP file questions...

2012-11-11 Thread Guy Harris

On Nov 11, 2012, at 5:44 PM, barcaroller  wrote:

> On 2012-11-11 23:27:00 +, Guy Harris said:
> 
>> They could, in principle, be appended to, but that can't be done with the 
>> existing APIs - you'd need an "open for appending" call, which would, unlike 
>> the "create a new file" calls (pcap_dump_open(), pcap_dump_fopen()), *not* 
>> write a file header.
> 
> The existing API does allow for:
> 
>   FILE* f = open("a");  // or open("a+")
>   pcap_dump_fopen(f);

pcap_dump_fopen(), in the current Git trunk, calls pcap_setup_dump(), which 
calls sf_write_header(), which writes out a file header, so that call will 
write a file header.  Some older versions have a different code path, but 
they'll still write out a file header.

A pcap file has *one* file header followed by a sequence of zero or more 
packets, each with a packet record header.  A file header is not a valid packet 
record header, so that wouldn't work for *any* number of packets.

As per my mail, what's needed is a routine that doesn't write the file header.

> It does work for a few hundred packets, but then evenually the file gets 
> corrupted.

That must be because, until you've written more packets, no write is done to 
the underlying file because the packets are still buffered in the standard I/O 
library routine buffers.  Once an actual write() is done, your file will be 
trashed.

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers