[tcpdump-workers] Request new DLT value for Infiniband link
Hi, I'd like to request a new DLT/LINKTYPE value for Infiniband traffic (DLT_INFINIBAND). Infiniband spec is available at: http://members.infinibandta.org/kwspub/spec/V1r1_2_1.Release_12062007.zip (registration required). See "Chapter 5.: Data packet format" for the packet layout. LRH (Local Route Header) is the first header of all data packets. Wireshark already has support for Infiniband dissection (epan/dissectors/packet-infiniband.c) under encap of DLT_ERF. I'd like Infiniband to have a dedicated DLT for future pcapng support which does not require the ERF encapsulation. Thanks, Oren Kladnitsky Staff engineer, Apps and Embedded group manager Tel: +972-74-7236370 Cell: +972-50-7349271 ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] mmap consumes more CPU
hi, I just checked the two mechanism : (1) Using mmap to fetch packets from kernel to userspace (2) Using recvfrom() call to fetch packets I see top reports (1) 34% memory 20% cpu usage (2) 21% memory 7% cpu usage ! I wanted a performance improvement using mmap but I am slowing my small router for packet capture( I can't use pcap for that; I have modified skbuff ) and its worst than twice ! Memory increase is fine. Can anyone suggest what going on ? Or how to improve -Abhinav Narain ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] regarding usage of recv calls in mmap code
hi I wanted to know why is MSG_PEEK used in the recv() call in mmap code and not recvfrom() with MSG_TRUNC flag. The reason i am asking is .. because I see my code takes a lot of CPU which is due to more looping, I suppose. The flag description for MSG_PEEK shows it doesn't discard the bytes even after reading from the queue. Can someone please explain. I would like to use recvfrom with MSG_TRUNC .. is that fine ? -Abhinav ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Wed, Oct 31, 2012 at 6:20 PM, Guy Harris wrote: > > On Oct 31, 2012, at 2:50 PM, Ani Sinha wrote: > >> pcap files that already have the tags reinsrted should work with >> current filter code. However for live traffic, one has to get the tags >> from CMSG() and then reinsert it back to the packet for the current >> filter to work. > > *Somebody* has to do that, at least to packets that pass the filter, before > they're handed to a libpcap-based application, for programs that expect to > see packets as they arrived from/were transmitted to the wire to work. > > I.e., the tags *should* be reinserted by libpcap, and, as I understand it, > that's what the > > #if defined(HAVE_PACKET_AUXDATA) && > defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) > ... > #endif > > blocks of code in pcap-linux.c in libpcap are doing. > > Now, if filtering is being done in the *kernel*, and the tags aren't being > reinserted by the kernel, then filter code stuffed into the kernel would need > to differ from filter code run in userland. There's already precedent for > that on Linux, with the "cooked mode" headers; those are synthesized by > libpcap from the metadata returned for PF_PACKET sockets, and the code that > attempts to hand the kernel a filter goes through the filter code, which was > generated under the assumption that the packet begins with a "cooked mode" > header, and modifies (a copy of) the code to, instead, use the special > Linux-BPF-interpreter offsets to access the metadata. > > The right thing to do here would be to, if possible, do the same, so that the > kernel doesn't have to reinsert VLAN tags for packets that aren't going to be > handed to userland. In this case, it would be incredibly complicated to do this just postprocessing a set of bpf instructions. The problem is that when running the filter in the kernel, the IP header, etc. are not offset, so "off_macpl" and "off_linktype" would be zero, not 4, while generating the rest of the expression. We would also have to insert code when comparing the ethertype to 0x8100 to instead load the vlan-tagged metadata, so all jumps crossing that point would have to be adjusted, and if the "if-false" instruction was also testing the ethertype, then the ethertype would have to be reloaded (again inserting another instruction). Basically, take a look at the output of "tcpdump -d tcp port 22 or (vlan and tcp port 22)". Are the IPv4 tcp ports at x+14/x+16, or at x+18/x+20? If we're filtering in the kernel, they're at x+14/x+16 whether the packet is vlan tagged or not. If we're filtering on the actual packet contents (from a savefile, for example), they're at x+18/x+20 if the packet is vlan tagged. Also, an expression such as 'tcp port 22' would have to have some instructions added at the beginning, for "vlan-tagged == false", or it would match both tagged and untagged packets. This would be much more straightforward to deal with in the code generation phase, except until now the code generation phase hasn't known whether the filter is headed for the kernel or not. Bill ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Request new DLT value for Infiniband link
On Nov 11, 2012, at 4:53 AM, Oren Kladnitsky wrote: > I'd like to request a new DLT/LINKTYPE value for Infiniband traffic > (DLT_INFINIBAND). > Infiniband spec is available at: > http://members.infinibandta.org/kwspub/spec/V1r1_2_1.Release_12062007.zip > (registration required). > See "Chapter 5.: Data packet format" for the packet layout. LRH (Local Route > Header) is the first header of all data packets. > > Wireshark already has support for Infiniband dissection > (epan/dissectors/packet-infiniband.c) under encap of DLT_ERF. > > I'd like Infiniband to have a dedicated DLT for future pcapng support which > does not require the ERF encapsulation. OK, I've added LINKTYPE_INFINIBAND and DLT_INFINIBAND, with a value of 247. The list of link-layer header types at http://www.tcpdump.org/linktypes.html should update in 24 hours or so; the description is "Raw InfiniBand frames, starting with the Local Routing Header, as specified in Chapter 5 "Data packet format" of InfiniBand™ Architectural Specification Release 1.2.1 Volume 1 - General Specifications.", with "InfiniBand™ Architectural Specification Release 1.2.1 Volume 1 - General Specifications" linking to the URL you gave. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] regarding usage of recv calls in mmap code
On Nov 7, 2012, at 10:28 AM, abhinav narain wrote: > I wanted to know why is MSG_PEEK used in the recv() call in mmap code > and not recvfrom() with MSG_TRUNC flag. > The reason i am asking is .. because I see my code takes a lot of CPU which > is due to more looping, I suppose. > The flag description for MSG_PEEK shows it doesn't discard the bytes even > after reading from the queue. > Can someone please explain. > I would like to use recvfrom with MSG_TRUNC .. is that fine ? The recv() is *not* reading a packet, it's reading an error code. There shouldn't even *be* any skbuffs to read from the socket in the mmapped code path - they should be in the memory-mapped buffer. That's why it's doing a recv() with MSG_PEEK. The comment "A recv() will give us the actual error code." perhaps doesn't indicate that clearly enough, but that's what it's doing. A poll() that includes the descriptor for the socket should set the POLLERR flag if there's an error condition on the descriptor, such as a "this network interface has gone down" indication. You have to do a recv() from the socket to get the error code and clear the error indication so that a subsequent poll() that includes that descriptor won't set POLLERR for it. If that code is being invoked a significant number of times, it means you have a problem - you're getting a lot of errors, not packets. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] PCAP file questions...
On Nov 11, 2012, at 2:55 PM, barcaroller wrote: > The libpcap C API provides functions for writing (pcap_dump) and reading > (pcap_next) a PCAP file. I have two questions: > > - How do I remove a packet from a PCAP file using the libpcap C API? You can't remove a packet from an existing file - pcap files are sequential files. What you *can* do is read a file and write out all the packets, except the ones you don't want, to a new file. > - Once I close a PCAP file (pcap_close), I find I cannot re-open it later > (pcap_dump_fopen) and append to it. I get a corrupt file every time. Are > PCAP files not meant to be appended to? They could, in principle, be appended to, but that can't be done with the existing APIs - you'd need an "open for appending" call, which would, unlike the "create a new file" calls (pcap_dump_open(), pcap_dump_fopen()), *not* write a file header. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] mmap consumes more CPU
On Nov 5, 2012, at 11:03 AM, abhinav narain wrote: > I just checked the two mechanism : > (1) Using mmap to fetch packets from kernel to userspace > (2) Using recvfrom() call to fetch packets > > I see top reports > (1) 34% memory 20% cpu usage > (2) 21% memory 7% cpu usage ! > > I wanted a performance improvement using mmap but I am slowing my small > router for packet capture( I can't use pcap for that; I have modified > skbuff ) and its worst than twice ! Have you tried running profiled versions of your code to see where the CPU usage is happening? (By "can't use pcap for that" do you mean that you're not using libpcap, you're using your own code that either reads from the socket or memory-maps the socket? If so, make sure you're doing the memory-mapped stuff the same way libpcap does - it's *not* trivial to get right, as you'll notice if you see the number of changes that have had to be made to that code.) ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] PCAP file questions...
On Nov 11, 2012, at 5:44 PM, barcaroller wrote: > On 2012-11-11 23:27:00 +, Guy Harris said: > >> They could, in principle, be appended to, but that can't be done with the >> existing APIs - you'd need an "open for appending" call, which would, unlike >> the "create a new file" calls (pcap_dump_open(), pcap_dump_fopen()), *not* >> write a file header. > > The existing API does allow for: > > FILE* f = open("a"); // or open("a+") > pcap_dump_fopen(f); pcap_dump_fopen(), in the current Git trunk, calls pcap_setup_dump(), which calls sf_write_header(), which writes out a file header, so that call will write a file header. Some older versions have a different code path, but they'll still write out a file header. A pcap file has *one* file header followed by a sequence of zero or more packets, each with a packet record header. A file header is not a valid packet record header, so that wouldn't work for *any* number of packets. As per my mail, what's needed is a routine that doesn't write the file header. > It does work for a few hundred packets, but then evenually the file gets > corrupted. That must be because, until you've written more packets, no write is done to the underlying file because the packets are still buffered in the standard I/O library routine buffers. Once an actual write() is done, your file will be trashed. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers