Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- On Oct 21, 2019, at 5:59 PM, Mario Rugiero via tcpdump-workers wrote: > I think it's time to summarize, and to propose one last idea. > I'm following the thread again to try and be as accurate as possible, > but of course any objections are welcomed. > > - The oldest officially supported kernel is 3.16, as this is the > oldest LTS according to kernel.org. > - Users of the library must be properly informed when their > environment is unsupported, as well as the last version supporting it. > This should be done both at compile-time and at run-time. > - SOCK_PACKET goes away. This is already done in master. > - TPACKET_V1 goes away. This includes the hack to handle 32-bit userland running on top of a 64-bit kernel; TPACKET_V2 eliminated that problem by making the flags field a 32-bit integer, even on 64-bit architectures, in the data structures shared between the kernel and userland. I.e., we also remove the internal "TPACKET_V1_64" support. > - TPACKET_V2 stays for immediate-mode support. > - As a side-effect, RHEL6 remains supported. So RHEL6's kernel is pre-3.16 and thus doesn't support TPACKET_V3? > - The idea of exploring using non-memory-mapped sockets for this was > proposed, and it would be interesting to follow-up. > For this, I was supposed to check whether that makes a difference > regarding how the kernel implements it. > - The workaround for TPACKET_V3's bug stays, as the fix was only > introduced in 3.19. > - We should explore reaching a solution to immediate-mode that doesn't > require TPACKET_V2. > - It has to be noted, tho, that any changes to allow that aren't > unlikely to be back-ported to older kernels, so we'd still need > TPACKET_V2 for the time being. It'd be a bet for the future. So you're talking about a TPACKET_V5, or changes to TPACKET_V3 or TPACKET_V4, to support immediate mode in memory-mapped capture, as opposed to using non-memory-mapped sockets? > - Just to acknowledge it, it was proposed to research on whether > support for AF_XDP makes sense. I think that belongs to its own > discussion, tho. Yes, that's a different mechanism from AF_PACKET. Does it allow receiving copies of packets that are also handed either to the kernel networking stack or to other AF_XDP sockets for regular input processing? That would be needed to allow it to be used for packet sniffing. > Now, the idea goes along with the last item. > I was thinking of proposing a new option for TPACKET_V3 sockets to set > a deadline. > I haven't completely decided on the details, but basically it would > behave somewhat like this: > - The deadline can take three types of value: > - (-1): no deadline, wait until a block is full before marking as > ready for user space. This would be the default, so no existing > programs change their behavior. That sounds like the behavior with a timeout set to 0 (with PF_PACKET sockets, BPF devices, and Solaris DLPI) > - (0): expose packets as soon as they arrive. This would act more or > less as previous AF_PACKET versions work. That sounds like immediate mode. > - A positive integer: this would be how long to wait If you mean "deliver a block if it's full or if the timeout expires", that sounds like the behavior with a non-zero timeout. So how does this differ from the regular timeout mechanism? > (I haven't decided on the unit, I'm guessing microseconds should work). The units are milliseconds in pcap_set_timeout() and with BPF devices. >I'm not sure regarding what the deadline is set, but I'm thinking > since the first packet in the block arrived. At least for BPF devices, it's "since a read or select was done", rather than "since the first packet in the block arrived"; I think it might be "since the first packet in the block arrived" with Solaris DLPI. > Any ideas on this? > How should we keep the discussion in sync between the two lists? > Should I CC the participants on this list on the RFC on the kernel list? "The two lists" being this list and some Linux list? Is the Linux list linux-netdev? --- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- El mar., 22 oct. 2019 a las 15:08, Guy Harris () escribió: > > On Oct 21, 2019, at 5:59 PM, Mario Rugiero via tcpdump-workers > wrote: > > - TPACKET_V2 stays for immediate-mode support. > > - As a side-effect, RHEL6 remains supported. > > So RHEL6's kernel is pre-3.16 and thus doesn't support TPACKET_V3? > It's 2.6 series (2.6.32?), and TPACKET_V3 was introduced in 3.2. > > - We should explore reaching a solution to immediate-mode that doesn't > > require TPACKET_V2. > > - It has to be noted, tho, that any changes to allow that aren't > > unlikely to be back-ported to older kernels, so we'd still need > > TPACKET_V2 for the time being. It'd be a bet for the future. > > So you're talking about a TPACKET_V5, or changes to TPACKET_V3 or TPACKET_V4, > to support immediate mode in memory-mapped capture, as opposed to using > non-memory-mapped sockets? > I was thinking more of a backwards compatible extension to TPACKET_V3 in principle, but that's open for discussion. My thoughts on this are at the end of my summary. > > - Just to acknowledge it, it was proposed to research on whether > > support for AF_XDP makes sense. I think that belongs to its own > > discussion, tho. > > Yes, that's a different mechanism from AF_PACKET. > > Does it allow receiving copies of packets that are also handed either to the > kernel networking stack or to other AF_XDP sockets for regular input > processing? That would be needed to allow it to be used for packet sniffing. > I haven't looked into it yet, but I'll keep that in mind when I do. > > Now, the idea goes along with the last item. > > I was thinking of proposing a new option for TPACKET_V3 sockets to set > > a deadline. > > I haven't completely decided on the details, but basically it would > > behave somewhat like this: > > - The deadline can take three types of value: > > - (-1): no deadline, wait until a block is full before marking as > > ready for user space. This would be the default, so no existing > > programs change their behavior. > > That sounds like the behavior with a timeout set to 0 (with PF_PACKET > sockets, BPF devices, and Solaris DLPI) > > > - (0): expose packets as soon as they arrive. This would act more or > > less as previous AF_PACKET versions work. > > That sounds like immediate mode. > > > - A positive integer: this would be how long to wait > > If you mean "deliver a block if it's full or if the timeout expires", that > sounds like the behavior with a non-zero timeout. > > So how does this differ from the regular timeout mechanism? > This mechanism would be for the AF_PACKET driver in the Linux kernel, not for libpcap. libpcap would only either set a small non-zero deadline on TPACKET_Vx (x >= 3) or 0 for immediate mode, and just use the default behavior for non-immediate mode. The similarity with what libpcap does is not a coincidence. > > (I haven't decided on the unit, I'm guessing microseconds should work). > > The units are milliseconds in pcap_set_timeout() and with BPF devices. > Yes, but for spacing between packets milliseconds may be to coarse. > >I'm not sure regarding what the deadline is set, but I'm thinking > > since the first packet in the block arrived. > > At least for BPF devices, it's "since a read or select was done", rather than > "since the first packet in the block arrived"; I think it might be "since the > first packet in the block arrived" with Solaris DLPI. > I should read that for inspiration, then. > > Any ideas on this? > > How should we keep the discussion in sync between the two lists? > > Should I CC the participants on this list on the RFC on the kernel list? > > "The two lists" being this list and some Linux list? > > Is the Linux list linux-netdev? Yes. Addendum: I missed one, replacing some device detection boilerplate. Initially. `if_nameindex` was proposed, but there's already the `getifaddr` based implementation that should detect all Linux interfaces usable by pcap by 2.3.x due to the fact that it counts AF_PACKET addresses, so we should be able to just remove the '/proc/net' and '/sys/class/net' crawling when we start expecting 2.4. --- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- On Oct 22, 2019, at 11:38 AM, Mario Rugiero wrote: > El mar., 22 oct. 2019 a las 15:08, Guy Harris () escribió: > >> On Oct 21, 2019, at 5:59 PM, Mario Rugiero via tcpdump-workers >> wrote: >>> - TPACKET_V2 stays for immediate-mode support. >>> - As a side-effect, RHEL6 remains supported. >> >> So RHEL6's kernel is pre-3.16 and thus doesn't support TPACKET_V3? > > It's 2.6 series (2.6.32?), and TPACKET_V3 was introduced in 3.2. I.e., the goal for libpcap support on Linux should be something such as it should work on min({kernel for oldest supported enterprise distribution}, {oldest "longterm maintenance" kernel release from kernel.org}) >>> Now, the idea goes along with the last item. >>> I was thinking of proposing a new option for TPACKET_V3 sockets to set >>> a deadline. ... >> So how does this differ from the regular timeout mechanism? > > This mechanism would be for the AF_PACKET driver in the Linux kernel, > not for libpcap. > libpcap would only either set a small non-zero deadline on TPACKET_Vx > (x >= 3) or 0 for immediate mode, and just use the default behavior > for non-immediate mode. > The similarity with what libpcap does is not a coincidence. OK, so TPACKET_V3 currently supports something similar to what BPF devices support, i.e. "deliver a block if it's full or if the timeout expires". The timeout is in the tp_retire_blk_tov field of a tpacket_req3 structure, as handed to a SOL_PACKET/PACKET_RX_RING setsockopt() call. It's in units of milliseconds; it doesn't refer to inter-packet spacing, but to the age of the block. Currently it doesn't deliver empty blocks; libpcap can handle either "delivers empty blocks" (as that's what BPF devices do) or "doesn't deliver empty blocks" (as that's what TPACKET_V3 currently does). The main difference is whether the timeout times out even if no packets are available; I guess code that wants to be woken up periodically, perhaps to do other work, even if there's no traffic that passes the filter would prefer "time out even if no packets are available". >> Is the Linux list linux-netdev? > > Yes. OK, I guess I'll have to go back to reading that list. (It's a very heavy traffic list, and 99.999% of it isn't relevant to packet capture - all that matters to me is 1) PF_PACKET stuff and 2) stuff involving device modes such as some ethtool features and monitor-mode/radiotap support - so I just look at it on occasion.) > Addendum: I missed one, replacing some device detection boilerplate. > Initially. `if_nameindex` was proposed, but there's already the > `getifaddr` based implementation that should detect all Linux > interfaces usable by pcap by 2.3.x due to the fact that it counts > AF_PACKET addresses, so we should be able to just remove the > '/proc/net' and '/sys/class/net' crawling when we start expecting 2.4. I.e., getifaddr() will find interfaces with no networking-layer addresses (no IPv4/IPv6/etc.) on 2.4 and later kernels? --- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- El mar., 22 oct. 2019 a las 16:02, Guy Harris () escribió: > I.e., the goal for libpcap support on Linux should be something such as > > it should work on min({kernel for oldest supported enterprise > distribution}, {oldest "longterm maintenance" kernel release from kernel.org}) > I'm more inclined to oldest longterm from kernel.org only, but I guess so. > OK, so TPACKET_V3 currently supports something similar to what BPF devices > support, i.e. "deliver a block if it's full or if the timeout expires". The > timeout is in the tp_retire_blk_tov field of a tpacket_req3 structure, as > handed to a SOL_PACKET/PACKET_RX_RING setsockopt() call. It's in units of > milliseconds; it doesn't refer to inter-packet spacing, but to the age of the > block. > > Currently it doesn't deliver empty blocks; libpcap can handle either > "delivers empty blocks" (as that's what BPF devices do) or "doesn't deliver > empty blocks" (as that's what TPACKET_V3 currently does). > > The main difference is whether the timeout times out even if no packets are > available; I guess code that wants to be woken up periodically, perhaps to do > other work, even if there's no traffic that passes the filter would prefer > "time out even if no packets are available". > I see. We would want a way to signal we want time outs regardless of blocks being empty, then, right? > OK, I guess I'll have to go back to reading that list. (It's a very heavy > traffic list, and 99.999% of it isn't relevant to packet capture - > all that matters to me is 1) PF_PACKET stuff and 2) stuff involving device > modes such as some ethtool features and monitor-mode/radiotap support - so I > just look at it on occasion.) > Wouldn't CC'ing you keep you on the loop already? > I.e., getifaddr() will find interfaces with no networking-layer addresses (no > IPv4/IPv6/etc.) on 2.4 and later kernels? > Exactly. There's even a code sample showing this in the Linux manual. --- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- On Oct 22, 2019, at 1:22 PM, Mario Rugiero wrote: > El mar., 22 oct. 2019 a las 16:02, Guy Harris () escribió: >> I.e., the goal for libpcap support on Linux should be something such as >> >>it should work on min({kernel for oldest supported enterprise >> distribution}, {oldest "longterm maintenance" kernel release from >> kernel.org}) >> > I'm more inclined to oldest longterm from kernel.org only, but I guess so. If RHEL 6 matters, oldest longterm from kernel.org only doesn't work, because RHEL 6 runs 2.6.32, according to https://access.redhat.com/articles/3078 so if we're going to support only the oldest longterm maintenance kernel from kernel.org, we're not going to support RHEL 6 unless TPACKET_V3 has been back ported to the RHEL 6 kernel. If it's not backported, *and* we continue to use TPACKET_V2 for immediate mode, then RHEL 6 happens to still be supported to that extent. However, if we require any *other* mechanisms that aren't present in the RHEL 6 kernel, that means no RHEL 6 support. So I wouldn't claim RHEL 6 support solely on the basis of continued TPACKET_V2 support - don't rely on the side effect. if we're going to support > >> OK, so TPACKET_V3 currently supports something similar to what BPF devices >> support, i.e. "deliver a block if it's full or if the timeout expires". The >> timeout is in the tp_retire_blk_tov field of a tpacket_req3 structure, as >> handed to a SOL_PACKET/PACKET_RX_RING setsockopt() call. It's in units of >> milliseconds; it doesn't refer to inter-packet spacing, but to the age of >> the block. >> >> Currently it doesn't deliver empty blocks; libpcap can handle either >> "delivers empty blocks" (as that's what BPF devices do) or "doesn't deliver >> empty blocks" (as that's what TPACKET_V3 currently does). >> >> The main difference is whether the timeout times out even if no packets are >> available; I guess code that wants to be woken up periodically, perhaps to >> do other work, even if there's no traffic that passes the filter would >> prefer "time out even if no packets are available". >> > I see. We would want a way to signal we want time outs regardless of > blocks being empty, then, right? Either that, or just change TPACKET_V3 to do that. Originally, TPACKET_V3 delivered wakeups in a bogus fashion: https://www.spinics.net/lists/netdev/msg290837.html (that's the problem we're working around). The original developer of TPACKET_V3 claimed that empty blocks have to be delivered: https://www.spinics.net/lists/netdev/msg291734.html but didn't indicate why, so I tried to infer from the patch: https://www.spinics.net/lists/netdev/msg291734.html I have no record of a response (and, for whatever reason, his original message didn't show up in the netdev archives). The bogus wakeups were fixed by a later patch: https://www.spinics.net/lists/netdev/msg315231.html That also eliminated delivery of empty blocks to the user. A response said "This change would break existing applications that have come to depend on the periodic signal.": https://www.spinics.net/lists/netdev/msg315418.html to which I responded: https://www.spinics.net/lists/netdev/msg315425.html The author of the patch then responded: https://www.spinics.net/lists/netdev/msg315510.html https://www.spinics.net/lists/netdev/msg315528.html The code currently has the patch, and doesn't deliver empty blocks. >> OK, I guess I'll have to go back to reading that list. (It's a very heavy >> traffic list, and 99.999% of it isn't relevant to packet capture - >> all that matters to me is 1) PF_PACKET stuff and 2) stuff involving device >> modes such as some ethtool features and monitor-mode/radiotap support - so I >> just look at it on occasion.) >> > Wouldn't CC'ing you keep you on the loop already? It might, as long as everybody keeps me on the CC list. It doesn't look as if you've sent anything to netdev yet about tpacket changes.--- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- El mar., 22 oct. 2019 a las 18:01, Guy Harris () escribió: > > If RHEL 6 matters, oldest longterm from kernel.org only doesn't work, because > RHEL 6 runs 2.6.32, according to > > https://access.redhat.com/articles/3078 > > so if we're going to support only the oldest longterm maintenance kernel from > kernel.org, we're not going to support RHEL 6 unless TPACKET_V3 has been back > ported to the RHEL 6 kernel. > > If it's not backported, *and* we continue to use TPACKET_V2 for immediate > mode, then RHEL 6 happens to still be supported to that extent. > > However, if we require any *other* mechanisms that aren't present in the RHEL > 6 kernel, that means no RHEL 6 support. > > So I wouldn't claim RHEL 6 support solely on the basis of continued > TPACKET_V2 support - don't rely on the side effect. > Exactly. I'm against supporting it if it requires extra work. I don't think libpcap 1.10 is an absolute need in a scenario where you have to deal with RHEL 6, except possibly for security fixes, but those will have to be backported by Red Hat anyway. > Either that, or just change TPACKET_V3 to do that. > Yes, that's what I was proposing. > Originally, TPACKET_V3 delivered wakeups in a bogus fashion: > > ... > > The code currently has the patch, and doesn't deliver empty blocks. > I'll read these carefully later, but my take on it is that TPACKET_V3 used to support our use case, so in principle a patch to restore it could be accepted. I find it unclear whether it is the ability of posting of empty blocks that would break use cases or its absence from the previous paragraph, but I guess I'll know after reading the mails. > It doesn't look as if you've sent anything to netdev yet about tpacket > changes. > I haven't, I wanted to discuss this here first. --- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] Legacy Linux kernel support
--- Begin Message --- On Oct 22, 2019, at 2:24 PM, Mario Rugiero wrote: > El mar., 22 oct. 2019 a las 18:01, Guy Harris () escribió: >> >> If RHEL 6 matters, oldest longterm from kernel.org only doesn't work, >> because RHEL 6 runs 2.6.32, according to >> >>https://access.redhat.com/articles/3078 >> >> so if we're going to support only the oldest longterm maintenance kernel >> from kernel.org, we're not going to support RHEL 6 unless TPACKET_V3 has >> been back ported to the RHEL 6 kernel. >> >> If it's not backported, *and* we continue to use TPACKET_V2 for immediate >> mode, then RHEL 6 happens to still be supported to that extent. >> >> However, if we require any *other* mechanisms that aren't present in the >> RHEL 6 kernel, that means no RHEL 6 support. >> >> So I wouldn't claim RHEL 6 support solely on the basis of continued >> TPACKET_V2 support - don't rely on the side effect. >> > Exactly. I'm against supporting it if it requires extra work. I don't > think libpcap 1.10 is an absolute need in a scenario where you have to > deal with RHEL 6, except possibly for security fixes, but those will > have to be backported by Red Hat anyway. So we'll say "oldest longterm maintenance kernel from kernel.org", and if it also happens to work on your enterprise Linux with a pre-3.16, consider yourself lucky; we won't make any effort to support RHEL 6 or other enterprise distribution releases with pre-3.16 kernels. >> Either that, or just change TPACKET_V3 to do that. >> > Yes, that's what I was proposing. The proposal was "We would want a way to signal we want time outs regardless of blocks being empty, then, right?"; I was suggesting just delivering empty blocks no matter what, if there's code that depends on it. Libpcap itself can work either way, and most libpcap applications used on both Linux and a platform with BPF devices can work either way, as they don't get timeouts with an empty buffer on Linux and they do get them with BPF, so I'm not sure there's a strong need to have TPACKET_V3 support both with an option to specify which one. >> Originally, TPACKET_V3 delivered wakeups in a bogus fashion: >> >> ... >> >> The code currently has the patch, and doesn't deliver empty blocks. >> > I'll read these carefully later, but my take on it is that TPACKET_V3 > used to support our use case, so in principle a patch to restore it > could be accepted. libpcap's use case doesn't require delivery of empty blocks; we make no promise that pcap_dispatch() or pcap_next() or pcap_next_ex() will return within the specified timeout interval. I don't think Solaris DLPI with bufmod delivers empty packets, for example. There may still be some programs that expect pcap_dispatch() or pcap_next() or pcap_next_ex() to return after the timeout expires, but since that doesn't happen on Linux with TPACKET_V3, most programs that used to do so have probably been changed not to do so. There may be programs directly using PF_PACKET sockets with TPACKET_V3 that expected empty blocks to be delivered, but given that 3.19 was released on 2015-02-08, and contained the patch that caused empty blocks not to be delivered, I suspect most programs that used to do no longer do so. > I find it unclear whether it is the ability of posting of empty blocks > that would break use cases or its absence from the previous paragraph, > but I guess I'll know after reading the mails. It's the *absence* of empty block delivery that was cited as potentially breaking code.--- End Message --- ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers