On Tue, Sep 8, 2020 at 2:07 AM Xie He <xie.he.0...@gmail.com> wrote: > > Thank you for your comment! > > On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0...@gmail.com> wrote: > > > > > > This comment is outdated and no longer reflects the actual implementation > > > of af_packet.c. > > > > If it was previously true, can you point to a commit that changes the > > behavior? > > This is my understanding about the history of "af_packet.c": > > 1. Pre git history > > At first, before "needed_headroom" was introduced, "hard_header_len" > was the only way for a driver to request headroom. However, > "hard_header_len" was also used in "af_packet.c" for processing the > header. There was a confusion / disagreement between "af_packet.c" > developers and driver developers about the use of "hard_header_len". > "af_packet.c" developers would assume that all headers were visible to > them through dev->header_ops (called dev->hard_header at that time?). > But the developers of some drivers were not able to expose all their > headers to "af_packet.c" through header_ops (for example, in tunnel > drivers). These drivers still requested the headroom via > "hard_header_len" but this created bugs for "af_packet.c" because > "af_packet.c" would assume "hard_header_len" was the length of the > header visible to them through header_ops. > > Therefore, in Linux version 2.1.43pre1, the FIXME comment was added. > In this comment, "af_packet.c" developers clearly stated that not > exposing the header through header_ops was a bug that needed to be > fixed in the drivers. But I think driver developers were not able to > agree because some drivers really had a need to add their own header > without using header_ops (for example in tunnel drivers). > > In Linux version 2.1.68, the developer of "af_packet.c" compromised > and recognized the use of "hard_header_len" even when there is no > header_ops, by adding the comment I'm trying to change now. But I > guess some other developers of "af_packet.c" continued to treat > "hard_header_len" to be the length of header of header_ops and created > a lot of problems. > > 2. Introduction of "needed_headroom" > > Because this issue has troubled for developers for long, in 2008, > developers introduced "needed_headroom" to solve this problem. > "needed_headroom" has only one purpose - reserve headroom. It is not > used in af_packet.c for processing so drivers can safely use it to > request headroom without exposing the header via header_ops. > > The commit was: > commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") > > After "needed_headroom" was introduced, all drivers that needed to > reserve the headroom but didn't want "af_packet.c" to interfere should > change to "needed_headroom". > > From this point on, "af_packet.c" developers were able to assume > "hard_header_len" was only used for header processing purposes in > "af_packet.c".
Very nice archeology! Thanks for summarizing. > 3. Not reserving the headroom of hard_header_len for RAW sockets > > Another very important point in history is these two commits in 2018: > commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link > layer allocation") > commit 9aad13b087ab ("packet: fix reserve calculation") > > These two commits changed packet_snd to the present state and made it > no long reserve the headroom of hard_header_len for RAW sockets. This > made drivers' switching from hard_header_len to needed_headroom became > urgent because otherwise they might have a kernel panic when used with > RAW sockets. > > > > In this file, the function packet_snd first reserves a headroom of > > > length (dev->hard_header_len + dev->needed_headroom). > > > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, > > > which calls dev->header_ops->create, to create the link layer header. > > > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of > > > length (dev->hard_header_len), and checks if the user has provided a > > > header of length (dev->hard_header_len) (in dev_validate_header). > > > > Not entirely, a header greater than dev->min_header_len that passes > > dev_validate_header. > > Yes, I understand. The function checks both hard_header_len and > min_header_len. I want to explain the role of hard_header_len in > dev_validate_header. But I feel a little hard to concisely explain > this without simplifying a little bit. Ack. > > > /* > > > Assumptions: > > > - - if device has no dev->hard_header routine, it adds and removes ll > > > header > > > - inside itself. In this case ll header is invisible outside of > > > device, > > > - but higher levels still should reserve dev->hard_header_len. > > > - Some devices are enough clever to reallocate skb, when header > > > - will not fit to reserved space (tunnel), another ones are silly > > > - (PPP). > > > + - If the device has no dev->header_ops, there is no LL header visible > > > + outside of the device. In this case, its hard_header_len should be > > > 0. > > > > Such a constraint is more robustly captured with a compile time > > BUILD_BUG_ON check. Please do add a comment that summarizes why the > > invariant holds. > > I'm not sure how to do this. I guess both header_ops and > hard_header_len are assigned at runtime. (Right?) I guess we are not > able to check this at compile-time. header_ops should be compile constant, and most devices use struct initializers for hard_header_len, but of course you're right. Perhaps a WARN_ON_ONCE, then. > > More about the older comment, but if reusing: it's not entirely clear > > to me what "outside of the device" means. The upper layers that > > receive data from the device and send data to it, including > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can > > be more specific. > > Yes, right. If a header is visible "outside of the device", it means > the header is exposed to upper layers via "header_ops". If a header is > not visible "outside of the device" and is only used "internally", it > means the header is not exposed to upper layers via "header_ops". > Maybe we can change it to "outside of the device driver"? We can > borrow the idea of encapsulation in object-oriented programming - some > things that happen inside a software component should not be visible > outside of that software component. How about "above"? If sketched as a network stack diagram, the code paths and devices below the (possibly tunnel) device do see packets with link layer header.