On Fri, Oct 9, 2020 at 6:07 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > Looking a bit deeper, I doubt the ipgre_header_ops->create is necessary, > because 1) all other tunnels devices do not have it (ip_tunnel_header_ops > only contains ->parse_protocol); 2) GRE headers are pushed in xmit > anyway, so at least SOCK_DGRAM does not need it; 3) ipgre_header() > creates the GRE header, later ipgre_xmit() pulls it back, then __gre_xmit() > builds GRE header again...
Haha, I also don't like ipgre_header_ops->create and want to get rid of it. We are thinking the same :) >From what I see, the flow when sending skbs (when header_ops is used) is like this: 1) First ipgre_header creates the IP header and the GRE base header, leaving space for the GRE optional fields and the "encap" header (the space for the "encap" header should be left before the GRE header, but it is incorrectly left after the GRE header). 2) Then ipgre_xmit pulls the header created by ipgre_header (but leaves the data there). Then it calls __gre_xmit. 3) __gre_xmit calls gre_build_header. gre_build_header will push back the GRE header, read the GRE base header and build the GRE optional fields. 4) Then __gre_xmit calls ip_tunnel_xmit. ip_tunnel_xmit will build the "encap" header by calling ip_tunnel_encap. So what ipgre_header does is actually creating the IP header and the GRE header, and leaving some holes for the GRE optional fields and the "encap" header to be built later. This seems so weird to me. If a user is using an AF_PACKET/RAW socket, the user is supposed to do what the header_ops->create function does (that is, creating two headers and leaving two holes to be filled in later). I think no user would actually do that. That is so weird. Also you said, all other tunnel devices didn't have header_ops->create. I think that is a good argument for getting rid of header_ops->create, too. Also, for IP GRE TAP devices (as I understand, these devices are just like IP GRE devices except it tunnels Ethernet frames instead of network-layer packets), its header_ops will be Ethernet's eth_header_ops (ipgre_tap_setup calls ether_setup, which sets up the Ethernet header_ops). In this case, I think it is logical to keep the header_ops for IP GRE devices as NULL. So that the header_ops of IP GRE devices is consistent with that of IP GRE TAP devices.