On Fri, Oct 9, 2020 at 6:07 PM Cong Wang <xiyou.wangc...@gmail.com> wrote:
>
> Looking a bit deeper, I doubt the ipgre_header_ops->create is necessary,
> because 1) all other tunnels devices do not have it (ip_tunnel_header_ops
> only contains ->parse_protocol); 2) GRE headers are pushed in xmit
> anyway, so at least SOCK_DGRAM does not need it; 3) ipgre_header()
> creates the GRE header, later ipgre_xmit() pulls it back, then __gre_xmit()
> builds GRE header again...

Haha, I also don't like ipgre_header_ops->create and want to get rid
of it. We are thinking the same :)

>From what I see, the flow when sending skbs (when header_ops is used)
is like this:

1) First ipgre_header creates the IP header and the GRE base header,
leaving space for the GRE optional fields and the "encap" header (the
space for the "encap" header should be left before the GRE header, but
it is incorrectly left after the GRE header).

2) Then ipgre_xmit pulls the header created by ipgre_header (but
leaves the data there). Then it calls __gre_xmit.

3) __gre_xmit calls gre_build_header. gre_build_header will push back
the GRE header, read the GRE base header and build the GRE optional
fields.

4) Then __gre_xmit calls ip_tunnel_xmit. ip_tunnel_xmit will build the
"encap" header by calling ip_tunnel_encap.

So what ipgre_header does is actually creating the IP header and the
GRE header, and leaving some holes for the GRE optional fields and the
"encap" header to be built later.

This seems so weird to me. If a user is using an AF_PACKET/RAW socket,
the user is supposed to do what the header_ops->create function does
(that is, creating two headers and leaving two holes to be filled in
later). I think no user would actually do that. That is so weird.

Also you said, all other tunnel devices didn't have
header_ops->create. I think that is a good argument for getting rid of
header_ops->create, too.

Also, for IP GRE TAP devices (as I understand, these devices are just
like IP GRE devices except it tunnels Ethernet frames instead of
network-layer packets), its header_ops will be Ethernet's
eth_header_ops (ipgre_tap_setup calls ether_setup, which sets up the
Ethernet header_ops). In this case, I think it is logical to keep the
header_ops for IP GRE devices as NULL. So that the header_ops of IP
GRE devices is consistent with that of IP GRE TAP devices.

Reply via email to