Re: [RFC] networking structure holes

Eric Dumazet Wed, 01 Nov 2006 22:56:15 -0800

Arnaldo Carvalho de Melo a écrit :

On 10/31/06, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Arnaldo Carvalho de Melo a écrit :
> Hi,
>
>       I've been working on some DWARF2 utilities and one of them,
> pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> in structs, the full output of:
>
> [EMAIL PROTECTED] net-2.6]$ pahole net/ipv4/tcp.o
>
>       is available at:
>
> http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
>
>       Just to show what we can find with this tool here is the layout
> of struct net_device, that barring any cacheline locality optimization
> has 4 bytes to harvest, David, do you think reordering those fields to
> get 4 byts back is ok?

I just want to bring your attention this net_device structure wasre-ordered

(by me :)) so that separate cache lines are used on SMP machines.

If you select CONFIG_SMP , you'll probably notice far more holes. Butit was a

feature, not lazyness.


Thanks for commenting on this case!

We can probably move some fields, but very carefully :)


Of course, in time I probably will try to combine valgrind's
cachegrind or some new tool using the same principles I used in OSTRA
to find out working sets of struct members to do automatic
"suggestions" on how to reorder structs to avoid holes while keeping
the relevant struct members close together as to exploit cacheline
locality effects, like you do so well manually :-)

- Arnaldo

PS.: While we don't have tools to check out that the holes are not a
problem because we want to exploit cacheline locality effects... what
about some comments on the structs to explain that such holes are not
a problem? :-)


I am all for automatic tools, if they can convince human beings :)

For example, I am using an optimization that is quite simple but which was notaccepted by netdev community :

- Moving the struct flowi directly into "struct dst_entry", right after the'struct dst_entry *next;' pointer.

AFAIK all objects that include a 'struct dst_entry' also include a 'structflowi', so this is just a small violation of layering.

This really helps because lookups now touch only one cache line per chaineditem instead of two/three. On loaded routers with 8 items per chain, thats 8or 16 cache lines CPU dont have to bring in its cache per IP packet.


Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] networking structure holes

Reply via email to