Arnaldo Carvalho de Melo a écrit :
On 10/31/06, Eric Dumazet <[EMAIL PROTECTED]> wrote:
Arnaldo Carvalho de Melo a écrit :
> Hi,
>
> I've been working on some DWARF2 utilities and one of them,
> pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> in structs, the full output of:
>
> [EMAIL PROTECTED] net-2.6]$ pahole net/ipv4/tcp.o
>
> is available at:
>
> http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
>
> Just to show what we can find with this tool here is the layout
> of struct net_device, that barring any cacheline locality optimization
> has 4 bytes to harvest, David, do you think reordering those fields to
> get 4 byts back is ok?
I just want to bring your attention this net_device structure was
re-ordered
(by me :)) so that separate cache lines are used on SMP machines.
If you select CONFIG_SMP , you'll probably notice far more holes. But
it was a
feature, not lazyness.
Thanks for commenting on this case!
We can probably move some fields, but very carefully :)
Of course, in time I probably will try to combine valgrind's
cachegrind or some new tool using the same principles I used in OSTRA
to find out working sets of struct members to do automatic
"suggestions" on how to reorder structs to avoid holes while keeping
the relevant struct members close together as to exploit cacheline
locality effects, like you do so well manually :-)
- Arnaldo
PS.: While we don't have tools to check out that the holes are not a
problem because we want to exploit cacheline locality effects... what
about some comments on the structs to explain that such holes are not
a problem? :-)
I am all for automatic tools, if they can convince human beings :)
For example, I am using an optimization that is quite simple but which was not
accepted by netdev community :
- Moving the struct flowi directly into "struct dst_entry", right after the
'struct dst_entry *next;' pointer.
AFAIK all objects that include a 'struct dst_entry' also include a 'struct
flowi', so this is just a small violation of layering.
This really helps because lookups now touch only one cache line per chained
item instead of two/three. On loaded routers with 8 items per chain, thats 8
or 16 cache lines CPU dont have to bring in its cache per IP packet.
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html