Rick Jones a écrit :
On a second thought, do you think we still need one rwlock per hash
chain ?
>
TCP established hash table entries: 1048576 (order: 12, 16777216 bytes)
On this x86_64 machine, we 'waste' 8 MB of ram for those rwlocks.
With RCU, we touch these rwlocks only on TCP connection
creation/deletion, maybe we could reduce to one rwlock or a hashed
array of 2^N rwlocks (2^N depending on NR_CPUS), like in
net/ipv4/route.c ?
TCP connection/deletion can be a rather frequent occurance. Web servers
and all, even with persistent connections. Contention there would
show-up in a SPECweb benchmark just as an example. SPECweb is a bit
involved - especially SPECweb2005. The netperf TCP_CRR or TCP_CC test
could be used - likely aggregates of them - to measure things, but it
might take more than a small CPU count system to see contention going to
a "too small" number of locks.
I suspect that one rwlock would be too small :) Hashed locks might work
- would it introduce more cache misses?
Yes it could introduce cache misses and SMP cache line bouncing, but reducing
the memory footprint could help some workloads too.
Maybe we could use a special structure with one rwlock per cache line ? This
would 'waste' 1/8 of ram on 64bits platforms , 1/16 or 1/32 on 32bits
platforms, instead of 1/2
#define PTRS_PER_CACHELINE ((L1_CACHE_BYTES - sizeof(rwlock_t))/sizeof(struct
hlist_head))
struct hash_agg_bucket {
rwlock_t wlock;
struct hlist_head chains[PTRS_PER_CACHELINE];
};
A division by a constant (7 or 15, or 31...) is fast since gcc emits a
multiply, so computing the address of wlock and 'hlist_head' would not be very
expensive.
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html