Rick Jones a écrit :
On a second thought, do you think we still need one rwlock per hash chain ?
 >

TCP established hash table entries: 1048576 (order: 12, 16777216 bytes)

On this x86_64 machine, we 'waste' 8 MB of ram for those rwlocks.

With RCU, we touch these rwlocks only on TCP connection creation/deletion, maybe we could reduce to one rwlock or a hashed array of 2^N rwlocks (2^N depending on NR_CPUS), like in net/ipv4/route.c ?

TCP connection/deletion can be a rather frequent occurance. Web servers and all, even with persistent connections. Contention there would show-up in a SPECweb benchmark just as an example. SPECweb is a bit involved - especially SPECweb2005. The netperf TCP_CRR or TCP_CC test could be used - likely aggregates of them - to measure things, but it might take more than a small CPU count system to see contention going to a "too small" number of locks.

I suspect that one rwlock would be too small :) Hashed locks might work - would it introduce more cache misses?

Yes it could introduce cache misses and SMP cache line bouncing, but reducing the memory footprint could help some workloads too.

Maybe we could use a special structure with one rwlock per cache line ? This would 'waste' 1/8 of ram on 64bits platforms , 1/16 or 1/32 on 32bits platforms, instead of 1/2

#define PTRS_PER_CACHELINE ((L1_CACHE_BYTES - sizeof(rwlock_t))/sizeof(struct hlist_head))

struct hash_agg_bucket {
        rwlock_t   wlock;
        struct hlist_head chains[PTRS_PER_CACHELINE];
};

A division by a constant (7 or 15, or 31...) is fast since gcc emits a multiply, so computing the address of wlock and 'hlist_head' would not be very expensive.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to