Benjamin LaHaise wrote:
On Tue, Mar 07, 2006 at 01:04:36PM +1100, Nick Piggin wrote:
I'd say it will turn out to be more trouble than its worth, for the
miserly cost
avoiding one atomic_inc, and one atomic_dec_and_test on page-local data
that will
be in L1 cache. I'd never turn my nose up at anyone just having a go
though :)
The cost is anything but miserly. Consider that every lock instruction is
a memory barrier which takes your OoO CPU with lots of instructions in flight
to ramp down to just 1 for the time it takes that instruction to execute.
That synchronization is what makes the atomic expensive.
Yeah x86(-64) is a _little_ worse off in that regard because its locks
imply rmbs.
But I'm saying the cost is miserly compared to the likely overheads
of using RCU-ed page freeing, when taken as impact on the system as a
whole.
Though definitely if we can get rid of atomic ops for free in any low
level page handling functions in mm/ then we want to do that.
In the case of netperf, I ended up with a 2.5Gbit/s (~30%) performance
improvement through nothing but microoptimizations. There is method to
my madness. ;-)
Well... it was wrong too ;)
But as you can see, I'm not against microoptimisations either and I'm
glad others, like yourself, are looking at the problem too.
The 30% number is very impressive. I'd be interested to see what the
stuff currently in -mm is worth.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html