Benjamin LaHaise wrote:
On Tue, Mar 07, 2006 at 01:04:36PM +1100, Nick Piggin wrote:

I'd say it will turn out to be more trouble than its worth, for the miserly cost avoiding one atomic_inc, and one atomic_dec_and_test on page-local data that will be in L1 cache. I'd never turn my nose up at anyone just having a go though :)


The cost is anything but miserly. Consider that every lock instruction is a memory barrier which takes your OoO CPU with lots of instructions in flight to ramp down to just 1 for the time it takes that instruction to execute. That synchronization is what makes the atomic expensive.


Yeah x86(-64) is a _little_ worse off in that regard because its locks
imply rmbs.

But I'm saying the cost is miserly compared to the likely overheads
of using RCU-ed page freeing, when taken as impact on the system as a
whole.

Though definitely if we can get rid of atomic ops for free in any low
level page handling functions in mm/ then we want to do that.

In the case of netperf, I ended up with a 2.5Gbit/s (~30%) performance improvement through nothing but microoptimizations. There is method to my madness. ;-)


Well... it was wrong too ;)

But as you can see, I'm not against microoptimisations either and I'm
glad others, like yourself, are looking at the problem too.

The 30% number is very impressive. I'd be interested to see what the
stuff currently in -mm is worth.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to