All,
In the Toolchain Working Group Mans has been doing some examination of SPEC
2000 and SPEC 2006 to see what C Library (glibc) routines impact performance
the most, and are worth tuning.
This has come up with two areas we consider worthy of further investigation:
1) malloc performance
2) Floating-point rounding functions.
This email is interested with the first of these.
Analysis of malloc shows large amounts of time is spent in executing
synchronization primitives even when the program under test is single-threaded.
The obvious 'fix' is to remove the synchronization primitives which will
give a performance boost. This is, of course, not safe and will require
reworking malloc's algorithms to be (substantially) synchronization free.
A quick Google suggests that there are better performing algorithms
available (TCMalloc, Lockless, Hoard, &c), and so changing glibc's algorithm
is something well worth investigating.
Currently we see around 4.37% of time being spent in libc for the whole of
SPEC CPU 2006. Around 75% of that is in malloc related functions (so about
3.1% of the total). One benchmark however spends around 20% of its time in
malloc. So overall we are looking at maybe 1% improvement in the SPEC 2006
score, which is not large given the amount of effort I estimate this is
going to require (as we have to convince the community we have made
everyone's life better).
So before we go any further I would like to see what the view of LEG is
about a better malloc. My questions boil down to:
* Is malloc important - or do server applications just implement their own?
* Do you have any benchmarks that stress malloc and would provide us with
some more data points?
But any and all comments on the subject are welcome.
Thanks,
Matt
--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain