On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
> Unfortunately, onx86/x64 both are compiled in a rather poor way:
>
> __sync_increment:
>
> lock addl $x01,(ptr)
>
> which is longer than:
>
> lock incl (ptr)
GCC actually generates lock incl (ptr) already now, it just depends
on which CPU you optimize for.
/* X86_TUNE_USE_INCDEC */
~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
if addl is better (e.g. on Atom incl is very bad compared to addl $1),
it will generate it.
Jakub