https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106804

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to anthony.mikh from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > Clang use of cmov also might produce worse code
> 
> That's exactly why I wrote "seemingly worse". However, the approach used by
> clang uses less registers and this can impact codegen even if this function
> is inlined.

The registers allocation of the non inlined version vs the inlined will be way
different and you cannot say anything about it really because it might be that
*& might happen so there is no loads or stores. Again this is a minor code gen
issue that really does not matter.
I will look into it though over the weekend to see exactly what is happening I
suspect gcc is trying to optimize away the loads too much which forces the code
gen like this. That is there are 3 loads and one store in clang code gen rather
than 2 loads and one store.
The number of micro-ops for gcc dcode gne might be better depending thr
micro-arch too on x86_64 so again it is much more complex and clang might be
worse even if there are less instructions emitted.


Also look into other target code gen, it might be ok.

Reply via email to