https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #48 from Luke Dalessandro <ldalessandro at gmail dot com> ---

(In reply to LIU Hao from comment #47)
> (In reply to Luke Dalessandro from comment #46)
> > But if 104688 isn't related to this issue, and thus Jakub's comment was in
> > error, I definitely don't understand the underlying problem and why clang is
> > fine doing it.
> 
> Issue here is that if atomic load is implemented with a call to libatomic
> routines then it's incorrect to implement CAS without a call.

So my understanding is that 104688 basically determined that it's correct to
implement atomic load with movdqa for aligned addresses on architectures with
AVX support. And hence gcc could inline that in the same way clang does, and
inline cmpxchg16b for compare_exchange/__atomic_compare_exchange{_n} as well.
And thus there no longer has to be a libatomic call for any of these.

I can support the fact that -mcx16 is maybe the wrong flag to use to force
inlining here given it's cmpxchg-style name, but it really feels like a
sophisticated user that's willing to live in implementation-defined land should
be able to get the same performance for lock-free code out of gcc that it does
out of clang in this situation.

Reply via email to