https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #49 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Luke Dalessandro from comment #48)
> So my understanding is that 104688 basically determined that it's correct to
> implement atomic load with movdqa for aligned addresses on architectures
> with AVX support. And hence gcc could inline that in the same way clang
> does, and inline cmpxchg16b for
> compare_exchange/__atomic_compare_exchange{_n} as well. And thus there no
> longer has to be a libatomic call for any of these.

Yes. However I suspect it might be an ABI break.


> I can support the fact that -mcx16 is maybe the wrong flag to use to force
> inlining here given it's cmpxchg-style name, but it really feels like a
> sophisticated user that's willing to live in implementation-defined land
> should be able to get the same performance for lock-free code out of gcc
> that it does out of clang in this situation.

May I remind you about https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878#c42 ?

First CMPXCHG16B can be much slower than CMPXCHG:
https://quick-bench.com/q/MZioNHkbBn0soH_KSDyYcKmrrxU

Second not all x86-64 processors support CMPXCHG16B, so `-mcx16` is required,
like `-mavx`.

Reply via email to