https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
Yongwei Wu <wuyongwei at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wuyongwei at gmail dot com --- Comment #3 from Yongwei Wu <wuyongwei at gmail dot com> --- Is there really a valid use case for a non-lock-free version of 128-bit CAS? I am using it in a lock-free data structure. The GCC-generated code is MUCH slower than the mutex-based version, defeating all its valid purposes. I am talking about a 10x difference. And the Clang-generated code is more than 200x faster in my 8-thread contention test. To me, the current GCC behaviour is not missed optimization. It is pessimization. I am really having a difficult time understanding the rationale of the current design.