https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
--- Comment #28 from Yongwei Wu <wuyongwei at gmail dot com> --- OK, somewhat answering myself. I was not aware of the fact that 128-bit atomic read has to be implemented using cmpxchg16b as well, thus defeating some non-CAS usage scenarios. The natural question is: which usage scenario is more significant? Or is there a way to support both? I still think lock-free data structures are too import to ignore.