On Tue, Jul 21, 2020 at 5:46 PM Franz Sirl <franz.sirl-ker...@lauterbach.com> wrote: > > Am 2020-07-20 um 20:39 schrieb Uros Bizjak via Gcc-patches: > > Currently, __atomic_thread_fence(seq_cst) on x86 and x86-64 generates > > mfence instruction. A dummy atomic instruction (a lock-prefixed instruction > > or xchg with a memory operand) would provide the same sequential consistency > > guarantees while being more efficient on most current CPUs. The mfence > > instruction additionally orders non-temporal stores, which is not relevant > > for atomic operations and are not ordered by seq_cst atomic operations > > anyway. > > > > 2020-07-20 Uroš Bizjak <ubiz...@gmail.com> > > > > gcc/ChangeLog: > > PR target/95750 > > * config/i386/i386.h (TARGET_AVOID_MFENCE): > > Rename from TARGET_USE_XCHG_FOR_ATOMIC_STORE. > > * config/i386/sync.md (mfence_sse2): Disable for TARGET_AVOID_MFENCE. > > (mfence_nosse): Enable also for TARGET_AVOID_MFENCE. Emit stack > > referred memory in word_mode. > > (mem_thread_fence): Do not generate mfence_sse2 pattern when > > TARGET_AVOID_MFENCE is true. > > (atomic_store<mode>): Update for rename. > > * config/i386/x86-tune.def (X86_TUNE_AVOID_MFENCE): > > Rename from X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE. > > > > gcc/testsuite/ChangeLog: > > PR target/95750 > > * gcc.target/i386/pr95750.c: New test. > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > Uros. > > > > Hi, > > I didn't bisect it, but I see a profiledbootstrap ICE that may be related:
Ah, mfence_sse2 can be expanded from the __builtin_ia32_mfence independently of tuning flags. I'm testing the following patch: --cut here-- diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index c6827037abf..c88750d3664 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -89,8 +89,7 @@ (define_insn "mfence_sse2" [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))] - "(TARGET_64BIT || TARGET_SSE2) - && !TARGET_AVOID_MFENCE" + "TARGET_64BIT || TARGET_SSE2" "mfence" [(set_attr "type" "sse") (set_attr "length_address" "0") @@ -101,8 +100,7 @@ [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE)) (clobber (reg:CC FLAGS_REG))] - "!(TARGET_64BIT || TARGET_SSE2) - || TARGET_AVOID_MFENCE" + "" { rtx mem = gen_rtx_MEM (word_mode, stack_pointer_rtx); --cut here-- Uros.