On Thu, Jul 6, 2023 at 1:53 PM Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Jul 6, 2023 at 3:14 AM liuhongt <hongtao....@intel.com> wrote: > > > > For testcase > > > > void __cond_swap(double* __x, double* __y) { > > bool __r = (*__x < *__y); > > auto __tmp = __r ? *__x : *__y; > > *__y = __r ? *__y : *__x; > > *__x = __tmp; > > } > > > > GCC-14 with -O2 and -march=x86-64 options generates the following code: > > > > __cond_swap(double*, double*): > > movsd xmm1, QWORD PTR [rdi] > > movsd xmm0, QWORD PTR [rsi] > > comisd xmm0, xmm1 > > jbe .L2 > > movq rax, xmm1 > > movapd xmm1, xmm0 > > movq xmm0, rax > > .L2: > > movsd QWORD PTR [rsi], xmm1 > > movsd QWORD PTR [rdi], xmm0 > > ret > > > > rax is used to save and restore DFmode value. In RA both GENERAL_REGS > > and SSE_REGS cost zero since we didn't disparage the > > alternative in movdf_internal pattern, according to register > > allocation order, GENERAL_REGS is allocated. The patch add ? for > > alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal > > pattern, after that we get optimal RA. > > > > __cond_swap: > > .LFB0: > > .cfi_startproc > > movsd (%rdi), %xmm1 > > movsd (%rsi), %xmm0 > > comisd %xmm1, %xmm0 > > jbe .L2 > > movapd %xmm1, %xmm2 > > movapd %xmm0, %xmm1 > > movapd %xmm2, %xmm0 > > .L2: > > movsd %xmm1, (%rsi) > > movsd %xmm0, (%rdi) > > ret > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > PR target/110170 > > * config/i386/i386.md (movdf_internal): Disparage slightly for > > 2 alternatives (r,v) and (v,r) by adding constraint modifier > > '?'. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr110170-3.c: New test. > > OK. Some user reports the same issue in unixbench, i looks like an common issue when swap 2 double variable So I'd like to backport this patch to GCC13/GCC12/GCC11, the fix should be generally good and at low risk. Any comments?
> > Thanks, > Uros. > > > --- > > gcc/config/i386/i386.md | 4 ++-- > > gcc/testsuite/gcc.target/i386/pr110170-3.c | 11 +++++++++++ > > 2 files changed, 13 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110170-3.c > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > index a82cc353cfd..e47ced1bb70 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -3915,9 +3915,9 @@ (define_split > > ;; Possible store forwarding (partial memory) stall in alternatives 4, 6 > > and 7. > > (define_insn "*movdf_internal" > > [(set (match_operand:DF 0 "nonimmediate_operand" > > - "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m ,r > > ,v,r ,o ,r ,m") > > + "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m > > ,?r,?v,r ,o ,r ,m") > > (match_operand:DF 1 "general_operand" > > - "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x,v,r > > ,roF,rF,rmF,rC"))] > > + "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x, v, > > r,roF,rF,rmF,rC"))] > > "!(MEM_P (operands[0]) && MEM_P (operands[1])) > > && (lra_in_progress || reload_completed > > || !CONST_DOUBLE_P (operands[1]) > > diff --git a/gcc/testsuite/gcc.target/i386/pr110170-3.c > > b/gcc/testsuite/gcc.target/i386/pr110170-3.c > > new file mode 100644 > > index 00000000000..70daa89e9aa > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr110170-3.c > > @@ -0,0 +1,11 @@ > > +/* { dg-do compile { target { ! ia32 } } } */ > > +/* { dg-options "-O2 -fno-if-conversion -fno-if-conversion2" } */ > > +/* { dg-final { scan-assembler-not {(?n)movq.*r} } } */ > > + > > +void __cond_swap(double* __x, double* __y) { > > + _Bool __r = (*__x < *__y); > > + double __tmp = __r ? *__x : *__y; > > + *__y = __r ? *__y : *__x; > > + *__x = __tmp; > > +} > > + > > -- > > 2.39.1.388.g2fc9e9ca3c > > -- BR, Hongtao