https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90283

--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Martin Liška from comment #3)
> The perf comes from an Intel Skylake server machine.
> 
> The number of fma is very similar:
> grep fma bad.report.txt | wc -l
> 126
> grep fma good.report.txt | wc -l
> 128

Grepping for vfm also includes the vfmsubs etc., with the same gap:

bad.report.txt:167
good.report.txt:169

The distribution also looks similar:

$ sed -n 's/.*\(vfm[^ ]*\).*/\1/p' good.report.txt  | sort | uniq -c
     61 vfmadd132sd
      1 vfmadd132ss
     35 vfmadd213sd
     30 vfmadd231sd
      1 vfmadd231ss
     32 vfmsub132sd
      1 vfmsub213sd
      8 vfmsub231sd
$ sed -n 's/.*\(vfm[^ ]*\).*/\1/p' bad.report.txt  | sort | uniq -c
     60 vfmadd132sd
      1 vfmadd132ss
     35 vfmadd213sd
     29 vfmadd231sd
      1 vfmadd231ss
     29 vfmsub132sd
      1 vfmsub213sd
     11 vfmsub231sd

> But the assembly is shuffled quite significantly after the change. Can you
> Richard Sandiford please take a look?

I think I'm going to need more clues why the new code is so much
slower in practice.  Could someone more familiar with the architecture
comment?

Reply via email to