> Hi!
> 
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
> 
> While we have *swap<mode> patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
> 
> The following patch does it when optimizing the insn for size only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR target/92549
>       * config/i386/i386.md (peephole2 for *swap<mode>): New peephole2.
> 
>       * gcc.target/i386/pr92549.c: New test.

It is very hard to get a testcase, unforutnately, but I got the
following (locally non-reproducible) failure while building firefox with
LTO+FDO:

[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -   1080 | }
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -        |
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -  (insn 5555 1389 1390 41 
(parallel [
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -              (set (reg:SI 
24 xmm4 [orig:187 SR.3778 ] [187])
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO - (reg:SI 23 xmm3 [orig:104 
SR.3780 ] [104]))
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -              (set (reg:SI 
23 xmm3 [orig:104 SR.3780 ] [104])
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO - (reg:SI 24 xmm4 [orig:187 
SR.3778 ] [187]))
[task 2019-12-01T14:38:04.166Z] 14:38:04     INFO -          ])
"/builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/WritingModes.h":1117:0
78 {*swapsi}
[task 2019-12-01T14:38:04.167Z] 14:38:04     INFO -       (nil))
[task 2019-12-01T14:38:04.167Z] 14:38:04     INFO -  during RTL pass: rnreg

I guess the problem is that there is no xchange in SSE instruction set,
so peephle needs to be more restrictive?

Honza

Reply via email to