On Wed, Jul 25, 2018 at 1:28 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool > <seg...@kernel.crashing.org> wrote: >> >> This patch allows combine to combine two insns into two. This helps >> in many cases, by reducing instruction path length, and also allowing >> further combinations to happen. PR85160 is a typical example of code >> that it can improve. >> >> This patch does not allow such combinations if either of the original >> instructions was a simple move instruction. In those cases combining >> the two instructions increases register pressure without improving the >> code. With this move test register pressure does no longer increase >> noticably as far as I can tell. >> >> (At first I also didn't allow either of the resulting insns to be a >> move instruction. But that is actually a very good thing to have, as >> should have been obvious). >> >> Tested for many months; tested on about 30 targets. >> >> I'll commit this later this week if there are no objections. > > Sounds good - but, _any_ testcase? Please! ;) >
Here is a testcase: For --- #define N 16 float f[N]; double d[N]; int n[N]; __attribute__((noinline)) void f3 (void) { int i; for (i = 0; i < N; i++) d[i] = f[i]; } --- r263067 improved -O3 -mavx2 -mtune=generic -m64 from .cfi_startproc vmovaps f(%rip), %xmm2 vmovaps f+32(%rip), %xmm3 vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0 vcvtps2pd %xmm0, %ymm1 vextractf128 $0x1, %ymm0, %xmm0 vmovaps %xmm1, d(%rip) vextractf128 $0x1, %ymm1, d+16(%rip) vcvtps2pd %xmm0, %ymm0 vmovaps %xmm0, d+32(%rip) vextractf128 $0x1, %ymm0, d+48(%rip) vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0 vcvtps2pd %xmm0, %ymm1 vextractf128 $0x1, %ymm0, %xmm0 vmovaps %xmm1, d+64(%rip) vextractf128 $0x1, %ymm1, d+80(%rip) vcvtps2pd %xmm0, %ymm0 vmovaps %xmm0, d+96(%rip) vextractf128 $0x1, %ymm0, d+112(%rip) vzeroupper ret .cfi_endproc to .cfi_startproc vcvtps2pd f(%rip), %ymm0 vmovaps %xmm0, d(%rip) vextractf128 $0x1, %ymm0, d+16(%rip) vcvtps2pd f+16(%rip), %ymm0 vmovaps %xmm0, d+32(%rip) vextractf128 $0x1, %ymm0, d+48(%rip) vcvtps2pd f+32(%rip), %ymm0 vextractf128 $0x1, %ymm0, d+80(%rip) vmovaps %xmm0, d+64(%rip) vcvtps2pd f+48(%rip), %ymm0 vextractf128 $0x1, %ymm0, d+112(%rip) vmovaps %xmm0, d+96(%rip) vzeroupper ret .cfi_endproc This is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86752 H.J.