Christophe Lyon <christophe.l...@linaro.org> writes: > On Mon, 30 Jul 2018 at 18:09, Segher Boessenkool > <seg...@kernel.crashing.org> wrote: >> >> On Tue, Jul 24, 2018 at 05:18:41PM +0000, Segher Boessenkool wrote: >> > This patch allows combine to combine two insns into two. This helps >> > in many cases, by reducing instruction path length, and also allowing >> > further combinations to happen. PR85160 is a typical example of code >> > that it can improve. >> > >> > This patch does not allow such combinations if either of the original >> > instructions was a simple move instruction. In those cases combining >> > the two instructions increases register pressure without improving the >> > code. With this move test register pressure does no longer increase >> > noticably as far as I can tell. >> > >> > (At first I also didn't allow either of the resulting insns to be a >> > move instruction. But that is actually a very good thing to have, as >> > should have been obvious). >> > >> > Tested for many months; tested on about 30 targets. >> > >> > I'll commit this later this week if there are no objections. >> >> Done now, with the testcase at >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01856.html . >> > > Hi, > > Since this was committed, I've noticed regressions > on aarch64: > FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and: > > on arm-none-linux-gnueabi > FAIL: gfortran.dg/actual_array_constructor_1.f90 -O1 execution test > > On aarch64, I've also noticed a few others regressions but I'm not yet > 100% sure it's caused by this patch (bisect running): > gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4 > gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve > scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2 > gcc.target/aarch64/sve/var_stride_4.c -march=armv8.2-a+sve > scan-assembler-times \\tlsl\\tx[0-9]+, x[0-9]+, 10\\n 2 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 7 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 14 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 5 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 10 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 21 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 42 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 15 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 30 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 21 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 42 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 15 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 30 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 21 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 42 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 15 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 30 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 21 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 42 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 15 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 30 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0\\n 21 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 42 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0\\n 15 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 30 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d\\n 252 > gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s\\n 180 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 14 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 28 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 10 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 20 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 21 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 42 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 15 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 30 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 28 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 56 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 20 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 40 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 21 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 42 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 15 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 30 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 28 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 56 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 20 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 40 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > #0\\.0 7 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 14 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > #0\\.0 5 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 10 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, > z[0-9]+\\.d 63 > gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve > scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, > z[0-9]+\\.s 45
The SVE failures were caused by it. What combine is doing is definitely valid though, since it's converting two dependent instructions into two independent instructions of equal cost. I think the fix would be to have proper support for conditional comparisons, but that's not a short-term thing. I've filed PR86753 in the meantime and will probably XFAIL the tests for now. Thanks, Richard