Re: [PATCH] combine: Allow combining two insns to two insns

Richard Sandiford Tue, 31 Jul 2018 05:59:28 -0700

Christophe Lyon <[email protected]> writes:
> On Mon, 30 Jul 2018 at 18:09, Segher Boessenkool
> <[email protected]> wrote:
>>
>> On Tue, Jul 24, 2018 at 05:18:41PM +0000, Segher Boessenkool wrote:
>> > This patch allows combine to combine two insns into two.  This helps
>> > in many cases, by reducing instruction path length, and also allowing
>> > further combinations to happen.  PR85160 is a typical example of code
>> > that it can improve.
>> >
>> > This patch does not allow such combinations if either of the original
>> > instructions was a simple move instruction.  In those cases combining
>> > the two instructions increases register pressure without improving the
>> > code.  With this move test register pressure does no longer increase
>> > noticably as far as I can tell.
>> >
>> > (At first I also didn't allow either of the resulting insns to be a
>> > move instruction.  But that is actually a very good thing to have, as
>> > should have been obvious).
>> >
>> > Tested for many months; tested on about 30 targets.
>> >
>> > I'll commit this later this week if there are no objections.
>>
>> Done now, with the testcase at
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01856.html .
>>
>
> Hi,
>
> Since this was committed, I've noticed regressions
> on aarch64:
> FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:
>
> on arm-none-linux-gnueabi
> FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test
>
> On aarch64, I've also noticed a few others regressions but I'm not yet
> 100% sure it's caused by this patch (bisect running):
>     gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4
>     gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2
>     gcc.target/aarch64/sve/var_stride_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tlsl\\tx[0-9]+, x[0-9]+, 10\\n 2
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 7
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 14
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 5
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 10
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 252
>     gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 180
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 14
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 28
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 10
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 20
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 21
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 42
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 15
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 30
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 28
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 56
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 20
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 40
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 21
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 42
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 15
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 30
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 28
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 56
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 20
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 40
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0 7
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 14
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0 5
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 10
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d 63
>     gcc.target/aarch64/sve/vcond_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s 45


The SVE failures were caused by it.  What combine is doing is definitely
valid though, since it's converting two dependent instructions into two
independent instructions of equal cost.

I think the fix would be to have proper support for conditional
comparisons, but that's not a short-term thing.  I've filed PR86753
in the meantime and will probably XFAIL the tests for now.

Thanks,
Richard

Re: [PATCH] combine: Allow combining two insns to two insns

Reply via email to