On 27/09/18 18:07, Wilco Dijkstra wrote:
> The popcount expansion uses SIMD instructions acting on 64-bit values.
> As a result a popcount of a 32-bit integer requires zero-extension before
> moving the zero-extended value into an FP register. This patch adds
> support for zero-extended int->FP moves to avoid the redundant uxtw.
> Similarly, add support for a 32-bit zero-extending load->FP register.
> Add a missing 'fp' arch attribute to the related 8/16-bit pattern.
>
> int f (int a)
> {
> return __builtin_popcount (a);
> }
>
> Before:
> uxtw x0, w0
> fmov d0, x0
> cnt v0.8b, v0.8b
> addv b0, v0.8b
> fmov w0, s0
> ret
>
> After:
> fmov s0, w0
> cnt v0.8b, v0.8b
> addv b0, v0.8b
> fmov w0, s0
> ret
>
> Passes regress on AArch64, OK for commit?
>
> ChangeLog:
> 2018-09-27 Wilco Dijkstra <[email protected]>
>
> gcc/
> * config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add w=r and
> w=m zeroextend alternatives.
This is a bit too terse for my liking. "Add alternatives to zero-extend
into a floating-point register." would be clearer.
> (zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add arch attribute.
>
> testsuite/
> * gcc.target/aarch64/popcnt.c: Test zero-extended popcount.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index
> ef2368706e88a551b9d0d2db2385860112bdbdde..c0b39ea876f76b54b31088d8d63d96a9cbcf8b88
> 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1588,13 +1588,16 @@
> )
>
> (define_insn "*zero_extendsidi2_aarch64"
> - [(set (match_operand:DI 0 "register_operand" "=r,r")
> - (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
> + [(set (match_operand:DI 0 "register_operand" "=r,r,w,w")
> + (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
> "r,m,r,m")))]
> ""
> "@
> uxtw\t%0, %w1
> - ldr\t%w0, %1"
> - [(set_attr "type" "extend,load_4")]
> + ldr\t%w0, %1
> + fmov\t%s0, %w1
> + ldr\t%s0, %1"
> + [(set_attr "type" "extend,load_4,fmov,f_loads")
> + (set_attr "arch" "*,*,fp,fp")]
> )
>
> (define_insn "*load_pair_zero_extendsidi2_aarch64"
> @@ -1634,7 +1637,8 @@
> and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
> ldr<SHORT:size>\t%w0, %1
> ldr\t%<SHORT:size>0, %1"
> - [(set_attr "type" "logic_imm,load_4,load_4")]
> + [(set_attr "type" "logic_imm,load_4,f_loads")
This change looks correct, but it's not part of your ChangeLog entry.
> + (set_attr "arch" "*,*,fp")]
> )
>
> (define_expand "<optab>qihi2"
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> index
> 7e957966d8e81b8633a444bb42944d0da82ae5db..8fb8db82e052bd898aa8513710fd94f892d80b61
> 100644
> --- a/gcc/testsuite/gcc.target/aarch64/popcnt.c
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -19,5 +19,14 @@ foo2 (long long x)
> return __builtin_popcountll (x);
> }
>
> -/* { dg-final { scan-assembler-not "popcount" } } */
> -/* { dg-final { scan-assembler-times "cnt\t" 3 } } */
> +int
> +foo3 (int *p)
> +{
> + return __builtin_popcount (*p);
> +}
> +
> +/* { dg-final { scan-assembler-not {popcount} } } */
> +/* { dg-final { scan-assembler-times {cnt\t} 4 } } */
> +/* { dg-final { scan-assembler-times {fmov\ts} 1 } } */
> +/* { dg-final { scan-assembler-times {fmov\td} 2 } } */
> +/* { dg-final { scan-assembler-times {ldr\ts} 1 } } */
>
OK with those changes.
R.