Hi Remi,

On 12/06/2025 17:02, Richard Sandiford wrote:
> Remi Machet <rmac...@nvidia.com> writes:
> > Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn 
> > which
> > allows for better optimization when the code is inside a loop by using a
> > constant.

It can be helpful for reviewers to show the codegen you get before/after
your patch in the cover letter.

In this case, it might also be helpful to include a brief argument for
why the transformation is correct, e.g. noting that, for a uint16_t x:
  -x = ~x + 1
  ~x = -1 - x
so:
  (u8)(~x >> imm)
is equivalent to:
  (u8)(((u16)-1 - x) >> imm).

> >
> > Bootstrapped and regtested on aarch64-linux-gnu.
> >
> > Signed-off-by: Remi Machet <rmac...@nvidia.com>
> >
> > gcc/ChangeLog:
> >
> >      * config/aarch64/aarch64-simd.md (*shrn_to_subhn_<mode>): Add pattern
> >          converting mvn+shrn into mvni+subhn.
> >
> > gcc/testsuite/ChangeLog:
> >
> >      * gcc.target/aarch64/simd/shrn2subhn.c: New test.
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md

As Kyrill noted, the formatting of the patch seems to be quite broken.
This diff --git line shouldn't be wrapped, for example.  There are also
quite a few unicode NBSP (non-breaking space) characters in the patch.
These things make it difficult to apply the patch (it needs manual
edits).  Attaching the patch file next time should help.

> > index 6e30dc48934..f49e6fe6a26 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -5028,6 +5028,34 @@
> >     DONE;
> >   })
> >
> > +;; convert what would be a mvn+shrn into a mvni+subhn because the use of a
> > +;; constant load rather than not instructions allows for better loop
> > +;; optimization. On some implementations the use of subhn also result 
> > in better
> > +;; throughput.
> > +(define_insn_and_split "*shrn_to_subhn_<mode>"
> > +  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=&w")
> > +    (truncate:<VNARROWQ>
> > +      (lshiftrt:VQN
> > +        (not:VQN (match_operand:VQN 1 "register_operand" "w"))
> > +        (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))))]
> 
> Very minor, sorry, but it would be good to format this so that the
> (truncate ...) lines up with the (match_operand...):
> 
>   [(set (match_operand:<VNARROWQ> 0 "register_operand" "=&w")
>       (truncate:<VNARROWQ>
>         (lshiftrt:VQN
>           (not:VQN (match_operand:VQN 1 "register_operand" "w"))
>           (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))))]
> 
> > +  "TARGET_SIMD"
> > +  "#"
> > +  "&& true"
> > +  [(const_int 0)]
> > +{
> > +  rtx tmp;
> > +  if (can_create_pseudo_p ())
> > +    tmp = gen_reg_rtx (<MODE>mode);
> > +  else
> > +    tmp = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
> > +  emit_insn (gen_move_insn (tmp,
> > +              aarch64_simd_gen_const_vector_dup (<MODE>mode, -1)));
> 
> This can be simplified to:
> 
>   emit_insn (gen_move_insn (tmp, CONSTM1_RTX (<MODE>mode)));

Is there a reason to prefer emit_insn (gen_move_insn (x,y)) over just
emit_move_insn (x,y)?  I tried the latter locally and it seemed to work.

> 
> > +  emit_insn (gen_aarch64_subhn<mode>_insn (operands[0], tmp,
> > +                        operands[1], operands[2]));
> 
> Sorry for the formatting nit, but: "operands[1]" should line up with
> "operands[0]".
> 
> > +  DONE;
> > +})
> > +
> > +
> >   ;; pmul.
> >
> >   (define_insn "aarch64_pmul<mode>"
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c 
> > b/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c
> > new file mode 100644
> > index 00000000000..d03af815671
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c
> > @@ -0,0 +1,18 @@
> > +/* This test case checks that replacing a not+shift by a sub -1 works. */
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "--save-temps -O1" } */
> 
> The --save-temps isn't needed, since dg-do compile compiles to assembly
> anyway.
> 
> Looks really good otherwise, thanks!  So OK with those changes from my POV,
> but please leave a day or so for others to comment.
> 
> Richard
> 
> > +/* { dg-final { scan-assembler-times "\\tsubhn\\t" 2 } } */
> > +
> > +#include<stdint.h>
> > +#include<arm_neon.h>
> > +#include<stdlib.h>

I think only the arm_neon.h include is required here, so let's drop the
std{int,lib}.h includes?

Otherwise LGTM with the other comments addressed, thanks for the patch.

Alex

> > +
> > +uint8x8_t neg_narrow(uint16x8_t a) {
> > +  uint16x8_t b = vmvnq_u16(a);
> > +  return vshrn_n_u16(b, 8);
> > +}
> > +
> > +uint8x8_t neg_narrow_vsubhn(uint16x8_t a) {
> > +  uint16x8_t ones = vdupq_n_u16(0xffff);
> > +  return vsubhn_u16(ones, a);
> > +}

Reply via email to