Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

Remi Machet Fri, 13 Jun 2025 09:07:11 -0700


On 6/13/25 11:22, Alex Coplan wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi Remi,
> 
> On 12/06/2025 17:02, Richard Sandiford wrote:
>> Remi Machet <rmac...@nvidia.com> writes:
>>> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn
>>> which
>>> allows for better optimization when the code is inside a loop by using a
>>> constant.
> 
> It can be helpful for reviewers to show the codegen you get before/after
> your patch in the cover letter.
> 
> In this case, it might also be helpful to include a brief argument for
> why the transformation is correct, e.g. noting that, for a uint16_t x:
>    -x = ~x + 1
>    ~x = -1 - x
> so:
>    (u8)(~x >> imm)
> is equivalent to:
>    (u8)(((u16)-1 - x) >> imm).
> 
>>>
>>> Bootstrapped and regtested on aarch64-linux-gnu.
>>>
>>> Signed-off-by: Remi Machet <rmac...@nvidia.com>
>>>
>>> gcc/ChangeLog:
>>>
>>>       * config/aarch64/aarch64-simd.md (*shrn_to_subhn_<mode>): Add pattern
>>>           converting mvn+shrn into mvni+subhn.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       * gcc.target/aarch64/simd/shrn2subhn.c: New test.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>> b/gcc/config/aarch64/aarch64-simd.md
> 
> As Kyrill noted, the formatting of the patch seems to be quite broken.
> This diff --git line shouldn't be wrapped, for example.  There are also
> quite a few unicode NBSP (non-breaking space) characters in the patch.
> These things make it difficult to apply the patch (it needs manual
> edits).  Attaching the patch file next time should help.
> 
>>> index 6e30dc48934..f49e6fe6a26 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -5028,6 +5028,34 @@
>>>      DONE;
>>>    })
>>>
>>> +;; convert what would be a mvn+shrn into a mvni+subhn because the use of a
>>> +;; constant load rather than not instructions allows for better loop
>>> +;; optimization. On some implementations the use of subhn also result
>>> in better
>>> +;; throughput.
>>> +(define_insn_and_split "*shrn_to_subhn_<mode>"
>>> +  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=&w")
>>> +    (truncate:<VNARROWQ>
>>> +      (lshiftrt:VQN
>>> +        (not:VQN (match_operand:VQN 1 "register_operand" "w"))
>>> +        (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))))]
>>
>> Very minor, sorry, but it would be good to format this so that the
>> (truncate ...) lines up with the (match_operand...):
>>
>>    [(set (match_operand:<VNARROWQ> 0 "register_operand" "=&w")
>>        (truncate:<VNARROWQ>
>>          (lshiftrt:VQN
>>            (not:VQN (match_operand:VQN 1 "register_operand" "w"))
>>            (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))))]
>>
>>> +  "TARGET_SIMD"
>>> +  "#"
>>> +  "&& true"
>>> +  [(const_int 0)]
>>> +{
>>> +  rtx tmp;
>>> +  if (can_create_pseudo_p ())
>>> +    tmp = gen_reg_rtx (<MODE>mode);
>>> +  else
>>> +    tmp = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
>>> +  emit_insn (gen_move_insn (tmp,
>>> +              aarch64_simd_gen_const_vector_dup (<MODE>mode, -1)));
>>
>> This can be simplified to:
>>
>>    emit_insn (gen_move_insn (tmp, CONSTM1_RTX (<MODE>mode)));
> 
> Is there a reason to prefer emit_insn (gen_move_insn (x,y)) over just
> emit_move_insn (x,y)?  I tried the latter locally and it seemed to work.
> 
>>
>>> +  emit_insn (gen_aarch64_subhn<mode>_insn (operands[0], tmp,
>>> +                        operands[1], operands[2]));
>>
>> Sorry for the formatting nit, but: "operands[1]" should line up with
>> "operands[0]".
>>
>>> +  DONE;
>>> +})
>>> +
>>> +
>>>    ;; pmul.
>>>
>>>    (define_insn "aarch64_pmul<mode>"
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c
>>> b/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c
>>> new file mode 100644
>>> index 00000000000..d03af815671
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/simd/shrn2subhn.c
>>> @@ -0,0 +1,18 @@
>>> +/* This test case checks that replacing a not+shift by a sub -1 works. */
>>> +/* { dg-do compile } */
>>> +/* { dg-additional-options "--save-temps -O1" } */
>>
>> The --save-temps isn't needed, since dg-do compile compiles to assembly
>> anyway.
>>
>> Looks really good otherwise, thanks!  So OK with those changes from my POV,
>> but please leave a day or so for others to comment.
>>
>> Richard
>>
>>> +/* { dg-final { scan-assembler-times "\\tsubhn\\t" 2 } } */
>>> +
>>> +#include<stdint.h>
>>> +#include<arm_neon.h>
>>> +#include<stdlib.h>
> 
> I think only the arm_neon.h include is required here, so let's drop the
> std{int,lib}.h includes?
> 
> Otherwise LGTM with the other comments addressed, thanks for the patch.
> 
> Alex
> 
>>> +
>>> +uint8x8_t neg_narrow(uint16x8_t a) {
>>> +  uint16x8_t b = vmvnq_u16(a);
>>> +  return vshrn_n_u16(b, 8);
>>> +}
>>> +
>>> +uint8x8_t neg_narrow_vsubhn(uint16x8_t a) {
>>> +  uint16x8_t ones = vdupq_n_u16(0xffff);
>>> +  return vsubhn_u16(ones, a);
>>> +}


Hi Alex,

Thanks for the feedback, regarding using emit_move_insn() no reason, I 
just had forgotten about it... I will fix that as well as the rest of 
your feedback, and will make sure to attach the patch this time as my 
second attempt was just as bad as the first!

Remi

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

Reply via email to