Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> As subject, this patch rewrites the vmull[_high]_p8 Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling and
> optimization.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> aarch64_be-none-elf - no issues.

Thanks for doing this.  Mostly LGTM, but one comment about the patterns:

> […]
> +(define_insn "aarch64_pmull_hiv16qi_insn"
> +  [(set (match_operand:V8HI 0 "register_operand" "=w")
> +     (unspec:V8HI
> +       [(vec_select:V8QI
> +          (match_operand:V16QI 1 "register_operand" "w")
> +          (match_operand:V16QI 3 "vect_par_cnst_hi_half" ""))
> +        (vec_select:V8QI
> +          (match_operand:V16QI 2 "register_operand" "w")
> +          (match_dup 3))]
> +       UNSPEC_PMULL2))]
> + "TARGET_SIMD"
> + "pmull2\\t%0.8h, %1.16b, %2.16b"
> +  [(set_attr "type" "neon_mul_b_long")]
> +)

As things stands, UNSPEC_PMULL2 has the vec_select “built in”:

(define_insn "aarch64_crypto_pmullv2di"
 [(set (match_operand:TI 0 "register_operand" "=w")
       (unspec:TI [(match_operand:V2DI 1 "register_operand" "w")
                   (match_operand:V2DI 2 "register_operand" "w")]
                  UNSPEC_PMULL2))]
  "TARGET_SIMD && TARGET_AES"
  "pmull2\\t%0.1q, %1.2d, %2.2d"
  [(set_attr "type" "crypto_pmull")]
)

So I think it would be more consistent to do one of the following:

(1) Keep the vec_selects in the new pattern, but use UNSPEC_PMULL
    for the operation instead of UNSPEC_PMULL2.
(2) Remove the vec_selects and keep the UNSPEC_PMULL2.

(1) in principle allows more combination opportunities than (2),
although I don't know how likely it is to help in practice.

Thanks,
Richard

Reply via email to