Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > > As subject, this patch rewrites the vmull[_high]_p8 Neon intrinsics to use RTL > builtins rather than inline assembly code, allowing for better scheduling and > optimization. > > Regression tested and bootstrapped on aarch64-none-linux-gnu and > aarch64_be-none-elf - no issues.
Thanks for doing this. Mostly LGTM, but one comment about the patterns: > […] > +(define_insn "aarch64_pmull_hiv16qi_insn" > + [(set (match_operand:V8HI 0 "register_operand" "=w") > + (unspec:V8HI > + [(vec_select:V8QI > + (match_operand:V16QI 1 "register_operand" "w") > + (match_operand:V16QI 3 "vect_par_cnst_hi_half" "")) > + (vec_select:V8QI > + (match_operand:V16QI 2 "register_operand" "w") > + (match_dup 3))] > + UNSPEC_PMULL2))] > + "TARGET_SIMD" > + "pmull2\\t%0.8h, %1.16b, %2.16b" > + [(set_attr "type" "neon_mul_b_long")] > +) As things stands, UNSPEC_PMULL2 has the vec_select “built in”: (define_insn "aarch64_crypto_pmullv2di" [(set (match_operand:TI 0 "register_operand" "=w") (unspec:TI [(match_operand:V2DI 1 "register_operand" "w") (match_operand:V2DI 2 "register_operand" "w")] UNSPEC_PMULL2))] "TARGET_SIMD && TARGET_AES" "pmull2\\t%0.1q, %1.2d, %2.2d" [(set_attr "type" "crypto_pmull")] ) So I think it would be more consistent to do one of the following: (1) Keep the vec_selects in the new pattern, but use UNSPEC_PMULL for the operation instead of UNSPEC_PMULL2. (2) Remove the vec_selects and keep the UNSPEC_PMULL2. (1) in principle allows more combination opportunities than (2), although I don't know how likely it is to help in practice. Thanks, Richard