On Wed, Oct 21, 2020 at 1:48 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote: > > > > Hi, > > > > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I > > > think that the new patterns should follow the same path as vzeroall > > > and vzeroupper patterns, where we emit the pattern with explicit hard > > > regs. > > > > > > BTW: We do have SSE_FIRST_REG class, but this class was added to solve > > > some reload problems in the past by marking %xmm0 as likely spilled. > > > > Thanks for your suggestion, we have removed the register classes and > > constraints, and > > set explicit sse hard registers in the expander. The corresponding patterns > > are also adjusted, > > > > Update and rebased patch. > > The attached patch goes only half-way to using explicit registers. As > said previously, please see how avx_vzeroall expander is generating > its insn pattern, and how *avx_vzeroall matches the generated pattern > using "vzeroall_operation" predicate.
For example: +(define_insn "encodekey128u32" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec_volatile:SI + [(match_operand:SI 1 "register_operand" "r") + (match_operand:V2DI 3 "register_operand" "2")] + UNSPECV_ENCODEKEY128U32)) should be generated as: (parallel [ (set ( ... as above ... ) (unspec_volatile:SI [( ... as above ... ) ( reg:V2DI 20 xmm0 )] UNSPEC_ENCODEKEY128U32)) followed by the serie of: (set (reg:V2DI 20 xmm0) (unspec_volatile:V2DI [(const_int 0)] UNSPECV_ENCODEKEY128U32)) no need to duplicate already listed input operands in unspec_volatile. followed by another serie of: (set (reg:V2DI 26 xmm6) (const_vector:V2DI [(const_int 0) (const_int 0)])) to tell the optimizer that some registers now hold zero, so the value in the register can eventually be reused elsewhere. and finish the parallel with clobber of flags_reg. Another example: +(define_insn "aes<aeswideklvariant>u8" + [(set (reg:CCZ FLAGS_REG) + (unspec_volatile:CCZ [(match_operand:BLK 0 "memory_operand" "m") + (match_operand:V2DI 9 "register_operand" "1") + (match_operand:V2DI 2 "sse_reg_operand") + (match_operand:V2DI 3 "sse_reg_operand") + (match_operand:V2DI 4 "sse_reg_operand") + (match_operand:V2DI 5 "sse_reg_operand") + (match_operand:V2DI 6 "sse_reg_operand") + (match_operand:V2DI 7 "sse_reg_operand") + (match_operand:V2DI 8 "sse_reg_operand")] + AESDECENCWIDEKL)) + (set (match_operand:V2DI 1 "register_operand" "=Yz") + (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL)) + (set (match_dup 2) + (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL)) This should be written as: parallel [ (set ( ... as above ... ) (unspec_volatile:CCZ [( ... as above, BLK only ... )] UNSPEC_AESDECENWIDEKL)) followed by a series of: (set (reg:V2DI 20 xmm0) (unspec_volatile:V2DI [(reg:V2DI 20 xmm0)] UNSPEC_AESDECENCWIDEKL)) And please see the mentioned expander and pattern how the above series are generated and matched. Uros.