On Wed, Oct 21, 2020 at 1:48 PM Uros Bizjak <[email protected]> wrote:
>
> On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang <[email protected]> wrote:
> >
> > Hi,
> >
> > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I
> > > think that the new patterns should follow the same path as vzeroall
> > > and vzeroupper patterns, where we emit the pattern with explicit hard
> > > regs.
> > >
> > > BTW: We do have SSE_FIRST_REG class, but this class was added to solve
> > > some reload problems in the past by marking %xmm0 as likely spilled.
> >
> > Thanks for your suggestion, we have removed the register classes and
> > constraints, and
> > set explicit sse hard registers in the expander. The corresponding patterns
> > are also adjusted,
> >
> > Update and rebased patch.
>
> The attached patch goes only half-way to using explicit registers. As
> said previously, please see how avx_vzeroall expander is generating
> its insn pattern, and how *avx_vzeroall matches the generated pattern
> using "vzeroall_operation" predicate.
For example:
+(define_insn "encodekey128u32"
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (unspec_volatile:SI
+ [(match_operand:SI 1 "register_operand" "r")
+ (match_operand:V2DI 3 "register_operand" "2")]
+ UNSPECV_ENCODEKEY128U32))
should be generated as:
(parallel [
(set ( ... as above ... )
(unspec_volatile:SI [( ... as above ... ) ( reg:V2DI 20 xmm0 )]
UNSPEC_ENCODEKEY128U32))
followed by the serie of:
(set (reg:V2DI 20 xmm0)
(unspec_volatile:V2DI [(const_int 0)] UNSPECV_ENCODEKEY128U32))
no need to duplicate already listed input operands in unspec_volatile.
followed by another serie of:
(set (reg:V2DI 26 xmm6)
(const_vector:V2DI [(const_int 0) (const_int 0)]))
to tell the optimizer that some registers now hold zero, so the value
in the register can eventually be reused elsewhere.
and finish the parallel with clobber of flags_reg.
Another example:
+(define_insn "aes<aeswideklvariant>u8"
+ [(set (reg:CCZ FLAGS_REG)
+ (unspec_volatile:CCZ [(match_operand:BLK 0 "memory_operand" "m")
+ (match_operand:V2DI 9 "register_operand" "1")
+ (match_operand:V2DI 2 "sse_reg_operand")
+ (match_operand:V2DI 3 "sse_reg_operand")
+ (match_operand:V2DI 4 "sse_reg_operand")
+ (match_operand:V2DI 5 "sse_reg_operand")
+ (match_operand:V2DI 6 "sse_reg_operand")
+ (match_operand:V2DI 7 "sse_reg_operand")
+ (match_operand:V2DI 8 "sse_reg_operand")]
+ AESDECENCWIDEKL))
+ (set (match_operand:V2DI 1 "register_operand" "=Yz")
+ (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))
+ (set (match_dup 2)
+ (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))
This should be written as:
parallel [
(set ( ... as above ... )
(unspec_volatile:CCZ [( ... as above, BLK only ... )]
UNSPEC_AESDECENWIDEKL))
followed by a series of:
(set (reg:V2DI 20 xmm0)
(unspec_volatile:V2DI [(reg:V2DI 20 xmm0)] UNSPEC_AESDECENCWIDEKL))
And please see the mentioned expander and pattern how the above series
are generated and matched.
Uros.