On Wed, Oct 21, 2020 at 1:48 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote:
> >
> > Hi,
> >
> > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I
> > > think that the new patterns should follow the same path as vzeroall
> > > and vzeroupper patterns, where we emit the pattern with explicit hard
> > > regs.
> > >
> > > BTW: We do have SSE_FIRST_REG class, but this class was added to solve
> > > some reload problems in the past by marking %xmm0 as likely spilled.
> >
> > Thanks for your suggestion, we have removed the register classes and 
> > constraints, and
> > set explicit sse hard registers in the expander. The corresponding patterns 
> > are also adjusted,
> >
> > Update and rebased patch.
>
> The attached patch goes only half-way to using explicit registers. As
> said previously, please see how avx_vzeroall expander is generating
> its insn pattern, and how *avx_vzeroall matches the generated pattern
> using "vzeroall_operation" predicate.

For example:

+(define_insn "encodekey128u32"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI
+          [(match_operand:SI   1 "register_operand" "r")
+           (match_operand:V2DI 3 "register_operand" "2")]
+         UNSPECV_ENCODEKEY128U32))

should be generated as:

(parallel [
  (set ( ... as above ... )
    (unspec_volatile:SI [( ... as above ... ) ( reg:V2DI 20 xmm0 )]
UNSPEC_ENCODEKEY128U32))

followed by the serie of:

   (set (reg:V2DI 20 xmm0)
        (unspec_volatile:V2DI [(const_int 0)] UNSPECV_ENCODEKEY128U32))

no need to duplicate already listed input operands in unspec_volatile.

followed by another serie of:

   (set (reg:V2DI 26 xmm6)
        (const_vector:V2DI [(const_int 0) (const_int 0)]))

to tell the optimizer that some registers now hold zero, so the value
in the register can eventually be reused elsewhere.

and finish the parallel with clobber of flags_reg.

Another example:

+(define_insn "aes<aeswideklvariant>u8"
+  [(set (reg:CCZ FLAGS_REG)
+        (unspec_volatile:CCZ [(match_operand:BLK 0 "memory_operand" "m")
+                              (match_operand:V2DI 9  "register_operand" "1")
+                              (match_operand:V2DI 2  "sse_reg_operand")
+                              (match_operand:V2DI 3  "sse_reg_operand")
+                              (match_operand:V2DI 4  "sse_reg_operand")
+                              (match_operand:V2DI 5  "sse_reg_operand")
+                              (match_operand:V2DI 6  "sse_reg_operand")
+                              (match_operand:V2DI 7  "sse_reg_operand")
+                              (match_operand:V2DI 8  "sse_reg_operand")]
+                             AESDECENCWIDEKL))
+   (set (match_operand:V2DI 1 "register_operand" "=Yz")
+        (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))
+   (set (match_dup 2)
+        (unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))

This should be written as:

parallel [
  (set ( ... as above ... )
    (unspec_volatile:CCZ [( ... as above, BLK only ... )]
UNSPEC_AESDECENWIDEKL))

followed by a series of:

   (set (reg:V2DI 20 xmm0)
        (unspec_volatile:V2DI [(reg:V2DI 20 xmm0)] UNSPEC_AESDECENCWIDEKL))

And please see the mentioned expander and pattern how the above series
are generated and matched.

Uros.

Reply via email to