On 08/26/2013 09:13 AM, Kirill Yukhin wrote:
> +(define_split
> +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> +     (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> +                      (match_operand:SWI12 2 "mask_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F && reload_completed"
> +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> +     (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> +                      (match_operand:SWI12 2 "mask_reg_operand")))])

You must you match_dup on the new pattern half of define_split.
This pattern must never have triggered during your tests, since
it should have resulted in garbage rtl, and an ICE.

> +(define_insn "kandn<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
> +     (and:SWI12
> +       (not:SWI12
> +         (match_operand:SWI12 1 "register_operand" "0,Yk"))
> +       (match_operand:SWI12 2 "register_operand" "r,Yk")))]
> +  "TARGET_AVX512F"
> +  "@
> +   #
> +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "*,msklog")
> +   (set_attr "prefix" "*,vex")
> +   (set_attr "mode" "<MODE>")])

What happened to the bmi andn alternative we discussed?

>  (define_insn "*andqi_1"
> -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
> -     (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
> -             (match_operand:QI 2 "general_operand" "qn,qmn,rn")))
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
> +     (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
> +             (match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (AND, QImode, operands)"
>    "@
>     and{b}\t{%2, %0|%0, %2}
>     and{b}\t{%2, %0|%0, %2}
> -   and{l}\t{%k2, %k0|%k0, %k2}"
> -  [(set_attr "type" "alu")
> -   (set_attr "mode" "QI,QI,SI")])
> +   and{l}\t{%k2, %k0|%k0, %k2}
> +   #"
> +  [(set_attr "type" "alu,alu,alu,msklog")
> +   (set_attr "mode" "QI,QI,SI,HI")])

Why force the split?  You can write the kand here...

> +(define_insn "<code>hi_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
> +     (any_or:HI
> +      (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
> +      (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
> +  "@
> +  <logic>{w}\t{%2, %0|%0, %2}
> +  <logic>{w}\t{%2, %0|%0, %2}
> +  #"
> +  [(set_attr "type" "alu,alu,msklog")
> +   (set_attr "mode" "HI")])

Likewise.

The point being that with optimization enabled, we will have run the split and
gotten all of the performance benefit of eliding the clobber.  But with
optimization disabled, we don't need the split for correctness.

> +(define_insn "kunpckhi"
> +  [(set (match_operand:HI 0 "register_operand" "=Yk")
> +     (ior:HI
> +       (ashift:HI
> +         (match_operand:HI 1 "register_operand" "Yk")
> +         (const_int 8))
> +       (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" 
> "Yk") 0))))]
> +  "TARGET_AVX512F"
> +  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mode" "HI")
> +   (set_attr "type" "msklog")
> +   (set_attr "prefix" "vex")])

Don't write the subreg explicitly.  Instead, use a match_operand:QI, which will
match the whole (subreg (reg)) expression, and also something that the combiner
could simplify out of that.

> +(define_insn "*one_cmplhi2_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
> +     (not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
> +  "ix86_unary_operator_ok (NOT, HImode, operands)"
...
>  (define_insn "*one_cmplqi2_1"
> -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
> -     (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
> +     (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]

Forgotten ! for Yk alternatives.


> +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
...
> +;; Do not split instructions with mask registers.
>  (define_split
...
> +   && (! ANY_MASK_REG_P (operands[0])
> +     || ! ANY_MASK_REG_P (operands[1])
> +     || ! ANY_MASK_REG_P (operands[2]))"

This ugliness is why I suggested adding a general_reg_operand in our last
conversation.


r~

Reply via email to