https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #1)
> Alloc order is just another kind of cost which can be compensated by
> increasing cost of mask->integer and integer->mask.
> 
> With below patch , pr96814 wouldn't generate any mask intructions execept
> for 
> 
>       kmovd   %eax, %k1
>       vpcmpeqd        %ymm1, %ymm1, %ymm1
>       vmovdqu8        %ymm1, %ymm0{%k1}{z}
> 
> which is what we want.
> 
> 
> modified   gcc/config/i386/i386.md
> @@ -1335,7 +1335,7 @@
>  (define_insn "*cmp<mode>_ccz_1"
>    [(set (reg FLAGS_REG)
>       (compare (match_operand:SWI1248_AVX512BWDQ_64 0
> -                     "nonimmediate_operand" "<r>,?m<r>,$k")
> +                     "nonimmediate_operand" "<r>,?m<r>,*k")
>                (match_operand:SWI1248_AVX512BWDQ_64 1 "const0_operand")))]
>    "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
>    "@
> modified   gcc/config/i386/x86-tune-costs.h
> @@ -2768,7 +2768,7 @@ struct processor_costs intel_cost = {
>    {6, 6, 6, 6, 6},                   /* cost of storing SSE registers
>                                          in 32,64,128,256 and 512-bit */
>    4, 4,                              /* SSE->integer and integer->SSE moves 
> */
> -  4, 4,                              /* mask->integer and integer->mask 
> moves */
> +  6, 6,                              /* mask->integer and integer->mask 
> moves */
I changed intel_cost just to validate 1 unit more cost is also enough for
-mtune=intel to prevent generation of mask instructions.

Reply via email to