https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #1) > Alloc order is just another kind of cost which can be compensated by > increasing cost of mask->integer and integer->mask. > > With below patch , pr96814 wouldn't generate any mask intructions execept > for > > kmovd %eax, %k1 > vpcmpeqd %ymm1, %ymm1, %ymm1 > vmovdqu8 %ymm1, %ymm0{%k1}{z} > > which is what we want. > > > modified gcc/config/i386/i386.md > @@ -1335,7 +1335,7 @@ > (define_insn "*cmp<mode>_ccz_1" > [(set (reg FLAGS_REG) > (compare (match_operand:SWI1248_AVX512BWDQ_64 0 > - "nonimmediate_operand" "<r>,?m<r>,$k") > + "nonimmediate_operand" "<r>,?m<r>,*k") > (match_operand:SWI1248_AVX512BWDQ_64 1 "const0_operand")))] > "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)" > "@ > modified gcc/config/i386/x86-tune-costs.h > @@ -2768,7 +2768,7 @@ struct processor_costs intel_cost = { > {6, 6, 6, 6, 6}, /* cost of storing SSE registers > in 32,64,128,256 and 512-bit */ > 4, 4, /* SSE->integer and integer->SSE moves > */ > - 4, 4, /* mask->integer and integer->mask > moves */ > + 6, 6, /* mask->integer and integer->mask > moves */ I changed intel_cost just to validate 1 unit more cost is also enough for -mtune=intel to prevent generation of mask instructions.