https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121947
Bug ID: 121947
Summary: Improve X86_TUNE_DEST_FALSE_DEP_FOR_GLC implementation
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
Target Milestone: ---
Most of X86_TUNE_DEST_FALSE_DEP_FOR_GLC is implemented as
(define_insn "<avx512>_<complexopname>_<mode><maskc_name><round_name>"
[(set (match_operand:VHF_AVX512VL 0 "register_operand" "=&v")
(unspec:VHF_AVX512VL
[(match_operand:VHF_AVX512VL 1 "<round_nimm_predicate>" "<int_comm>v")
(match_operand:VHF_AVX512VL 2 "<round_nimm_predicate>"
"<round_constraint>")]
UNSPEC_COMPLEX_F_C_MUL))]
"TARGET_AVX512FP16 && <round_mode512bit_condition>"
{
if (TARGET_DEST_FALSE_DEP_FOR_GLC
&& <maskc_dest_false_dep_for_glc_cond>)
output_asm_insn ("vxorps\t%x0, %x0, %x0", operands);
return "v<complexopname><ssemodesuffix>\t{<round_maskc_op3>%2, %1,
%0<maskc_operand3>|%0<maskc_operand3>, %1, %2<round_maskc_op3>}";
}
There is an extra vxorps before all instructions. They can be implemented as
split before reload and run x86_cse pass after it to remove all redundant
vxorps instructions.