https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
Bug ID: 104479 Summary: [12 Regression] cond_op is combined without considering single_use Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-*-* i?86-*-* cat test.c void mc_weight (unsigned int* __restrict dst, unsigned int* __restrict src, int i_width,int i_scale, unsigned int* __restrict y) { for(int x = 0; x < i_width; x++) dst[x] = src[x] >> 3 > 255 ? src[x] >> 3 : y[x]; } gcc -march=icelake-server -O3 gcc11.2 vpsrld ymm0, YMMWORD PTR [rsi+rax], 3 vpcmpud k1, ymm0, ymm2, 2 vmovdqu32 ymm1{k1}, YMMWORD PTR [r8+rax] vpcmpud k1, ymm0, ymm2, 6 vpblendmd ymm0{k1}, ymm1, ymm0 vmovdqu YMMWORD PTR [rcx+rax], ymm0 gcc 12 vmovdqu ymm1, YMMWORD PTR [rsi+rax] vpsrld ymm2, ymm1, 3 vpcmpud k1, ymm2, ymm3, 2 vmovdqu32 ymm0{k1}, YMMWORD PTR [r8+rax] vpcmpud k1, ymm2, ymm3, 6 vmovdqa ymm2, ymm0 vpsrld ymm2{k1}, ymm1, 3 vmovdqu YMMWORD PTR [rcx+rax], ymm2 It's because in match.pd ---------------cut---------------- (for uncond_op (UNCOND_BINARY) cond_op (COND_BINARY) (simplify (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3) (with { tree op_type = TREE_TYPE (@4); } (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type) && is_truth_type_for (op_type, TREE_TYPE (@0))) (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3)))))) (simplify (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3))) (with { tree op_type = TREE_TYPE (@4); } (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type) && is_truth_type_for (op_type, TREE_TYPE (@0))) (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1))))))) ---------------end------------------- uncond_op + vec_cond is combined to cond_op w/o considering uncond_op result could be used by others, which caused unoptimal codegen.