https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
Bug ID: 104479
Summary: [12 Regression] cond_op is combined without
considering single_use
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*
cat test.c
void
mc_weight (unsigned int* __restrict dst, unsigned int* __restrict src,
int i_width,int i_scale, unsigned int* __restrict y)
{
for(int x = 0; x < i_width; x++)
dst[x] = src[x] >> 3 > 255 ? src[x] >> 3 : y[x];
}
gcc -march=icelake-server -O3
gcc11.2
vpsrld ymm0, YMMWORD PTR [rsi+rax], 3
vpcmpud k1, ymm0, ymm2, 2
vmovdqu32 ymm1{k1}, YMMWORD PTR [r8+rax]
vpcmpud k1, ymm0, ymm2, 6
vpblendmd ymm0{k1}, ymm1, ymm0
vmovdqu YMMWORD PTR [rcx+rax], ymm0
gcc 12
vmovdqu ymm1, YMMWORD PTR [rsi+rax]
vpsrld ymm2, ymm1, 3
vpcmpud k1, ymm2, ymm3, 2
vmovdqu32 ymm0{k1}, YMMWORD PTR [r8+rax]
vpcmpud k1, ymm2, ymm3, 6
vmovdqa ymm2, ymm0
vpsrld ymm2{k1}, ymm1, 3
vmovdqu YMMWORD PTR [rcx+rax], ymm2
It's because in match.pd
---------------cut----------------
(for uncond_op (UNCOND_BINARY)
cond_op (COND_BINARY)
(simplify
(vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
(with { tree op_type = TREE_TYPE (@4); }
(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))))))
(simplify
(vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
(with { tree op_type = TREE_TYPE (@4); }
(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
---------------end-------------------
uncond_op + vec_cond is combined to cond_op w/o considering uncond_op result
could be used by others, which caused unoptimal codegen.