https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98211
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Huh. OK, so we do some pointless vectorization (the store is in a BB
ending in __builtin_unreachable()) but the actual issue must be the
live lane extraction into the not vectorized scalar code:
vect_patt_95.26_71 = .VCOND_MASK (mask_patt_91.25_47, _59, _67);
_79 = BIT_FIELD_REF <vect_patt_95.26_71, 16, 0>;
hmm, somehow the VCOND_MASK condition unpacking ends up with
v16_int8 = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0}
Looks like this is because we have
_17 = var_12_18(D) != 0;
_11 = test_var_3.4_1 != 0;
_26 = _11 | _17;
_99 = VIEW_CONVERT_EXPR<unsigned char>(_26);
_35 = {_99, _99, _99, _99, _99, _99, _99, _99, _99, _99, _99, _99, _99, _99,
_99, _99};
mask_patt_87.23_39 = VIEW_CONVERT_EXPR<vector(16) <signed-boolean:8>>(_35);
mask_patt_91.25_47 = [vec_unpack_lo_expr] mask_patt_87.23_39;
but that doesn't produce the canonical -1 values which means the bool
pattern is somehow broke. We do code-generate
mask_patt_87.23_39 = VIEW_CONVERT_EXPR<vector(16) <signed-boolean:8>>(_35);
mask_patt_84.24_43 = mask_patt_87.23_39 ^ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0 };
mask_patt_91.25_47 = [vec_unpack_lo_expr] mask_patt_84.24_43;
from the
(gdb) p debug (slp_node)
x.c:37:6: note: node 0x3f7e078 (max_nunits=16, refcnt=1)
x.c:37:6: note: op template: patt_84 = patt_87 != 0;
x.c:37:6: note: stmt 0 patt_84 = patt_87 != 0;
x.c:37:6: note: stmt 1 patt_84 = patt_87 != 0;
...
node by choosing BIT_XOR. patt_87 has boolean vector type, but that
is just a cast:
x.c:41:31: note: op template: patt_87 = (<signed-boolean:8>) _26;
x.c:41:31: note: stmt 0 patt_87 = (<signed-boolean:8>) _26;
x.c:41:31: note: stmt 1 patt_87 = (<signed-boolean:8>) _26;
x.c:41:31: note: stmt 2 patt_87 = (<signed-boolean:8>) _26;
x.c:41:31: note: stmt 3 patt_87 = (<signed-boolean:8>) _26;
which I guess is what is wrong, built via
#0 0x0000000002a217c0 in build_mask_conversion (vinfo=0x3e3e510,
mask=<ssa_name 0x7ffff69a3d80 26>,
vectype=<vector_type 0x7ffff699ad20>, stmt_vinfo=0x3e64490)
at /home/rguenther/src/gcc2/gcc/tree-vect-patterns.c:4230
#1 0x0000000002a22497 in vect_recog_mask_conversion_pattern (vinfo=0x3e3e510,
stmt_vinfo=0x3e64490,
type_out=0x7fffffffd060) at
/home/rguenther/src/gcc2/gcc/tree-vect-patterns.c:4457
#2 0x0000000002a25505 in vect_pattern_recog_1 (vinfo=0x3e3e510,
recog_func=0x3b21050 <vect_vect_recog_func_ptrs+272>, stmt_info=0x3e64490)
at /home/rguenther/src/gcc2/gcc/tree-vect-patterns.c:5450
#3 0x0000000002a25a1f in vect_pattern_recog (vinfo=0x3e3e510)
at /home/rguenther/src/gcc2/gcc/tree-vect-patterns.c:5608
which is
/* If rhs1 is a comparison we need to move it into a
separate statement. */
if (TREE_CODE (rhs1) != SSA_NAME)
{
tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
if (rhs1_op0_type
&& TYPE_PRECISION (rhs1_op0_type) != TYPE_PRECISION (rhs1_type))
rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
vectype2, stmt_vinfo);
if (rhs1_op1_type
&& TYPE_PRECISION (rhs1_op1_type) != TYPE_PRECISION (rhs1_type))
rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
vectype2, stmt_vinfo);
pattern_stmt = gimple_build_assign (tmp, TREE_CODE (rhs1),
rhs1_op0, rhs1_op1);
rhs1 = tmp;
append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, vectype2,
rhs1_type);
}
x.c:41:31: note: === vect_determine_precisions ===
x.c:41:31: note: using normal nonmask vectors for _17 = var_12_18(D) != 0;
x.c:41:31: note: using boolean precision 32 for _11 = test_var_3.4_1 != 0;
x.c:41:31: note: using boolean precision 32 for _26 = _11 | _17;
...
x.c:41:31: note: === vect_pattern_recog ===
x.c:41:31: note: vect_recog_mask_conversion_pattern: detected: iftmp.2_10 =
_26 != 0 ? iftmp.2_22 : iftmp.2_21;
x.c:41:31: note: mask_conversion pattern recognized: patt_95 = patt_91 ?
iftmp.2_22 : iftmp.2_21;
x.c:41:31: note: extra pattern stmt: patt_87 = (<signed-boolean:8>) _26;
x.c:41:31: note: extra pattern stmt: patt_84 = patt_87 != 0;
x.c:41:31: note: extra pattern stmt: patt_91 = (<signed-boolean:16>) patt_84;
note _26 is not part of the SLP but is splat from the scalar def.
As SLP improvement it's to say we should have splat iftmp.2_10 itself but
the change probably disabled that.
Still the above is a latent issue - I'll try to craft a more meaningful
testcase.