https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Biener from comment #1)
> It's the logic
>
> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> {
> if (svisited.contains (stmt_info))
> continue;
> stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
> if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)
> && STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info)
> /* Only the pattern root stmt computes the original scalar value. */
> continue;
> bool mark_visited = true;
> gimple *orig_stmt = orig_stmt_info->stmt;
> ssa_op_iter op_iter;
> def_operand_p def_p;
> FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> {
> imm_use_iterator use_iter;
> gimple *use_stmt;
> stmt_vec_info use_stmt_info;
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> if (!is_gimple_debug (use_stmt))
> {
> use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
> if (!use_stmt_info
> || !PURE_SLP_STMT (vect_stmt_to_vectorize
> (use_stmt_info)))
> {
> STMT_VINFO_LIVE_P (stmt_info) = true;
>
> specifically the last check. That's supposed to pick up the "main" pattern
> that's now covering the scalar stmt.
>
> But somehow the "main" pattern,
>
> patt_67 = .VEC_WIDEN_MINUS (_1, _3); // _5 = _2 - _4;
> patt_68 = (signed short) patt_67; // _5 = _2 - _4;
> patt_69 = (int) patt_68; // _5 = _2 - _4;
>
> doesn't get picked up here. I wonder what's the orig_stmt and the def
> picked and what original scalar use we end up in where the
> vect_stmt_to_vectorize isn't the "last" pattern. Maybe we really want
This problem happens at slp node:
note: node 0x425bc38 (max_nunits=8, refcnt=1) vector(8) char
note: op template: _1 = *a_50(D);
note: stmt 0 _1 = *a_50(D);
note: stmt 1 _7 = MEM[(char *)a_50(D) + 1B];
note: stmt 2 _13 = MEM[(char *)a_50(D) + 2B];
note: stmt 3 _19 = MEM[(char *)a_50(D) + 3B];
note: stmt 4 _25 = MEM[(char *)a_50(D) + 4B];
note: stmt 5 _31 = MEM[(char *)a_50(D) + 5B];
note: stmt 6 _37 = MEM[(char *)a_50(D) + 6B];
note: stmt 7 _43 = MEM[(char *)a_50(D) + 7B];
The orig_stmt is "_1 = *a_50(D)"
The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int)
patt_63", which is not referenced by any original or other pattern statements.
Or in other word, the orig_stmt could be absorbed into a vector operation,
without any outlier scalar use.
The fore-mentioned "last check" in vect_bb_slp_mark_live_stmts would make the
orig_stmt to be STMT_VINFO_LIVE_P, which actually implies it has scalar use
(though it should not have), the difference is re-generating the def somewhere,
rather than retaining the original scalar statement. And the following
"vectorizable_live_operation" would account the new operations into
vectorization cost of the SLP instance.
The function vect_bb_vectorization_profitable_p resorts to a recursive way to
identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would
change scalar cost computation. If we can avoid such fake-liveness adjustment
on the statements we are interested in, vectorization cost could beat scalar
cost, and make the former succeed.
Unvectorized:
mov x2, x0
stp x29, x30, [sp, -48]!
mov x29, sp
ldrb w3, [x1]
ldrb w4, [x1, 1]
add x0, sp, 16
ldrb w9, [x2]
ldrb w8, [x2, 1]
sub w9, w9, w3
ldrb w7, [x2, 2]
ldrb w3, [x1, 2]
sub w8, w8, w4
ldrb w6, [x2, 3]
ldrb w4, [x1, 3]
sub w7, w7, w3
ldrb w10, [x1, 5]
ldrb w3, [x1, 4]
sub w6, w6, w4
ldrb w5, [x2, 4]
ldrb w4, [x2, 5]
sub w5, w5, w3
ldrb w3, [x2, 6]
sub w4, w4, w10
ldrb w2, [x2, 7]
ldrb w10, [x1, 6]
ldrb w1, [x1, 7]
sub w3, w3, w10
stp w9, w8, [sp, 16]
sub w1, w2, w1
stp w7, w6, [sp, 24]
stp w5, w4, [sp, 32]
stp w3, w1, [sp, 40]
bl test
ldp x29, x30, [sp], 48
ret
Vectorized:
mov x2, x0
stp x29, x30, [sp, -48]!
mov x29, sp
ldr d1, [x1]
add x0, sp, 16
ldr d0, [x2]
usubl v0.8h, v0.8b, v1.8b
sxtl v1.4s, v0.4h
sxtl2 v0.4s, v0.8h
stp q1, q0, [sp, 16]
bl test
ldp x29, x30, [sp], 48
ret
> these "overlapping" patterns, but IMHO having "two entries" into
> a chain of scalar stmts is bad and we should link up the whole matched
> sequence to the final "root" instead?
>
> That said, the current code doesn't see that wherever we end up isn't
> dead code (aka fully covered by the vectorization).
>
> IMO vect_stmt_to_vectorize for each of those stmts should end up at
>
> patt_69 = (int) patt_68;