Richard Biener <[email protected]> writes:
> On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> <[email protected]> wrote:
>>
>> Richard Biener <[email protected]> writes:
>> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>> > <[email protected]> wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that
>> >> > we're not papering over an issue elsewhere.
>> >>
>> >> Yes, I also wonder if this is an issue in vectorizable_reduction. Below
>> >> is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>> >>
>> >> <bb 3>:
>> >> # res_18 = PHI <res_15(7), 0(6)>
>> >> # i_20 = PHI <i_16(7), 0(6)>
>> >> _1 = (long unsigned int) i_20;
>> >> _2 = _1 * 2;
>> >> _3 = x_14(D) + _2;
>> >> _4 = *_3;
>> >> _5 = (unsigned short) _4;
>> >> res.0_6 = (unsigned short) res_18;
>> >> _7 = _5 + res.0_6; <-- The current stmt_info
>> >> res_15 = (short int) _7;
>> >> i_16 = i_20 + 1;
>> >> if (n_11(D) > i_16)
>> >> goto <bb 7>;
>> >> else
>> >> goto <bb 4>;
>> >>
>> >> <bb 7>:
>> >> goto <bb 3>;
>> >>
>> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI
>> >> <res_15(7), 0(6)>"?
>> >> The status here is:
>> >> STMT_VINFO_REDUC_IDX (stmt_info): 1
>> >> STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>> >> STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>> >
>> > Not all stmts in the SSA cycle forming the reduction have
>> > STMT_VINFO_REDUC_DEF set,
>> > only the last (latch def) and live stmts have at the moment.
>>
>> Ah, thanks. In that case, Hao, I think we can avoid the ICE by changing:
>>
>> if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>> && vect_is_reduction (stmt_info))
>>
>> to:
>>
>> if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>> && STMT_VINFO_LIVE_P (stmt_info)
>> && vect_is_reduction (stmt_info))
>>
>> instead of using a null check.
>
> But as seen you will miss stmts that are part of the reduction then?
Yeah, but the code is doing a maximum of all the reductions in the loop:
/* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
that's not yet the case. */
ops->reduction_latency = MAX (ops->reduction_latency, base * count);
So as it stands, we only need to see each reduction (as opposed to each
reduction statement) once.
But we do want to know the length of each reduction...
> In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> as well. See vectorizable_reduction in the
>
> while (reduc_def != PHI_RESULT (reduc_def_phi))
>
> loop.
Nod. That's where I'd got the STMT_VINFO_LIVE_P thing from.
>> I see that vectorizable_reduction calculates a reduc_chain_length.
>> Would it be OK to store that in the stmt_vec_info? I suppose the
>> AArch64 code should be multiplying by that as well. (It would be a
>> separate patch from this one though.)
>
> I don't think that's too relevant here (it also counts noop conversions).
Bah. I'm loath to copy that loop and just pick out the relevant
statements though.
I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
then take the maximum of those total latencies. It sounds pretty
complex though...
Thanks,
Richard