Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

Richard Sandiford via Gcc-patches Wed, 26 Jul 2023 03:12:39 -0700

Richard Biener <[email protected]> writes:
> On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> <[email protected]> wrote:
>>
>> Richard Biener <[email protected]> writes:
>> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>> > <[email protected]> wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that 
>> >> > we're not papering over an issue elsewhere.
>> >>
>> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below 
>> >> is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>> >>
>> >>   <bb 3>:
>> >>   # res_18 = PHI <res_15(7), 0(6)>
>> >>   # i_20 = PHI <i_16(7), 0(6)>
>> >>   _1 = (long unsigned int) i_20;
>> >>   _2 = _1 * 2;
>> >>   _3 = x_14(D) + _2;
>> >>   _4 = *_3;
>> >>   _5 = (unsigned short) _4;
>> >>   res.0_6 = (unsigned short) res_18;
>> >>   _7 = _5 + res.0_6;                             <-- The current stmt_info
>> >>   res_15 = (short int) _7;
>> >>   i_16 = i_20 + 1;
>> >>   if (n_11(D) > i_16)
>> >>     goto <bb 7>;
>> >>   else
>> >>     goto <bb 4>;
>> >>
>> >>   <bb 7>:
>> >>   goto <bb 3>;
>> >>
>> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI 
>> >> <res_15(7), 0(6)>"?
>> >> The status here is:
>> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>> >
>> > Not all stmts in the SSA cycle forming the reduction have
>> > STMT_VINFO_REDUC_DEF set,
>> > only the last (latch def) and live stmts have at the moment.
>>
>> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>       && vect_is_reduction (stmt_info))
>>
>> to:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>       && STMT_VINFO_LIVE_P (stmt_info)
>>       && vect_is_reduction (stmt_info))
>>
>> instead of using a null check.
>
> But as seen you will miss stmts that are part of the reduction then?


Yeah, but the code is doing a maximum of all the reductions in the loop:

      /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
         that's not yet the case.  */
      ops->reduction_latency = MAX (ops->reduction_latency, base * count);

So as it stands, we only need to see each reduction (as opposed to each
reduction statement) once.

But we do want to know the length of each reduction...

> In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> as well.  See vectorizable_reduction in the
>
>   while (reduc_def != PHI_RESULT (reduc_def_phi))
>
> loop.

Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.

>> I see that vectorizable_reduction calculates a reduc_chain_length.
>> Would it be OK to store that in the stmt_vec_info?  I suppose the
>> AArch64 code should be multiplying by that as well.  (It would be a
>> separate patch from this one though.)
>
> I don't think that's too relevant here (it also counts noop conversions).

Bah.  I'm loath to copy that loop and just pick out the relevant
statements though.

I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
then take the maximum of those total latencies.  It sounds pretty
complex though...

Thanks,
Richard

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

Reply via email to