Hi Richard,
on 2020/7/2 下午1:20, Kewen.Lin via Gcc-patches wrote:
> on 2020/7/1 下午11:17, Richard Sandiford wrote:
>> "Kewen.Lin" <[email protected]> writes:
>>> on 2020/7/1 上午3:53, Richard Sandiford wrote:
>>>> "Kewen.Lin" <[email protected]> writes:
[...]
>> Hmm, OK. But in that case can you update the names of the variables
>> to match? It's confusing to have some nscalars_* variables actually
>> count scalars (and thus have “nitems” equivalents) and other nscalars_*
>> variables count something else (and thus effectively be nitems_* variables
>> themselves).
>>
>
> OK. I'll update the names like nscalars_total/nscalars_step and equivalents
> to nitems_total/... (or nunits_total better?)
>
Please ignore this part, I have used nitems_ for the names. :)
>>>>> + /* Work out how many bits we need to represent the length limit. */
>>>>> + unsigned int nscalars_per_iter_ft = rgl->max_nscalars_per_iter *
>>>>> rgl->factor;
>>>>
>>>> I think this breaks the abstraction. There's no guarantee that the
>>>> factor is the same for each rgroup_control, so there's no guarantee
>>>> that the maximum bytes per iter comes the last entry. (Also, it'd
>>>> be better to avoid talking about bytes if we're trying to be general.)
>>>> I think we should take the maximum of each entry instead.
>>>>
>>>
>>> Agree! I guess the above "maximum bytes per iter" is a typo? and you meant
>>> "maximum elements per iter"? Yes, the code is for length in bytes, checking
>>> the last entry is only reasonable for it. Will update it to check all
>>> entries
>>> instead.
>>
>> I meant bytes, since that's what the code is effectively calculating
>> (at least for Power). I.e. I think this breaks the abstraction even
>> if we assume the Power scheme to measuring length, since in principle
>> it's possible to fix different vector sizes in the same vector region.
>>
>
> Sorry I didn't catch the meaning of "it's possible to fix different
> vector sizes in the same vector region." I guess if we are counting
> bytes, the max nunits per iteration should come from the last entry
> since the last one holds max bytes which is the result of
> max_nscalar_per_iter * factor. But I agree that it breaks abstraction
> here since it's not applied to length in lanes.
>
By further thought, I guessed you meant we can have different vector
sizes for the same loop in future? Yes, the assumption doesn't hold then.
>
>>>>> + /* Decide whether to use fully-masked approach. */
>>>>> + if (vect_verify_full_masking (loop_vinfo))
>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
>>>>> + /* Decide whether to use length-based approach. */
>>>>> + else if (vect_verify_loop_lens (loop_vinfo))
>>>>> + {
>>>>> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>>>> + || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo))
>>>>> + {
>>>>> + if (dump_enabled_p ())
>>>>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>>>> + "can't vectorize this loop with length-based"
>>>>> + " partial vectors approach becuase peeling"
>>>>> + " for alignment or gaps is required.\n");
>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
>>>>> + }
>>>>
>>>> Why are these peeling cases necessary? Peeling for gaps should
>>>> just mean subtracting one scalar iteration from the iteration count
>>>> and shouldn't otherwise affect the main loop. Similarly, peeling for
>>>> alignment can be handled in the normal way, with a scalar prologue loop.
>>>>
>>>
>>> I was thinking to relax this later and to avoid to handle too many cases
>>> in the first enablement patch. Since Power hw whose level is able to
>>> support
>>> vector with length, it supports unaligned load/store, need to construct
>>> some cases for them. May I postpone it a bit? Or you prefer me to support
>>> it here?
>>
>> I've no objection to postponing it if there are specific known
>> problems that make it difficult, but I think we should at least
>> say what they are. On the face of it, I'm not sure why it doesn't
>> Just Work, since the way that we control the main loop should be
>> mostly orthogonal to how we handle peeled prologue iterations
>> and how we handle a single peeled epilogue iteration.
>>
>
> OK, I will remove it to see the impact. By the way, do you think to
> use partial vectors for prologue is something worth to trying in future?
>
I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS
part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to
fail at execution during vect-partial-vector-usage=2. So far the patch
doesn't handle any niters_skip cases. I think if we want to support it,
we have to add some handlings in/like what we have for masking, such as:
mask_skip_niters, vect_prepare_for_masked_peels etc.
Do you prefer me to extend the support in this patch series?
>>> Sorry I might miss something, but all undetermined lengths are generated
>>> here,
>>> the other places you meant is doc or elsewhere?
>>
>> For example, we'd need to start querying the length operand of the optabs
>> to see what length precision the target uses, since it would be invalid
>> to do this optimisation for IVs that are wider than that precision.
>> The routine above doesn't seem the right place to do that.
>>
>
> OK, but it seems it's acceptable if the IV wider than the precision since
> we allows it out of range?
>
Please ignore this question, I agree that we have to avoid that case. Sorry
that
I was misunderstanding it before. :)
BR,
Kewen