"Kewen.Lin" <li...@linux.ibm.com> writes:
> on 2020/5/27 下午6:02, Richard Sandiford wrote:
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> Hi Richard,
>>>
>>> Thanks for your comments!
>>>
>>> on 2020/5/26 锟斤拷锟斤拷8:49, Richard Sandiford wrote:
>>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>>> @@ -626,6 +645,12 @@ public:
>>>>>    /* True if have decided to use a fully-masked loop.  */
>>>>>    bool fully_masked_p;
>>>>>  
>>>>> +  /* Records whether we still have the option of using a length access 
>>>>> loop.  */
>>>>> +  bool can_with_length_p;
>>>>> +
>>>>> +  /* True if have decided to use length access for the loop fully.  */
>>>>> +  bool fully_with_length_p;
>>>>
>>>> Rather than duplicate the flags like this, I think we should have
>>>> three bits of information:
>>>>
>>>> (1) Can the loop operate on partial vectors?  Starts off optimistically
>>>>     assuming "yes", gets set to "no" when we find a counter-example.
>>>>
>>>> (2) If we do decide to use partial vectors, will we need loop masks?
>>>>
>>>> (3) If we do decide to use partial vectors, will we need lengths?
>>>>
>>>> Vectorisation using partial vectors succeeds if (1) && ((2) != (3))
>>>>
>>>> LOOP_VINFO_CAN_FULLY_MASK_P currently tracks (1) and
>>>> LOOP_VINFO_MASKS currently tracks (2).  In pathological cases it's
>>>> already possible to have (1) && !(2), see r9-6240 for an example.
>>>>
>>>> With the new support, LOOP_VINFO_LENS tracks (3).
>>>>
>>>> So I don't think we need the can_with_length_p.  What is now
>>>> LOOP_VINFO_CAN_FULLY_MASK_P can continue to track (1) for both
>>>> approaches, with the final choice of approach only being made
>>>> at the end.  Maybe it would be worth renaming it to something
>>>> more generic though, now that we have two approaches to partial
>>>> vectorisation.
>>>
>>> I like this idea!  I could be wrong, but I'm afraid that we
>>> can not have one common flag to be shared for both approaches,
>>> the check criterias could be different for both approaches, one
>>> counter example for length could be acceptable for masking, such
>>> as length can only allow CONTIGUOUS related modes, but masking
>>> can support more.  When we see acceptable VMAT_LOAD_STORE_LANES,
>>> we leave LOOP_VINFO_CAN_FULLY_MASK_P true, later should length
>>> checking turn it to false?  I guess no, assuming still true, then 
>>> LOOP_VINFO_CAN_FULLY_MASK_P will mean only partial vectorization
>>> for masking, not for both.  We can probably clean LOOP_VINFO_LENS
>>> when the length checking is false, but we just know the vec is empty,
>>> not sure we are unable to do partial vectorization with length,
>>> when we see LOOP_VINFO_CAN_FULLY_MASK_P true, we could still
>>> record length into it if possible.
>> 
>> Let's call the flag in (1) CAN_USE_PARTIAL_VECTORS_P rather than
>> CAN_FULLY_MASK_P to (try to) avoid any confusion from the current name.
>> 
>> What I meant is that each vectorizable_* routine has the responsibility
>> of finding a way of coping with partial vectorisation, or setting
>> CAN_USE_PARTIAL_VECTORS_P to false if it can't.
>> 
>> vectorizable_load chooses the VMAT first, and then decides based on that
>> whether partial vectorisation is supported.  There's no influence in
>> the other direction (partial vectorisation doesn't determine the VMAT).
>> 
>> So once it has chosen a VMAT, vectorizable_load needs to try to find a way
>> of handling the operation with partial vectorisation.  Currently the only
>> way of doing that for VMAT_LOAD_STORE_LANES is using masks.  So at the
>> moment there are two possible outcomes:
>> 
>> - The target supports the necessary IFN_MASK_LOAD_LANES function.
>>   If so, we can use partial vectorisation for the statement, so we
>>   leave CAN_USE_PARTIAL_VECTORS_P true and record the necessary masks
>>   in LOOP_VINFO_MASKS.
>> 
>> - The target doesn't support the necessary IFN_MASK_LOAD_LANES function.
>>   If so, we can't use partial vectorisation, so we clear
>>   CAN_USE_PARTIAL_VECTORS_P.
>> 
>> That's how things work at the moment.  It would work in the same way
>> for lengths if we ever supported IFN_LEN_LOAD_LANES: we'd check whether
>> IFN_LEN_LOAD_LANES is available and record the length in LOOP_VINFO_LENS
>> if so.  If partial vectorisation isn't supported (via masks or lengths),
>> we'd continue to clear CAN_USE_PARTIAL_VECTORS_P.
>> 
>> But equally, if we never add support for IFN_LEN_LOAD_LANES, the current
>> code continues to work with length-based approaches.  We'll continue to
>> clear CAN_USE_PARTIAL_VECTORS_P for VMAT_LOAD_STORE_LANES when the
>> target provides no IFN_MASK_LOAD_LANES function.
>> 
>
> Thanks a lot for your detailed explanation!  This proposal looks good
> based on the current implementation of both masking and length.  I may
> think too much, but I had a bit concern as below when some targets have
> both masking and length supports in future, such as ppc adds masking
> support like SVE.
>
> I assumed that you meant each vectorizable_* routine should record the
> objs for any available partial vectorisation approaches.  If one target
> supports both, we would have both recorded but decide not to do partial
> vectorisation finally since both have records.  The target can disable
> length like through optab to resolve it, but there is one possibility
> that the masking support can be imperfect initially since ISA support
> could be gradual, it further leads some vectorizable_* check or final
> verification to fail for masking, and length approach may work here but
> it gets disabled.  We can miss to use partial vectorisation here.
>
> The other assumption is that each vectorizable_* routine record the 
> first available partial vectorisation approach, let's assume masking
> takes preference, then it's fine to record just one here even if one
> target supports both approaches, but we still have the possiblity to
> miss the partial vectorisation chance as some check/verify fail with
> masking but fine with length.
>
> Does this concern make sense?

There's nothing to stop us using masks and lengths in the same loop
in future if we need to.  It would “just” be a case of setting up both
the masks and the lengths in vect_set_loop_condition.  But the point is
that doing that would be extra code, and there's no point writing that
extra code until it's needed.

If some future arch does support both mask-based and length-based
approaches, I think that's even less reason to make a binary choice
between them.  How we prioritise the length and mask approaches when
both are available is something that we'll have to decide at the time.

If your concern is that the arch might support masked operations
without wanting them to be used for loop control, we could test for
that case by checking whether while_ult_optab is implemented.

Thanks,
Richard

Reply via email to