Hi, Richard.
For case 2, I come up with this idea:

+            Case 2 (SLP multiple rgroup):
+               ...
+               _38 = (unsigned long) n_12(D);
+               _39 = _38 * 2;
+               _40 = MAX_EXPR <_39, 16>;
+               _41 = _40 - 16;
+               ...
+               # ivtmp_42 = PHI <ivtmp_43(4), _41(3)>
+               # ivtmp_45 = PHI <ivtmp_46(4), _39(3)>
+               ...
+               _44 = MIN_EXPR <ivtmp_42, 32>;
+               _47 = MIN_EXPR <ivtmp_45, 32>;+               _47_2 = MIN_EXPR 
<_47, 16>;  -------->add+               _47_3 = _47 - _47_2 ; --------> add
+               ...
+               .LEN_STORE (_6, 8B, _47_2, ...);
+               ...
+               .LEN_STORE (_25, 8B, _47_3, ...);
+               _33 = _47_2 / 2;
+               ...
+               .LEN_STORE (_8, 16B, _33, ...);
+               _36 = _47_3 / 2;
+               ...
+               .LEN_STORE (_15, 16B, _36, ...);
+               ivtmp_46 = ivtmp_45 - _47;
+               ivtmp_43 = ivtmp_42 - _44;
+               ...
+               if (ivtmp_46 != 0)
+                 goto <bb 4>; [83.33%]
+               else
+                 goto <bb 5>; [16.67%]
Is it reasonable ? Or you do have better idea for it?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 14:57
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
>>> The examples are good, but this one made me wonder: why is the
>>> adjustment made to the limit (namely 16, the gap between _39 and _41)
>>> different from the limits imposed by the MIN_EXPR (32)?  And I think
>>> the answer is that:
>
>>> - _47 counts the number of elements processed by the loop in total,
>>>   including the vectors under the control of _44
>
>>> - _44 counts the number of elements controlled by _47 in the next
>>>   iteration of the vector loop (if there is one)
>
>>> And that's needed to allow the IVs to be updated independently.
>
>>> The difficulty with this is that the len_load* and len_store*
>>> optabs currently say that the behaviour is undefined if the
>>> length argument is greater than the length of a vector.
>>> So I think using these values of _47 and _44 in the .LEN_STOREs
>>> is relying on undefined behaviour.
>
>>> Haven't had time to think about the consequences of that yet,
>>> but wanted to send something out sooner rather than later.
>
> Hi, Richard. I totally understand your concern now. I think the undefine 
> behavior is more
> appropriate for RVV since we have vsetvli instruction that gurantee this will 
> cause potential
> issues. However, for some other target, we may need to use additional 
> MIN_EXPR to guard
> the length never over VF. I think it can be addressed in the future when it 
> is needed.
 
But we can't generate (vector) gimple that has undefined behaviour from
(scalar) gimple that had defined behaviour.  So something needs to change.
Either we need to generate a different sequence, or we need to define
what the behaviour of len_load/store/etc. are when the length is out of
range (perhaps under a target hook?).
 
We also need to be consistent.  If case 2 is allowed to use length
parameters that are greater than the vector length, then there's no
reason for case 1 to use the result of the MIN_EXPR as the length
parameter.  It could just use the loop IV directly.  (I realise the
select_vl patch will change case 1 for RVV anyway.  But the principle
still holds.)
 
What does the riscv backend's implementation of the len_load and
len_store guarantee?  Is any length greater than the vector length
capped to the vector length?  Or is it more complicated than that?
 
Thanks,
Richard
 

Reply via email to