Hi, Richard. For case 2, I come up with this idea: + Case 2 (SLP multiple rgroup): + ... + _38 = (unsigned long) n_12(D); + _39 = _38 * 2; + _40 = MAX_EXPR <_39, 16>; + _41 = _40 - 16; + ... + # ivtmp_42 = PHI <ivtmp_43(4), _41(3)> + # ivtmp_45 = PHI <ivtmp_46(4), _39(3)> + ... + _44 = MIN_EXPR <ivtmp_42, 32>; + _47 = MIN_EXPR <ivtmp_45, 32>;+ _47_2 = MIN_EXPR <_47, 16>; -------->add+ _47_3 = _47 - _47_2 ; --------> add + ... + .LEN_STORE (_6, 8B, _47_2, ...); + ... + .LEN_STORE (_25, 8B, _47_3, ...); + _33 = _47_2 / 2; + ... + .LEN_STORE (_8, 16B, _33, ...); + _36 = _47_3 / 2; + ... + .LEN_STORE (_15, 16B, _36, ...); + ivtmp_46 = ivtmp_45 - _47; + ivtmp_43 = ivtmp_42 - _44; + ... + if (ivtmp_46 != 0) + goto <bb 4>; [83.33%] + else + goto <bb 5>; [16.67%] Is it reasonable ? Or you do have better idea for it?
Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-05-16 14:57 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>> The examples are good, but this one made me wonder: why is the >>> adjustment made to the limit (namely 16, the gap between _39 and _41) >>> different from the limits imposed by the MIN_EXPR (32)? And I think >>> the answer is that: > >>> - _47 counts the number of elements processed by the loop in total, >>> including the vectors under the control of _44 > >>> - _44 counts the number of elements controlled by _47 in the next >>> iteration of the vector loop (if there is one) > >>> And that's needed to allow the IVs to be updated independently. > >>> The difficulty with this is that the len_load* and len_store* >>> optabs currently say that the behaviour is undefined if the >>> length argument is greater than the length of a vector. >>> So I think using these values of _47 and _44 in the .LEN_STOREs >>> is relying on undefined behaviour. > >>> Haven't had time to think about the consequences of that yet, >>> but wanted to send something out sooner rather than later. > > Hi, Richard. I totally understand your concern now. I think the undefine > behavior is more > appropriate for RVV since we have vsetvli instruction that gurantee this will > cause potential > issues. However, for some other target, we may need to use additional > MIN_EXPR to guard > the length never over VF. I think it can be addressed in the future when it > is needed. But we can't generate (vector) gimple that has undefined behaviour from (scalar) gimple that had defined behaviour. So something needs to change. Either we need to generate a different sequence, or we need to define what the behaviour of len_load/store/etc. are when the length is out of range (perhaps under a target hook?). We also need to be consistent. If case 2 is allowed to use length parameters that are greater than the vector length, then there's no reason for case 1 to use the result of the MIN_EXPR as the length parameter. It could just use the loop IV directly. (I realise the select_vl patch will change case 1 for RVV anyway. But the principle still holds.) What does the riscv backend's implementation of the len_load and len_store guarantee? Is any length greater than the vector length capped to the vector length? Or is it more complicated than that? Thanks, Richard