Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Friday, September 20, 2024 3:48 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH]AArch64: Take into account when VF is higher than known
>> scalar iters
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> >>
>> >> So my gut instinct is that we should instead tweak the condition for
>> >> using latency costs, but I'll need to think about it more when I get
>> >> back from holiday.
>> >>
>> >
>> > I think that's a separate problem.. From first principals it should already
>> > be very wrong to compare the scalar loop to an iteration count it will
>> > *NEVER* reach.  So I don't understand why that would ever be valid.
>> 
>> But I don't think we're doing that, or at least, not as the final result.
>> Instead, we first calculate the minimum number of vector iterations for
>> which the vector loop is sometimes profitable.  If this is N, then we're
>> saying that the vector code is better than the scalar code for N*VF
>> iterations.  Like you say, this part ignores whether N*VF is actually
>> achievable.  But then:
>> 
>>        /* Now that we know the minimum number of vector iterations,
>>           find the minimum niters for which the scalar cost is larger:
>> 
>>           SIC * niters > VIC * vniters + VOC - SOC
>> 
>>           We know that the minimum niters is no more than
>>           vniters * VF + NPEEL, but it might be (and often is) less
>>           than that if a partial vector iteration is cheaper than the
>>           equivalent scalar code.  */
>>        int threshold = (vec_inside_cost * min_vec_niters
>>                         + vec_outside_cost
>>                         - scalar_outside_cost);
>>        if (threshold <= 0)
>>          min_profitable_iters = 1;
>>        else
>>          min_profitable_iters = threshold / scalar_single_iter_cost + 1;
>> 
>> calculates which number of iterations in the range [(N-1)*VF + 1, N*VF]
>> is the first to be profitable.  This is specifically taking partial
>> iterations into account and includes the N==1 case.  The lower niters is,
>> the easier it is for the scalar code to win.
>> 
>> This is what is printed as:
>> 
>>   Calculated minimum iters for profitability: 7
>> 
>> So we think that vectorisation should be rejected if the loop count
>> is <= 6, but accepted if it's >= 7.
>
> This 7 is the vector iteration count.
>
>   epilogue iterations: 0
>   Minimum number of vector iterations: 1
>   Calculated minimum iters for profitability: 7
> /app/example.c:4:21: note:    Runtime profitability threshold = 7
> /app/example.c:4:21: note:    Static estimate profitability threshold = 7
>
> Which says the vector code has to iterate at least 7 iteration for it to be 
> profitable.

It doesn't though:

>   Minimum number of vector iterations: 1

This is in vector iterations but:

>   Calculated minimum iters for profitability: 7

This is in scalar iterations.  (Yes, it would be nice if the dump line
was more explicit. :))

This is why, if we change the loop count to 7 rather than 9:

  for (int i = 0; i < 7; i++)

we still get:

/tmp/foo.c:4:21: note:  Cost model analysis:
  Vector inside of loop cost: 20
  Vector prologue cost: 6
  Vector epilogue cost: 0
  Scalar iteration cost: 4
  Scalar outside cost: 0
  Vector outside cost: 6
  prologue iterations: 0
  epilogue iterations: 0
  Minimum number of vector iterations: 1
  Calculated minimum iters for profitability: 7
/tmp/foo.c:4:21: note:    Runtime profitability threshold = 7
/tmp/foo.c:4:21: note:    Static estimate profitability threshold = 7
/tmp/foo.c:4:21: note:  ***** Analysis succeeded with vector mode VNx4SI

But if we change it to 6:

  for (int i = 0; i < 6; i++)

we get:

/tmp/foo.c:4:21: note:  Cost model analysis:
  Vector inside of loop cost: 20
  Vector prologue cost: 6
  Vector epilogue cost: 0
  Scalar iteration cost: 4
  Scalar outside cost: 0
  Vector outside cost: 6
  prologue iterations: 0
  epilogue iterations: 0
  Minimum number of vector iterations: 1
  Calculated minimum iters for profitability: 7
/tmp/foo.c:4:21: note:    Runtime profitability threshold = 7
/tmp/foo.c:4:21: note:    Static estimate profitability threshold = 7
/tmp/foo.c:4:21: missed:  not vectorized: vectorization not profitable.
/tmp/foo.c:4:21: note:  not vectorized: iteration count smaller than user 
specified loop bound parameter or minimum profitable iterations (whichever is 
more conservative).
/tmp/foo.c:4:21: missed:  Loop costings not worthwhile.
/tmp/foo.c:4:21: note:  ***** Analysis failed with vector mode VNx4SI

Thanks,
Richard

Reply via email to