Re: [RFC] S/390: Alignment peeling prolog generation

Richard Biener Tue, 09 May 2017 03:21:54 -0700

On Mon, May 8, 2017 at 6:11 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote:
>> So the new part is the last point?  There's a lot of refactoring in
> 3/3 that
>> makes it hard to see what is actually changed ...  you need to resist
>> in doing this, it makes review very hard.
>
> The new part is actually spread across the three last "-"s.  Attached is
> a new version of [3/3] split up into two patches with hopefully less
> blending of refactoring and new functionality.
>
> [3/4] Computes full costs when peeling for unknown alignment, uses
> either read or write and compares the better one with the peeling costs
> for known alignment.  If the peeling for unknown alignment "aligns" more
> than twice the number of datarefs, it is preferred over the peeling for
> known alignment.
>
> [4/4] Computes the costs for no peeling and compares them with the costs
> of the best peeling so far.  If it is not more expensive, no peeling
> will be performed.
>
>> I think it's always best to align a ref with known alignment as that
> simplifies
>> conditions and allows followup optimizations (unrolling of the
>> prologue / epilogue).
>> I think for this it's better to also compute full costs rather than
> relying on
>> sth as simple as "number of same aligned refs".
>>
>> Does the code ever end up misaligning a previously known aligned ref?
>
> The following case used to get aligned via the known alignment of dd but
> would not anymore since peeling for unknown alignment aligns two
> accesses.  I guess the determining factor is still up for scrutiny and
> should probably > 2.  Still, on e.g. s390x no peeling is performed due
> to costs.


Ok, in principle this makes sense if we manage to correctly compute
the costs.  What exactly is profitable or not is of course subject to
the target costs.

Richard.

> void foo(int *restrict a, int *restrict b, int *restrict c, int
> *restrict d, unsigned int n)
> {
>   int *restrict dd = __builtin_assume_aligned (d, 8);
>   for (unsigned int i = 0; i < n; i++)
>     {
>       b[i] = b[i] + a[i];
>       c[i] = c[i] + b[i];
>       dd[i] = a[i];
>     }
> }
>
> Regards
>  Robin
>

Re: [RFC] S/390: Alignment peeling prolog generation

Reply via email to