Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

Richard Earnshaw (lists) Wed, 16 Dec 2015 03:25:43 -0800

On 15/12/15 23:34, Evandro Menezes wrote:
> On 12/14/2015 05:26 AM, James Greenhalgh wrote:
>> On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:
>>> On 11/20/2015 05:53 AM, James Greenhalgh wrote:
>>>> On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
>>>>> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
>>>>>> 2015-11-05  Evandro Menezes <e.mene...@samsung.com>
>>>>>>
>>>>>>    gcc/
>>>>>>
>>>>>>        * config/aarch64/aarch64.c
>>>>>> (aarch64_override_options_internal):
>>>>>>        Increase loop peeling limit.
>>>>>>
>>>>>> This patch increases the limit for the number of peeled insns.
>>>>>> With this change, I noticed no major regression in either
>>>>>> Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
>>>>>> ones, improved significantly.
>>>>>>
>>>>>> I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
>>>>>> benefit from this tuning too.  However, I'd appreciate comments
>>>>> >from other stakeholders.
>>>>>
>>>>> Ping.
>>>> I'd like to leave this for a call from the port maintainers. I can
>>>> see why
>>>> this leads to more opportunities for vectorization, but I'm
>>>> concerned about
>>>> the wider impact on code size. Certainly I wouldn't expect this to
>>>> be our
>>>> default at -O2 and below.
>>>>
>>>> My gut feeling is that this doesn't really belong in the back-end
>>>> (there are
>>>> presumably good reasons why the default for this parameter across
>>>> GCC has
>>>> fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
>>>> like Marcus or Richard to make the call as to whether or not we take
>>>> this
>>>> patch.
>>> Please, correct me if I'm wrong, but loop peeling is enabled only
>>> with loop unrolling (and with PGO).  If so, then extra code size is
>>> not a concern, for this heuristic is only active when unrolling
>>> loops, when code size is already of secondary importance.
>> My understanding was that loop peeling is enabled from -O2 upwards, and
>> is also used to partially peel unaligned loops for vectorization
>> (allowing
>> the vector code to be well aligned), or to completely peel inner loops
>> which
>> may then become amenable to SLP vectorization.
>>
>> If I'm wrong then I take back these objections. But I was sure this
>> parameter was used in a number of situations outside of just
>> -funroll-loops/-funroll-all-loops . Certainly I remember seeing
>> performance
>> sensitivities to this parameter at -O3 in some internal workloads I was
>> analysing.
> 
> Vectorization, including SLP, is only enabled at -O3, isn't it?  It
> seems to me that peeling is only used by optimizations which already
> lead to potential increase in code size.
> 
> For instance, with "-Ofast -funroll-all-loops", the total text size for
> the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
> without it; with just "-O2", it is the same at 23.1MB regardless of this
> setting.
> 
> So it seems to me that this proposal should be neutral for up to -O2.
> 
> Thank you,
>


My preference would be to not diverge from the global parameter
settings.  I haven't looked in detail at this parameter but it seems to
me there are two possible paths:

1) We could get agreement globally that the parameter should be increased.
2) We could agree that this specific use of the parameter is distinct
from some other uses and deserves a new param in its own right with a
higher value.

R.

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

Reply via email to