Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

Richard Biener Wed, 16 Dec 2015 04:42:58 -0800

On Wed, Dec 16, 2015 at 12:24 PM, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
> On 15/12/15 23:34, Evandro Menezes wrote:
>> On 12/14/2015 05:26 AM, James Greenhalgh wrote:
>>> On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:
>>>> On 11/20/2015 05:53 AM, James Greenhalgh wrote:
>>>>> On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
>>>>>> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
>>>>>>> 2015-11-05  Evandro Menezes <e.mene...@samsung.com>
>>>>>>>
>>>>>>>    gcc/
>>>>>>>
>>>>>>>        * config/aarch64/aarch64.c
>>>>>>> (aarch64_override_options_internal):
>>>>>>>        Increase loop peeling limit.
>>>>>>>
>>>>>>> This patch increases the limit for the number of peeled insns.
>>>>>>> With this change, I noticed no major regression in either
>>>>>>> Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
>>>>>>> ones, improved significantly.
>>>>>>>
>>>>>>> I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
>>>>>>> benefit from this tuning too.  However, I'd appreciate comments
>>>>>> >from other stakeholders.
>>>>>>
>>>>>> Ping.
>>>>> I'd like to leave this for a call from the port maintainers. I can
>>>>> see why
>>>>> this leads to more opportunities for vectorization, but I'm
>>>>> concerned about
>>>>> the wider impact on code size. Certainly I wouldn't expect this to
>>>>> be our
>>>>> default at -O2 and below.
>>>>>
>>>>> My gut feeling is that this doesn't really belong in the back-end
>>>>> (there are
>>>>> presumably good reasons why the default for this parameter across
>>>>> GCC has
>>>>> fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
>>>>> like Marcus or Richard to make the call as to whether or not we take
>>>>> this
>>>>> patch.
>>>> Please, correct me if I'm wrong, but loop peeling is enabled only
>>>> with loop unrolling (and with PGO).  If so, then extra code size is
>>>> not a concern, for this heuristic is only active when unrolling
>>>> loops, when code size is already of secondary importance.
>>> My understanding was that loop peeling is enabled from -O2 upwards, and
>>> is also used to partially peel unaligned loops for vectorization
>>> (allowing
>>> the vector code to be well aligned), or to completely peel inner loops
>>> which
>>> may then become amenable to SLP vectorization.
>>>
>>> If I'm wrong then I take back these objections. But I was sure this
>>> parameter was used in a number of situations outside of just
>>> -funroll-loops/-funroll-all-loops . Certainly I remember seeing
>>> performance
>>> sensitivities to this parameter at -O3 in some internal workloads I was
>>> analysing.
>>
>> Vectorization, including SLP, is only enabled at -O3, isn't it?  It
>> seems to me that peeling is only used by optimizations which already
>> lead to potential increase in code size.
>>
>> For instance, with "-Ofast -funroll-all-loops", the total text size for
>> the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
>> without it; with just "-O2", it is the same at 23.1MB regardless of this
>> setting.
>>
>> So it seems to me that this proposal should be neutral for up to -O2.
>>
>> Thank you,
>>
>
> My preference would be to not diverge from the global parameter
> settings.  I haven't looked in detail at this parameter but it seems to
> me there are two possible paths:
>
> 1) We could get agreement globally that the parameter should be increased.
> 2) We could agree that this specific use of the parameter is distinct
> from some other uses and deserves a new param in its own right with a
> higher value.


I think the fix is to improve the unrolled size estimates by better taking into
account constant propagation and CSE opportunities.  I have some ideas
here but not sure if I have enough free cycles to implement this for GCC 7.

Richard.

> R.

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

Reply via email to