On 15/12/15 23:34, Evandro Menezes wrote: > On 12/14/2015 05:26 AM, James Greenhalgh wrote: >> On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote: >>> On 11/20/2015 05:53 AM, James Greenhalgh wrote: >>>> On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote: >>>>> On 11/05/2015 02:51 PM, Evandro Menezes wrote: >>>>>> 2015-11-05 Evandro Menezes <e.mene...@samsung.com> >>>>>> >>>>>> gcc/ >>>>>> >>>>>> * config/aarch64/aarch64.c >>>>>> (aarch64_override_options_internal): >>>>>> Increase loop peeling limit. >>>>>> >>>>>> This patch increases the limit for the number of peeled insns. >>>>>> With this change, I noticed no major regression in either >>>>>> Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP >>>>>> ones, improved significantly. >>>>>> >>>>>> I tested this tuning on Exynos M1 and on A57. ThunderX seems to >>>>>> benefit from this tuning too. However, I'd appreciate comments >>>>> >from other stakeholders. >>>>> >>>>> Ping. >>>> I'd like to leave this for a call from the port maintainers. I can >>>> see why >>>> this leads to more opportunities for vectorization, but I'm >>>> concerned about >>>> the wider impact on code size. Certainly I wouldn't expect this to >>>> be our >>>> default at -O2 and below. >>>> >>>> My gut feeling is that this doesn't really belong in the back-end >>>> (there are >>>> presumably good reasons why the default for this parameter across >>>> GCC has >>>> fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd >>>> like Marcus or Richard to make the call as to whether or not we take >>>> this >>>> patch. >>> Please, correct me if I'm wrong, but loop peeling is enabled only >>> with loop unrolling (and with PGO). If so, then extra code size is >>> not a concern, for this heuristic is only active when unrolling >>> loops, when code size is already of secondary importance. >> My understanding was that loop peeling is enabled from -O2 upwards, and >> is also used to partially peel unaligned loops for vectorization >> (allowing >> the vector code to be well aligned), or to completely peel inner loops >> which >> may then become amenable to SLP vectorization. >> >> If I'm wrong then I take back these objections. But I was sure this >> parameter was used in a number of situations outside of just >> -funroll-loops/-funroll-all-loops . Certainly I remember seeing >> performance >> sensitivities to this parameter at -O3 in some internal workloads I was >> analysing. > > Vectorization, including SLP, is only enabled at -O3, isn't it? It > seems to me that peeling is only used by optimizations which already > lead to potential increase in code size. > > For instance, with "-Ofast -funroll-all-loops", the total text size for > the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB > without it; with just "-O2", it is the same at 23.1MB regardless of this > setting. > > So it seems to me that this proposal should be neutral for up to -O2. > > Thank you, >
My preference would be to not diverge from the global parameter settings. I haven't looked in detail at this parameter but it seems to me there are two possible paths: 1) We could get agreement globally that the parameter should be increased. 2) We could agree that this specific use of the parameter is distinct from some other uses and deserves a new param in its own right with a higher value. R.