Hi All,
I've forgotten to commit this patch when it was approved 2 years ago. It
still applies cleanly to the current mainline and I've retested it
(bootstrap+regtest) on aarch64-linux-gnu and arm-linux-gnueabihf with no
regressions.
I'll commit this shortly.
Regards,
On Tue, 3 Sept 2019 at 19
Hi Maxim,
> > Autoprefetching heuristic is enabled only for cores that support it, and
> isn't active for by default.
>
> It's enabled on most cores, including the default (generic). So we do have to
> be
> careful that this doesn't regress any other benchmarks or do worse on modern
> cores
On Thu, Aug 29, 2019 at 7:36 PM Alexander Monakov wrote:
>
> On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote:
>
> > >> r1 = [rb + 0]
> > >>
> > >> r2 = [rb + 8]
> > >>
> > >> r3 = [rb + 16]
> > >>
> > >>
> > >> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This
> > >> schedule happ
Hi Alexander,
> So essentially the main issue is not a hardware peculiarity, but rather the
> bad schedule being totally wrong (it could only make sense if loads had
> 1-cycle
> latency, which they do not).
The scheduling is only bad because the specific intrinsics used are mapped
onto asm stat
On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote:
> >> r1 = [rb + 0]
> >>
> >> r2 = [rb + 8]
> >>
> >> r3 = [rb + 16]
> >>
> >>
> >> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This
> >> schedule happens because r2= load gets lower priority than the
> >> "irrelevant" due to the
Hi Maxim,
> It appears that cores with autoprefetcher hardware prefer loads and stores
>bundled together, not interspersed with > other instructions to occupy the
>rest of CPU units.
I don't believe it is as simple as that - modern cores have multiple
prefetchers but
won't prefer bund
> On Aug 29, 2019, at 7:29 PM, Richard Biener
> wrote:
>
> On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov
> wrote:
>> Hi,
>>
>> This patch tweaks autoprefetcher heuristic in scheduler to better group
>> memory loads and stores together.
>>
>> From https://gcc.gnu.org/bugzilla/show_b
On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov
wrote:
>Hi,
>
>This patch tweaks autoprefetcher heuristic in scheduler to better group
>memory loads and stores together.
>
>From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:
>
>There are two separate changes, both related to instruct
Hi,
This patch tweaks autoprefetcher heuristic in scheduler to better group memory
loads and stores together.
From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:
There are two separate changes, both related to instruction scheduler, that
cause the regression. The first change in r253235