Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2021-08-17 Thread Maxim Kuvyrkov via Gcc-patches
Hi All, I've forgotten to commit this patch when it was approved 2 years ago. It still applies cleanly to the current mainline and I've retested it (bootstrap+regtest) on aarch64-linux-gnu and arm-linux-gnueabihf with no regressions. I'll commit this shortly. Regards, On Tue, 3 Sept 2019 at 19

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-09-03 Thread Wilco Dijkstra
Hi Maxim, >  > Autoprefetching heuristic is enabled only for cores that support it, and > isn't active for by default. >   > It's enabled on most cores, including the default (generic). So we do have to > be > careful that this doesn't regress any other benchmarks or do worse on modern > cores

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-30 Thread Richard Biener
On Thu, Aug 29, 2019 at 7:36 PM Alexander Monakov wrote: > > On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote: > > > >> r1 = [rb + 0] > > >> > > >> r2 = [rb + 8] > > >> > > >> r3 = [rb + 16] > > >> > > >> > > >> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This > > >> schedule happ

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Alexander, > So essentially the main issue is not a hardware peculiarity, but rather the > bad schedule being totally wrong (it could only make sense if loads had > 1-cycle > latency, which they do not). The scheduling is only bad because the specific intrinsics used are mapped onto asm stat

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Alexander Monakov
On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote: > >> r1 = [rb + 0] > >> > >> r2 = [rb + 8] > >> > >> r3 = [rb + 16] > >> > >> > >> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This > >> schedule happens because r2= load gets lower priority than the > >> "irrelevant" due to the

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Maxim, >  It appears that cores with autoprefetcher hardware prefer loads and stores >bundled together, not interspersed with > other instructions to occupy the >rest of CPU units.   I don't believe it is as simple as that - modern cores have multiple prefetchers but won't prefer bund

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Maxim Kuvyrkov
> On Aug 29, 2019, at 7:29 PM, Richard Biener > wrote: > > On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov > wrote: >> Hi, >> >> This patch tweaks autoprefetcher heuristic in scheduler to better group >> memory loads and stores together. >> >> From https://gcc.gnu.org/bugzilla/show_b

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Richard Biener
On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov wrote: >Hi, > >This patch tweaks autoprefetcher heuristic in scheduler to better group >memory loads and stores together. > >From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598: > >There are two separate changes, both related to instruct

[PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Maxim Kuvyrkov
Hi, This patch tweaks autoprefetcher heuristic in scheduler to better group memory loads and stores together. From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598: There are two separate changes, both related to instruction scheduler, that cause the regression. The first change in r253235