https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|arm |aarch64
Status|UNCONFIRMED |NEW
Last reconfirmed| |2019-08-30
CC| |wilco at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #2)
> Created attachment 46784 [details]
> Patch for 70% of the regression
Confirmed. Note this is not about auto prefetching but basic scheduling for
load latency.
The key issue is the use of asm in arm_neon.h - fixing those will improve
scheduling. It may also be a good idea to fix the scheduler so that it
schedules asm instructions. For example always use the latencies of input
registers and assign a fixed latency to outputs depending on the mode (eg.
integer =1, FP = 4, int simd = 2).
It's not clear what the point is of the "auto prefetch" scheduling - while it
may be a good idea to order loads/stores on increasing addresses, grouping all
loads or stores together is counterproductive.