https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target|arm |aarch64 Status|UNCONFIRMED |NEW Last reconfirmed| |2019-08-30 CC| |wilco at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Maxim Kuvyrkov from comment #2) > Created attachment 46784 [details] > Patch for 70% of the regression Confirmed. Note this is not about auto prefetching but basic scheduling for load latency. The key issue is the use of asm in arm_neon.h - fixing those will improve scheduling. It may also be a good idea to fix the scheduler so that it schedules asm instructions. For example always use the latencies of input registers and assign a fixed latency to outputs depending on the mode (eg. integer =1, FP = 4, int simd = 2). It's not clear what the point is of the "auto prefetch" scheduling - while it may be a good idea to order loads/stores on increasing addresses, grouping all loads or stores together is counterproductive.