Siddhesh Poyarekar wrote: On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote: > Why is that a bad thing? With the patch as is, the testcase generates: > > .L4: > ldr q0, [x2, x3] > add x5, x1, x3 > add x3, x3, 16 > cmp x3, x4 > str q0, [x5] > bne .L4 > > With a change in address cost (for loads and stores) we would get: > > .L4: > ldr q0, [x3], 16 > str q0, [x4], 16 > cmp x3, x5 > bne .L4
> This is great for the load because of the way the falkor prefetcher > works, but it is terrible for the store because of the way the pipeline > works. The only performant store for falkor is an indirect load with a > constant or zero offset. Everything else has hidden costs. Are you saying the same issue exists for all stores with writeback? If so then your patch would need to address that too. > I briefly looked at the possibility of splitting the register_offset > cost into load and store, but I realized that I'd have to modify the > target hook for it to be useful, which is way too much work for this > single quirk. It seems way more fundamental if it affects anything that isn't a simple immediate offset. Again I suggest using the existing cost infrastructure to find a setting that improves performance. If discouraging pre/post increment helps Falkor then that's better than doing nothing. >> I think a special case for Falkor in aarch64_address_cost would be acceptable >> in GCC8 - that would be much smaller and cleaner than the current patch. >> If required we could improve upon this in GCC9 and add a way to differentiate >> between loads and stores. > > I can't do this in address_cost since I can't determine whether the > address is a load or a store location. The most minimal way seems to be > using the patterns in the md file. Well I don't think the approach of blocking specific patterns is a good solution to this problem and may not be accepted by AArch64 maintainers. Try your example with -funroll-loops and compare with my suggestion (with or without extra code to increase cost of writeback too). It seems to me adjusting costs is always going to result in better overall code quality, even if it also applies to loads for the time being. Wilco