> On Feb 3, 2017, at 8:12 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> > wrote: > > Hi all, > > While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 > prefetch pattern is > overly restrictive in its address operand. It only accepts simple register > addressing modes. > In fact, the PRFM instruction accepts almost all modes that a normal 64-bit > LDR supports. > The restriction in the pattern leads to explicit address calculation code to > be emitted which we could avoid.
Thanks for this fix, I'll test it on my hardware. I've reviewed your patch and it looks OK to me. > > This patch relaxes the restrictions on the prefetch define_insn. It creates a > predicate and constraint that > allow the full addressing modes that PRFM allows. Thus for the testcase in > the patch (adapted from one of the existing > __builtin_prefetch tests in the testsuite) we can generate a: > prfm PLDL1STRM, [x1, 8] > > instead of the current > prfm PLDL1STRM, [x1] > with an explicit increment of x1 by 8 in a separate instruction. > > I've removed the %a output modifier in the output template and wrapped the > address operand into a DImode MEM before > passing it down to aarch64_print_operand. > > This is because operand 0 is an address operand rather than a memory operand > and thus doesn't have a mode associated > with it. When processing the 'a' output modifier the code in final.c will > call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode > argument. This will ICE on aarch64 because we need a mode for the memory in > order for aarch64_classify_address to work > correctly. Rather than overriding the VOIDmode in > aarch64_print_operand_address I decided to instead create the DImode > MEM in the "prefetch" output template and treat it as a normal 64-bit memory > address, which at the point of assembly output > is what it is anyway. I agree that it is cleaner to convert operand of prefetch to DImode just before printing out to assembly. There is little to be gained in relaxing asserts in aarch64_print_operand_address. > > With this patch I see a reduction in instruction count in the SPEC2006 > benchmarks when SW prefetching is enabled on top > of Maxim's patchset because fewer address calculation instructions are > emitted due to the use of the more expressive > addressing modes. It also fixes a performance regression that I observed in > 410.bwaves from Maxim's patches on Cortex-A72. > I'll be running a full set of benchmarks to evaluate this further, but I > think this is the right thing to do. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Maxim, do you want to try this on top of your patches on your hardware to see > if it helps with the regressions you mentioned? Sure. -- Maxim Kuvyrkov www.linaro.org