On Fri, Sep 22, 2017 at 8:59 AM, Jim Wilson <jim.wil...@linaro.org> wrote: > On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson <jim.wil...@linaro.org> wrote: >> On Falkor, because of an idiosyncracy of how the pipelines are designed, a >> quad-word store using a reg+reg addressing mode is almost twice as slow as an >> add followed by a quad-word store with a single reg addressing mode. So we >> get better performance if we disallow addressing modes using register offsets >> with quad-word stores. > > This was tested with a bootstrap and make check of course. Also, > gcc.c-torture/compile/20021212-1.c compiled with -O3 -mcpu=falkor > makes a nice testcase to demonstrate that the patch works. > > OK?
Two overall comments: * What about splitting register_offset into two different elements, one for non 128bit modes and one for 128bit (and more; OI, etc.) modes so you get better address generation right away for the simd load cases rather than having LRA/reload having to reload the address into a register. * Maybe adding a testcase to the testsuite to show this change. One extra comment: * should we change the generic tuning to avoid reg+reg for 128bit modes? Thanks, Andrew > > Jim