http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52897

Mikael Pettersson <mikpe at it dot uu.se> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mikpe at it dot uu.se

--- Comment #2 from Mikael Pettersson <mikpe at it dot uu.se> 2012-04-07 
10:33:05 UTC ---
I see two obvious performance problems in the gcc-4.7.0 code for the second
loop:

- Instead of doing memory-to-memory moves it does load;store sequences.

- GCC apparently attempted to avoid moves with 16-bit immediate offsets by
setting up a bunch of address registers sparsely in the source array and then
using auto-increment addressing modes when loading from them; that could have
been a win, but gcc had to spill one of those address registers so the code
becomes rather awful.

Neither version of gcc managed to hoist the constant destination address into
an address register, so that 32-bit immediate appears many times in the loops.

Reply via email to