http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52897
Mikael Pettersson <mikpe at it dot uu.se> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mikpe at it dot uu.se --- Comment #2 from Mikael Pettersson <mikpe at it dot uu.se> 2012-04-07 10:33:05 UTC --- I see two obvious performance problems in the gcc-4.7.0 code for the second loop: - Instead of doing memory-to-memory moves it does load;store sequences. - GCC apparently attempted to avoid moves with 16-bit immediate offsets by setting up a bunch of address registers sparsely in the source array and then using auto-increment addressing modes when loading from them; that could have been a win, but gcc had to spill one of those address registers so the code becomes rather awful. Neither version of gcc managed to hoist the constant destination address into an address register, so that 32-bit immediate appears many times in the loops.