------- Comment #8 from rguenth at gcc dot gnu dot org  2009-07-07 15:47 -------
The issue is likely the sequence

  load upper half of cache line 1
  load lower half of cache line 2
  store upper half of cache line 1
  store lower half of cache line 2   <---
  load upper half of cache line 2    <---
  load lower half of cache line 3
   ...

where the marked lines probably cause internal delays.

Not using unaligned stores for this kind of data dependence or peeling
for alignment will probably help here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648

Reply via email to