------- Comment #3 from rguenth at gcc dot gnu dot org 2010-06-05 10:56 -------
Ok. Fact is that no pass can move invariant store/load pairs. But that's
pre-existing - the main issue is that the new SRA implementation ends up
rematerializing the stores inside the loop!
Diff of pre-esra vs. esra:
<bb 2>:
D.4339_3 = a_2(D)->r;
- va.f[0] = D.4339_3;
+ va$f$0_33 = D.4339_3;
D.4340_4 = a_2(D)->g;
- va.f[1] = D.4340_4;
+ va$f$1_32 = D.4340_4;
D.4341_5 = a_2(D)->b;
- va.f[2] = D.4341_5;
- va.f[3] = 0.0;
+ va$f$2_31 = D.4341_5;
+ va$f$3_30 = 0.0;
y_6 = 0;
goto <bb 4>;
@@ -504,6 +203,10 @@
tmpatt_37 = {D.4375_36, D.4375_36, D.4375_36, D.4375_36};
tmpatt_40 = tmpatt_37;
tmpatt_15 = tmpatt_40;
+ va.f[0] = va$f$0_33;
+ va.f[1] = va$f$1_32;
+ va.f[2] = va$f$2_31;
+ va.f[3] = va$f$3_30;
D.4347_16 = va.v;
tmpatt_38 = __builtin_ia32_mulps (tmpatt_15, D.4347_16);
tmpatt_41 = tmpatt_38;
that's of course bad (and the scalarization in this particular case looks
useless, too - the only use is an aggregate one, covering all stores).
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu dot
| |org
Component|regression |tree-optimization
Summary|[4.5/4.6] Massive |[4.5/4.6 Regression] Massive
|performance regression in |performance regression in
|SSE code |SSE code due to SRA
Target Milestone|--- |4.5.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423