http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2011-01-23 15:59:30 | CC| |rguenther at suse dot de --- Comment #18 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 20:00:23 UTC --- We produce very lousy code for the out of line copy of __perdida_m_MOD_generalized_hookes_law. This seems to be reason why we inline it. Code is bit better with early FRE but still we get in vect_pgeneralized_constitutive_tensor (optimized dump): generalized_constitutive_tensor = {}; D.4502_45 = *lambda_44(D); D.4503_47 = *mu_46(D); D.4504_48 = D.4503_47 * 2.0e+0; D.4505_49 = D.4504_48 + D.4502_45; generalized_constitutive_tensor[0] = D.4505_49; generalized_constitutive_tensor[6] = D.4502_45; generalized_constitutive_tensor[12] = D.4502_45; generalized_constitutive_tensor[1] = D.4502_45; generalized_constitutive_tensor[7] = D.4505_49; generalized_constitutive_tensor[13] = D.4502_45; generalized_constitutive_tensor[2] = D.4502_45; generalized_constitutive_tensor[8] = D.4502_45; generalized_constitutive_tensor[14] = D.4505_49; generalized_constitutive_tensor[21] = D.4503_47; generalized_constitutive_tensor[28] = D.4503_47; generalized_constitutive_tensor[35] = D.4503_47; initialize the array with mostly zeros and then we use it in vectorized loop: vect_cst_.855_301 = {D.4508_69, D.4508_69}; vect_cst_.862_295 = {D.4511_73, D.4511_73}; vect_cst_.870_288 = {D.4514_77, D.4514_77}; vect_cst_.878_323 = {D.4519_82, D.4519_82}; vect_cst_.886_330 = {D.4522_86, D.4522_86}; vect_cst_.894_337 = {D.4526_90, D.4526_90}; vect_var_.853_205 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor]; vect_var_.854_210 = vect_var_.853_205 * vect_cst_.855_301; vect_var_.860_211 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor + 48B]; vect_var_.861_214 = vect_var_.860_211 * vect_cst_.862_295; vect_var_.863_215 = vect_var_.861_214 + vect_var_.854_210; vect_var_.868_220 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor + 96B]; vect_var_.869_221 = vect_var_.868_220 * vect_cst_.870_288; vect_var_.871_224 = vect_var_.863_215 + vect_var_.869_221; vect_var_.876_225 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor + 144B]; we would better go with unrolling this and optimizing away 0 terms. W/o -ftree-vectorize we however still don't do this transform. We end up with: generalized_constitutive_tensor = {}; D.4502_45 = *lambda_44(D); D.4503_47 = *mu_46(D); D.4504_48 = D.4503_47 * 2.0e+0; D.4505_49 = D.4504_48 + D.4502_45; generalized_constitutive_tensor[1] = D.4502_45; generalized_constitutive_tensor[7] = D.4505_49; generalized_constitutive_tensor[13] = D.4502_45; generalized_constitutive_tensor[2] = D.4502_45; generalized_constitutive_tensor[8] = D.4502_45; generalized_constitutive_tensor[14] = D.4505_49; generalized_constitutive_tensor[21] = D.4503_47; generalized_constitutive_tensor[28] = D.4503_47; generalized_constitutive_tensor[35] = D.4503_47; .... pretmp.827_334 = generalized_constitutive_tensor[1]; pretmp.830_336 = generalized_constitutive_tensor[7]; pretmp.832_338 = generalized_constitutive_tensor[13]; pretmp.834_340 = generalized_constitutive_tensor[19]; pretmp.836_342 = generalized_constitutive_tensor[25]; pretmp.838_344 = generalized_constitutive_tensor[31]; so copy propagation and SRA are missing. Moreover we can't figure out that generalized_constitutive_tensor[31] is 0. So it is quite good testcase for optimization queue ordering. Honza