http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Known to work| |4.6.2 Summary|[4.4/4.5/4.6/4.7 |[4.4/4.5/4.7 regression] |regression] reassociation |reassociation causes the RA |causes the RA to be |to be confused |confused | --- Comment #37 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-12 12:00:26 UTC --- Hmm, on the 4.6 branch this seems fixed, trunk it is no longer tree-reassoc that makes us slow, but instead with tree-reassoc things are faster ... 4.5 -O3 -ffast-math 1583.48 4.5 -O3 -ffast-math -fno-tree-reassoc 2307.55 4.6 -O3 -ffast-math 2287.99 4.6 -O3 -ffast-math -fno-tree-reassoc 2268.77 4.7 -O3 -ffast-math 1529.65 4.7 -O3 -ffast-math -fno-tree-reassoc 1144.00 4.7 -O3 -ffast-math -fno-tree-reassoc -fno-tree-vectorize 1992.50 Huh. On the 4.6 branch we somehow avoided scheduling everything down by tree reassoc, but on the 4.7 branch we are back to doing so. It would be interesting to bisect what fixed it for 4.6 and what broke it again for 4.7, even without reassociation. With -fschedule-insns -fsched-pressure we get back to 2212.98 for 4.7. With -fno-tree-reassoc we now vectorize the loop(!): <bb 4>: # pC0_1 = PHI <pC0_361(3), C_25(D)(2)> # pA0_2 = PHI <pA0_34(3), A_12(D)(2)> # pB0_4 = PHI <pB0_369(3), B_14(D)(2)> rC0_0_29 = *pC0_1; rC1_0_30 = MEM[(double *)pC0_1 + 8B]; rC2_0_31 = MEM[(double *)pC0_1 + 16B]; rC3_0_32 = MEM[(double *)pC0_1 + 24B]; rC4_0_33 = MEM[(double *)pC0_1 + 32B]; vect_var_.1256_2404 = {rC4_0_33, 0.0}; vect_var_.1298_2510 = {rC3_0_32, 0.0}; vect_var_.1340_2616 = {rC2_0_31, 0.0}; vect_var_.1382_2722 = {rC1_0_30, 0.0}; vect_var_.1424_2828 = {rC0_0_29, 0.0}; ivtmp.1443_2312 = (long unsigned int) pB0_4; ivtmp.1451_379 = (long unsigned int) pA0_2; D.4447_128 = ivtmp.1443_2312 + 480; <bb 5>: # vect_var_.1256_2375 = PHI <vect_var_.1256_2385(5), vect_var_.1256_2404(4)> # vect_var_.1256_2376 = PHI <vect_var_.1256_2386(5), {0.0, 0.0}(4)> # vect_var_.1256_2377 = PHI <vect_var_.1256_2387(5), {0.0, 0.0}(4)> # vect_var_.1256_2378 = PHI <vect_var_.1256_2388(5), {0.0, 0.0}(4)> # vect_var_.1256_2379 = PHI <vect_var_.1256_2389(5), {0.0, 0.0}(4)> # vect_var_.1256_2380 = PHI <vect_var_.1256_2390(5), {0.0, 0.0}(4)> # vect_var_.1256_2381 = PHI <vect_var_.1256_2391(5), {0.0, 0.0}(4)> # vect_var_.1256_2382 = PHI <vect_var_.1256_2392(5), {0.0, 0.0}(4)> # vect_var_.1256_2383 = PHI <vect_var_.1256_2393(5), {0.0, 0.0}(4)> # vect_var_.1256_2384 = PHI <vect_var_.1256_2394(5), {0.0, 0.0}(4)> # vect_var_.1298_2481 = PHI <vect_var_.1298_2491(5), vect_var_.1298_2510(4)> # vect_var_.1298_2482 = PHI <vect_var_.1298_2492(5), {0.0, 0.0}(4)> # vect_var_.1298_2483 = PHI <vect_var_.1298_2493(5), {0.0, 0.0}(4)> # vect_var_.1298_2484 = PHI <vect_var_.1298_2494(5), {0.0, 0.0}(4)> # vect_var_.1298_2485 = PHI <vect_var_.1298_2495(5), {0.0, 0.0}(4)> # vect_var_.1298_2486 = PHI <vect_var_.1298_2496(5), {0.0, 0.0}(4)> # vect_var_.1298_2487 = PHI <vect_var_.1298_2497(5), {0.0, 0.0}(4)> # vect_var_.1298_2488 = PHI <vect_var_.1298_2498(5), {0.0, 0.0}(4)> # vect_var_.1298_2489 = PHI <vect_var_.1298_2499(5), {0.0, 0.0}(4)> # vect_var_.1298_2490 = PHI <vect_var_.1298_2500(5), {0.0, 0.0}(4)> # vect_var_.1340_2587 = PHI <vect_var_.1340_2597(5), vect_var_.1340_2616(4)> # vect_var_.1340_2588 = PHI <vect_var_.1340_2598(5), {0.0, 0.0}(4)> ... D.4346_400 = (void *) ivtmp.1451_2315; vect_var_.1399_2748 = MEM[base: D.4346_400, offset: 0B]; vect_var_.1400_2750 = MEM[base: D.4346_400, offset: 16B]; vect_var_.1401_2752 = MEM[base: D.4346_400, offset: 32B]; vect_var_.1402_2754 = MEM[base: D.4346_400, offset: 48B]; vect_var_.1403_2756 = MEM[base: D.4346_400, offset: 64B]; ... vect_var_.1423_2789 = vect_var_.1399_2748 * vect_var_.1413_2770; vect_var_.1423_2790 = vect_var_.1400_2750 * vect_var_.1414_2772; vect_var_.1423_2791 = vect_var_.1401_2752 * vect_var_.1415_2774; ... ivtmp.1443_2318 = ivtmp.1443_2316 + 160; ivtmp.1451_2317 = ivtmp.1451_2315 + 160; if (ivtmp.1443_2318 != D.4447_128) goto <bb 5>; else goto <bb 6>; <bb 6>: # vect_var_.1256_2405 = PHI <vect_var_.1256_2385(5)> # vect_var_.1256_2406 = PHI <vect_var_.1256_2386(5)> ... vect_var_.1436_2839 = vect_var_.1424_2829 + vect_var_.1424_2830; vect_var_.1436_2840 = vect_var_.1436_2839 + vect_var_.1424_2831; vect_var_.1436_2841 = vect_var_.1436_2840 + vect_var_.1424_2832; vect_var_.1436_2842 = vect_var_.1436_2841 + vect_var_.1424_2833; vect_var_.1436_2843 = vect_var_.1436_2842 + vect_var_.1424_2834; vect_var_.1436_2844 = vect_var_.1436_2843 + vect_var_.1424_2835; vect_var_.1436_2845 = vect_var_.1436_2844 + vect_var_.1424_2836; vect_var_.1436_2846 = vect_var_.1436_2845 + vect_var_.1424_2837; vect_var_.1436_2847 = vect_var_.1436_2846 + vect_var_.1424_2838; stmp_var_.1435_2848 = BIT_FIELD_REF <vect_var_.1436_2847, 64, 0>; stmp_var_.1435_2849 = BIT_FIELD_REF <vect_var_.1436_2847, 64, 64>; ... stmp_var_.1267_2425 = BIT_FIELD_REF <vect_var_.1268_2423, 64, 64>; stmp_var_.1267_2426 = stmp_var_.1267_2425 + stmp_var_.1267_2424; pB0_2322 = pB0_4 + 480; *pC0_1 = stmp_var_.1435_2850; MEM[(double *)pC0_1 + 8B] = stmp_var_.1393_2744; MEM[(double *)pC0_1 + 16B] = stmp_var_.1351_2638; MEM[(double *)pC0_1 + 24B] = stmp_var_.1309_2532; MEM[(double *)pC0_1 + 32B] = stmp_var_.1267_2426; pC0_362 = pC0_1 + 40; pA0_363 = &MEM[(void *)pA0_2 + 2400B]; pB0_364 = pB0_4; if (pA0_363 != stM_13) goto <bb 3>; else goto <bb 7>;