http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |4.6.2
            Summary|[4.4/4.5/4.6/4.7            |[4.4/4.5/4.7 regression]
                   |regression] reassociation   |reassociation causes the RA
                   |causes the RA to be         |to be confused
                   |confused                    |

--- Comment #37 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-12 
12:00:26 UTC ---
Hmm, on the 4.6 branch this seems fixed, trunk it is no longer tree-reassoc
that makes us slow, but instead with tree-reassoc things are faster ...

4.5 -O3 -ffast-math                     1583.48
4.5 -O3 -ffast-math -fno-tree-reassoc   2307.55
4.6 -O3 -ffast-math                     2287.99
4.6 -O3 -ffast-math -fno-tree-reassoc   2268.77
4.7 -O3 -ffast-math                     1529.65
4.7 -O3 -ffast-math -fno-tree-reassoc   1144.00
4.7 -O3 -ffast-math -fno-tree-reassoc -fno-tree-vectorize  1992.50

Huh.

On the 4.6 branch we somehow avoided scheduling everything down by
tree reassoc, but on the 4.7 branch we are back to doing so.
It would be interesting to bisect what fixed it for 4.6 and what broke
it again for 4.7, even without reassociation.

With -fschedule-insns -fsched-pressure we get back to 2212.98 for 4.7.

With -fno-tree-reassoc we now vectorize the loop(!):

<bb 4>:
  # pC0_1 = PHI <pC0_361(3), C_25(D)(2)>
  # pA0_2 = PHI <pA0_34(3), A_12(D)(2)>
  # pB0_4 = PHI <pB0_369(3), B_14(D)(2)>
  rC0_0_29 = *pC0_1;
  rC1_0_30 = MEM[(double *)pC0_1 + 8B];
  rC2_0_31 = MEM[(double *)pC0_1 + 16B];
  rC3_0_32 = MEM[(double *)pC0_1 + 24B];
  rC4_0_33 = MEM[(double *)pC0_1 + 32B];
  vect_var_.1256_2404 = {rC4_0_33, 0.0};
  vect_var_.1298_2510 = {rC3_0_32, 0.0};
  vect_var_.1340_2616 = {rC2_0_31, 0.0};
  vect_var_.1382_2722 = {rC1_0_30, 0.0};
  vect_var_.1424_2828 = {rC0_0_29, 0.0};
  ivtmp.1443_2312 = (long unsigned int) pB0_4;
  ivtmp.1451_379 = (long unsigned int) pA0_2;
  D.4447_128 = ivtmp.1443_2312 + 480;

<bb 5>:
  # vect_var_.1256_2375 = PHI <vect_var_.1256_2385(5), vect_var_.1256_2404(4)>
  # vect_var_.1256_2376 = PHI <vect_var_.1256_2386(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2377 = PHI <vect_var_.1256_2387(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2378 = PHI <vect_var_.1256_2388(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2379 = PHI <vect_var_.1256_2389(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2380 = PHI <vect_var_.1256_2390(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2381 = PHI <vect_var_.1256_2391(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2382 = PHI <vect_var_.1256_2392(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2383 = PHI <vect_var_.1256_2393(5), {0.0, 0.0}(4)>
  # vect_var_.1256_2384 = PHI <vect_var_.1256_2394(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2481 = PHI <vect_var_.1298_2491(5), vect_var_.1298_2510(4)>
  # vect_var_.1298_2482 = PHI <vect_var_.1298_2492(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2483 = PHI <vect_var_.1298_2493(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2484 = PHI <vect_var_.1298_2494(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2485 = PHI <vect_var_.1298_2495(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2486 = PHI <vect_var_.1298_2496(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2487 = PHI <vect_var_.1298_2497(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2488 = PHI <vect_var_.1298_2498(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2489 = PHI <vect_var_.1298_2499(5), {0.0, 0.0}(4)>
  # vect_var_.1298_2490 = PHI <vect_var_.1298_2500(5), {0.0, 0.0}(4)>
  # vect_var_.1340_2587 = PHI <vect_var_.1340_2597(5), vect_var_.1340_2616(4)>
  # vect_var_.1340_2588 = PHI <vect_var_.1340_2598(5), {0.0, 0.0}(4)>
...
  D.4346_400 = (void *) ivtmp.1451_2315;
  vect_var_.1399_2748 = MEM[base: D.4346_400, offset: 0B];
  vect_var_.1400_2750 = MEM[base: D.4346_400, offset: 16B];
  vect_var_.1401_2752 = MEM[base: D.4346_400, offset: 32B];
  vect_var_.1402_2754 = MEM[base: D.4346_400, offset: 48B];
  vect_var_.1403_2756 = MEM[base: D.4346_400, offset: 64B];
...
  vect_var_.1423_2789 = vect_var_.1399_2748 * vect_var_.1413_2770;
  vect_var_.1423_2790 = vect_var_.1400_2750 * vect_var_.1414_2772;
  vect_var_.1423_2791 = vect_var_.1401_2752 * vect_var_.1415_2774;
...
  ivtmp.1443_2318 = ivtmp.1443_2316 + 160;
  ivtmp.1451_2317 = ivtmp.1451_2315 + 160;
  if (ivtmp.1443_2318 != D.4447_128)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 6>:
  # vect_var_.1256_2405 = PHI <vect_var_.1256_2385(5)>
  # vect_var_.1256_2406 = PHI <vect_var_.1256_2386(5)>
...
  vect_var_.1436_2839 = vect_var_.1424_2829 + vect_var_.1424_2830;
  vect_var_.1436_2840 = vect_var_.1436_2839 + vect_var_.1424_2831;
  vect_var_.1436_2841 = vect_var_.1436_2840 + vect_var_.1424_2832;
  vect_var_.1436_2842 = vect_var_.1436_2841 + vect_var_.1424_2833;
  vect_var_.1436_2843 = vect_var_.1436_2842 + vect_var_.1424_2834;
  vect_var_.1436_2844 = vect_var_.1436_2843 + vect_var_.1424_2835;
  vect_var_.1436_2845 = vect_var_.1436_2844 + vect_var_.1424_2836;
  vect_var_.1436_2846 = vect_var_.1436_2845 + vect_var_.1424_2837;
  vect_var_.1436_2847 = vect_var_.1436_2846 + vect_var_.1424_2838;
  stmp_var_.1435_2848 = BIT_FIELD_REF <vect_var_.1436_2847, 64, 0>;
  stmp_var_.1435_2849 = BIT_FIELD_REF <vect_var_.1436_2847, 64, 64>;
...
  stmp_var_.1267_2425 = BIT_FIELD_REF <vect_var_.1268_2423, 64, 64>;
  stmp_var_.1267_2426 = stmp_var_.1267_2425 + stmp_var_.1267_2424;
  pB0_2322 = pB0_4 + 480;
  *pC0_1 = stmp_var_.1435_2850;
  MEM[(double *)pC0_1 + 8B] = stmp_var_.1393_2744;
  MEM[(double *)pC0_1 + 16B] = stmp_var_.1351_2638;
  MEM[(double *)pC0_1 + 24B] = stmp_var_.1309_2532;
  MEM[(double *)pC0_1 + 32B] = stmp_var_.1267_2426;
  pC0_362 = pC0_1 + 40;
  pA0_363 = &MEM[(void *)pA0_2 + 2400B];
  pB0_364 = pB0_4;
  if (pA0_363 != stM_13)
    goto <bb 3>;
  else
    goto <bb 7>;

Reply via email to