[Bug rtl-optimization/50190] New: linkpk bench of polyhedron fails during validation with gcc trunk when it is compiled with -Ofast on amd64.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50190 Bug #: 50190 Summary: linkpk bench of polyhedron fails during validation with gcc trunk when it is compiled with -Ofast on amd64. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: venkataramanan.kumar@gmail.com Polyhedron benchmark linpk fails during validation when compiled with -Ofast flag. (Snip) > Value= 25.114499300 Target= 23.1 Tolerance= 2.00 FAIL > Value=0.27880142600E-10 Target=0.27858826400E-10 Tolerance=0.100E-09 > Value=0.22204460500E-15 Target=0.22204460500E-15 Tolerance=0.100E-14 > Value= 1.00 Target= 1.00 Tolerance=0.100E-07 > Value= 1.00 Target= 1.00 Tolerance=0.100E-07 linpk FAILED1 fails and4 passes (Snip) exact version of GCC: gcctrunk 1780539 system type: amd64 options given when GCC was configured/built: --disable-bootstrap --without-ppl --without-cloog --enable-languages=c,c++,fortran complete command line that triggers the bug: -Ofast This issue does not occur when -fprotect-parens is specified with -Ofast. Current trunk has -fno-protect-parens in -Ofast. So this seems to be a precision issue as GCC may reorder floating point computations. But I have reported this to bring it to the notice of community.
[Bug rtl-optimization/50904] New: Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Bug #: 50904 Summary: Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: venkataramanan.kumar@gmail.com Configurations: GCC 4.7 trunk revison: 180364 Machine: AMD64 Commandline: gfortran -Ofast induct2.f90 Description: We observed slowdown in induct benchmark for -Ofast after -fprotect-parens got disabled in -Ofast (in gcc trunk rev 173385 on 2011-05-04 for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48864). When ISA enabled is avx, we observed a slowdown of ~2% While analyzing the slowdown, we found that there is a difference in code generated in one of the induct's hot loop nest, between -Ofast (with protect-parens) and -Ofast (without protect-parens) irrespective of ISA (avx,fma4)and tuning. Observations revealed this is due to an interaction with the reassociation of expression happening in gimple and code hoisting in PRE and other loop optimizations for the RTL generated for that expression. Details: The following snippet shows the hot loop in subroutine "mutual_ind_quad_cir_coil". (-Snip-) do i = 1, 2*m theta = pi*real(i,longreal)/real(m,longreal) c_vector(1) = r_coil * cos(theta) c_vector(2) = r_coil * sin(theta) ! ! compute current vector for the coil in the global coordinate system ! coil_tmp_vector(1) = -sin(theta) coil_tmp_vector(2) = cos(theta) coil_tmp_vector(3) = 0.0_longreal coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) q_vector(2) = 0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal) q_vector(3) = 0.0_longreal ! ! rotate quad vector into the global coordinate system ! rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) ! ! compute and add in quadrature term ! numerator = w1gauss(j) * w2gauss(k) * & dot_product(coil_current_vec,current_vector) denominator = sqrt(dot_product(rot_c_vector-rot_q_vector, & rot_c_vector-rot_q_vector)) l12_lower = l12_lower + numerator/denominator end do end do end do (-Snip-) At Ofast, the k loop is unrolled and vectorized. When -fprotect-parens is enabled at -Ofast, "q_vector(2) = 0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal)" and part of the expression "rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))" are hoisted out of the j loop: But in case when -fprotect-parens is disabled, the expressions are not hoisted out of the loop. Observations: 1) In gimple, when -fprotect-parens is disabled, the expression (y2gauss(k) - 1.0_longreal) is reassociated as shown below. induct2.f90.080t.dse1 (-Snip-) D.8701_385 = y2gauss[D.8696_378]; D.8702_386 = D.8701_385 - 1.0e+0; D.8703_387 = b1_148 * D.8702_386; D.8704_388 = D.8703_387 * 5.0e-1; (-Snip-) induct2.f90.081.reassoc1 (-Snip-) D.8701_385 = y2gauss[D.8696_378]; D.8702_386 = D.8701_385 + -1.0e+0; D.8703_387 = b1_148 * 5.0e-1; D.8704_388 = D.8703_387 * D.8702_386; (-Snip-) However with -fprotect-parens is enabled, induct2.f90.081.reassoc1 (-Snip-) D.8814_395 = y2gauss[D.8808_387]; D.8815_396 = D.8814_395 - 1.0e+0; D.8816_397 = ((D.8815_396)); D.8817_398 = b1_154 * 5.0e-1; D.8818_399 = D.8817_398 * D.8816_397 (-Snip-) 2) Due to the reassociation that happens when -fprotect-parens is disabled, the RTL generated for the expression "0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal)" also changes. For example first 2 elements in y2guass array, the RTL
[Bug rtl-optimization/50904] Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #7 from Venkataramanan Kumar 2011-11-02 05:50:44 UTC --- (In reply to comment #6) > > I don't see why RTL invariant motion should move the one variant but not > > the other. Of course this also shows that we should, after loop unrolling > > on the tree level, also perform loop invariant motion again ... > The problem seems to be in RTL PRE, which hoists simple loads but not loads > that are wrapped up in a PLUS or a MINUS. Even with -fprotect-parens, load > hoisting opportunities are lost because of this. You mean to say PRE hosits the when expression of this pattern. (Snip) (insn 536 533 537 14 (set (reg:V2DF 1172 [ MEM[(real(kind=8)[9] *)&y2gauss] ]) (mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2] ) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256])) induct2.f90:1662 1102 {*movv2df_internal} (Snip) But not of this pattern. (Snip) (insn 526 525 527 14 (set (reg:V2DF 1123) (plus:V2DF (reg:V2DF 1124) (mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2] ) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256]))) ../induct2.f90:1662 1130 {*addv2df3} (Snip)
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #13 from Venkataramanan Kumar 2011-11-09 10:22:39 UTC --- (In reply to comment #12) > Created attachment 25764 [details] > Tentative fix (3) > Final version. Can someone try it on his favorite Fortran benchmark? Ok I will check and let you know the results.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #14 from Venkataramanan Kumar 2011-11-11 22:58:01 UTC --- I ran polyhedron benchmarks with -march=bdver1 and -Ofast. Induct run time was brought down to 53.45 sec from 70.93 sec. Other benchmarks are not affected much. I am planning to test on older machine.
[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #16 from Venkataramanan Kumar 2011-11-19 04:25:27 UTC --- (In reply to comment #13) > (In reply to comment #12) > > Created attachment 25764 [details] > > Tentative fix (3) > > Final version. Can someone try it on his favorite Fortran benchmark? > Ok I will check and let you know the results. Hi Eric, I tested on machine with -march=amdfam10, with your patch induct run time improves by ~5%. Other benchmarks are not affected much. With your patch, CPU2006 benchmark "416.gamess" fails during validation. Mimimal flag -O3 -fno-tree-pre -mfma4. I am trying to reduce the test case. Please let me know if you have any other suggestions.
[Bug gcov-profile/51361] New: [4.7 Regression] 471.omnetpp of SPEC2006 fails to build with -fprofile-generate
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51361 Bug #: 51361 Summary: [4.7 Regression] 471.omnetpp of SPEC2006 fails to build with -fprofile-generate Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: gcov-profile AssignedTo: unassig...@gcc.gnu.org ReportedBy: venkataramanan.kumar@gmail.com When we build SPEC2006 benchmark omnetpp with -fprofile-generate, it fails with lot of undefined references to instrumented sysmbols. (Snip) 1322659317.41: EtherAppSrv.o:(.data+0x2a0): undefined reference to `__gcov___ZNK7cModule13numInitStagesEv' 1322659317.41: EtherAppSrv.o:(.data+0x2a4): undefined reference to `__gcov___ZN7cModule10initializeEi' 1322659317.41: EtherApp_m.o:(.data+0xdcc): undefined reference to `__gcov___ZNK12EtherAppResp3dupEv' 1322659317.41: EtherApp_m.o:(.data+0xdd0): undefined reference to `__gcov___ZNK11EtherAppReq3dupEv' 1322659317.41: EtherBus.o:(.data+0x260): undefined reference to `__gcov___ZNK7cModule13numInitStagesEv' 1322659317.41: EtherBus.o:(.data+0x264): undefined reference to `__gcov___ZN7cModule10initializeEi' 1322659317.41: EtherBus.o:(.data+0x26c): undefined reference to `__gcov___ZNK8cMessage3dupEv' (Snip ends) I am trying to make a reduced test case. The build error started in revision 181105 Also reproduced in revison 181838
[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #49 from Venkataramanan Kumar 2011-12-06 09:59:39 UTC --- I am planning to test the patch on polyhedron benchmarks. Then I test it on CPU2006 SPEC.
[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #50 from Venkataramanan Kumar 2011-12-07 13:18:57 UTC --- In the machine I used Induct run time improves from 68.9 seconds to 55.94 seconds for -Ofast. I will update on other benchmarks and SPEC2006 once I complete testing.
[Bug gcov-profile/51361] [4.7 Regression] 471.omnetpp of SPEC2006 fails to build with -fprofile-generate
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51361 --- Comment #5 from Venkataramanan Kumar 2011-12-11 16:58:21 UTC --- The defect does not occur now. I checked in revision 182206.