[Bug rtl-optimization/50190] New: linkpk bench of polyhedron fails during validation with gcc trunk when it is compiled with -Ofast on amd64.

2011-08-25 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50190

 Bug #: 50190
   Summary: linkpk bench of polyhedron fails during validation
with gcc trunk when it is compiled with -Ofast on
amd64.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: venkataramanan.kumar@gmail.com


Polyhedron benchmark linpk fails during validation when compiled with -Ofast
flag.

(Snip)
> Value= 25.114499300 Target= 23.1 Tolerance= 2.00
FAIL 
> Value=0.27880142600E-10 Target=0.27858826400E-10 Tolerance=0.100E-09
> Value=0.22204460500E-15 Target=0.22204460500E-15 Tolerance=0.100E-14
> Value= 1.00 Target= 1.00 Tolerance=0.100E-07
> Value= 1.00 Target= 1.00 Tolerance=0.100E-07
linpk FAILED1 fails and4 passes
(Snip)

exact version of GCC: gcctrunk 1780539
system type: amd64
options given when GCC was configured/built: --disable-bootstrap --without-ppl
--without-cloog --enable-languages=c,c++,fortran 
complete command line that triggers the bug: -Ofast


This issue does not occur when -fprotect-parens is specified with -Ofast.
Current trunk has -fno-protect-parens in -Ofast. So this seems to be a
precision issue as GCC may reorder floating point computations. But I have
reported this to bring it to the notice of community.


[Bug rtl-optimization/50904] New: Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast.

2011-10-28 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

 Bug #: 50904
   Summary: Induct benchmark of polyhedron slows down when
-fno-protect-parens is enabled by -Ofast.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: venkataramanan.kumar@gmail.com


Configurations:
GCC 4.7 trunk revison: 180364
Machine: AMD64 

Commandline:
gfortran -Ofast induct2.f90

Description:
We observed slowdown in induct benchmark for -Ofast after -fprotect-parens got
disabled in -Ofast (in gcc trunk rev 173385 on 2011-05-04 for
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48864).

When ISA enabled is avx, we observed a slowdown of ~2% 

While analyzing the slowdown, we found that there is a difference in code
generated in one of the induct's hot loop nest, between -Ofast (with
protect-parens) and  -Ofast (without protect-parens) irrespective of ISA
(avx,fma4)and tuning. Observations revealed this is due to an interaction with
the reassociation of expression happening in gimple and code hoisting in PRE
and other loop optimizations for the RTL generated for that expression.  

Details:

The following snippet shows the hot loop in subroutine
"mutual_ind_quad_cir_coil".

(-Snip-)
  do i = 1, 2*m
  theta = pi*real(i,longreal)/real(m,longreal)
  c_vector(1) = r_coil * cos(theta)
  c_vector(2) = r_coil * sin(theta)
!
!   compute current vector for the coil in the global coordinate system
!
  coil_tmp_vector(1) = -sin(theta)
  coil_tmp_vector(2) = cos(theta)
  coil_tmp_vector(3) = 0.0_longreal
  coil_current_vec(1) =
dot_product(rotate_coil(1,:),coil_tmp_vector(:))
  coil_current_vec(2) =
dot_product(rotate_coil(2,:),coil_tmp_vector(:))
  coil_current_vec(3) =
dot_product(rotate_coil(3,:),coil_tmp_vector(:))
!
  do j = 1, 9
  c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
!   rotate coil vector into the global coordinate system and translate it
!
  rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
  rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
  rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
!
  do k = 1, 9
  q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
  q_vector(2) = 0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal)
  q_vector(3) = 0.0_longreal
!
!   rotate quad vector into the global coordinate system
!
  rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
  rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
  rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
!
!   compute and add in quadrature term
!
  numerator = w1gauss(j) * w2gauss(k) *
&

dot_product(coil_current_vec,current_vector)
  denominator = sqrt(dot_product(rot_c_vector-rot_q_vector,
&
 
rot_c_vector-rot_q_vector))
  l12_lower = l12_lower + numerator/denominator
  end do
  end do
  end do
(-Snip-)

At Ofast, the k loop is unrolled and vectorized. 

When -fprotect-parens is enabled at -Ofast, "q_vector(2) = 0.5_longreal * b1 *
(y2gauss(k) - 1.0_longreal)" and part of the expression "rot_q_vector(1) =
dot_product(rotate_quad(1,:),q_vector(:))" are hoisted out of the j loop:

But in case when -fprotect-parens is disabled, the expressions are not hoisted
out of the loop.

Observations:

1) In gimple, when -fprotect-parens is disabled, the expression (y2gauss(k) -
1.0_longreal) is reassociated as shown below.

   induct2.f90.080t.dse1
   (-Snip-)
   D.8701_385 = y2gauss[D.8696_378];
   D.8702_386 = D.8701_385 - 1.0e+0;
   D.8703_387 = b1_148 * D.8702_386;
   D.8704_388 = D.8703_387 * 5.0e-1;
   (-Snip-)

   induct2.f90.081.reassoc1
   (-Snip-)
   D.8701_385 = y2gauss[D.8696_378];
   D.8702_386 = D.8701_385 + -1.0e+0;
   D.8703_387 = b1_148 * 5.0e-1;
   D.8704_388 = D.8703_387 * D.8702_386;
  (-Snip-)

However with  -fprotect-parens is enabled, 

  induct2.f90.081.reassoc1
  (-Snip-)
  D.8814_395 = y2gauss[D.8808_387];
  D.8815_396 = D.8814_395 - 1.0e+0;
  D.8816_397 = ((D.8815_396));
  D.8817_398 = b1_154 * 5.0e-1;
  D.8818_399 = D.8817_398 * D.8816_397
  (-Snip-)

2) Due to the reassociation that happens when -fprotect-parens is disabled, the
RTL generated for the expression "0.5_longreal * b1 * (y2gauss(k) -
1.0_longreal)" also changes.

For example first 2 elements in y2guass array, the RTL

[Bug rtl-optimization/50904] Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast.

2011-11-01 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #7 from Venkataramanan Kumar  2011-11-02 05:50:44 UTC ---
(In reply to comment #6)
> > I don't see why RTL invariant motion should move the one variant but not
> > the other.  Of course this also shows that we should, after loop unrolling
> > on the tree level, also perform loop invariant motion again ...
> The problem seems to be in RTL PRE, which hoists simple loads but not loads
> that are wrapped up in a PLUS or a MINUS.  Even with -fprotect-parens, load
> hoisting opportunities are lost because of this.

You mean to say PRE hosits the when expression of this pattern.

(Snip)
(insn 536 533 537 14 (set (reg:V2DF 1172 [ MEM[(real(kind=8)[9] *)&y2gauss] ])
(mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2]  ) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256]))
induct2.f90:1662 1102 {*movv2df_internal}
(Snip)


But not of this pattern.

(Snip)
(insn 526 525 527 14 (set (reg:V2DF 1123)
(plus:V2DF (reg:V2DF 1124)
(mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2]  ) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256])))
../induct2.f90:1662 1130 {*addv2df3}
(Snip)


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-09 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #13 from Venkataramanan Kumar  2011-11-09 10:22:39 UTC ---
(In reply to comment #12)
> Created attachment 25764 [details]
> Tentative fix (3)
> Final version.  Can someone try it on his favorite Fortran benchmark?

Ok I will check and let you know the results.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-11 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #14 from Venkataramanan Kumar  2011-11-11 22:58:01 UTC ---
I ran polyhedron benchmarks with -march=bdver1 and -Ofast. Induct run time was
brought down to 53.45 sec from 70.93 sec. Other benchmarks are not affected
much.

I am planning to test on older machine.


[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-18 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #16 from Venkataramanan Kumar  2011-11-19 04:25:27 UTC ---
(In reply to comment #13)
> (In reply to comment #12)
> > Created attachment 25764 [details]
> > Tentative fix (3)
> > Final version.  Can someone try it on his favorite Fortran benchmark?
> Ok I will check and let you know the results.

Hi Eric, 

I tested on machine with -march=amdfam10, with your patch induct run time
improves by ~5%. Other benchmarks are not affected much. 

With your patch, CPU2006 benchmark "416.gamess" fails during validation.
Mimimal flag -O3 -fno-tree-pre -mfma4. I am trying to reduce the test case.
Please let me know if you have any other suggestions.


[Bug gcov-profile/51361] New: [4.7 Regression] 471.omnetpp of SPEC2006 fails to build with -fprofile-generate

2011-11-30 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51361

 Bug #: 51361
   Summary: [4.7 Regression] 471.omnetpp of SPEC2006 fails to
build with -fprofile-generate
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: gcov-profile
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: venkataramanan.kumar@gmail.com


When we build SPEC2006 benchmark omnetpp with  -fprofile-generate, it fails
with lot of undefined references to instrumented sysmbols.

(Snip)
 1322659317.41: EtherAppSrv.o:(.data+0x2a0): undefined reference to
`__gcov___ZNK7cModule13numInitStagesEv'
1322659317.41: EtherAppSrv.o:(.data+0x2a4): undefined reference to
`__gcov___ZN7cModule10initializeEi'
1322659317.41: EtherApp_m.o:(.data+0xdcc): undefined reference to
`__gcov___ZNK12EtherAppResp3dupEv'
1322659317.41: EtherApp_m.o:(.data+0xdd0): undefined reference to
`__gcov___ZNK11EtherAppReq3dupEv'
1322659317.41: EtherBus.o:(.data+0x260): undefined reference to
`__gcov___ZNK7cModule13numInitStagesEv'
1322659317.41: EtherBus.o:(.data+0x264): undefined reference to
`__gcov___ZN7cModule10initializeEi'
1322659317.41: EtherBus.o:(.data+0x26c): undefined reference to
`__gcov___ZNK8cMessage3dupEv'
(Snip ends) 

I am trying to make a reduced test case.

The build error started in revision 181105

Also reproduced in revison 181838


[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-06 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #49 from Venkataramanan Kumar  2011-12-06 09:59:39 UTC ---
I am planning to test the patch on polyhedron benchmarks. 
Then I test it on CPU2006 SPEC.


[Bug tree-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-07 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #50 from Venkataramanan Kumar  2011-12-07 13:18:57 UTC ---
In the machine I used Induct run time improves from 68.9 seconds to 55.94
seconds for -Ofast. I will update on other benchmarks and SPEC2006 once I
complete testing.


[Bug gcov-profile/51361] [4.7 Regression] 471.omnetpp of SPEC2006 fails to build with -fprofile-generate

2011-12-11 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51361

--- Comment #5 from Venkataramanan Kumar  2011-12-11 16:58:21 UTC ---
The defect does not occur now. I checked in revision 182206.