[Bug middle-end/45017] New: miscompile with bitfield and optimization

2010-07-21 Thread borntraeger at de dot ibm dot com
The following testcase has a different result with 4.6 and -O1/2/3

gcc4.5: all optimizations:
r1: 15 r2: 2

gcc4.6: -O0
r1: 15 r2: 2

gcc4.6: -O1,-O2..
r1: 47 r2: 2

#include 

int tester(char *bytes)
{
union {
struct {
unsigned int r1:4;
unsigned int r2:4;
} fmt;
char value[1];
} ovl;

ovl.value[0] = bytes[0];
printf("r1: %d r2: %d\n", ovl.fmt.r1, ovl.fmt.r2);
return 0;
}

int main()
{
char buff = 0x2f;
return tester(&buff);
}


-- 
   Summary: miscompile with bitfield and optimization
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: borntraeger at de dot ibm dot com
 GCC build triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu
  GCC host triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu
GCC target triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017



[Bug middle-end/45017] miscompile with bitfield and optimization

2010-07-22 Thread borntraeger at de dot ibm dot com


--- Comment #4 from borntraeger at de dot ibm dot com  2010-07-22 12:41 
---
the testcase will fail on big endian machines.
since r2=f and r1=2 instead of r2=2 and r1=f.
Can you adopt the testcase to check the endianess?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017



[Bug middle-end/45017] miscompile with bitfield and optimization

2010-07-25 Thread borntraeger at de dot ibm dot com


--- Comment #11 from borntraeger at de dot ibm dot com  2010-07-25 16:54 
---
Something like the following should do the trick.
Is endian.h available on all supported platforms?


*** gcc.c-torture/execute/pr45017.c.orig
--- gcc.c-torture/execute/pr45017.c
***
*** 1,9 
--- 1,17 
+ #include 
+ 
  int tester(char *bytes)
  {
union {
struct {
+ #  if __BYTE_ORDER == __BIG_ENDIAN
+ unsigned int r2:4;
+ unsigned int r1:4;
+ #  endif
+ #  if __BYTE_ORDER == __LITTLE_ENDIAN
  unsigned int r1:4;
  unsigned int r2:4;
+ #  endif
} fmt;
char value[1];
} ovl;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017



[Bug middle-end/45017] miscompile with bitfield and optimization

2010-07-26 Thread borntraeger at de dot ibm dot com


--- Comment #14 from borntraeger at de dot ibm dot com  2010-07-26 10:13 
---
I have seen the original problem only with bitfields.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017



[Bug middle-end/44078] [4.6 regression] FAIL: gcc.dg/tree-ssa/prefetch-7.c

2010-05-11 Thread borntraeger at de dot ibm dot com


--- Comment #1 from borntraeger at de dot ibm dot com  2010-05-11 13:43 
---
>From a first look this looks like that the test case scans for
"nontemporal store" which is also emitted by the new debug messages:

-return false;
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
+  return false;
+}

I will check if that is the problem. If yes I will probably fix the testcase.

Christian


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078



[Bug middle-end/44078] [4.6 regression] FAIL: gcc.dg/tree-ssa/prefetch-7.c

2010-05-11 Thread borntraeger at de dot ibm dot com


--- Comment #2 from borntraeger at de dot ibm dot com  2010-05-11 13:57 
---
Created an attachment (id=20629)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20629&action=view)
Testfix for the prefetch-7.c testcase

There always was
fprintf (dump_file, "Marked reference %p as a nontemporal store.\n",
in the old code.

r159256 now adds
fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);

The testcase matches both messages, but it should only match the first one.

So a potential fix might look like like the attachement. Can you confirm that
this patch fixes the testcase on your system?

Christian


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078



[Bug middle-end/44203] [4.6 regression] New prefetch test failures

2010-05-20 Thread borntraeger at de dot ibm dot com


--- Comment #1 from borntraeger at de dot ibm dot com  2010-05-20 08:29 
---

Indeed. I think I found a typo when handling array prefetches.
a potential fix might be:

--- gcc/tree-ssa-loop-prefetch.c(Revision 159557)
+++ gcc/tree-ssa-loop-prefetch.c(Arbeitskopie)
@@ -440,7 +440,7 @@

   *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
fold_convert (sizetype, *ar_data->step),
-   fold_convert (sizetype, step));
+   fold_convert (sizetype, stepsize));
   idelta *= imult;
 }



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203



[Bug middle-end/44203] [4.6 regression] New prefetch test failures

2010-05-20 Thread borntraeger at de dot ibm dot com


--- Comment #2 from borntraeger at de dot ibm dot com  2010-05-20 12:28 
---
Created an attachment (id=20709)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20709&action=view)
new version of the fix. 

There is actually a second bug :-(
We not only have to replace step with stepsize (a=a+b*b vs. a=a+b*c)
we also have to fix the fact that we changed a=a+b*c to a=(a+b)*c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread borntraeger at de dot ibm dot com


--- Comment #4 from borntraeger at de dot ibm dot com  2010-05-28 07:24 
---
Created an attachment (id=20767)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767&action=view)
Patch that makes loop invariant prefetches backend specfic

Three observations:

1. the patch had a bug which let to wrong calculation in some cases
This commit should be applied to improve some other testcases:
http://gcc.gnu.org/viewcvs?view=revision&revision=159816

2. The first patch was written in a way to allow the backend to enable this
feature or not. Feedback was given that this should not be backend specific.
Here is a patch that makes this feature backend specific again. We enable this
on s390 since useless prefetches are pretty cheap.

3. As a long term goal we could enhance profile directed feedback to make this
decision better.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread borntraeger at de dot ibm dot com


--- Comment #5 from borntraeger at de dot ibm dot com  2010-05-28 07:41 
---
An alternative approach might be have different values for 
prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio
depending on constant/non-constant step size.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-31 Thread borntraeger at de dot ibm dot com


--- Comment #10 from borntraeger at de dot ibm dot com  2010-05-31 08:58 
---
Created an attachment (id=20783)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783&action=view)
experimental patch to have separate values for min_insn_to_prefetch_ration

Changpeng,

thank you for the feedback.
Can you confirm that the regression was introduced by a prefetch with an
unknown step or is there still a bug in the calculation of the "normal"
prefetches (e.g. by applying the first patch that disables non-constant steps)

Anyway, here is a patch that increases min_insn_to_prefetch_ratio for
non-constant steps. Does that make a difference for tonto? Do you prefer other
intial values?
Thanks

Christian


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-01 Thread borntraeger at de dot ibm dot com


--- Comment #12 from borntraeger at de dot ibm dot com  2010-06-01 19:30 
---
Ok. So I will let you continue to look into that and wait for your results?

Do you have any feedback on separate.patch and its influence on performance?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread borntraeger at de dot ibm dot com


--- Comment #20 from borntraeger at de dot ibm dot com  2010-06-08 05:51 
---
both patches look sane. I will test both.
thank you for your work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-08 Thread borntraeger at de dot ibm dot com


--- Comment #22 from borntraeger at de dot ibm dot com  2010-06-08 19:42 
---
I bootstrapped with patches 0002 and 0003.
The results are also good.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297



[Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling

2010-06-18 Thread borntraeger at de dot ibm dot com
testsuite/gfortran.dg/zero_sized_1.f90 takes almost 11 minutes to compile on my
notebook when combining aggressive loop prefetching with loop peeling:

$ time gfortran-4.5 -O3 -march=core2  zero_sized_1.f90 -S
-fprefetch-loop-arrays  -funroll-loops --param max-completely-peeled-insns=2000

real10m54.594s
user10m48.638s
sys 0m0.393s

The compiler is almost always in compute_miss_rate introduced by
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00641.html

The problem is triggered by several things:
- loop peeling creates thousands of references (with only a small delta)  and
each reference is compared with every other reference
- for each comparison we iterate over all alignments
- for each alignment we iterate over all distinct iterators

As you can see this causes an exploding complexitiy.
Furthermore,since the cache line size is passed in via an external variable,
the compiler cannot optimize the integer division into shifts.

I think the right solution would be to reduce the complexity of
compute_miss_rate, but I have not found a good solution yet.


-- 
   Summary: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
time on prefetching+peeling
   Product: gcc
   Version: 4.5.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: borntraeger at de dot ibm dot com
  GCC host triplet: several, i486, s390..


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576



[Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-18 Thread borntraeger at de dot ibm dot com


--- Comment #1 from borntraeger at de dot ibm dot com  2010-06-18 07:59 
---
4.6 (trunk) is also affected


-- 

borntraeger at de dot ibm dot com changed:

   What|Removed |Added

Summary|testsuite/gfortran.dg/zero_s|testsuite/gfortran.dg/zero_s
   |ized_1.f90 with huge compile|ized_1.f90 with huge compile
   |time on prefetching+peeling |time on prefetching +
   ||peeling


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576



[Bug middle-end/44203] [4.6 regression] New prefetch test failures

2010-06-24 Thread borntraeger at de dot ibm dot com


--- Comment #6 from borntraeger at de dot ibm dot com  2010-06-24 12:35 
---
HJ confirmed that the patch worked and Andreas applied the patch. So from my
point of view, the problem is fixed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203



[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-25 Thread borntraeger at de dot ibm dot com


--- Comment #3 from borntraeger at de dot ibm dot com  2010-06-25 09:02 
---
Created an attachment (id=21001)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001&action=view)
Potential fix for compile time regression

Here is a potential fix. We just limit prefetching to loops with a "low" amount
of memory references and bail out if the amount of references is too large.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576



[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-26 Thread borntraeger at de dot ibm dot com


--- Comment #6 from borntraeger at de dot ibm dot com  2010-06-26 20:30 
---
Richard,

can you check if compute_miss_rate is the problem?
does the attached patch helps?

thanks

Christian


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576



[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-27 Thread borntraeger at de dot ibm dot com


--- Comment #9 from borntraeger at de dot ibm dot com  2010-06-27 11:09 
---
So there seem to be at least two problems:
1. exploding complexity in compute_miss_rate (the start for this bugzilla)
2. effects due to prefetching seen in other passes

I think the attached patch cures 1. What shall we do about 2? Opening another
bugzilla or tracking the problems here? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576