[Bug middle-end/45017] New: miscompile with bitfield and optimization
The following testcase has a different result with 4.6 and -O1/2/3 gcc4.5: all optimizations: r1: 15 r2: 2 gcc4.6: -O0 r1: 15 r2: 2 gcc4.6: -O1,-O2.. r1: 47 r2: 2 #include int tester(char *bytes) { union { struct { unsigned int r1:4; unsigned int r2:4; } fmt; char value[1]; } ovl; ovl.value[0] = bytes[0]; printf("r1: %d r2: %d\n", ovl.fmt.r1, ovl.fmt.r2); return 0; } int main() { char buff = 0x2f; return tester(&buff); } -- Summary: miscompile with bitfield and optimization Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: borntraeger at de dot ibm dot com GCC build triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu GCC host triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu GCC target triplet: i686-pc-linux-gnu and s390x-ibm-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017
[Bug middle-end/45017] miscompile with bitfield and optimization
--- Comment #4 from borntraeger at de dot ibm dot com 2010-07-22 12:41 --- the testcase will fail on big endian machines. since r2=f and r1=2 instead of r2=2 and r1=f. Can you adopt the testcase to check the endianess? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017
[Bug middle-end/45017] miscompile with bitfield and optimization
--- Comment #11 from borntraeger at de dot ibm dot com 2010-07-25 16:54 --- Something like the following should do the trick. Is endian.h available on all supported platforms? *** gcc.c-torture/execute/pr45017.c.orig --- gcc.c-torture/execute/pr45017.c *** *** 1,9 --- 1,17 + #include + int tester(char *bytes) { union { struct { + # if __BYTE_ORDER == __BIG_ENDIAN + unsigned int r2:4; + unsigned int r1:4; + # endif + # if __BYTE_ORDER == __LITTLE_ENDIAN unsigned int r1:4; unsigned int r2:4; + # endif } fmt; char value[1]; } ovl; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017
[Bug middle-end/45017] miscompile with bitfield and optimization
--- Comment #14 from borntraeger at de dot ibm dot com 2010-07-26 10:13 --- I have seen the original problem only with bitfields. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45017
[Bug middle-end/44078] [4.6 regression] FAIL: gcc.dg/tree-ssa/prefetch-7.c
--- Comment #1 from borntraeger at de dot ibm dot com 2010-05-11 13:43 --- >From a first look this looks like that the test case scans for "nontemporal store" which is also emitted by the new debug messages: -return false; +{ + if (dump_file && (dump_flags & TDF_DETAILS)) +fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref); + return false; +} I will check if that is the problem. If yes I will probably fix the testcase. Christian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078
[Bug middle-end/44078] [4.6 regression] FAIL: gcc.dg/tree-ssa/prefetch-7.c
--- Comment #2 from borntraeger at de dot ibm dot com 2010-05-11 13:57 --- Created an attachment (id=20629) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20629&action=view) Testfix for the prefetch-7.c testcase There always was fprintf (dump_file, "Marked reference %p as a nontemporal store.\n", in the old code. r159256 now adds fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref); The testcase matches both messages, but it should only match the first one. So a potential fix might look like like the attachement. Can you confirm that this patch fixes the testcase on your system? Christian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078
[Bug middle-end/44203] [4.6 regression] New prefetch test failures
--- Comment #1 from borntraeger at de dot ibm dot com 2010-05-20 08:29 --- Indeed. I think I found a typo when handling array prefetches. a potential fix might be: --- gcc/tree-ssa-loop-prefetch.c(Revision 159557) +++ gcc/tree-ssa-loop-prefetch.c(Arbeitskopie) @@ -440,7 +440,7 @@ *ar_data->step = fold_build2 (MULT_EXPR, sizetype, fold_convert (sizetype, *ar_data->step), - fold_convert (sizetype, step)); + fold_convert (sizetype, stepsize)); idelta *= imult; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
[Bug middle-end/44203] [4.6 regression] New prefetch test failures
--- Comment #2 from borntraeger at de dot ibm dot com 2010-05-20 12:28 --- Created an attachment (id=20709) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20709&action=view) new version of the fix. There is actually a second bug :-( We not only have to replace step with stepsize (a=a+b*b vs. a=a+b*c) we also have to fix the fact that we changed a=a+b*c to a=(a+b)*c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #4 from borntraeger at de dot ibm dot com 2010-05-28 07:24 --- Created an attachment (id=20767) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767&action=view) Patch that makes loop invariant prefetches backend specfic Three observations: 1. the patch had a bug which let to wrong calculation in some cases This commit should be applied to improve some other testcases: http://gcc.gnu.org/viewcvs?view=revision&revision=159816 2. The first patch was written in a way to allow the backend to enable this feature or not. Feedback was given that this should not be backend specific. Here is a patch that makes this feature backend specific again. We enable this on s390 since useless prefetches are pretty cheap. 3. As a long term goal we could enhance profile directed feedback to make this decision better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #5 from borntraeger at de dot ibm dot com 2010-05-28 07:41 --- An alternative approach might be have different values for prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio depending on constant/non-constant step size. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #10 from borntraeger at de dot ibm dot com 2010-05-31 08:58 --- Created an attachment (id=20783) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783&action=view) experimental patch to have separate values for min_insn_to_prefetch_ration Changpeng, thank you for the feedback. Can you confirm that the regression was introduced by a prefetch with an unknown step or is there still a bug in the calculation of the "normal" prefetches (e.g. by applying the first patch that disables non-constant steps) Anyway, here is a patch that increases min_insn_to_prefetch_ratio for non-constant steps. Does that make a difference for tonto? Do you prefer other intial values? Thanks Christian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #12 from borntraeger at de dot ibm dot com 2010-06-01 19:30 --- Ok. So I will let you continue to look into that and wait for your results? Do you have any feedback on separate.patch and its influence on performance? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #20 from borntraeger at de dot ibm dot com 2010-06-08 05:51 --- both patches look sane. I will test both. thank you for your work. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86
--- Comment #22 from borntraeger at de dot ibm dot com 2010-06-08 19:42 --- I bootstrapped with patches 0002 and 0003. The results are also good. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
[Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling
testsuite/gfortran.dg/zero_sized_1.f90 takes almost 11 minutes to compile on my notebook when combining aggressive loop prefetching with loop peeling: $ time gfortran-4.5 -O3 -march=core2 zero_sized_1.f90 -S -fprefetch-loop-arrays -funroll-loops --param max-completely-peeled-insns=2000 real10m54.594s user10m48.638s sys 0m0.393s The compiler is almost always in compute_miss_rate introduced by http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00641.html The problem is triggered by several things: - loop peeling creates thousands of references (with only a small delta) and each reference is compared with every other reference - for each comparison we iterate over all alignments - for each alignment we iterate over all distinct iterators As you can see this causes an exploding complexitiy. Furthermore,since the cache line size is passed in via an external variable, the compiler cannot optimize the integer division into shifts. I think the right solution would be to reduce the complexity of compute_miss_rate, but I have not found a good solution yet. -- Summary: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: borntraeger at de dot ibm dot com GCC host triplet: several, i486, s390.. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576
[Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
--- Comment #1 from borntraeger at de dot ibm dot com 2010-06-18 07:59 --- 4.6 (trunk) is also affected -- borntraeger at de dot ibm dot com changed: What|Removed |Added Summary|testsuite/gfortran.dg/zero_s|testsuite/gfortran.dg/zero_s |ized_1.f90 with huge compile|ized_1.f90 with huge compile |time on prefetching+peeling |time on prefetching + ||peeling http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576
[Bug middle-end/44203] [4.6 regression] New prefetch test failures
--- Comment #6 from borntraeger at de dot ibm dot com 2010-06-24 12:35 --- HJ confirmed that the patch worked and Andreas applied the patch. So from my point of view, the problem is fixed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
--- Comment #3 from borntraeger at de dot ibm dot com 2010-06-25 09:02 --- Created an attachment (id=21001) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001&action=view) Potential fix for compile time regression Here is a potential fix. We just limit prefetching to loops with a "low" amount of memory references and bail out if the amount of references is too large. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576
[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
--- Comment #6 from borntraeger at de dot ibm dot com 2010-06-26 20:30 --- Richard, can you check if compute_miss_rate is the problem? does the attached patch helps? thanks Christian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576
[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
--- Comment #9 from borntraeger at de dot ibm dot com 2010-06-27 11:09 --- So there seem to be at least two problems: 1. exploding complexity in compute_miss_rate (the start for this bugzilla) 2. effects due to prefetching seen in other passes I think the attached patch cures 1. What shall we do about 2? Opening another bugzilla or tracking the problems here? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576