Hello! The "[patch, vectorizer] misaligned store support" patch [1] resulted in more than 10% longer execution time for Polyhedron test_fpu test on Core2.
The test is compiled with "-march=x86-64 -ffast-math -funroll-loops -O3", results are: time ./a.out Benchmark running, hopefully as only ACTIVE task Test1 - Gauss 2000 (101x101) inverts 2.5 sec Err= 0.000000000000006 Test2 - Crout 2000 (101x101) inverts 2.5 sec Err= 0.000000000000015 Test3 - Crout 2 (1001x1001) inverts 2.3 sec Err= 0.000000000000065 Test4 - Lapack 2 (1001x1001) inverts 2.4 sec Err= 0.000000000000250 total = 9.6 sec real 0m9.864s user 0m9.778s sys 0m0.074s with patch [1] included, vs.: time ./a.out Benchmark running, hopefully as only ACTIVE task Test1 - Gauss 2000 (101x101) inverts 1.9 sec Err= 0.000000000000006 Test2 - Crout 2000 (101x101) inverts 2.5 sec Err= 0.000000000000015 Test3 - Crout 2 (1001x1001) inverts 2.3 sec Err= 0.000000000000065 Test4 - Lapack 2 (1001x1001) inverts 2.0 sec Err= 0.000000000000250 total = 8.6 sec real 0m8.869s user 0m8.788s sys 0m0.068s when patch [1] is reverted. The compiler is from today's SVN, "xgcc (GCC) 4.5.0 20090704 (experimental) [trunk revision 149223]". The effect of this patch can also be seen on [2], see test_fpu chart between 2009-06-05 and 2009-06-06. [1] http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00492.html [2] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html -- Summary: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648