Usage of sizeof in testsuite/g++.dg/cpp0x/rv[1..8]p.C
Hello! A problem arises with the code in testsuite/g++.dg/cpp0x/rv[1..8]p.C. These tests use "sizeof(..character array...) == ", but sizeof char array depends heavily on the value of #define STRUCTURE_SIZE_BOUNDARY. Targets that define this value to i.e. 32 (for performance reasons, instead of default BITS_PER_UNIT) will fail all these checks. Would it be acceptable to change all these checks from "sa t1;" to "sa t1;" ? Uros.
Re: Turn on -fomit-frame-pointer by default for 32bit Linux/x86
On Sun, Aug 8, 2010 at 7:56 AM, Uros Bizjak wrote: > Hello! > > After recent discussions, I would like to propose a transition to > -fomit-frame-pointer for x86_32. > > The transition should be smooth as much as possible, should have > option to revert to old behaviour and still providing path for the > improvement. And we have learned something from cld issues, too > (cough, cough...). > > I support the idea to change x86_32 defaults w.r.t. frame pointer (and > unwind tables) to the same defaults as x86_64 has. > > The patch should also introduce --enable-frame-pointer configure > option (off by default) that would revert back to old x86_32 > behaviour. So, if there are codes that depend on FP, their users (or > distributions) should either (re-)configure the compiler with > --enable-frame-pointer or they should use older compiler - 4.5.x will > still be supported for many years. OTOH, it looks that users don't > care that much whether backtraces on x86_64 are totally accurate, so > IMO the sky won't fall down if x86_32 misses some backtraces in the > same way. And as I have learned from the discussion, the problem is > fixable with some effort on the user's side, thus fixing both targets > in one shot. > > Of course, this change and the option to revert to the previous > behaviour should be announced and documented in GCC release notes for > 4.6.0. > > IMO, we have to bite the bullet from time to time in order to improve > the generated code. We should not claim that gcc is > "no-code-left-behind compiler" - from my experience, introducing new > compiler always means that some parts of the code have to be fixed (as > in case of the change to -fno-strict-aliasing). > > Uros. > I tested this patch on Linux/ia32 and Linux/x86-64. There are no regressions. I don't have good wording for document: -- For 32-bit x86 targets, it is not enabled at @option{-Os} by default. This option also can be disabled by default on 32-bit x86 targets by configuring GCC with the @option{--enable-frame-pointer} configure option. -- isn't very accurate. Any suggestions? Thanks. -- H.J. --- 2010-08-09 H.J. Lu * config.gcc: Handle --enable-frame-pointer. * configure.ac: Add --enable-frame-pointer. * configure: Regenerated. * config/i386/i386.c (override_options): If not optimize for size, use -fomit-frame-pointer and -fasynchronous-unwind-tables by default for 32-bit code unless configured with --enable-frame-pointer. 2010-08-09 H.J. Lu * config.gcc: Handle --enable-frame-pointer. * configure.ac: Add --enable-frame-pointer. * configure: Regenerated. * config/i386/i386.c (override_options): If not optimize for size, use -fomit-frame-pointer and -fasynchronous-unwind-tables by default for 32-bit code unless configured with --enable-frame-pointer. diff --git a/gcc/config.gcc b/gcc/config.gcc index 9170fc8..62dd9f6 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -406,6 +406,9 @@ i[34567]86-*-*) if test "x$enable_cld" = xyes; then tm_defines="${tm_defines} USE_IX86_CLD=1" fi + if test "x$enable_frame_pointer" = xyes; then + tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1" + fi tm_file="vxworks-dummy.h ${tm_file}" ;; x86_64-*-*) @@ -413,6 +416,9 @@ x86_64-*-*) if test "x$enable_cld" = xyes; then tm_defines="${tm_defines} USE_IX86_CLD=1" fi + if test "x$enable_frame_pointer" = xyes; then + tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1" + fi tm_file="vxworks-dummy.h ${tm_file}" ;; esac diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1877730..c0b657b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2979,32 +2979,6 @@ override_options (bool main_args_p) if (TARGET_MACHO && TARGET_64BIT) flag_pic = 2; - /* Set the default values for switches whose default depends on TARGET_64BIT - in case they weren't overwritten by command line options. */ - if (TARGET_64BIT) -{ - if (flag_zee == 2) -flag_zee = 1; - /* Mach-O doesn't support omitting the frame pointer for now. */ - if (flag_omit_frame_pointer == 2) - flag_omit_frame_pointer = (TARGET_MACHO ? 0 : 1); - if (flag_asynchronous_unwind_tables == 2) - flag_asynchronous_unwind_tables = 1; - if (flag_pcc_struct_return == 2) - flag_pcc_struct_return = 0; -} - else -{ - if (flag_zee == 2) -flag_zee = 0; - if (flag_omit_frame_pointer == 2) - flag_omit_frame_pointer = 0; - if (flag_asynchronous_unwind_tables == 2) - flag_asynchronous_unwind_tables = 0; - if (flag_pcc_struct_return == 2) - flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN; -} - /* Need to check -mtune=generic first. */ if (ix86_tune_string) { @@ -3292,6 +3266,49 @@ o
Question about tree-switch-conversion.c
I am in the process of fixing PR44328 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44328) The problem is that gen_inbound_check in tree-switch-conversion.c subtracts info.range_min from info.index_expr, which can cause the MIN and MAX values for info.index_expr to become invalid. For example: typedef enum { FIRST = 0, SECOND, THIRD, FOURTH } ExampleEnum; int dummy (const ExampleEnum e) { int mode = 0; switch (e) { case SECOND: mode = 20; break; case THIRD: mode = 30; break; case FOURTH: mode = 40; break; } return mode; } tree-switch-conversion would like to create a lookup table for this, so that SECOND maps to entry 0, THIRD maps to entry 1 and FOURTH maps to entry 2. It achieves this by subtracting SECOND from index_expr. The problem is that after the subtraction, the type of the result can have a value outside the range 0-3. Later, when tree-vrp.c sees the inbound check as being <= 2 with a possible range for the type as 0-3, it converts the <=2 into a != 3, which is totally wrong. If e==FIRST, then we can end up looking for entry 255 in the lookup table! I think the solution is to update the type of the result of the subtraction to show that it is no longer in the range 0-3, but I have had trouble implementing this. The attached patch (based off 4.5 branch) shows my current approach, but I ran into LTO issues: lto1: internal compiler error: in lto_get_pickled_tree, at lto-streamer-in.c I am guessing this is because the debug info for the type does not match the new range I have set for it. Is there a *right* way to update the range such that LTO doesn't get unhappy? (Maybe a cast with fold_convert_loc would be right?) pr44328.gcc4.5.fix.patch Description: Binary data
Re: Remove "asssertions" support from libcpp
> "Steven" == Steven Bosscher writes: Steven> Assertions in libcpp have been deprecated since r135264: Steven> 2008-05-13 Tom Tromey Steven> PR preprocessor/22168: Steven> * expr.c (eval_token): Warn for use of assertions. Steven> Can this feature be removed for GCC 4.6? It would be fine by me, but I would rather have someone more actively involved in GCC make the decision. Tom
gcc-4.4-20100810 is now available
Snapshot gcc-4.4-20100810 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100810/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 163080 You'll find: gcc-4.4-20100810.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20100810.tar.bz2 C front end and core compiler gcc-ada-4.4-20100810.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20100810.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20100810.tar.bz2 C++ front end and runtime gcc-java-4.4-20100810.tar.bz2 Java front end and runtime gcc-objc-4.4-20100810.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20100810.tar.bz2The GCC testsuite Diffs from 4.4-20100803 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
food for optimizer developers
I wrote a Fortran to C++ conversion program that I used to convert selected LAPACK sources. Comparing runtimes with different compilers I get: absolute relative ifort 11.1.0721.790s1.00 gfortran 4.4.42.470s1.38 g++ 4.4.4 2.922s1.63 This is under Fedora 13, 64-bit, 12-core Opteron 2.2GHz All files needed to easily reproduce the results are here: http://cci.lbl.gov/lapack_fem/ See the README file or the example commands below. Questions: - Is there a way to make the g++ version as fast as ifort? - Is there anything I could do in the C++ code generation or in the "fem" Fortran EMulation library to help runtime performance (without making the generated C++ code less readable)? - Is there an interest in similar speed comparisons for other LAPACK functions (times above are for DSYEV with matrix size 800x800)? Note: yesterday I sent similar questions to a clang++ list, with the same subject line. This lead to a llvm bug report that may contain useful information: http://llvm.org/bugs/show_bug.cgi?id=7868 Ralf wget http://cci.lbl.gov/lapack_fem/lapack_fem_001.tgz tar zxf lapack_fem_001.tgz cd lapack_fem_001 g++ -o dsyev_test_g++ -I. -O3 -ffast-math dsyev_test.cpp time dsyev_test_g++
Re: food for optimizer developers
On Tue, Aug 10, 2010 at 6:51 PM, Ralf W. Grosse-Kunstleve wrote: > I wrote a Fortran to C++ conversion program that I used to convert selected > LAPACK sources. Comparing runtimes with different compilers I get: > > absolute relative > ifort 11.1.072 1.790s 1.00 > gfortran 4.4.4 2.470s 1.38 > g++ 4.4.4 2.922s 1.63 I wonder if adding __restrict to some of the arguments of the functions will help. Fortran aliasing is so different from C aliasing. -- Pinski
Re: food for optimizer developers
Most of the time is spent in this function... void dlasr( str_cref side, str_cref pivot, str_cref direct, int const& m, int const& n, arr_cref c, arr_cref s, arr_ref a, int const& lda) in this loop: FEM_DOSTEP(j, n - 1, 1, -1) { ctemp = c(j); stemp = s(j); if ((ctemp != one) || (stemp != zero)) { FEM_DO(i, 1, m) { temp = a(i, j + 1); a(i, j + 1) = ctemp * temp - stemp * a(i, j); a(i, j) = stemp * temp + ctemp * a(i, j); } } } a(i, j) is implemented as T* elems_; // member T const& operator()( ssize_t i1, ssize_t i2) const { return elems_[dims_.index_1d(i1, i2)]; } with ssize_t all[Ndims]; // member ssize_t origin[Ndims]; // member size_t index_1d( ssize_t i1, ssize_t i2) const { return (i2 - origin[1]) * all[0] + (i1 - origin[0]); } The array pointer is buried as elems_ member in the arr_ref<> class template. How can I apply __restrict in this case? Ralf - Original Message From: Andrew Pinski To: Ralf W. Grosse-Kunstleve Cc: gcc@gcc.gnu.org Sent: Tue, August 10, 2010 8:47:18 PM Subject: Re: food for optimizer developers On Tue, Aug 10, 2010 at 6:51 PM, Ralf W. Grosse-Kunstleve wrote: > I wrote a Fortran to C++ conversion program that I used to convert selected > LAPACK sources. Comparing runtimes with different compilers I get: > > absolute relative > ifort 11.1.0721.790s1.00 > gfortran 4.4.42.470s1.38 > g++ 4.4.4 2.922s1.63 I wonder if adding __restrict to some of the arguments of the functions will help. Fortran aliasing is so different from C aliasing. -- Pinski
Re: food for optimizer developers
On 8/10/2010 9:21 PM, Ralf W. Grosse-Kunstleve wrote: Most of the time is spent in this function... void dlasr( str_cref side, str_cref pivot, str_cref direct, int const& m, int const& n, arr_cref c, arr_cref s, arr_ref a, int const& lda) in this loop: FEM_DOSTEP(j, n - 1, 1, -1) { ctemp = c(j); stemp = s(j); if ((ctemp != one) || (stemp != zero)) { FEM_DO(i, 1, m) { temp = a(i, j + 1); a(i, j + 1) = ctemp * temp - stemp * a(i, j); a(i, j) = stemp * temp + ctemp * a(i, j); } } } a(i, j) is implemented as T* elems_; // member T const& operator()( ssize_t i1, ssize_t i2) const { return elems_[dims_.index_1d(i1, i2)]; } with ssize_t all[Ndims]; // member ssize_t origin[Ndims]; // member size_t index_1d( ssize_t i1, ssize_t i2) const { return (i2 - origin[1]) * all[0] + (i1 - origin[0]); } The array pointer is buried as elems_ member in the arr_ref<> class template. How can I apply __restrict in this case? Do you mean you are adding an additional level of functions and hoping for efficient in-lining? Your programming style is elusive, and your insistence on top posting will make this thread difficult to deal with. The conditional inside the loop likely is even more difficult for C++ to optimize than Fortran. As already discussed, if you don't optimize otherwise, you will need __restrict to overcome aliasing concerns among a,c, and s. If you want efficient C++, you will need a lot of hand optimization, and verification of the effect of each level of obscurity which you add. How is this topic appropriate to gcc mail list? -- Tim Prince