Usage of sizeof in testsuite/g++.dg/cpp0x/rv[1..8]p.C

2010-08-10 Thread Uros Bizjak
Hello!

A problem arises with the code in testsuite/g++.dg/cpp0x/rv[1..8]p.C.
These tests use "sizeof(..character array...) == ", but sizeof char
array depends heavily on the value of #define STRUCTURE_SIZE_BOUNDARY.
Targets that define this value to i.e. 32 (for performance reasons,
instead of default BITS_PER_UNIT) will fail all these checks.

Would it be acceptable to change all these checks from

"sa t1;"

to

"sa t1;" ?

Uros.


Re: Turn on -fomit-frame-pointer by default for 32bit Linux/x86

2010-08-10 Thread H.J. Lu
On Sun, Aug 8, 2010 at 7:56 AM, Uros Bizjak  wrote:
> Hello!
>
> After recent discussions, I would like to propose a transition to
> -fomit-frame-pointer for x86_32.
>
> The transition should be smooth as much as possible, should have
> option to revert to old behaviour and still providing path for the
> improvement. And we have learned something from cld issues, too
> (cough, cough...).
>
> I support the idea to change x86_32 defaults w.r.t. frame pointer (and
> unwind tables) to the same defaults as x86_64 has.
>
> The patch should also introduce --enable-frame-pointer configure
> option (off by default) that would revert back to old x86_32
> behaviour. So, if there are codes that depend on FP, their users (or
> distributions) should either (re-)configure the compiler with
> --enable-frame-pointer or they should use older compiler - 4.5.x will
> still be supported for many years. OTOH, it looks that users don't
> care that much whether backtraces on x86_64 are totally accurate, so
> IMO the sky won't fall down if x86_32 misses some backtraces in the
> same way. And as I have learned from the discussion, the problem is
> fixable with some effort on the user's side, thus fixing both targets
> in one shot.
>
> Of course, this change and the option to revert to the previous
> behaviour should be announced and documented in GCC release notes for
> 4.6.0.
>
> IMO, we have to bite the bullet from time to time in order to improve
> the generated code. We should not claim that gcc is
> "no-code-left-behind compiler" - from my experience, introducing new
> compiler always means that some parts of the code have to be fixed (as
> in case of the change to -fno-strict-aliasing).
>
> Uros.
>

I tested this patch on Linux/ia32 and Linux/x86-64. There are no regressions.

I don't have good wording for document:

--
For 32-bit x86 targets, it is not enabled at @option{-Os} by default.
This option also can be disabled by default on 32-bit x86 targets by
configuring GCC with the @option{--enable-frame-pointer} configure
option.
--

isn't very accurate.  Any suggestions?

Thanks.


-- 
H.J.
---
2010-08-09  H.J. Lu  

* config.gcc: Handle --enable-frame-pointer.

* configure.ac: Add --enable-frame-pointer.
* configure: Regenerated.

* config/i386/i386.c (override_options): If not optimize for
size, use -fomit-frame-pointer and -fasynchronous-unwind-tables
by default for 32-bit code unless configured with
--enable-frame-pointer.
2010-08-09  H.J. Lu  

* config.gcc: Handle --enable-frame-pointer.

* configure.ac: Add --enable-frame-pointer.
* configure: Regenerated.

* config/i386/i386.c (override_options): If not optimize for
size, use -fomit-frame-pointer and -fasynchronous-unwind-tables
by default for 32-bit code unless configured with
--enable-frame-pointer.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 9170fc8..62dd9f6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -406,6 +406,9 @@ i[34567]86-*-*)
if test "x$enable_cld" = xyes; then
tm_defines="${tm_defines} USE_IX86_CLD=1"
fi
+   if test "x$enable_frame_pointer" = xyes; then
+   tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1"
+   fi
tm_file="vxworks-dummy.h ${tm_file}"
;;
 x86_64-*-*)
@@ -413,6 +416,9 @@ x86_64-*-*)
if test "x$enable_cld" = xyes; then
tm_defines="${tm_defines} USE_IX86_CLD=1"
fi
+   if test "x$enable_frame_pointer" = xyes; then
+   tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1"
+   fi
tm_file="vxworks-dummy.h ${tm_file}"
;;
 esac
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1877730..c0b657b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2979,32 +2979,6 @@ override_options (bool main_args_p)
   if (TARGET_MACHO && TARGET_64BIT)
 flag_pic = 2;
 
-  /* Set the default values for switches whose default depends on TARGET_64BIT
- in case they weren't overwritten by command line options.  */
-  if (TARGET_64BIT)
-{
-  if (flag_zee == 2)
-flag_zee = 1;
-  /* Mach-O doesn't support omitting the frame pointer for now.  */
-  if (flag_omit_frame_pointer == 2)
-   flag_omit_frame_pointer = (TARGET_MACHO ? 0 : 1);
-  if (flag_asynchronous_unwind_tables == 2)
-   flag_asynchronous_unwind_tables = 1;
-  if (flag_pcc_struct_return == 2)
-   flag_pcc_struct_return = 0;
-}
-  else
-{
-  if (flag_zee == 2)
-flag_zee = 0;
-  if (flag_omit_frame_pointer == 2)
-   flag_omit_frame_pointer = 0;
-  if (flag_asynchronous_unwind_tables == 2)
-   flag_asynchronous_unwind_tables = 0;
-  if (flag_pcc_struct_return == 2)
-   flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
-}
-
   /* Need to check -mtune=generic first.  */
   if (ix86_tune_string)
 {
@@ -3292,6 +3266,49 @@ o

Question about tree-switch-conversion.c

2010-08-10 Thread Ian Bolton
I am in the process of fixing PR44328
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44328)

The problem is that gen_inbound_check in tree-switch-conversion.c subtracts
info.range_min from info.index_expr, which can cause the MIN and MAX values
for info.index_expr to become invalid.

For example:


typedef enum {
  FIRST = 0,
  SECOND,
  THIRD,
  FOURTH
} ExampleEnum;

int dummy (const ExampleEnum e)
{
  int mode = 0;
  switch (e)
  {
case SECOND: mode = 20; break;
case THIRD: mode = 30; break;
case FOURTH: mode = 40; break;
  }
  return mode;
}


tree-switch-conversion would like to create a lookup table for this, so
that SECOND maps to entry 0, THIRD maps to entry 1 and FOURTH maps to
entry 2.  It achieves this by subtracting SECOND from index_expr.  The
problem is that after the subtraction, the type of the result can have a
value outside the range 0-3.

Later, when tree-vrp.c sees the inbound check as being <= 2 with a possible
range for the type as 0-3, it converts the <=2 into a != 3, which is
totally wrong.  If e==FIRST, then we can end up looking for entry 255 in
the lookup table!

I think the solution is to update the type of the result of the subtraction
to show that it is no longer in the range 0-3, but I have had trouble
implementing this.  The attached patch (based off 4.5 branch) shows my
current approach, but I ran into LTO issues:

lto1: internal compiler error: in lto_get_pickled_tree, at lto-streamer-in.c

I am guessing this is because the debug info for the type does not match
the new range I have set for it.

Is there a *right* way to update the range such that LTO doesn't get
unhappy?  (Maybe a cast with fold_convert_loc would be right?)


pr44328.gcc4.5.fix.patch
Description: Binary data


Re: Remove "asssertions" support from libcpp

2010-08-10 Thread Tom Tromey
> "Steven" == Steven Bosscher  writes:

Steven> Assertions in libcpp have been deprecated since r135264:
Steven> 2008-05-13  Tom Tromey  
Steven> PR preprocessor/22168:
Steven> * expr.c (eval_token): Warn for use of assertions.
Steven> Can this feature be removed for GCC 4.6?

It would be fine by me, but I would rather have someone more actively
involved in GCC make the decision.

Tom


gcc-4.4-20100810 is now available

2010-08-10 Thread gccadmin
Snapshot gcc-4.4-20100810 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100810/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 163080

You'll find:

gcc-4.4-20100810.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20100810.tar.bz2 C front end and core compiler

gcc-ada-4.4-20100810.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20100810.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20100810.tar.bz2  C++ front end and runtime

gcc-java-4.4-20100810.tar.bz2 Java front end and runtime

gcc-objc-4.4-20100810.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20100810.tar.bz2The GCC testsuite

Diffs from 4.4-20100803 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


food for optimizer developers

2010-08-10 Thread Ralf W. Grosse-Kunstleve
I wrote a Fortran to C++ conversion program that I used to convert selected
LAPACK sources. Comparing runtimes with different compilers I get:

 absolute  relative
ifort 11.1.0721.790s1.00
gfortran 4.4.42.470s1.38
g++ 4.4.4 2.922s1.63

This is under Fedora 13, 64-bit, 12-core Opteron 2.2GHz

All files needed to easily reproduce the results are here:

  http://cci.lbl.gov/lapack_fem/

See the README file or the example commands below.

Questions:

- Is there a way to make the g++ version as fast as ifort?

- Is there anything I could do in the C++ code generation or in the
  "fem" Fortran EMulation library to help runtime performance
  (without making the generated C++ code less readable)?

- Is there an interest in similar speed comparisons for other
  LAPACK functions (times above are for DSYEV with matrix size 800x800)?

Note: yesterday I sent similar questions to a clang++ list, with the
same subject line. This lead to a llvm bug report that may contain
useful information:

  http://llvm.org/bugs/show_bug.cgi?id=7868

Ralf


wget http://cci.lbl.gov/lapack_fem/lapack_fem_001.tgz
tar zxf lapack_fem_001.tgz
cd lapack_fem_001
g++ -o dsyev_test_g++ -I. -O3 -ffast-math dsyev_test.cpp
time dsyev_test_g++


Re: food for optimizer developers

2010-08-10 Thread Andrew Pinski
On Tue, Aug 10, 2010 at 6:51 PM, Ralf W. Grosse-Kunstleve
 wrote:
> I wrote a Fortran to C++ conversion program that I used to convert selected
> LAPACK sources. Comparing runtimes with different compilers I get:
>
>                         absolute  relative
> ifort 11.1.072            1.790s    1.00
> gfortran 4.4.4            2.470s    1.38
> g++ 4.4.4                 2.922s    1.63

I wonder if adding __restrict to some of the arguments of the
functions will help.  Fortran aliasing is so different from C
aliasing.

-- Pinski


Re: food for optimizer developers

2010-08-10 Thread Ralf W. Grosse-Kunstleve
Most of the time is spent in this function...

void
dlasr(
  str_cref side,
  str_cref pivot,
  str_cref direct,
  int const& m,
  int const& n,
  arr_cref c,
  arr_cref s,
  arr_ref a,
  int const& lda)

in this loop:

FEM_DOSTEP(j, n - 1, 1, -1) {
  ctemp = c(j);
  stemp = s(j);
  if ((ctemp != one) || (stemp != zero)) {
FEM_DO(i, 1, m) {
  temp = a(i, j + 1);
  a(i, j + 1) = ctemp * temp - stemp * a(i, j);
  a(i, j) = stemp * temp + ctemp * a(i, j);
}
  }
}

a(i, j) is implemented as

  T* elems_; // member

T const&
operator()(
  ssize_t i1,
  ssize_t i2) const
{
  return elems_[dims_.index_1d(i1, i2)];
}

with
  
  ssize_t all[Ndims]; // member
  ssize_t origin[Ndims]; // member

size_t
index_1d(
  ssize_t i1,
  ssize_t i2) const
{
  return
  (i2 - origin[1]) * all[0]
+ (i1 - origin[0]);
}

The array pointer is buried as elems_ member in the arr_ref<> class template.
How can I apply __restrict in this case?

Ralf




- Original Message 
From: Andrew Pinski 
To: Ralf W. Grosse-Kunstleve 
Cc: gcc@gcc.gnu.org
Sent: Tue, August 10, 2010 8:47:18 PM
Subject: Re: food for optimizer developers

On Tue, Aug 10, 2010 at 6:51 PM, Ralf W. Grosse-Kunstleve
 wrote:
> I wrote a Fortran to C++ conversion program that I used to convert selected
> LAPACK sources. Comparing runtimes with different compilers I get:
>
> absolute  relative
> ifort 11.1.0721.790s1.00
> gfortran 4.4.42.470s1.38
> g++ 4.4.4 2.922s1.63

I wonder if adding __restrict to some of the arguments of the
functions will help.  Fortran aliasing is so different from C
aliasing.

-- Pinski



Re: food for optimizer developers

2010-08-10 Thread Tim Prince

On 8/10/2010 9:21 PM, Ralf W. Grosse-Kunstleve wrote:

Most of the time is spent in this function...

void
dlasr(
   str_cref side,
   str_cref pivot,
   str_cref direct,
   int const&  m,
   int const&  n,
   arr_cref  c,
   arr_cref  s,
   arr_ref  a,
   int const&  lda)

in this loop:

 FEM_DOSTEP(j, n - 1, 1, -1) {
   ctemp = c(j);
   stemp = s(j);
   if ((ctemp != one) || (stemp != zero)) {
 FEM_DO(i, 1, m) {
   temp = a(i, j + 1);
   a(i, j + 1) = ctemp * temp - stemp * a(i, j);
   a(i, j) = stemp * temp + ctemp * a(i, j);
 }
   }
 }

a(i, j) is implemented as

   T* elems_; // member

 T const&
 operator()(
   ssize_t i1,
   ssize_t i2) const
 {
   return elems_[dims_.index_1d(i1, i2)];
 }

with

   ssize_t all[Ndims]; // member
   ssize_t origin[Ndims]; // member

 size_t
 index_1d(
   ssize_t i1,
   ssize_t i2) const
 {
   return
   (i2 - origin[1]) * all[0]
 + (i1 - origin[0]);
 }

The array pointer is buried as elems_ member in the arr_ref<>  class template.
How can I apply __restrict in this case?

   
Do you mean you are adding an additional level of functions and hoping 
for efficient in-lining?   Your programming style is elusive, and your 
insistence on top posting will make this thread difficult to deal with.
The conditional inside the loop likely is even more difficult for C++ to 
optimize than Fortran. As already discussed, if you don't optimize 
otherwise, you will need __restrict to overcome aliasing concerns among 
a,c, and s.  If you want efficient C++, you will need a lot of hand 
optimization, and verification of the effect of each level of obscurity 
which you add.   How is this topic appropriate to gcc mail list?


--
Tim Prince