Re: Compile time of Expression Templates in C++

2006-10-13 Thread Richard Guenther

On 10/10/06, Jochen Haerdtlein
<[EMAIL PROTECTED]> wrote:

Hello,

I am a PhD student working on the extended use of
expression templates for solving partial differential equations.
Thus, I did a lot of studying of expression templates papers,
articles and expression templates implementations. Actually,
I had some ideas for improving them for high performance
platforms.

   Since compile time of huge expression template programs is
a severe problem, I am interested in the mechanisms within g++
that are necessary to resolve the nested template constructs.
Unfortunately, it is pretty hard (for me) to find any hints on that. Is
there any documentation concerning the policies within gcc how
the templates are resolved, instatiated, etc. ?
Where is the time spent during compiling? I already started
the -ftime-report option and got some information. Concerning that,
about one third of the time is needed by the parser. But I would like
to understand it in more detail, if possible.

First I would be happy for any reaction of you, just telling me
whether my interests are complete nonsense or unsolvable.
Maybe,  you can give me some hints, where I can find some
answers, or somebody I can annoy.


You may want to look at PR29433.

Richard.


Re: Getting type used in THROW_EXPR

2006-10-14 Thread Richard Guenther

On 10/14/06, Brendon Costa <[EMAIL PROTECTED]> wrote:

Hi all,

I have yet another question that has arisen as i have started testing my
code. Basically I am trying to get the type that is being used in
throwing an exception.


Is there a simple macro i can use to get the type of an exception from a
THROW_EXPR? I think this is a matter of getting the TREE_TYPE for the
value passed into the function: except.c: build_throw(tree exp)


If you look at cp/cp-tree.def you will see

/* A throw expression.  operand 0 is the expression, if there was one,
  else it is NULL_TREE.  */
DEFTREECODE (THROW_EXPR, "throw_expr", tcc_expression, 1)

which means that TREE_OPERAND (t, 0) is the expression thrown.  Based
on whether that is a reference already or not, you need to create a
reference by your own using build1 (ADDR_EXPR, ...) with a properly
constructed reference type (I guess there's some helper for that in the
C++ frontend).

Richard.


Re: is this a good time to commit a patch on builtin_function?

2006-10-23 Thread Richard Guenther

On 10/23/06, Rafael Espíndola <[EMAIL PROTECTED]> wrote:

I have an approved  patch that factors code that is common to all
builtin_function implementations
(http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00195.html,
http://gcc.gnu.org/ml/gcc-patches/2006-06/msg01499.html).

I have just updated and tested it. Is this a good time to commit?


Usually such changes disturb branches and are more suitable for late
stage2.  But
in this case I don't see why this should be the case.

Richard.


Re: memory benchmark of tuples branch

2006-10-27 Thread Richard Guenther

On 10/27/06, Aldy Hernandez <[EMAIL PROTECTED]> wrote:

> My vote is to merge into mainline sooner rather than later.  However, it
> is a big patch and affects just about every module in the compiler, so I
> wouldn't want to barge in without getting some consensus first.

I agree with you and Mark.

What I'd like to do next is:

1. Merge mainline into the branch to make sure there are no snafus.
2. Post the patch so folks can look at it while I do 3 & 4:

3. Attack the other front ends.  So far I have only dealt with C/C++,
   but the FE only changes are usually minimal.
4. Adjust the 10 or so back ends that make MODIFY_EXPR nodes.
5. Merge into mainline.

And then... start working on the other tuples.

How does this sound to y'all?


Does the tuples branch include the CALL_EXPR reworking from the LTO branch?

Richard.


Re: compiling very large functions.

2006-11-04 Thread Richard Guenther

On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

I think that it is time that we in the GCC community took some time to
address the problem of compiling very large functions in a somewhat
systematic manner.

GCC has two competing interests here:  it needs to be able to provide
state of the art optimization for modest sized functions and it needs to
be able to properly process very large machine generated functions using
reasonable resources.

I believe that the default behavior for the compiler should be that
certain non essential passes be skipped if a very large function is
encountered.

There are two problems here:

1) defining the set of optimizations that need to be skipped.
2) defining the set of functions that trigger the special processing.


For (1) I would propose that three measures be made of each function.
These measures should be made before inlining occurs. The three measures
are the number of variables, the number of statements, and the number of
basic blocks.


Why before inlining?  These three numbers can change quite significantly
as a function passes through the pass pipeline.  So we should try to keep
them up-to-date to have an accurate measurement.

Otherwise the proposal sounds reasonable but we should make sure the
limits we impose allow reproducible compilations for N x M cross
configurations and native compilation on different sized machines.

Richard.


Re: compiling very large functions.

2006-11-04 Thread Richard Guenther

On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> I think that it is time that we in the GCC community took some time to
>> address the problem of compiling very large functions in a somewhat
>> systematic manner.
>>
>> GCC has two competing interests here:  it needs to be able to provide
>> state of the art optimization for modest sized functions and it needs to
>> be able to properly process very large machine generated functions using
>> reasonable resources.
>>
>> I believe that the default behavior for the compiler should be that
>> certain non essential passes be skipped if a very large function is
>> encountered.
>>
>> There are two problems here:
>>
>> 1) defining the set of optimizations that need to be skipped.
>> 2) defining the set of functions that trigger the special processing.
>>
>>
>> For (1) I would propose that three measures be made of each function.
>> These measures should be made before inlining occurs. The three measures
>> are the number of variables, the number of statements, and the number of
>> basic blocks.
>
> Why before inlining?  These three numbers can change quite significantly
> as a function passes through the pass pipeline.  So we should try to keep
> them up-to-date to have an accurate measurement.
>
I am flexible here. We may want inlining to be able to update the
numbers.  However, I think that we should drive the inlining agression
based on these numbers.


Well, for example jump threading and tail duplication can cause these
numbers to significantly change.  Also CFG instrumentation and PRE
can increase the BB count.  So we need to deal with cases where an
optimization produces overly large number of basic blocks or instructions.
(like by throtting those passes properly)

Richard.


Re: compiling very large functions.

2006-11-05 Thread Richard Guenther

On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> Richard Guenther wrote:
>> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> >> I think that it is time that we in the GCC community took some
>> time to
>> >> address the problem of compiling very large functions in a somewhat
>> >> systematic manner.
>> >>
>> >> GCC has two competing interests here:  it needs to be able to provide
>> >> state of the art optimization for modest sized functions and it
>> needs to
>> >> be able to properly process very large machine generated functions
>> using
>> >> reasonable resources.
>> >>
>> >> I believe that the default behavior for the compiler should be that
>> >> certain non essential passes be skipped if a very large function is
>> >> encountered.
>> >>
>> >> There are two problems here:
>> >>
>> >> 1) defining the set of optimizations that need to be skipped.
>> >> 2) defining the set of functions that trigger the special processing.
>> >>
>> >>
>> >> For (1) I would propose that three measures be made of each function.
>> >> These measures should be made before inlining occurs. The three
>> measures
>> >> are the number of variables, the number of statements, and the
>> number of
>> >> basic blocks.
>> >
>> > Why before inlining?  These three numbers can change quite
>> significantly
>> > as a function passes through the pass pipeline.  So we should try
>> to keep
>> > them up-to-date to have an accurate measurement.
>> >
>> I am flexible here. We may want inlining to be able to update the
>> numbers.  However, I think that we should drive the inlining agression
>> based on these numbers.
>
> Well, for example jump threading and tail duplication can cause these
> numbers to significantly change.  Also CFG instrumentation and PRE
> can increase the BB count.  So we need to deal with cases where an
> optimization produces overly large number of basic blocks or
> instructions.
> (like by throtting those passes properly)
>
I lean to leave the numbers static even if they do increase as time goes
by.  Otherwise you get two effects, the first optimizations get to be
run more, and you get the wierd non linear step functions where small
changes in some upstream function effect the down stream.


Ok, I guess we can easily flag each function as having
- many BBs
- big BBs
- complex CFG (many edges)
and set these flags at CFG construction time during the lowering phase
(which is after the early inlining pass I believe).

The number of basic blocks is kept up-to-date during optimization, the
other numbers would need to be re-generated if we want to keep them
up-to-date.

But with just using three (or even only one?) flag, we can easily fit this
information in struct function.

I also like the idea of a "hot" function flag to be able to dynamically
switch betweed optimize_size and !optimize_size in the tree optimizers.
Profile based inlining already tries to follow that route.

Richard.


Re: strict aliasing question

2006-11-10 Thread Richard Guenther

On 11/10/06, Howard Chu <[EMAIL PROTECTED]> wrote:

I see a lot of APIs (e.g. Cyrus SASL) that have accessor functions
returning values through a void ** argument. As far as I can tell, this
doesn't actually cause any problems, but gcc 4.1 with -Wstrict-aliasing
will complain. For example, take these two separate source files:

alias1.c


#include 

extern void getit( void **arg );

main() {
int *foo;

getit( (void **)&foo);
printf("foo: %x\n", *foo);
}



alias2.c

static short x[] = {16,16};

void getit( void **arg ) {
*arg = x;
}


gcc -O3 -fstrict-aliasing -Wstrict-aliasing *.c -o alias

The program prints the expected result with both strict-aliasing and
no-strict-aliasing on my x86_64 box.  As such, when/why would I need to
worry about  this warning?


If you compile with -O3 -combine *.c -o alias it will break.

Richard.


Re: Polyhedron performance regression

2006-11-11 Thread Richard Guenther

On 11/11/06, FX Coudert <[EMAIL PROTECTED]> wrote:

> Just wanted to note to the list that Tobias spotted a performance
> regression on Polyhedron ac.
>
> http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-
> summary.txt-2-0.html

Hum, the performance change on ac is significant. Anyway we can get
the revision numbers before and after the jump? (and before the last
jump to zero)? What patches have been commited in that time that
could affect this? I can't see anything on the fortran patches...


If I had to guess I would say it was the forwprop merge.  But I didn't
investigate.

Richard.


Re: Polyhedron performance regression

2006-11-11 Thread Richard Guenther
On Sat, 11 Nov 2006, FX Coudert wrote:

> >Just wanted to note to the list that Tobias spotted a performance regression
> >on Polyhedron ac.
> >
> >http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html
> 
> Hum, the performance change on ac is significant. Anyway we can get the
> revision numbers before and after the jump? (and before the last jump to
> zero)? What patches have been commited in that time that could affect this? I
> can't see anything on the fortran patches...

Must have between r118372 and r118615.

Richard.


Re: -funsafe-math-optimizations and -fno-rounding-math

2006-11-11 Thread Richard Guenther

On 11/11/06, Revital1 Eres <[EMAIL PROTECTED]> wrote:


Hello,

-fno-rounding-math enables the transformation of (-(X - Y)) -> (Y - X)
in simplify-rtx.c  which seems to be the same transformation
that enabled by -funsafe-math-optimizations in fold-const.c.

If I understand currently -frounding-math means that the rounding mode is
important.
In that case should there be correlation between
-funsafe-math-optimizations
and -fno-rounding-math (which currently does not exist)?


I think the simplify-rtx.c code is partly wrong, as it changes behavior
with signed zeros.  I don't know off-hand if -(X - Y) and Y - X behave
the same in rounding if the rounding mode is round to nearest, but
certainly for round to +Inf it will differ.  So HONOR_SIGNED_ZEROS (mode)
&& !flag_rounding_math might be the correct predicate here (and in
the fold-const.c case).

But floating point rounding scares me ;)

Richard.


Re: Polyhedron performance regression

2006-11-11 Thread Richard Guenther
On Sat, 11 Nov 2006, Paul Thomas wrote:

> Steven and Jerry,
> >
> >
> > If someone can confirm that this patch causes the drop, I can help
> > trying to find a fix.
> >
> amd64/Cygwin_NT
> 
> $ /irun/bin/gfortran -O3 -funroll-loops -ffast-math -march=opteron ac.f90
> 
> 118372 20.2s
> 118475 20.4s  Bonzini's patch
> 118704 16.2s
> 
> I believe that the improvement is FX's and my patch for MOD.

Note that the suse x86_64 tester has 18.6s with the last run before
bonzinis patch and 30.8s the first run after it, so it regressed quite
badly.  It also has -ftree-vectorize as additional option here.

> 
> Notice that this is a single core machine and that there is a PR out on the
> vectorizer.  Could this be the problem, since the suse tests are done on a two
> core machine, if I understood correctly?

Yes this is a dual-socket machine.  But I don't see how this can be
an issue here?

Richard.

--
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: Volatile operations and PRE

2006-11-15 Thread Richard Guenther

On 08 Nov 2006 08:07:50 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Andrew Haley <[EMAIL PROTECTED]> writes:

>  > 2006-11-07  Paolo Bonzini  <[EMAIL PROTECTED]>
>  >
>  >  * gimplify.c (fold_indirect_ref_rhs): Use
>  >  STRIP_USELESS_TYPE_CONVERSION rather than STRIP_NOPS.
>
> Regtested x86-64-gnu-linux.  The only interesting failure was
> mayalias-2.c, but that also fails before the patch.

This is OK for active branches.


I have committed this to mainline and will continue with the branches
after testing.

Richard.


Re: Zdenek Dvorak and Daniel Berlin appointed loop optimizer maintainers

2006-11-16 Thread Richard Guenther

On 11/16/06, Andrew Haley <[EMAIL PROTECTED]> wrote:

Zdenek Dvorak writes:
 > Hello,
 >
 > >I am pleased to announce that the GCC Steering Committee has
 > > appointed Zdenek Dvorak and Daniel Berlin as non-algorithmic maintainers
 > > of the RTL and Tree loop optimizer infrastructure in GCC.
 > >
 > >Please join me in congratulating Zdenek and Daniel on their new
 > > role.  Zdenek and Daniel, please update your listings in the MAINTAINERS
 > > file.
 >
 > done.  I am not sure whether it would be useful to indicate
 > non-algorithmicness somehow in the MAINTAINERS file?

"non-algorithmicity", I suspect.  :-)


I would rather open a new section if this idiom is supposed to spread more ;-)

Richard.


Re: Volatile operations and PRE

2006-11-19 Thread Richard Guenther

On 11/15/06, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 08 Nov 2006 08:07:50 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> Andrew Haley <[EMAIL PROTECTED]> writes:
>
> >  > 2006-11-07  Paolo Bonzini  <[EMAIL PROTECTED]>
> >  >
> >  >  * gimplify.c (fold_indirect_ref_rhs): Use
> >  >  STRIP_USELESS_TYPE_CONVERSION rather than STRIP_NOPS.
> >
> > Regtested x86-64-gnu-linux.  The only interesting failure was
> > mayalias-2.c, but that also fails before the patch.
>
> This is OK for active branches.

I have committed this to mainline and will continue with the branches
after testing.


Done.

Richard.


Re: SPEC CFP2000 and polyhedron runtime scores dropped from 13. november onwards

2006-12-01 Thread Richard Guenther

On 12/1/06, Uros Bizjak <[EMAIL PROTECTED]> wrote:

Hello!

At least on x86_64 and i686 SPEC score [1] and polyhedron [2] scores
dropped noticeably. For SPEC benchmarks, mgrid, galgel, ammp and
sixtrack tests are affected and for polygedron, ac (second regression
in the peak) and protein (?) regressed in that time frame.

[1] http://www.suse.de/~aj/SPEC/amd64/CFP/summary-britten/recent.html
[2] 
http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html

Does anybody have any idea what is going on there?


It correlates with the PPRE introduction (enabled at -O3 only) which might
increase register pressure, but also improves Polyhedron rnflow a lot.

Richard.


Re: [PATCH]: Require MPFR 2.2.1

2006-12-04 Thread Richard Guenther

On 12/3/06, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote:

This patch updates configure to require MPFR 2.2.1 as promised here:
http://gcc.gnu.org/ml/gcc/2006-12/msg00054.html

Tested on sparc-sun-solaris2.10 using mpfr-2.2.1, mpfr-2.2.0 and an older
mpfr included with gmp-4.1.4.  Only 2.2.1 passed (as expected).

I'd like to give everyone enough time to update their personal
installations and regression testers before installing this.  Does one
week sound okay?  If there are no objections, that's what I'd like to do.


Please don't.  It'll be a hassle for us again and will cause automatic testers
to again miss some days or weeks during stage1 (given christmas
holiday season is near).
Rather defer to the start of stage3 please.

Thanks,
Richard.


Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-04 Thread Richard Guenther

On 12/3/06, Toon Moene <[EMAIL PROTECTED]> wrote:

Richard,

Somewhere, in a mail lost in my large e-mail clash with my ISP
(verizon), you said that gfortran couldn't profit from the pow(x, 1./3.)
-> cbrt(x) conversion because gfortran didn't "know" of cbrt.

Could you be more explicit about this - I'd like to repair this
deficiency, if at all possible.

Thanks in advance !


It's a matter of making the cbrt builtin available - I have a patch for this,
but wondered if the fortran frontend can rely on the cbrt library call being
available?  Or available in a fast variant, not a fallback implementation in
libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp  expansion.

Richard.


Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-04 Thread Richard Guenther

On 12/4/06, Howard Hinnant <[EMAIL PROTECTED]> wrote:

On Dec 4, 2006, at 11:27 AM, Richard Guenther wrote:

> On 12/3/06, Toon Moene <[EMAIL PROTECTED]> wrote:
>> Richard,
>>
>> Somewhere, in a mail lost in my large e-mail clash with my ISP
>> (verizon), you said that gfortran couldn't profit from the pow(x,
>> 1./3.)
>> -> cbrt(x) conversion because gfortran didn't "know" of cbrt.
>>
>> Could you be more explicit about this - I'd like to repair this
>> deficiency, if at all possible.
>>
>> Thanks in advance !
>
> It's a matter of making the cbrt builtin available - I have a patch
> for this,
> but wondered if the fortran frontend can rely on the cbrt library
> call being
> available?  Or available in a fast variant, not a fallback
> implementation in
> libgfortran which does pow (x, 1./3.) which will then of course
> pessimize
> pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp  expansion.

Is pow(x, 1./3.) == cbrt(x) ?

My handheld calculator says (imagining a 3 decimal digit machine):

pow(64.0, .333) == 3.99

In other words, can pow assume that if it sees .333, that the client
actually meant the non-representable 1/3?  Or must pow assume that .
333 means .333?


It certainly will _not_ recognize 0.333 as 1/3 (or 0.33 as used in
Polyhedron aermod).  Instead it will require the exponent to be
exactly equal to the result of 1./3. in the precision of the exponent,
with correct rounding.


My inclination is that if pow(x, 1./3.) (computed
correctly to the last bit) ever differs from cbrt(x) (computed
correctly to the last bit) then this substitution should not be done.


C99 7.12.7.1 says "The cbrt functions compute the real cube root
of x." and "The cbrt functions return x**1/3".  So it looks to me
that cbrt is required to return the same as pow(x, 1/3).

Richard.


Re: Richard Guenther appointed middle-end maintainer

2006-12-04 Thread Richard Guenther

On 12/4/06, David Edelsohn <[EMAIL PROTECTED]> wrote:

I am pleased to announce that the GCC Steering Committee has
appointed Richard Guenther as non-algorithmic middle-end maintainer.

Please join me in congratulating Richi on his new role.  Richi,
please update your listings in the MAINTAINERS file.


Thanks!  Any objections to add a new section in the MAINTAINERS file
like below?

Richard.

2006-12-04  Richard Guenther  <[EMAIL PROTECTED]>

   * MAINTAINERS (Non-Algorithmic Maintainers): New section.
   (Non-Algorithmic Maintainers): Move over non-algorithmic
   loop optimizer maintainers, add myself as a non-algorithmic
   middle-end maintainer.


p
Description: Binary data


Re: Richard Guenther appointed middle-end maintainer

2006-12-04 Thread Richard Guenther

On 12/4/06, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 12/4/06, David Edelsohn <[EMAIL PROTECTED]> wrote:
> I am pleased to announce that the GCC Steering Committee has
> appointed Richard Guenther as non-algorithmic middle-end maintainer.
>
> Please join me in congratulating Richi on his new role.  Richi,
> please update your listings in the MAINTAINERS file.

Thanks!  Any objections to add a new section in the MAINTAINERS file
like below?


Committed after "ok" on IRC.

Richard.


2006-12-04  Richard Guenther  <[EMAIL PROTECTED]>

* MAINTAINERS (Non-Algorithmic Maintainers): New section.
(Non-Algorithmic Maintainers): Move over non-algorithmic
loop optimizer maintainers, add myself as a non-algorithmic
middle-end maintainer.





Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-04 Thread Richard Guenther

On 12/4/06, Howard Hinnant <[EMAIL PROTECTED]> wrote:

On Dec 4, 2006, at 4:57 PM, Richard Guenther wrote:

>
>> My inclination is that if pow(x, 1./3.) (computed
>> correctly to the last bit) ever differs from cbrt(x) (computed
>> correctly to the last bit) then this substitution should not be done.
>
> C99 7.12.7.1 says "The cbrt functions compute the real cube root
> of x." and "The cbrt functions return x**1/3".  So it looks to me
> that cbrt is required to return the same as pow(x, 1/3).

 For me, this:

#include 
#include 

int main()
{
 printf("pow(100., 1./3.) = %a\ncbrt(100.)   = %a\n",
pow(100., 1./3.), cbrt(100.));
}

prints out:

pow(100., 1./3.) = 0x1.8fffep+6
cbrt(100.)   = 0x1.9p+6

I suspect that both are correct, rounded to the nearest least
significant bit.  Admittedly I haven't checked the computation by
hand for pow(100., 1./3.).  But I did the computation using gcc
4.0 on both PPC and Intel Mac hardware, and on PPC using CodeWarrior
math libs, and got the same results on all three platforms.

The pow function is not raising 1,000,000 to the power of 1/3.  It is
raising 1,000,000 to some power which is very close to, but not equal
to 1/3.

Perhaps I've misunderstood and you have some way of exactly
representing the fraction 1/3 in Gfortran.  In C and C++ we have no
way to exactly represent that fraction except implicitly using cbrt.
(or with a user-defined rational type)


1./3. is represented (round-to-nearest) as 0x1.5p-2
and pow (x, 1./3.) is of course not the same as the cube-root of x with
exact arithmetic.  cbrt as defined by C99 suggests that an
approximation by pow (x, 1./3.) fullfils the requirements.  The question
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.

For cbrt (100.) I get the same result as from pow (100., nextafter (1./3., 1))
for example.  So, instead of only recognizing 1./3. rounded to nearest
we should recognize both representable numbers that are nearest to
1./3..

Richard.


Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-04 Thread Richard Guenther

On 12/5/06, Joseph S. Myers <[EMAIL PROTECTED]> wrote:

On Tue, 5 Dec 2006, Richard Guenther wrote:

> is whether a correctly rounded "exact" cbrt differs from the pow
> replacement by more than 1ulp - it looks like this is not the case.

They fairly obviously differ for negative arguments, which are valid for
cbrt but not for pow (raising to a fraction with even denominator).  (The
optimization from pow to cbrt is valid if you don't care about no longer
getting a NaN from a negative argument.  Converting the other way (cbrt to
pow) is only OK if you don't care about negative arguments to cbrt at
all.)


True, F.9.4.4 says "pow (x, y) returns a NaN and raises the invalid
floating-point
exception for finite x < 0 and finite non-integer y.  I'll adjust the expander
to cover these cases by conditionalizing on tree_nonnegative_p or
HONOR_NANS.  I will probably also require flag_unsafe_math_optimizations
as we else will optimize cbrt (x) - pow (x, 1./3.) to zero.  Or even
pow (x, 1./3.) - pow (x, nextafter (1./3, 1)).

Richard.


Re: Announce: MPFR 2.2.1 is released

2006-12-05 Thread Richard Guenther

On 05 Dec 2006 07:16:04 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Paul Brook <[EMAIL PROTECTED]> writes:

> > This all may just be a shakedown problem with MPFR, and maybe it will
> > stabilize shortly.  But it's disturbing that after one undistributed
> > version became a requirement, we then very shortly stepped up to a new
> > undistributed version.  I think it should be obvious that if we
> > require an external library which is not in the tree, we should not be
> > at the bleeding edge of that library.
>
> I thought we were going from an unreleased version to the subsequent release.

As far as I know both versions are released.  What I said was
"undistributed," by which I mean: the required version of MPFR is not
on my relatively up to date Fedora system.


It also missed the openSUSE 10.2 schedule (which has the old version
with all patches).  So I don't like rejecting the old version at any point.

Richard.


Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-05 Thread Richard Guenther

On 12/5/06, Toon Moene <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:

> It's a matter of making the cbrt builtin available - I have a patch for
> this, but wondered if the fortran frontend can rely on the cbrt library call
> being available?  Or available in a fast variant, not a fallback
> implementation in libgfortran which does pow (x, 1./3.) which will then of 
course pessimize
> pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp  expansion.

Couldn't libgfortran just simply borrow, errr, include the glibc version
?  That one seems simple and fast enough.


Does libgfortran have a cbrt now?  The frontend could make the
builtin available conditional on TARGET_C99_FUNCTIONS, this
way we won't need a fallback.


OTOH, I somehow assumed this expansion was protected by
-funsafe-math-optimizations, but further testing showed me that it is not.


Yes, that was a mistake.  I posted a patch for this:
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00320.html


This wouldn't scare me one bit (I'm used to the phrase "a processor
dependent approximation to the value of", which is used *a lot* in the
Fortran Standard), but some people might think this to be too valiant.


Yes, I guess so ;)

Richard.


Re: Gfortran and using C99 cbrt for X ** (1./3.)

2006-12-05 Thread Richard Guenther

On 12/5/06, Toon Moene <[EMAIL PROTECTED]> wrote:

Toon Moene wrote:

> Toon Moene wrote:
>
>> Richard Guenther wrote:
>>
>>> It's a matter of making the cbrt builtin available - I have a patch
>>> for this,
>
> Oh, BTW, my own version of this patch (plus your work in the area of
> sqrt) had the following effect on a profile breakdown

The speed up is around 5 %.


Is this because of cbrt or a combined effect?  Can you measure the cbrt
effect in isolation?

Thanks,
Richard.


Re: compile time testsuite [Re: alias slowdown?]

2006-12-07 Thread Richard Guenther

On 12/3/06, Gerald Pfeifer <[EMAIL PROTECTED]> wrote:

Hi Richie,

On Tue, 21 Nov 2006, Richard Guenther wrote:
> Public monitoring would be more useful.  If you have working single-file
> testcases that you want be monitored for compile-time and memory-usage
> just contact me and I can add them to the daily tester
> (http://www.suse.de/~gcctest/c++bench/).

that's a neat one.  Would you mind adding this to the list of performance
testers at

  http://gcc.gnu.org/benchmarks/

?  That would be nice.


Done with the attached patch.

Richard.


p
Description: Binary data


Re: Serious SPEC CPU 2006 FP performance regressions on IA32

2006-12-13 Thread Richard Guenther

On 12/13/06, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Meissner, Michael wrote:
>>> 437.leslie3d-26%
> it was felt that the PPRE patches that were added on November 13th were
> the cause of the slowdown:
> http://gcc.gnu.org/ml/gcc/2006-12/msg00023.html
>
> Has anybody tried doing a run with just ppre disabled?
>

Right. PPRE appears to be the reason of slowdown.

-fno-tree-pre gets performance of cpu2006/437.leslie3d back to normal.
This is the worst case. And that will take much longer to verify whole
set of cpu2006 benchmarks.


It would be sooo nice to have a (small) testcase that shows why we are
regressing.

Thanks!
Richard.


Re: Serious SPEC CPU 2006 FP performance regressions on IA32

2006-12-13 Thread Richard Guenther

On 12/13/06, Menezes, Evandro <[EMAIL PROTECTED]> wrote:

> Meissner, Michael wrote:
> >>> 437.leslie3d  -26%
> > it was felt that the PPRE patches that were added on
> November 13th were
> > the cause of the slowdown:
> > http://gcc.gnu.org/ml/gcc/2006-12/msg00023.html
> >
> > Has anybody tried doing a run with just ppre disabled?
>
> Right. PPRE appears to be the reason of slowdown.
>
> -fno-tree-pre gets performance of cpu2006/437.leslie3d back to normal.
> This is the worst case. And that will take much longer to
> verify whole
> set of cpu2006 benchmarks.

If -fno-tree-pre disables PPRE, then it doesn't change much (4.3 relative to 
4.2 with -O2):

CPU2006 -O2 -O2 -fno-tree-pre
410.bwaves   -6% -8%
416.gamess
433.milc
434.zeusmp
435.gromacs
436.cactusADM
437.leslie3d-26%-27%
444.namd
447.dealII
450.soplex
453.povray
454.calculix
459.GemsFDTD-12%-12%
465.tonto
470.lbm
481.wrf
482.sphinx3

Is PPRE enabled at -O2 at all?  I couldn't confirm that from the original 
patches, which enabled PPRE only at -O3.


PPRE is only enabled at -O3.

Richard.


Re: libjvm.la and problems with libtool relink mode

2006-12-14 Thread Richard Guenther

On 12/14/06, Mark Shinwell <[EMAIL PROTECTED]> wrote:

I am currently involved in building GCC toolchains by configuring them with
the prefix that is to be used on the target system (somewhere in /opt)
and then installing them in a separate "working installation" directory
(say somewhere in my scratch space).  The installation step into this
"working installation" directory is performed by invoking a command of
the form "make prefix=/path/to/working/dir/install install".


I think you should use DESTDIR instead of prefix here.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Richard Guenther

On 12/29/06, Robert Dewar <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

> I'm sure no matter what argument i come up with, you'll just explain it away.
> The reality is the majority of our users seem to care more about
> whether they have to write "typename" in front of certain declarations
> than they do about signed integer overflow.

I have no idea how you know this, to me ten reports seems a lot for
something like this.


Not compared to the number of type-based aliasing "bugs" reported.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Richard Guenther

On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote:

Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> Does anybody think that Paul's proposed patch to autoconf would be
> better than changing VRP?

I don't.

I haven't noticed anyone else addressing this question,
which I think is a good one.


I don't think doing any of both is a good idea.  Authors of the affected
programs should adjust their makefiles instead - after all, the much more
often reported problems are with -fstrict-aliasing, and this one also doesn't
get any special treatment by autoconf.  Even though -fno-strict-aliasing -fwrapv
would be a valid, more forgiving default.  Also as ever, -O2 is what get's
the most testing, so you are going to more likely run into compiler bugs
with -fwrapv.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Richard Guenther

On 12/29/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:

On Fri, Dec 29, 2006 at 10:44:02PM +0100, Florian Weimer wrote:
> (BTW, I would be somewhat disappointed if this had to be pampered over
> on the autoconf side.  If the GNU project needs -fwrapv for its own
> software by default, this should be reflected in the compiler's
> defaults.

I absolutely agree.  My impression is that the current situation is a
disagreement between (some of, at least) the GCC developers, and
someone who can commit to autoconf; but I think it would be a very
bad choice for the GNU project to work around itself.  If we can't
come to an agreement on the list, please ask the Steering Committee.
This is a textbook example of what they're for.


But first produce some data please.  I only remember one case where
the program was at fault and not gcc from the transition from 4.0 to 4.1
compiling all of SUSE Linux 10.1 (barring those we don't recognized
of course).

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-30 Thread Richard Guenther

On 12/30/06, Paul Eggert <[EMAIL PROTECTED]> wrote:


For example, GCC itself assumes wrapv semantics internally,
but according to the -fwrapv opponents GCC is obviously "at
fault" here and should be fixed, so that shouldn't count,
right?  (If that's the way the data will be collected, I
think I know how things will turn out.  :-)


Where does GCC assume wrapv semantics?
GCC assumes two's complement arithmetic in the backend - but
that's different as it's not ISO C it is generating but target machine
assembly code.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-30 Thread Richard Guenther

On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] (Richard Kenner) writes:

| > Here's an example from the intprops module of gnulib.
|
| These are interesting case.
|
| Note that all the computations are constant-folded.
|
| And I think this points the fact that we can "have our cake and eat it too"
| in many cases.  Everything we're seeing points to the fact that the cases
| where assuming undefined behavior for overflow will help optimizations are
| cases where few if any sane programmers would write the code depending on
| wrap semantics (e.g., loop variables).  And nearly all the cases where wrap
| semantics are expected (e.g., the above) are such that there's no reasonable
| optimization benefit in assuming they're undefined.
|
| Take constant folding: if we were pendantic about it, we could say, "this
| folded expression overflows and so is undefined, so let's set its value to be
| whatever constant would yield the most efficient code in that case".
|
| Such behavior would be standard-compliant, but as unfriendly as possible
| because it wouldn't "optimize" any real code, just break common idioms.
| I doubt anybody would suggest not implementing wrapping semantics in
| constant folding.

As I'm looking into the VRP and CHREC codes to implement the
-Wundefined warning, I came across this:

   /* Wrapper around int_const_binop.  If the operation overflows and we
  are not using wrapping arithmetic, then adjust the result to be
  -INF or +INF depending on CODE, VAL1 and VAL2.  */

   static inline tree
   vrp_int_const_binop (enum tree_code code, tree val1, tree val2)
   {
 /* ... */
 else if (TREE_OVERFLOW (res)
  && !TREE_OVERFLOW (val1)
  && !TREE_OVERFLOW (val2))
   {
 /* If the operation overflowed but neither VAL1 nor VAL2 are
overflown, return -INF or +INF depending on the operation
and the combination of signs of the operands.  */
 int sgn1 = tree_int_cst_sgn (val1);
 int sgn2 = tree_int_cst_sgn (val2);

 /* Notice that we only need to handle the restricted set of
operations handled by extract_range_from_binary_expr.
Among them, only multiplication, addition and subtraction
can yield overflow without overflown operands because we
are working with integral types only... except in the
case VAL1 = -INF and VAL2 = -1 which overflows to +INF
for division too.  */

 /* For multiplication, the sign of the overflow is given
by the comparison of the signs of the operands.  */
 if ((code == MULT_EXPR && sgn1 == sgn2)
 /* For addition, the operands must be of the same sign
to yield an overflow.  Its sign is therefore that
of one of the operands, for example the first.  */
 || (code == PLUS_EXPR && sgn1 > 0)
 /* For subtraction, the operands must be of different
signs to yield an overflow.  Its sign is therefore
that of the first operand or the opposite of that
of the second operand.  A first operand of 0 counts
as positive here, for the corner case 0 - (-INF),
which overflows, but must yield +INF.  */
 || (code == MINUS_EXPR && sgn1 >= 0)
 /* For division, the only case is -INF / -1 = +INF.  */
 || code == TRUNC_DIV_EXPR
 || code == FLOOR_DIV_EXPR
 || code == CEIL_DIV_EXPR
 || code == EXACT_DIV_EXPR
 || code == ROUND_DIV_EXPR)
   return TYPE_MAX_VALUE (TREE_TYPE (res));
 else
   return TYPE_MIN_VALUE (TREE_TYPE (res));
   }
 /* ... */
   }

What would you suggest this function to do, based on your comments?


The function should be looked at in the context of the few callers - this
is really one of the more ugly and tricky parts of VRP.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-30 Thread Richard Guenther

On 31 Dec 2006 00:10:23 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Richard Guenther" <[EMAIL PROTECTED]> writes:

| On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis
| <[EMAIL PROTECTED]> wrote:
| >/* Wrapper around int_const_binop.  If the operation overflows and we
| >   are not using wrapping arithmetic, then adjust the result to be
| >   -INF or +INF depending on CODE, VAL1 and VAL2.  */
| >
| >static inline tree
| >vrp_int_const_binop (enum tree_code code, tree val1, tree val2)

[...]

| > What would you suggest this function to do, based on your comments?
|
| The function should be looked at in the context of the few callers - this
| is really one of the more ugly and tricky parts of VRP.

I've done that; I do not see an obvious way to make everybody happy --
except issueing a warning (which I've done).  That is why I was asking
since you raised that particular point.  Maybe VRP experts may have
opinions...

The heavy (and sole) user of vrp_int_const_binop() is
extract_range_from_binary_expr().


Yes.  I don't see a way to issue a warning there without 99% false
positives there.  The only thing we can really do is avoid false
positives reliably if we have a + b and known ranges for a and b
so we can see it will _not_ overflow.  But issuing a warning only
if we are sure it _will_ overflow will likely cause in no warning at
all - the interesting cases would be those that will have many
false positives.

Note the interesting places in VRP where it assumes undefined
signed overflow is in compare_values -- we use the undefinedness
to fold comparisons.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 31 Dec 2006 00:40:39 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Richard Guenther" <[EMAIL PROTECTED]> writes:

| On 31 Dec 2006 00:10:23 +0100, Gabriel Dos Reis
| <[EMAIL PROTECTED]> wrote:
| > "Richard Guenther" <[EMAIL PROTECTED]> writes:
| >
| > | On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis
| > | <[EMAIL PROTECTED]> wrote:
| > | >/* Wrapper around int_const_binop.  If the operation overflows and we
| > | >   are not using wrapping arithmetic, then adjust the result to be
| > | >   -INF or +INF depending on CODE, VAL1 and VAL2.  */
| > | >
| > | >static inline tree
| > | >vrp_int_const_binop (enum tree_code code, tree val1, tree val2)
| >
| > [...]
| >
| > | > What would you suggest this function to do, based on your comments?
| > |
| > | The function should be looked at in the context of the few callers - this
| > | is really one of the more ugly and tricky parts of VRP.
| >
| > I've done that; I do not see an obvious way to make everybody happy --
| > except issueing a warning (which I've done).  That is why I was asking
| > since you raised that particular point.  Maybe VRP experts may have
| > opinions...
| >
| > The heavy (and sole) user of vrp_int_const_binop() is
| > extract_range_from_binary_expr().
|
| Yes.  I don't see a way to issue a warning there without 99% false
| positives there.  The only thing we can really do is avoid false
| positives reliably if we have a + b and known ranges for a and b
| so we can see it will _not_ overflow.  But issuing a warning only
| if we are sure it _will_ overflow will likely cause in no warning at
| all - the interesting cases would be those that will have many
| false positives.

for this specific function (vrp_int_const_binop), I'm issuing a
warning inside the else-if  branch that tests for the overflowed
result.  I'm unclear why that is a false positive since the result is
known to overflow.  Could you elaborate?


Well, we use that function to do arithmetic on value ranges like
for example the ranges involving the expression a + b

[50, INT_MAX] + [50, 100]

now you will get a warning as we use vrp_int_const_binop to add
INT_MAX and 100 (to yield INT_MAX in the signed case).  Of
course adding a + b will not always overflow here (it might never
as the INT_MAX bound might be just due to VRP deficiencies),
for example 50 + 50 will not overflow.

So using vrp_int_const_binop to generate the warning will yield
very many false positives (also due to the fact that if we only know
the lower or upper bound we have lots of INT_MAX and INT_MIN
in value ranges).


| Note the interesting places in VRP where it assumes undefined
| signed overflow is in compare_values -- we use the undefinedness
| to fold comparisons.

I considered compare_values(), but I concluded that issueing a warning
from there will yield too many false positive, and probably many
duplicates.  Is that assumption correct?


That is correct - it's basically the same problem as if we were warning
from inside fold.


I have been looking into infer_loop_bounds_from_signedness() called
from infer_loop_bounds_from_undefined().
At some places, nowrap_type_p() is used but this function operates
only on types, so there will be too many false positive there; yet we
will miss warning through that case.


I don't know that area too well, but I think we are already issuing a
warning if we use -funsafe-loop-optimizations, so it might be possible
to do the same if we use signed wrapping undefinedness.  Zdenek
should know more.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote:

> What would you suggest this function to do, based on your comments?

I'm not familiar enough with VRP to answer at that level, but at a higher
level, what I'd advocate is that the *generator* of information would track
things both ways, assuming wrapping and non-wrapping semantics (of course, if
-fwrapv or -fno-wrapv was specified, it would only do that one).  Then the
*user* of the information would decide which one to use by applying
heuristics based both on the likelihood that the programmer would be relying
on wrapping and the importance from an optimization standpoint of not doing so.


For the VRP case I'd like to rework vrp_int_const_binop to behave like
int_const_binop (that wraps) and return if the operation overflowed.  It's
much more readable to have the handling (or not handling) of overflow
at the callers site extract_range_from_binary_expression.  Using
TREE_OVERFLOW as present is just wasting memory as an extra return
value will also do.

So in the end it's probably time to re-work the core implementation of
int_const_binop to be more like

tree int_const_binop (enum tree_code code, tree type, tree arg1, tree arg2,
int notrunc, bool *overflow);

as it has all the magic to detect overflow already.


So, for example, when making decisions involving loop optimizations, it would
assume that bivs and givs don't overflow.  But it if saw a comparison against
a result that might overflow, it would assume the more conservative choice on
the grounds that it's more likely that the test was intended to do something
than to always be true or false.

Here is where you also have a chance of issuing a warning.  In the former of
these cases, however, I don't think it would make sense to issue one since
it'd almost always be a false positive and in the latter there is nothing to
warn about since the conservative choice is being made. But in some "middle
of the road" cases, assuming undefined overflow and warning might be the most
appropriate: indeed, the ability to issue the warning could be a
justification towards making the less conservative choice.



Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 12/31/06, Duncan Sands <[EMAIL PROTECTED]> wrote:

> > for this specific function (vrp_int_const_binop), I'm issuing a
> > warning inside the else-if  branch that tests for the overflowed
> > result.  I'm unclear why that is a false positive since the result is
> > known to overflow.  Could you elaborate?
>
> Well, we use that function to do arithmetic on value ranges like
> for example the ranges involving the expression a + b
>
>  [50, INT_MAX] + [50, 100]
>
> now you will get a warning as we use vrp_int_const_binop to add
> INT_MAX and 100 (to yield INT_MAX in the signed case).  Of
> course adding a + b will not always overflow here (it might never
> as the INT_MAX bound might be just due to VRP deficiencies),
> for example 50 + 50 will not overflow.
>
> So using vrp_int_const_binop to generate the warning will yield
> very many false positives (also due to the fact that if we only know
> the lower or upper bound we have lots of INT_MAX and INT_MIN
> in value ranges).

You could emit a warning if the entire range overflows (i.e. both lower
and upper bound calculations overflow), since that means that the calculation
of a+b necessarily overflows.


Yes we can do that, but this won't detect the cases people care about.
In fact I doubt it will trigger on real code at all - you can just make
artificial testcases that excercise this warning.  Like

 if (a > INT_MAX/2 && b > INT_MAX/2)
   return a + b;

I doubt this will be very useful.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 31 Dec 2006 12:42:57 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Richard Guenther" <[EMAIL PROTECTED]> writes:

| On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote:
| > > What would you suggest this function to do, based on your comments?
| >
| > I'm not familiar enough with VRP to answer at that level, but at a higher
| > level, what I'd advocate is that the *generator* of information would track
| > things both ways, assuming wrapping and non-wrapping semantics (of course, 
if
| > -fwrapv or -fno-wrapv was specified, it would only do that one).  Then the
| > *user* of the information would decide which one to use by applying
| > heuristics based both on the likelihood that the programmer would be relying
| > on wrapping and the importance from an optimization standpoint of not doing 
so.
|
| For the VRP case I'd like to rework vrp_int_const_binop to behave like
| int_const_binop (that wraps) and return if the operation overflowed.  It's
| much more readable to have the handling (or not handling) of overflow
| at the callers site extract_range_from_binary_expression.  Using
| TREE_OVERFLOW as present is just wasting memory as an extra return
| value will also do.
|
| So in the end it's probably time to re-work the core implementation of
| int_const_binop to be more like
|
| tree int_const_binop (enum tree_code code, tree type, tree arg1, tree arg2,
| int notrunc, bool *overflow);
|
| as it has all the magic to detect overflow already.

Can I interpret that as you volunteering to do it, or at least help?


Yes, I have some patches in the queue to clean this up (and add some
more stuff to VRP).

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 12/31/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
> "Steven Bosscher" <[EMAIL PROTECTED]> writes:
>
> > On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
> >> Also, as I understand it this change shouldn't affect gcc's
> >> SPEC benchmark scores, since they're typically done with -O3
> >> or better.
> >
> > It's not all about benchmark scores.
>
> But so far, benchmark scores are the only scores given by the people
> who oppose having -O2 imply -fwrapv.  If the benchmarks use -O3 they
> wouldn't be affected by such a change -- and if so, we have zero hard
> evidence of any real harm being caused by having -O2 imply -fwrapv.
>
> > I think most users compile at -O2
>
> Yes, which is why there's so much argument about what -O2 should do
>
> > You say you doubt it affects performance.  Based on what?  Facts
> > please, not guesses and hand-waiving...
>
> The burden of proof ought to be on the guys proposing -O2
> optimizations that break longstanding code, not on the skeptics.

The burden ought to be (and IMHO is) on those who propose we change
optimizer behavior in order to support something non-standard.

Why do you believe otherwise?
>
> That being said, I just compiled GNU coreutils CVS on a Debian stable
> x86 (2.4 GHz Pentium 4) using GCC 4.1.1.  With -O0, "sha512sum" on the
> coreutils tar.gz file took 0.94 user CPU seconds (measured by "time
> src/sha512sum coreutils-6.7-dirty.tar.gz").  With -O2 -fwrapv, 0.87
> seconds.  With plain -O2, 0.86 seconds.
>
> I also tried gzip 1.3.10, compressing its own tar file with a -9
> compression option.  With -O0, 0.30 user CPU seconds.  With -O2
> -fwrapv, 0.24 seconds.  With -O2, 0.24 seconds.
>
> In all these cases I've averaged several results.  The difference
> between -O2 and -O2 -fwrapv is pretty much in the noise here.
>
> Admittedly it's only two small tests, and it's with 4.1.1.  But that's
> two more tests than the -fwrapv naysayers have done, on
> bread-and-butter applications like coreutils or gzip or Emacs (or GCC
> itself, for that matter).

These are not performance needing applications.
I'll happily grant you that  adding -fwrapv will make no difference at
all on any application that does not demand performance in integer or
floating point calculations.


I added -fwrapv to the Dec30 run of SPEC at
http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html
and
http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 12/31/06, Robert Dewar <[EMAIL PROTECTED]> wrote:

Paul Eggert wrote:

> The question is not whether GCC should support wrapv
> semantics; it already does, if you specify -fwrapv.
> The question is merely whether wrapv should be the default
> with optimization levels -O0 through -O2.

That over simplifies, because it presents things as though
there are only two possibilities

1. Allow "undefined" optimizations without restriction

2. Forbid all such optimizations by specifying fwrapv

Quite a few of us (including certainly me, and I think
Richard) argue for a mid ground where

We don't insist on full fwrapv semantics (because we
think it will hurt code equality, particularly in the
loop invariant case).

We are judcious in applying the optimization in other
cases in a pragmatic attempt to keep "traditional"
C code working in practice.


I think this is a fragile and not very practical approach.  How
do you define these "traditional" cases?  I guess you would
keep the other two as well, so have the mid ground default,
-fno-wrapv do what we have now and -fwrapv disable all
the optimizations.

I'd rather enable -fwrapv at -O1 and keep -O2 as is.  This is
what we also do for strict-aliasing, it's enabled at -O2 and beyond
only (for C, that is).  Of course with -O1 the VRP pass is not run, so
it might be that the reported problems do not show up with -O1 - and
as they are reportedly not for performance critical code they should
maybe just use -O1 as autoconf default.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Richard Guenther

On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote:

> I think this is a fragile and not very practical approach.  How do
> you define these "traditional" cases?

You don't need to define the "cases" in advance.  Rather, you look at
each place where you'd be making an optimization based on the non-existance
of overflow and use knowlege of the importance of that optimization and the
types of code likely to trigger it to choose the default for whether to make
it or not.  It seems quite practical to me.

It also doesn't seem fragile: if you guess wrong on one particular default,
it's easy to change it.


Are you volunteering to audit the present cases and argue whether they
fall in the "traditional" cases?  Note that -fwrapv also _enables_ some
transformations on signed integers that are disabled otherwise.  We for
example constant fold -CST for -fwrapv while we do not if signed overflow
is undefined.  Would you change those?

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-01 Thread Richard Guenther

On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote:

> Are you volunteering to audit the present cases and argue whether they
> fall in the "traditional" cases?

I'm certainly willing to *help*, but I'm sure there will be some cases
that will require discussion to get a consensus.

> Note that -fwrapv also _enables_ some transformations on signed
> integers that are disabled otherwise.  We for example constant fold
> -CST for -fwrapv while we do not if signed overflow is undefined.
> Would you change those?

I don't understand the rationale for not wrapping constant folding when
signed overflow is undefined: what's the harm in "defining" it as wrapping
for that purpose?  If it's undefined, then why does it matter what we
fold it to?  So we might as well fold it to what traditional code expects.


The reason is PR27116 (and others, see the difficulties in fixing PR27132).
We cannot do both, assume wrapping and undefined behavior during
foldings at the same time - this leads to wrong-code.  Citing from a
message from
myself:

"Other than that I'm a bit nervous if we both
introduce signed overflow because it is undefined and at the same time
pretend it doesn't happen because it is undefined.  Like given

  a - INT_MIN < a
-> a + INT_MIN < a
-> INT_MIN < 0

which is true, even for a == INT_MIN for which the original expression
didn't contain an overflow.  I.e. the following aborts

#include 

extern void abort(void);

int foo(int a)
{
 return a - INT_MIN < a;
}

int main()
{
 if (foo(INT_MIN))
   abort ();
 return 0;
}

because we fold the comparison to 1."

This was while trying to implement a - -1 to a + 1 folding.  The problematic
folding that existed at that point was that negate_expr_p said it would
happily negate INT_MIN (to INT_MIN with overflow flag set), which is
wrong in this context.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-01 Thread Richard Guenther

On 1/1/07, Richard Kenner <[EMAIL PROTECTED]> wrote:

> the seemingly prevalent attitude "but it is undefined; but it is not
> C" is the opinion of the majority of middle-end maintainers.

Does anybody DISAGREE with that "attitude"?  It isn't valid C to assume that
signed overflow wraps.  I've heard nobody argue that it is.  The question
is how far we go in supporting existing code that's broken in this way.


I don't disagree with that attitude, I even strongly agree.  We support broken
code by options like -fno-strict-aliasing and -fwrapv.  I see this discussion as
a way to prioritize work we need to do anyway: annotate operations with
their overflow behavior (like creating new tree codes WRAPPING_PLUS_EXPR),
clean up existing code to make it more obvious where rely on what semantics,
add more testcases for corner-cases and document existing
(standard-conformant) behavior more explicitly.

Note that we had/have a similar discussion like this (what's a valid/useful
optimization in the users perspective) on the IEEE math front - see the
hunge thread about -ffast-math, -funsafe-math-optimizations and the
proposal to split it into -fassociative-math and -freciprocal-math.  We also
have infrastructure work to do there, like laying grounds to implement
proper contraction support.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Richard Guenther

On 1/1/07, Geert Bosch <[EMAIL PROTECTED]> wrote:


On Dec 31, 2006, at 19:13, Daniel Berlin wrote:
> Note the distinct drop in performance across almost all the benchmarks
> on Dec 30, including popular programs like bzip2 and gzip.
Not so.

To my eyes, the specint 2000 mean went UP by about 1% for the
base -O3 compilation. The peak enabled more unrolling, which
is helped by additional range information provided by absence
of -frwapv.

So, I'd say this run would suggest enabling -fwrapv for
at least -O1 and -O2. Also, note that we never have
focussed on performance with -fwrapv, and it is quite
likely there is quite some improvement possible.

I'd really like using -fwrapv by default for -O, -O[s12].
The benefit of many programs moving from "undefined semantics"
to "implementation-defined semantics, overflow wraps like in
old compilers" far outweighs even an average performance loss
of 2% as seen in specfp.


I would support the proposal to enable -fwrapv for -O[01], but
not for -O2 as that is supposed to be "optimize for speed" and
as -O3 is not widely used to optimize for speed (in fact it may
make code slower).  I'm undecided for -Os but care less about it.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Richard Guenther

On 1/2/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

>> Richard Guenther added -fwrapv to the December 30 run of SPEC at
>> <http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html>
>> and
>> <http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html>.
>> Daniel Berlin and Geert Bosch disagreed about how to interpret
>> these results; see <http://gcc.gnu.org/ml/gcc/2007-01/msg00034.html>.

Thank you for pointing that out.  I apologize for having missed it
previously.

As others have noted, one disturbing aspect of that data is that it
shows that there is sometimes an inverse correlation between the base
and peak flags.  On the FP benchmarks, the results are mostly negative
for both base and peak (with 168.wupwise the notable exception); on the
integer benchmarks it's more mixed.  It would be nice to have data for
some other architectures: anyone have data for ARM/Itanium/MIPS/PowerPC?

So, my feeling is similar to what Daniel expresses below, and what I
think Ian has also said: let's disable the assumption about signed
overflow not wrapping for VRP, but leave it in place for loop analysis.

Especially given:

>> We don't have an exhaustive survey, but of the few samples I've
>> sent in most of code is in explicit overflow tests.  However, this
>> could be an artifact of the way I searched for wrapv-dependence
>> (basically, I grep for "overflow" in the source code).  The
>> remaining code depended on -INT_MIN evaluating to INT_MIN.  The
>> troublesome case that started this thread was an explicit overflow
>> test that also acted as a loop bound (which is partly what caused
>> the problem).

it sounds like that would eliminate most of the problem.  Certainly,
making -INT_MIN evaluate to INT_MIN, when expressed like that, is an
easy thing to do; that's just a guarantee about constant folding.
There's no reason for us not to document that signed arithmetic wraps
when folding constants, since we're going to fold the constant to
*something*, and we may as well pick that answer.

I don't even necessarily think we need to change our user documentation.
 We can just choose to make the compiler not make this assumption for
VRP, and to implement folding as two's-complement arithmetic, and go on
with life.  In practice, we probably won't "miscompile" many
non-conforming programs, and we probably won't miss two many useful
optimization opportunities.

Perhaps Richard G. would be so kind as to turn this off in VRP, and
rerun SPEC with that change?


I can do this.  What I also will do is improve VRP to still fold comparisons
of the for a - 10 > 20 when it knows there is no overflow due to available
range information for a (it doesn't do that right now).  That might eliminate
most of the bad effects of turning on -fwrapv for VRP.  The question is if
we want to assume wrapping semantics for signed ints in VRP then or
if we just not want to assume anything if signed ints happen to wrap?

So, given 'a + 10' with a value range for a being [0, +INF], what should be
the resulting value range?  At the moment with -fno-wrapv we get [10, +INF],
with wrapping semantics we could get ~[-INF+10, 9] (which includes
the range we get with -fno-wrapv).

(I'll do this as I return from yet another short vacation - maybe someone can
beat me on producing the SPEC numbers, a s/flag_wrapv/1/ in VRP should
do it for all places that do not simply call into fold).

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Richard Guenther

On 1/2/07, Robert Dewar <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:
> On 1/1/07, Geert Bosch <[EMAIL PROTECTED]> wrote:
  specfp.
>
> I would support the proposal to enable -fwrapv for -O[01], but
> not for -O2 as that is supposed to be "optimize for speed" and
> as -O3 is not widely used to optimize for speed (in fact it may
> make code slower).  I'm undecided for -Os but care less about it.

I think it is a bad idea for the optimization levels to deal with
anything other than optimization. -fwrapv is not about optimization,
it is about changing the language semantics.

So this proposal would be tantamount to implementing a different
language at -O1 and -O2, and having -O2 change the formal
semantic interpretation of the program. That seems a very
bad idea to me.


We do that with -fstrict-aliasing, which also changes language semantics.
-fstrict-aliasing is disabled for -O0 and -O1 and enabled for -O[23s].


It is one thing to have different optimization levels do different
amounts of optimization that in practice may have more or less
effect on non-standard programs. It is quite another to guarantee
at a formal semantic level wrapping at -O1 and not -O2.

If we decide to avoid some optimizations at -O1 in this area,
that's fine, but it should not be done by enabling -fwrapv as
one of the (presumably documented) flags included in -O1.

Instead I would just do this silently without the guarantee.

And I continue to favor the compromise approach where loop
optimization can use undefinedness of overflow in dealing
with loop invariants, but we don't by default take advantage
of undefinedess elsewhere.


I fear this is not as easy as it sounds, as loop optimizers (and basically
every optimizer) calls back into fold to do simplifications to for example
predicates it tries to prove.  Unless you change this by either duplicating
these parts of fold or putting logic into fold that checks
current_pass == loop_optimizer
you will not catch those cases.

Also all of VRP, loop optimizers and fold do comparison simplification which
benefits from signed overflow undefinedness the most (note that this is another
area of cleanup I'm likely to touch in not too distant future).


Then we have two switches:

-fstandard

which allows all optimizations (name can be changed, I
don't care about the name)

-fwrapv

which changes the semantics to require wrapping in
all cases (including loops)


How do these switches implement your proposal?  "All optimiations" sounds
like the -fno-wrapv case we have now.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Richard Guenther

On 1/2/07, Richard Kenner <[EMAIL PROTECTED]> wrote:

> We do that with -fstrict-aliasing, which also changes language semantics.

Well, yes, but not quite in the same way.  Indeed it's rather hard to
describe in what way it changes the language semantics but easier to
describe the effect it has on optimization.  I think -fwrapv is the other
way around.


Well, while the effect of -fstrict-aliasing is hard to describe (TBAA
_is_ a complex
part of the standard), -fno-strict-aliasing rules are simple.  All
loads and stores
alias each other if they cannot be proven not to alias by points-to analysis.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-04 Thread Richard Guenther

On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote:

Paul Eggert <[EMAIL PROTECTED]> writes:
> Mark Mitchell <[EMAIL PROTECTED]> writes:
>> it sounds like that would eliminate most of the problem.  Certainly,
>> making -INT_MIN evaluate to INT_MIN, when expressed like that, is an
>> easy thing to do; that's just a guarantee about constant folding.
>
> Well, no, just to clarify: the GCC code in question actually computed
> "- x", and relied on the fact that the result was INT_MIN if x (an
> unknown integer) happened to be INT_MIN.  Also, now that I'm thinking
> about it, some the Unix v7 atoi() implementation relied on "x + 8"
> evaluating to INT_MIN when x happened to be (INT_MAX - 7).  These are
> the usual kind of assumptions in this area.

I don't know if you're implicitly only looking for certain types of
signed overflow, or if this has been mentioned elsewhere (I admit I had
to skim-read some of the thread) but the assumption that signed overflow
is defined is _very_ pervasive in gcc at the rtl level.  The operand to
a CONST_INT is a signed HOST_WIDE_INT, and its accessor macro -- INTVAL
-- returns a value of that type.  Most arithmetic related to CONST_INTs
is therefore done on signed HOST_WIDE_INTs.  This means that many parts
of gcc would produce wrong code if signed arithmetic saturated, for
example.  (FWIW, this is why I suggested adding a UINTVAL, which Stuart
has since done -- thanks.  However, most of gcc still uses INTVAL.)


I thought all ints are unsigned in the RTL world as there is I believe no way
to express "signedness" of a mode.  This would have to change of course
if we ever support non two's-complement arithmetic.

Richard.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-04 Thread Richard Guenther

On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote:

"Richard Guenther" <[EMAIL PROTECTED]> writes:
> On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote:
>> Paul Eggert <[EMAIL PROTECTED]> writes:
>> > Mark Mitchell <[EMAIL PROTECTED]> writes:
>> >> it sounds like that would eliminate most of the problem.  Certainly,
>> >> making -INT_MIN evaluate to INT_MIN, when expressed like that, is an
>> >> easy thing to do; that's just a guarantee about constant folding.
>> >
>> > Well, no, just to clarify: the GCC code in question actually computed
>> > "- x", and relied on the fact that the result was INT_MIN if x (an
>> > unknown integer) happened to be INT_MIN.  Also, now that I'm thinking
>> > about it, some the Unix v7 atoi() implementation relied on "x + 8"
>> > evaluating to INT_MIN when x happened to be (INT_MAX - 7).  These are
>> > the usual kind of assumptions in this area.
>>
>> I don't know if you're implicitly only looking for certain types of
>> signed overflow, or if this has been mentioned elsewhere (I admit I had
>> to skim-read some of the thread) but the assumption that signed overflow
>> is defined is _very_ pervasive in gcc at the rtl level.  The operand to
>> a CONST_INT is a signed HOST_WIDE_INT, and its accessor macro -- INTVAL
>> -- returns a value of that type.  Most arithmetic related to CONST_INTs
>> is therefore done on signed HOST_WIDE_INTs.  This means that many parts
>> of gcc would produce wrong code if signed arithmetic saturated, for
>> example.  (FWIW, this is why I suggested adding a UINTVAL, which Stuart
>> has since done -- thanks.  However, most of gcc still uses INTVAL.)
>
> I thought all ints are unsigned in the RTL world as there is I believe no way
> to express "signedness" of a mode.  This would have to change of course
> if we ever support non two's-complement arithmetic.

I'm not sure what you mean.  Yes, "all ints are unsigned in the RTL world"
in the sense that we must use two's complement arithmetic for them --
we have no way of distinguishing what was originally signed from
what was originally unsigned.  But my point was that gcc _stores_ the
integers as _signed_ HOST_WIDE_INTs, and operates on them as such,
even though these signed HOST_WIDE_INTs may actually represent unsigned
integers.  Thus a lot of the arithmetic that gcc does at the rtl level
would be wrong for certain inputs if the _compiler used to build gcc_
assumed that signed overflow didn't wrap.

In other words, it sounds like you took my message to mean that gcc's
rtl code treated signed overflow _in the input files_ as undefined.
I didn't mean that.  I meant that gcc's own code relies on signed
overflow being defined.  I think it was instances of the latter
that Paul was trying to find.


Yes, I confused that.  It seems that RTL should store integer consts as
unsigned HOST_WIDE_INT instead - the current situation could lead
to problems as you rightfully said.

Richard.


Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-04 Thread Richard Guenther

On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote:

Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way
I can make gcc 3.4 not use these libraries ?


3.4 doesn't use gmp or mpfr, gfortran introduces this dependency but it
appears with 4.0 or newer only.

Richard.


Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-04 Thread Richard Guenther

On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote:

I configure with --enable-languages=c,c++ . Shudnt that disable gfortran ?


You are not configuring gcc 3.4.

Richard.


thanks
dz

On 1/4/07, Richard Guenther <[EMAIL PROTECTED]> wrote:
> On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote:
> > Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way
> > I can make gcc 3.4 not use these libraries ?
>
> 3.4 doesn't use gmp or mpfr, gfortran introduces this dependency but it
> appears with 4.0 or newer only.
>
> Richard.
>



Re: Build snapshots according to a more regular schedule

2007-01-05 Thread Richard Guenther

On 1/5/07, Joe Buck <[EMAIL PROTECTED]> wrote:

On Fri, Jan 05, 2007 at 07:26:27AM -0800, Ian Lance Taylor wrote:
> David Edelsohn <[EMAIL PROTECTED]> writes:
>
> > Are 4.0 snapshots still necessary?  I suspect they should be
> > discontinued.
>
> 4.0 still seems to be regarded as an active branch.
>
> I don't mind closing it, myself.  Does anybody think we should have a
> 4.0.4 release?

I'd like to see it closed.  We have some bugs that are only open because
they are targeted for 4.0.4 (fixed on all branches but 4_0).

If Gaby's interested in doing one final release first, I'm fine with that.
But I'd like to see it go away.


I'd like to see it closed, too, all Linux/BSD vendors I know of are either
still using 3.x or have switched to 4.1 already.

Richard.


Re: Build snapshots according to a more regular schedule

2007-01-05 Thread Richard Guenther

On 1/5/07, David Fang <[EMAIL PROTECTED]> wrote:

> > > > Are 4.0 snapshots still necessary?  I suspect they should be
> > > > discontinued.
> > >
> > > 4.0 still seems to be regarded as an active branch.
> > >
> > > I don't mind closing it, myself.  Does anybody think we should have a
> > > 4.0.4 release?
> >
> > I'd like to see it closed.  We have some bugs that are only open because
> > they are targeted for 4.0.4 (fixed on all branches but 4_0).
>
> I'd like to see it closed, too, all Linux/BSD vendors I know of are either
> still using 3.x or have switched to 4.1 already.

Hi,

User chiming in: before retiring 4.0, one would be more easily convinced
to make a transition to 4.1+ if the regressions from 4.0 to 4.1 numbered
fewer.  In the database, I see only 79 (P3+) regressions in 4.1 that are
not in 4.0 (using only summary matching).  Will these get a bit more
attention for the upcoming 4.1.2 release?


Well, I certainly see more attention on 4.1 regressions than on 4.0 regressions,
but as 4.0 almost get's no attention there is no hint that retiring
4.0 will magically free resources to tackle more 4.1 problems.

Richard.


http://gcc.gnu.org/bugzilla/query.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=4.1&known_to_fail_type=allwordssubstr&known_to_work_type=allwordssubstr&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&gcchost_type=allwordssubstr&gcchost=&gcctarget_type=allwordssubstr&gcctarget=&gccbuild_type=allwordssubstr&gccbuild=&keywords_type=allwords&keywords=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&priority=P1&priority=P2&priority=P3&emailtype1=substring&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&query_based_on=4.1%20%5C%204.0%20regressions&negate0=1&field0-0-0=short_desc&type0-0-0=substring&value0-0-0=4.0&field0-1-0=noop&type0-1-0=noop&value0-1-0=

Fang





Re: Build problem with gcc 4.3.0 20070108 (experimental)

2007-01-08 Thread Richard Guenther

On 1/8/07, George R Goffe <[EMAIL PROTECTED]> wrote:

/tools/gcc/obj-i686-pc-linux-gnu/./prev-gcc/xgcc
-B/tools/gcc/obj-i686-pc-linux-gnu/./prev-gcc/
-B/usr/lsd/Linux/x86_64-unknown-linux-gnu/bin/ -c   -g -O2 -DIN_GCC   -W -Wall
-Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic 
-Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
-Wmissing-format-attribute -Werror -fno-common   -DHAVE_CONFIG_H -I. -I.
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include
-I../../gcc/gcc/../libcpp/include  -I../../gcc/gcc/../libdecnumber 
-I../libdecnumber
   ../../gcc/gcc/tree-vectorizer.c -o tree-vectorizer.o
cc1: warnings being treated as errors
../../gcc/gcc/tree-vectorizer.c:2267: warning: initialization from incompatible
pointer type
make[3]: *** [tree-vectorizer.o] Error 1
make[3]: *** Waiting for unfinished jobs
rm gcjh.pod gcj-dbtool.pod grmiregistry.pod fsf-funding.pod jcf-dump.pod
jv-convert.pod grmic.pod gcov.pod gcj.pod gfdl.pod jv-scan.pod cpp.pod gjnih.pod
gij.pod gpl.pod gfortran.pod gcc.pod
make[3]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu/gcc'
make[2]: *** [all-stage2-gcc] Error 2
make[2]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu'
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu'
make: *** [bootstrap] Error 2


That's honza's patch - but bootstrap doesn't abort for me at that point.

Richard.


Re: CSE not combining equivalent expressions.

2007-01-15 Thread Richard Guenther

On 1/15/07, pranav bhandarkar <[EMAIL PROTECTED]> wrote:

Hello Everyone,
I have the following source code

static int i;
static char a;

char foo_gen(int);
void foo_assert(char);
void foo ()
{
   int *x = &i;
   a = foo_gen(0);
   a |= 1; /*  1-*/
   if (*x) goto end:
   a | =1; /* -2--*/
   foo_assert(a);
end:
   return;
}

Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2.
However the RTL code before the first CSE passthe RTL snippet is as follows

(insn 11 9 12 0 (set (reg:SI 1 $c1)
   (const_int 0 [0x0])) 43 {*movsi} (nil)
   (nil))

(call_insn 12 11 13 0 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41]
) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 39 {*call_value_direct} (nil)
   (nil)
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
   (nil)))

(insn 13 12 14 0 (set (reg:SI 137)
   (reg:SI 1 $c1)) 43 {*movsi} (nil)
   (nil))

(insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ])
   (reg:SI 137)) 43 {*movsi} (nil)
   (nil))

(insn 16 14 17 0 (set (reg:SI 138)
   (ior:SI (reg:SI 135 [ D.1217 ])
   (const_int 1 [0x1]))) 63 {iorsi3} (nil)
   (nil))

(insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ])
   (zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil)
   (nil))

(insn 18 17 19 0 (set (reg/f:SI 139)
   (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
   (nil))

(insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8])
   (subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil)
   (nil))

< expansion of the if condition 

;; End of basic block 0, registers live:
(nil)

;; Start of basic block 1, registers live: (nil)
(note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 27 25 28 1 (set (reg:SI 142)
   (ior:SI (reg:SI 134 [ D.1219 ])
   (const_int 1 [0x1]))) 63 {iorsi3} (nil)
   (nil))

(insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ])
   (zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil)
   (nil))

(insn 29 28 30 1 (set (reg/f:SI 143)
   (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
   (nil))

(insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8])
   (subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil)
   (nil))



Now the problem is that the CSE  pass doesnt identify that the source
of the set in insn 27 is equivalent to the source of the set in insn
16. This It seems happens because of the zero_extend in insn 17. I am
using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend
gets generated and the result of the ior operation is immediately
copied into memory. I am compiling this case with -O3. Can  anybody
please tell me how this problem can be overcome.


CSE/FRE or VRP do not track bit operations and CSE of bits.  To overcome this
you need to implement such.

Richard.


Re: -Wconversion versus libstdc++

2007-01-17 Thread Richard Guenther

On 17 Jan 2007 16:36:04 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:

Paolo Carlini <[EMAIL PROTECTED]> writes:

| Joe Buck wrote:
|
| >In the case of the containers, we are asserting/relying on the fact that
| >the pointer difference is zero or positive.  But this has become a
| >widespread idiom: people write their own code in the STL style.  If STL
| >code now has to be fixed to silence warnings, so will a lot of user code.
| >
| Good point. About it, we should also take into account the recent
| messages from Martin, pointing out that many C++ front-ends do not
| warn for signed -> unsigned.

I just built firefox (CVS) with GCC mainline.  The compiler spitted
avalanches of non-sensical warning about conversions signed ->
unsigned may alter values, when in fact the compiler knows that
such things cannot happen.

First, let's recall that GCC supports only 2s complement targets.

Second, a conversion from T to U may alter value if a round trip is
not the identity function.  That is, there exists a value t in T
such that the assertion

   assert (T(U(t)) == t)

fails.


I think it warns if U(t) != t in a mathematical sense (without promoting
to the same type for the comparison), so it warns as (unsigned)-1 is
not "-1".

I agree this warning is of questionable use and should be not enabled
with -Wall.

Richard.


Re: gcc compile time support for assumptions

2007-01-18 Thread Richard Guenther

On 18 Jan 2007 07:51:51 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Andrew Haley <[EMAIL PROTECTED]> writes:

> Ian Lance Taylor writes:
>  > Abramo Bagnara <[EMAIL PROTECTED]> writes:
>  >
>  > > I'd like to know if gcc has implemented some generic way to help
>  > > optimizer job by allowing programmers to specify assumptions (or
>  > > constraints).
>  >
>  > The answer is no, there is nothing quite like you describe.
>  >
>  > But I think it would be a good idea.
>
> Something like this would greatly improve the code generation quality
> of gcj.  There are a great many assertions that I could pass to VRP
> and the optimizers: this is invariant, this is less than that, and so
> on.

Well, internally, we do have ASSERT_EXPR.  It would probably take a
little work to permit the frontends to generate it, but the optimizers
should understand it.


Providing a __builtin_assert () function is still one thing on my TODO, we can
derive proper ASSERT_EXPRs from it in VRP even in the -DNDEBUG case.
Of course if the asserts still end up in the source VRP already can derive
information from the IL of the assert code.

Richard.


Re: gcc compile time support for assumptions

2007-01-18 Thread Richard Guenther

On 1/18/07, Robert Dewar <[EMAIL PROTECTED]> wrote:

Andrew Haley wrote:
> Ian Lance Taylor writes:
>  > Abramo Bagnara <[EMAIL PROTECTED]> writes:
>  >
>  > > I'd like to know if gcc has implemented some generic way to help
>  > > optimizer job by allowing programmers to specify assumptions (or
>  > > constraints).
>  >
>  > The answer is no, there is nothing quite like you describe.
>  >
>  > But I think it would be a good idea.
>
> Something like this would greatly improve the code generation quality
> of gcj.  There are a great many assertions that I could pass to VRP
> and the optimizers: this is invariant, this is less than that, and so
> on.

Note that such assertions also can function as predicates to be
discharged in a proof engine. See work on SPARK (www.www.praxis-his.com).

One thing to consider here is whether to implement just a simple
assume as proposed, or a complete mechanism for pre and post
assertions (in particular, allowing you to talk about old values),
then it can serve as

a) a mechanism for programming by contract
b) a mechanism for interacting with proof tools
c) a mechanism to improve generated code as suggested in this thread

all at the same time

Eiffel should be examined for inspiration on such assertions.


Sure.  I only was thinking about the interface to the middle-end where new
builtin functions are the easiest way to provide assumptions.  If you have
further ideas they are certainly appreciated (without me being required to
research about Eiffel ...).

Thanks,
Richard.


Re: gcc compile time support for assumptions

2007-01-18 Thread Richard Guenther

On 18 Jan 2007 10:19:37 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:

Paolo Carlini <[EMAIL PROTECTED]> writes:

| Richard Guenther wrote:
|
| > Providing a __builtin_assert () function is still one thing on my
| > TODO, we can
| > derive proper ASSERT_EXPRs from it in VRP even in the -DNDEBUG case.
|
| Great! Certainly could be profitably used in libstdc++.

Indeed!

We would just make sure that people don't think there is a relation
between __builtin_assert() and assert(), and there usually are
between __builtin_xxx() and xxx().


Ok, so better name it __builtin_assume () then.

Richard.


Re: Miscompilation of remainder expressions

2007-01-19 Thread Richard Guenther

On 1/19/07, Joe Buck <[EMAIL PROTECTED]> wrote:

On Thu, Jan 18, 2007 at 05:36:23PM -0500, Robert Dewar wrote:
> Morten Welinder wrote:
> >>For sure a/b is undefined
> >
> >In C, it is.  In assembler it is perfectly well defined, i.e., it
> >traps.  But how is the
> >trap handler supposed to know the source of a given instruction?
> >
> >M.
>
> Sure it traps, but what happens when that trap occurs is of course
> O/S dependent, there is no guarantee of the effect.

We're going around in circles.  Everything has been said. What if we put
together a Wiki page with the problem and possible solutions, as in

a) do nothing, no big deal.  This is what happens if no one contributes
   code in any case.

b) add a flag to generate code to compute something like
divisor == -1 ? 0 : rem(dividend,divisor)
   or
divisor == rem(dividend,abs(divisor))
   in an efficient way (cmov to avoid busting the pipeline)


Note that cmov is not the right thing to use here (at least on
ia32 and x86_64) as it is more expensive than compare and jump
if not the probability of taking either way are the same (which I
would not expect here, as divisor == -1 should be nearly never
the case).

That said, a transformation on the tree level would make it possible
for VRP to optimize away the special casing.

Richard.


Re: [RFC] Our release cycles are getting longer

2007-01-24 Thread Richard Guenther

On 1/24/07, David Carlton <[EMAIL PROTECTED]> wrote:

On Tue, 23 Jan 2007 17:54:10 -0500, Diego Novillo <[EMAIL PROTECTED]> said:

> So, I was doing some archeology on past releases and we seem to be
> getting into longer release cycles.

Interesting.

I'm a GCC observer, not a participant, but here are some thoughts:

As far as I can tell, it looks to me like there's a vicious cycle
going on.  Picking an arbitrary starting point:

1) Because lots of bugs are introduced during stage 1 (and stage 2),
   stage 3 takes a long time.

2) Because stage 3 takes a long time, development branches are
   long-lived.  (After all, development branches are the only way to
   do work during stage 3.)

3) Because development branches are long-lived, the stage 1 merges
   involve a lot of code.

4) Because the stage 1 merges involve a lot of code, lots of bugs are
   introduced during stage 1.  (After all, code changes come with
   bugs, and large code changes come with lots of bugs.)

1) Because lots of bugs are introduced during stage 1, stage 3 takes a
   long time.


Now, the good news is that this cycle can be a virtuous cycle rather
than a vicious cycle: if you can lower one of these measurements
(length of stage 3, size of branches, size of patches, number of
bugs), then the other measurements will start going down.  "All" you
have to do is find a way to mute one of the links somehow, focus on
the measurement at the end of that link, and then things will start
getting better.


Indeed this is a good observation.  If you look at how the
linux-kernel development
changed to face and fix this problem and map this to gcc development we would
have like

1. a two-week stage1 for merging
2. six weeks for bugfixing
3. a release
1. ...

note that we would not have maintained FSF release branches, but if
time is right
vendors would pick up a release and keep it maintained (semi-officially).  This
would allow focusing on "interesting" releases and avoid piling up too
much development
on branches, at the same time we can enforce more strict quality rules
on merges,
because missing one merge window is not delaying the merge for one
year (as now),
but only two month.

Note that the above model would basically force all development on
non-minor stuff
to happen on branches, "working" on the mainline is not possible in
this scheme.  (This
of course fits the distributed development model of the linux kernel more)

Of course the two-week and six-week numbers are not fixed, but keeping
both numbers
low encourages good quality work (and keeping the first number low is
a requirement
anyway).

We'd have a lot of releases of course - but that's only a naming problem.

Richard.


Re: [RFC] Our release cycles are getting longer

2007-01-25 Thread Richard Guenther

On 1/25/07, Steven Bosscher <[EMAIL PROTECTED]> wrote:

On 1/25/07, H. J. Lu <[EMAIL PROTECTED]> wrote:
> > >Gcc 4.2 has a serious FP performace issue:
> > >
> > >http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html
> > >
> > >on both ia32 and x86-64. If there will be a 4.2.0 release, I hope it
> > >will be addressed.
> >
> > As always, the best way to ensure that it is addressed if it is
> > important to you is to address it yourself, or pay someone to do so :-)
>
> The fix is in mainline. The question is if it should be backported to
> 4.2.

ISTR Dan already made it clear more than once that the answer to that
question is a loud NO.


I thought it was more like "if you really want it I can do it".  And I
think without
it 4.2 sucks.

Richard.


Re: GCC-4.0.4 release status

2007-01-25 Thread Richard Guenther

On 25 Jan 2007 10:29:27 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:


Hi,

  There were over 250 PRs open against GCC-4.0.4.  Almost all of
them are "benign" in the sense that we can leave without fixing them
in GCC-4.0.4 -- many are already fixed in more recent versions.
I'm now giving attention only to those PRs marked as blocker or
critical.  I've identified three:

  tree-optimization/29605

 * Wrong code generation when optimization is one.
   Andrew Pinski believes it is "Out of SSA" doing it.

  This touches middle-end, and will be left unfixed -- unless the
  fix is really really trivial.

  tree-optimization/28778
 * not to be fixed in GCC-4.0.x

  middle-end/28683
 * has a "simple" patch applied to GCC-4.2.x and GCC-4.1.x.
   I'm considering applying to GCC-4.0.4.


You might want to consider middle-end/28651 given the recent integer overflow
discussions.  I can do the backport work if you like.

Richard.


Re: GCC-4.0.4 release status

2007-01-25 Thread Richard Guenther

On 1/25/07, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:

On Thu, 25 Jan 2007, Richard Guenther wrote:

| You might want to consider middle-end/28651 given the recent integer overflow
| discussions.

Good suggestion!

| I can do the backport work if you like.

If you could work out a backport by tomorrow morning (CST), that would
would great.


Done.  Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.


Re: GCC-4.0.4 release status

2007-01-25 Thread Richard Guenther

On 1/25/07, Volker Reichelt <[EMAIL PROTECTED]> wrote:

Hi,

> | Well, there's another serious (wrong-code) bug which should be fixed
IMHO:
> |
> | PR c++/29106 is a C++ front-end issue.
> | It has a one-line fix (plus comment) on the 4.1 branch.
> | Well, actually one should also backport the fix for PR c++/28284 then,
> | which is a two-liner.

> I was primarily looking at the PRs that marked in the bugzilla
> database blocker or critical.  As there were over 256 PRs open, and
> the idea is to get GCC-4.0.4 out of the door as soon as possible, I'm
> not trying to fix everything; just those that are critical or
> blockers.  This is based on the fact that most distros have moved to
> GCC-4.1.x or higher.  GCC-4.0.x has been since GCC-4.0.0 to contain
> major shortcomings.

Well, the severity status of the bugs is not very well maintained.
Mark e.g. only sets the prioriy field (P1 - P5) of the bugs.
And PR 29106 bug is one of the 37 P1 bugs. And one from three
wrong-code P1 bugs. So this is not like every simple error-recovery
problem.

In addition this is a regression from GCC 4.0.2, i.e. a regression
on the 4.0 branch. Which makes this bug even worse, IMHO.
(This infromation seems to be missing in bugzilla, though.)

Considering how much dispute there is on the mailing list about how
to handle undefined behaviour correctly ;-), it bothers me more that
we ignore one-lines fixes for wrong-code bugs.


I think regressions on the branch are worth a fix.  Though I agree
that the primary goal should be to get rid of the 4.0 branch ;)

Richard.


Re: Which optimization levels affect gimple?

2007-01-26 Thread Richard Guenther

On 1/26/07, Diego Novillo <[EMAIL PROTECTED]> wrote:

Paulo J. Matos wrote on 01/26/07 06:52:

> Is the output of -fdump-tree-optimized a subset of GIMPLE?
>
Yes.  The output is an incomplete textual representation of the GIMPLE
form of the program.


It's after doing TER, so the statements are no longer valid GIMPLE statements.

Richard.


Re: Signed int overflow behavior in the security context

2007-01-27 Thread Richard Guenther

On 1/27/07, Paul Schlie <[EMAIL PROTECTED]> wrote:

>> On Fri, Jan 26, 2007 at 06:57:43PM -0500, Paul Schlie wrote:
> Likewise, if the program has an uninitialized variable, the behavior
> will differ depending on details of optimization and how variables are
> assigned to memory.  Heap allocated for the first time might always be
> zero (as the OS delivers it that way), turning on optimization might then
> result in a nonzero initial value because of reuse of a register.

- I would argue that in this circumstance although the resulting value may
differ, the results are actually equivalent; as in both circumstances the
value returned is the value associated with it's storage location; and as
the value of all storage locations are arbitrary unless otherwise well
specified; the result may change from run to run regardless of any applied
optimizations.


If you read from an uninitialized variable twice you might as well get
a different result each time.  This is exactly the same issue than with
signed overflow and observable behavior - though as somebody notes
later - the uninitialized variable case doesn't stir up too many peoples
mind.

Richard.


Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!

2007-01-27 Thread Richard Guenther

On 1/28/07, Steven Bosscher <[EMAIL PROTECTED]> wrote:

Hello rth,

Can you explain what went through your mind when you picked the
tree_exp.complexity field for something implemented new...  :-(


Not much...

http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01117.html

"... but my brain isn't working anymore."

;)

Richard.


Re: remarks about g++ 4.3 and some comparison to msvc & icc on ia32

2007-01-28 Thread Richard Guenther

On 1/28/07, tbp <[EMAIL PROTECTED]> wrote:

Let it be clear from the start this is a potshot and while those
trends aren't exactly new or specific to my code, i haven't tried to
provide anything but specific data from one of my app, on
win32/cygwin.

Primo, gcc getting much better wrt inling exacerbates the fact that
it's not as good as other compilers at shrinking the stack frame size,
and perhaps as was suggested by Uros when discussing that point a pass
to address that would make sense.
As i'm too lazy to properly measure cruft across multiple compilers,
i'll use my rtrt app where i mostly control large scale inlining by
hand.
objdump -wdrfC --no-show-raw-insn $1|perl -pe 's/^\s+\w+:\s+//'|perl
-ne 'printf "%4d\n", hex($1) if /sub\s+\$(0x\w+),%esp/'|sort -r| head
-n 10

msvc:2196 2100 1772 1692 1688 1444 1428 1312 1308 1160
icc: 2412 2280 2172 2044 1928 1848 1820 1588 1428 1396
gcc: 2604 2596 2412 2076 2028 1932 1900 1756 1720 1132


It would have been nice to tell us what the particular columns in
this table mean - now we have to decrypt objdump params and
perl postprocessing ourselves.

(If you are interested in stack size related to inlining you may want
to tune --param large-stack-frame and --param large-stack-frame-growth).

Richard.


Re: gcc 4.1.1: char *p = "str" puts "str" into rodata

2007-01-28 Thread Richard Guenther

On 1/28/07, Denis Vlasenko <[EMAIL PROTECTED]> wrote:

char p;
int main() {
p = "";
return 0;
}

Don't you think that "" should end up in rw data?


Why?  It's not writable after all.

Richard.


Re: remarks about g++ 4.3 and some comparison to msvc & icc on ia32

2007-01-28 Thread Richard Guenther

On 1/28/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:

> tbp wrote:
>
> > Secundo, while i very much appreciate the brand new string ops, it
> > seems that on ia32 some array initialization cases where left out,
> > hence i still see oodles of 'movl $0x0' when generating code for k8.
> > Also those zeroings get coalesced at the top of functions on ia32, and
> > i have a function where there's 3 pages of those right after prologue.
> > See the attached 'grep 'movl   $0x0' dump.
>
> It looks like Jan and Richard have answered some of your questions about
> inlining (or are in the process of doing so), but I haven't seen a
> response to this point.
>
> Certainly, if we're generating zillions of zero-initializations to
> contiguous memory, rather than using memset, or an inline loop, that
> seems unfortunate.  Would you please file a bug report?

I though the comment was more reffering to fact that we will happily
generate
movl $0x0,  place1
movl $0x0,  place2
...
movl $0x0,  placeMillion

rather than shorter
xor %eax, %eax
movl %eax, ...
but indeed both of those issues should be addressed (and it would be
interesting to know where we fail ty synthetize memset in real
scenarios).

With the repeated mov issue unforutnately I don't know what would be the
best place: we obviously don't want to constrain register allocation too
much and after regalloc I guess only machine dependent pass is the hope
that is pretty ugly (but not that difiuclt to code at least at local
level).


One source of these patterns is SRA decomposing structures and
initialization.  But the structure size we do that for is limited (I also
believe we already have bugreports about this, but cannot find them
right now).

Richard.


Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!

2007-01-29 Thread Richard Guenther

On 1/29/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Steven Bosscher wrote:
> On 1/29/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:
>> Email is a tricky thing.  I've learned -- the hard way -- that it's best
>> to put a smiley on jokes, because otherwise people can't always tell
>> that they're jokes.
>
> I did use a smiley.
>
> Maybe I should put the smiley smiling then, instead of a sad looking
> smlley.

To me, they do mean very different things.  The sad smiley didn't say
"joke"; it said "boo!".  I think that a happy smiley would help.


Well, a "joke" smiley on the first sentence

Can you explain what went through your mind when you picked the
tree_exp.complexity field for something implemented new...  :-(

would have been inappropriate.  There is no joke in this sentence, there
is disappointment in it.


Like I say, I've had exactly the same problem with my own humor-impaired
recipients.  So, I think it's best just to live in constant fear that
people don't think things are funny. :-)


There's a later ;) simley in the mail and maybe you missed one after
the second paragraph.  (certainly you did)

I don't read that mail as anywhere insulting or inappropriate, maybe too
"informal" for this list (!?).

I appreciate Stevens contributions and willingness to fix things in GCC
that appearantly nobody else wants to tackle.  (I and others owe you
beer for that!)

Thanks,
Richard.


Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!

2007-01-29 Thread Richard Guenther

On 1/29/07, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> There's a later ;) simley in the mail and maybe you missed one after
> the second paragraph.  (certainly you did)

Then I guess the question is: what is the scope of a smiley?  Does it
retroactively cover all the preceding sentences, including the subject?


Good point.  Personally I tend to annotate sentences with (or without)
smiley ;)  If I want a smiley to apply to like the whole mail it would be
on a separate paragraph.  Annotating a complete paragraph I don't know
how to do.

;)  (whole-mail-smiley)

Richard.


Re: Does anyone recognize this --enable-checking=all bootstrap failure?

2007-01-30 Thread Richard Guenther

On 1/30/07, Brooks Moses <[EMAIL PROTECTED]> wrote:

I've been trying to track down an build failure that I was pretty sure
came about from some patches I've been trying to merge, but I've now
reduced things to a bare unmodified tree and it's still failing.  I
could have sworn that it wasn't doing that until I started adding
things, though, so I'm posting about it here before I make a bug report
so y'all can tell me if I did something dumb.

Essentially, what's happening is that I've got a build tree configured
with --enable-checking=all, and the first time it tries to use xgcc to
compile something, it dies with an ICE.  Here are the gory details:


This is fold-checking tripping over bugs - this is also well-known, just
don't use --enable-checking=all.  (but =yes)

Richard.


Re: Interesting build failure on trunk

2007-02-01 Thread Richard Guenther

On 2/1/07, Ismail Dönmez <[EMAIL PROTECTED]> wrote:

On Wednesday 31 January 2007 11:26:38 Ismail Dönmez wrote:
> On Tuesday 30 January 2007 18:43:52 Ian Lance Taylor wrote:
> > Ismail Dönmez <[EMAIL PROTECTED]> writes:
> > > I am getting this when I try to compile gcc trunk:
> > >
> > > ../../libcpp/../include -I../../libcpp/include  -march=i686 -O2 -pipe
> > > -fomit-frame-pointer -U_FORTIFY_SOURCE -fprofile-use -W -Wall
> > > -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
> > > -Wold-style-definition -Wmissing-format-attribute -pedantic
> > > -Wno-long-long -Werror -I../../libcpp -I. -I../../libcpp/../include
> > > -I../../libcpp/include  -c -o files.o -MT files.o -MMD -MP -MF
> > > .deps/files.Po ../../libcpp/files.c ../../libcpp/files.c: In function
> > > 'read_name_map':
> > > ../../libcpp/files.c:1238: internal compiler error: Floating point
> > > exception Please submit a full bug report,
> > > with preprocessed source if appropriate.
> > > See http://gcc.gnu.org/bugs.html> for instructions.
> >
> > libcpp/files.c:1238 seems to be a call to memcpy.  I don't see
> > anyplace a floating point exception might come from.  I've certainly
> > never seen anything like that.
>
> I think this is an hardware error

Ok its not, I tried to build on AMD64 3500+ 1GB RAM ( unlike my Centrino
laptop, totally different hardware ) and it crashes in the same exact way.
now my guess is host compiler is somehow hosed , bad news for gcc 4.2 I
guess.


This is probably PR30650 (just don't use profiledbootstrap).

Richard.


Re: build failure? (libgfortran)

2007-02-05 Thread Richard Guenther

On 2/5/07, Dorit Nuzman <[EMAIL PROTECTED]> wrote:

Grigory Zagorodnev <[EMAIL PROTECTED]> wrote on 05/02/2007
08:18:34:

> Dorit Nuzman wrote:
> > I'm seeing this bootstrap failure on i686-pc-linux-gnu (revision
121579) -
> > something I'm doing wrong, or is anyone else seeing this?
>
> Yes. I see the same at x86_64-redhat-linux.
>

Thanks.

Turns out I see the same problem on ppc64-yellowdog-linux


This is because we now fixinclude sysmacros.h and libgfortran is built with
-std=gnu99.

Caused by:

2007-02-03  Bruce Korb <[EMAIL PROTECTED]>

   * inclhack.def (glibc_c99_inline_4): replace "extern" only if
   surrounded by space characters.

Richard.


Re: GCC 4.1.2 Status Report

2007-02-05 Thread Richard Guenther
On Sun, 4 Feb 2007, Mark Mitchell wrote:

> [Danny, Richard G., please see below.]
> 
> Thanks to all who have helped tested GCC 4.1.2 RC1 over the last week.
> 
> I've reviewed the list traffic and Bugzilla.  Sadly, there are a fair
> number of bugs.  Fortunately, most seem not to be new in 4.1.2, and
> therefore I don't consider them showstoppers.
> 
> The following issues seem to be the 4.1.1 regressions:
> 
>   http://gcc.gnu.org/wiki/GCC_4.1.2_Status
> 
> PR 28743 is only an ICE-on-invalid, so I'm not terribly concerned.
> 
> Daniel, 30088 is another aliasing problem.  IIIRC, you've in the past
> said that these were (a) hard to fix, and (b) uncommon.  Is this the
> same problem?  If so, do you still feel that (b) is true?  I'm
> suspicious, and I am afraid that we need to look for a conservative hack.

PR30708 popped up as well and maybe related (and easier to analyze as
it's C-only).  Both don't seem to be regressions on the branch, though.

> Richard, 30370 has a patch, but David is concerned that we test it on
> older GNU/Linux distributions, and suggested SLES9.  Would you be able
> to test that?

The patch bootstrapped and tested ok on SLES9-ppc.

> Richard, 29487 is an issue raised on HP-UX 10.10, but I'm concerned that
> it may reflect a bad decision about optimization of C++ functions that
> don't throw exceptions.  Would you please comment?

I believe reverting this patch on the branch is the only option (to
fixing the HP-UX assembler).  I don't know of any real-life issue
it fixes.

Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Scheduling an early complete loop unrolling pass?

2007-02-05 Thread Richard Guenther

Hi,

currently with -ftree-vectorize we generate for

  for (i=0; i<3; ++i)
  # SFT.4346_507 = VDEF 
  # SFT.4347_508 = VDEF 
  # SFT.4348_509 = VDEF 
d[i] = 0.0;

  for (j=0; j:;
  vect_cst_.4501_723 = { 0.0, 0.0 };
  vect_p.4506_724 = (vector double *) &D.76822;
  vect_p.4502_725 = vect_p.4506_724;

  # ivtmp.4508_728 = PHI <0(6), ivtmp.4508_729(11)>
  # ivtmp.4507_726 = PHI 
  # ivtmp.4461_601 = PHI <3(6), ivtmp.4461_485(11)>
  # SFT.4348_612 = PHI 
  # SFT.4347_611 = PHI 
  # SFT.4346_610 = PHI 
  # i_582 = PHI <0(6), i_118(11)>
:;
  # SFT.4346_507 = VDEF 
  # SFT.4347_508 = VDEF 
  # SFT.4348_509 = VDEF 
  *ivtmp.4507_726 = vect_cst_.4501_723;
  i_118 = i_582 + 1;
  ivtmp.4461_485 = ivtmp.4461_601 - 1;
  ivtmp.4507_727 = ivtmp.4507_726 + 16B;
  ivtmp.4508_729 = ivtmp.4508_728 + 1;
  if (ivtmp.4508_729 < 1) goto ; else goto ;

  # i_722 = PHI 
  # ivtmp.4461_717 = PHI 
:;

  # ivtmp.4461_706 = PHI 
  # SFT.4348_707 = PHI 
  # SFT.4347_708 = PHI 
  # SFT.4346_709 = PHI 
  # i_710 = PHI 
:;
  # SFT.4346_711 = VDEF 
  # SFT.4347_712 = VDEF 
  # SFT.4348_713 = VDEF 
  D.76822.D.44378.values[i_710] = 0.0;
  i_714 = i_710 + 1;
  ivtmp.4461_715 = ivtmp.4461_706 - 1;
  if (ivtmp.4461_715 != 0) goto ; else goto ;

...

and we are later not able to do constant propagation to the
second loop which we can do if we first unroll such small loops.

As we also only vectorize innermost loops I believe doing a
complete unrolling pass early will help in general (I pushed
for this some time ago).

Thoughts?

Thanks,
Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: Scheduling an early complete loop unrolling pass?

2007-02-05 Thread Richard Guenther
On Mon, 5 Feb 2007, Paolo Bonzini wrote:

> 
> > As we also only vectorize innermost loops I believe doing a
> > complete unrolling pass early will help in general (I pushed
> > for this some time ago).
> > 
> > Thoughts?
> 
> It might also hurt, though, since we don't have a basic block vectorizer.
> IIUC the vectorizer is able to turn
> 
>   for (i = 0; i < 4; i++)
> v[i] = 0.0;
> 
> into
> 
>   *(vector double *)v = (vector double){0.0, 0.0, 0.0, 0.0};

That's true.  But we can not do constant propagation out of this
(and the vectorizer leaves us with a lot of cruft which is only
removed much later).

The above case would also ask for an early vectorization pass if the
loop was wrapped into another.

Finding a good heuristic for which loops to completely unroll early
is not easy, though for odd small numbers of iterations it is
probably always profitable.

Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: Scheduling an early complete loop unrolling pass?

2007-02-05 Thread Richard Guenther
On Mon, 5 Feb 2007, Jan Hubicka wrote:

> > 
> > Hi,
> > 
> > currently with -ftree-vectorize we generate for
> > 
> >   for (i=0; i<3; ++i)
> >   # SFT.4346_507 = VDEF 
> >   # SFT.4347_508 = VDEF 
> >   # SFT.4348_509 = VDEF 
> > d[i] = 0.0;
> 
> Also Tomas' patch is supposed to catch this special case and convert it
> into memset that should be subsequently optimized into assignment that
> should be good enough (which reminds me that I forgot to merge the
> memset part of stringop optimizations).
> 
> Perhaps this can be made a bit more generic and construct INIT_EXPRs for
> small arrays directly from Tomas's pass (going from memset to assignment
> works just in special cases).  Tomas, what is the status of your patch?

It would be certainly interesting to make constant propagation work in 
this case (though after vectorization aliasing is in an "interesting" 
state).

> Did you run some benchmarks?

Not yet - I'm looking at the C++ SPEC 2006 benchmarks at the moment
and using vectorization there seems to do a lot of collateral damage
(maybe not measurable though).

Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: GCC 4.1.2 Status Report

2007-02-05 Thread Richard Guenther

On 2/5/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

>> Daniel, 30088 is another aliasing problem.  IIIRC, you've in the past
>> said that these were (a) hard to fix, and (b) uncommon.  Is this the
>> same problem?  If so, do you still feel that (b) is true?  I'm
>> suspicious, and I am afraid that we need to look for a conservative hack.
>
> It's certainly true that people will discover more and more aliasing
> bugs the harder they work 4.1 :)

Do you think that PR 30088 is another instance of the same problem, and
that therefore turning off the pruning will fix it?


Disabling pruning will also increase memory-usage and compile-time.


> There is always the possibility of turning off the pruning, which will
> drop our performance, but will hide most of the latent bugs we later
> fixed through rewrites well enough that they can't be triggered (the
> 4.1 optimizers aren't aggressive enough).

Is it convenient for you (or Richard?) to measure that on SPEC?
(Richard, thank you very much for stepping up to help with the various
issues that I've raised for 4.1.2!)  Or, have we already done so, and
I've just forgotten?  I'm very mindful of the import or performance, but
if we think that these aliasing bugs are going to affect reasonably
large amounts of code (which I'm starting to think), then shipping the
compiler as is seems like a bad idea.

(Yes, there's a slippery slope argument whereby we turn off all
optimization, since all optimization passes may have bugs.  But, if I
understand correctly, the aliasing algorithm in 4.1 has relatively
fundamental problems, which is rather different.)


I don't think we need to go this way - there is a workaround available
(use -fno-strict-aliasing) and there are not enough problems to warrant
this.

Richard.


Re: Scheduling an early complete loop unrolling pass?

2007-02-06 Thread Richard Guenther
On Tue, 6 Feb 2007, Dorit Nuzman wrote:

> > Hi Richard,
> >
> >
> ...
> > However...,
> >
> > I have seen cases in which complete unrolling before vectorization
> enabled
> > constant propagation, which in turn enabled significant simplification of
> > the code, thereby, in fact making a previously unvectorizable loop (at
> > least on some targets, due to the presence of divisions, unsupported in
> the
> > vector unit), into a loop (in which the divisions were replaced with
> > constants), that can be vectorized.
> >
> > Also, given that we are working on "SLP" kind of technology (straight
> line
> > code vectorization), which would make vectorization less sensitive to
> > unrolling, I think maybe it's not such a bad idea after all... One option
> > is to increase the default value of --param min-vect-loop-bound for now,
> > and when SLP is incorporated, go ahead and schedule early complete
> > unrolling. However, since SLP implementation may take some time
> (hopefully
> > within the time frame of 4.3 though) - we could just go ahead and
> schedule
> > early complete unrolling right now. (I can't believe I'm in favor of this
> > idea, but that loop I was talking about before - improved by a factor
> over
> > 20x when early complete unrolling + subsequent vectorization were
> > applied...)
> >
> 
> After sleeping on it, it actually makes a lot of sense to me to schedule
> complete loop unrolling before vectorization - I think it would either
> simplify loops (sometimes creating more opportunities for vectorization),
> or prevent vectorization of loops we probably don't want to vectorize
> anyhow, and even that - only temporarily - until we have straight-line-code
> vectorization in place. So I'm all for it.

Ok, I'll dig out the patches I had for this and will do some SPEC
runs.  As soon as I have time for this (no hurry necessary here I think).

Thanks!
Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: Scheduling an early complete loop unrolling pass?

2007-02-06 Thread Richard Guenther
On Tue, 6 Feb 2007, Dorit Nuzman wrote:

> Ira Rosen/Haifa/IBM wrote on 06/02/2007 11:49:17:
> 
> > Dorit Nuzman/Haifa/IBM wrote on 05/02/2007 21:13:40:
> >
> > > Richard Guenther <[EMAIL PROTECTED]> wrote on 05/02/2007 17:59:00:
> > >
> ...
> > >
> > > That's going to change once this project goes in: "(3.2) Straight-
> > > line code vectorization" from http://gcc.gnu.
> > > org/wiki/AutovectBranchOptimizations. In fact, I think in autovect-
> > > branch, if you unroll the above loop it should get vectorized
> > > already. Ira - is that really the case?
> >
> > The completely unrolled loop will not get vectorized because the
> > code will not be inside any loop (and our SLP implementation will
> > focus, at least as a first step, on loops).
> 
> Ah, right... I wonder if we can keep the loop structure in place, even
> after completely unrolling the loop  - I mean the 'struct loop' in
> 'current_loops' (not the actual CFG), so that the "SLP in loops" would have
> a chance to at least consider vectorizing this "loop". Zdenek - what do you
> say?

Well, usually if it's not inside another loop it can't be performance
critical ;)  At least if there would be a setup cost for the vectorized
variant.

I don't think we need to worry about this case until a real testcase
for this comes along.

Richard.

-- 
Richard Guenther <[EMAIL PROTECTED]>
Novell / SUSE Labs


Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-18 Thread Richard Guenther

On 2/18/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 2/17/07, H. J. Lu <[EMAIL PROTECTED]> wrote:
> On Sat, Feb 17, 2007 at 01:35:28PM +0300, Vladimir Sysoev wrote:
> > Hello, Daniel
> >
> > It looks like your changeset listed bellow makes performance
> > regression ~40% on SPEC2006/leslie3d. I will try to create minimal
> > test for this issue this week and update you in any case.
> >
>
> That is a known issue:
>
> http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html

Yes, it is something we sadly cannot do anything about without doing a
very large number of backports.

There were some seriously broken things in 4.2's aliasing that got
fixed properly in 4.3.
The price of fixing them in 4.2 was a serious performance drop.


There's the option of un-fixing them to get back to the state of 4.1 declaring
them fixed in 4.3 earliest.

Richard.


Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-20 Thread Richard Guenther

On 2/20/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

So, my feeling is that the best course of action is to set a relatively
low threshold for GCC 4.2.0 and target 4.2.0 RC1 soon: say, March 10th.
  Then, we'll have a 4.2.0 release by (worst case, and allowing for
lameness on my part) March 31.

Feedback and alternative suggestions are welcome, of course.


I'd vote for reverting the aliasing fixes on the 4.2 branch to get
back the speed
and the 4.1 bugs and release that.  I expect that 4.2 will be made available in
some form to openSUSE users because of gfortran and OpenMP improvements.

Richard.


Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-20 Thread Richard Guenther

On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:


As for proposal to revert the aliasing fixes on the 4.2, IMHO aliasing
bugs are pretty nasty it is hard to find a option to work around because
alias info is used in many optimizations.


All bugs we are talking about can be worked around by using
-fno-strict-aliasing.

Richard.


Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-20 Thread Richard Guenther

On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:

> On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:
>
>> As for proposal to revert the aliasing fixes on the 4.2, IMHO aliasing
>> bugs are pretty nasty it is hard to find a option to work around because
>> alias info is used in many optimizations.
>
>
> All bugs we are talking about can be worked around by using
> -fno-strict-aliasing.
>
It is too conservative and I think seriously descreases performance
too.  Although it would be interesting to see what degradation we would
have with usage of the option. May be it is not so bad. In that case,
your proposal would be reasonable because it is much less work.


Well, in case you need to decide between wrong-code or decreasing performance
I'd choose decreasing performance ;)  I only wanted to make the point that
reverting the patches in question will not introduce wrong-code bugs that cannot
be worked around.  If the working around is convenient or if the wrong-code
bugs are easy to spot is another question, but in theory we have all
these issues
with 4.1 as well.  So to clarify, I didn't suggest that we or the
casual user should
use -fno-strict-aliasing by default.

Richard.


Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-20 Thread Richard Guenther

On 2/20/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Mark Mitchell wrote:
>> FP performance regressions of the recent GCC 4.2 (revision 120817)
>> compiler against September GCC 4.2 (revision 116799)
> What does that translate to in terms of overall score?
>
Hi,
This is 4.7% drop of SPECfp_base2006 ratio (geomean of individual FP
ratios).

Here is the full set of changes in cpu2k6/fp performance of GCC 4.2
compiler between r116799 and r120817, measured on Intel Core2 Duo at -O2
optimization level.


Do you happen to have a 4.1.x baseline to compare those against?

Richard.


410.bwaves  -6.3%
416.gamess  -0.7%
433.milc-7.0%
434.zeusmp  0.0%
435.gromacs -0.4%
436.cactusADM   -1.1%
437.leslie3d-25.4%
444.namd0.0%
447.dealII  -1.4%
450.soplex  -3.9%
453.povray  0.0%
454.calculix-0.3%
459.GemsFDTD-18.3%
465.tonto   -2.5%
470.lbm -4.1%
481.wrf -2.7%
482.sphinx3 -1.2%
SPECfp_base2006 -4.7%

- Grigory



Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-21 Thread Richard Guenther

On 2/21/07, Benjamin Kosnik <[EMAIL PROTECTED]> wrote:


> 4.0 branched with critical flaws that were not noticed until 4.2.0 which
> is why we end up with the missed optimiation regression in the first place.
>
> So the question is do we want to correct the regressions or not, because
> right now we sound like we don't.  Which regression is more important?
> Wrong code or missed optimiation, I still say wrong code.  I know other
> people will disagree with me but guess what they can disagree with me all
> they want, they will never win.  This is the same thing with rejecting
> legal code and compile time regressions (I remember a patch Apple complained
> about because it caused a compile time regression but it fixed a reject legal,
> even recently with some ICE and -g and a compile time regression though it
> does not effect C++).

Sure. This is one argument: we want correctness. Yipeee!!! There's no
real arguing this point. It looks like 4.2.x and 4.1.x as it stands now
can be made to be about the same when it comes to both correctness and
(perhaps) codegen.

However, you keep dropping the compile-time regression issue: is this on
purpose? It's more than just missed optimizations here. If 4.2.0 is 10%
slower than 4.1 and 4.3, then what? (In fact, I'm seeing 4.2.x as 22%
slower for C++ vs. mainline, when running libstdc++ testsuite. Fixing
this without merging Diego/Andrew patches for mem-ssa seem impossible.
Vlad has also posted compile-time results that are quite poor.) GCC
release procedures have historically been toothless when it comes to
compile-time issues, which rank less than even SPEC scores. However,
this is one of the things that GCC users evaluate in production
settings, and the current evaluation of 4.2.x in this area is going to
be pretty poor. (IMHO unusably poor.)


It's not only compile-time, it's also memory-usage unfortunately.


I think that another issue, which Vlad and Paolo pick up and I now echo,
is setting a good break point for major new infrastructure.
Historically, GCC development has prioritized time-based schedules
(which then slip), instead of feature-based schedules (which might
actually motivate the pricipals to do the heavy lifting required to get
to release shape: see C++ABI in 3.0.x-3.2.x, 3.4.x, and fortran and SSA
in 4.0, say java in 4.1.x). It seems to me that df (and LTO?) could be
this for what-might-be 4.3.0. It's worth considering this, and
experimenting with "release theory and practice."

As Kaveh said, all compilers ship with a mix of bugs and features. There
are some compelling new features in 4.2.0, and some issues. After
evaluating all the above, I think re-branching may be the best bet for
an overall higher-quality 4.2.x release series than 4.1.x.


I believe re-branching will not make 4.2.x better but will only use up scarce
resources.  We have piled up so much new stuff for 4.3 already that it will
be hard to stabilize it.  So either go for 4.2.0 as it is now with all
its regressions
but possibly critically more correctness, or get back not-correctness of 4.1.x
and fixing some of the regressions.

Re-branching is the worst thing we can do I believe.


Obviously, others disagree.


Indeed ;)

Richard.


Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-21 Thread Richard Guenther

On 2/21/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:


To be honest, my instinct for the FSF is to take the 4% hit and get rid
of this nasty class of bugs.  Users measure compiler quality by more
than just floating-point benchmarks; FP code is a relatively small
(albeit important, and substantial) set of all code.  That's why I asked
Danny for the patches in the first place.


Of course the speed of a compiler is measured on testcases where
speed matters - and this is usually FP code.  Now based on this reasoning
we could (as CodeSourcery probably did) enable -fno-strict-aliasing by
default, which fixes the class of 4.1.x bugs we are talking about.  With
leaving the possibility for the user to specify -fstrict-aliasing and get
back the speed for FP code with the risk of getting wrong-code.

Now, the realistic choices for 4.2.0 as I see them are, in order of my
personal preference:

1) Ship 4.2.0 as is
2) Ship 4.2.0 with the aliasing fixes reverted
3) no more reasonable option.  Maybe don't ship 4.2.0 at all.

so, I don't see backporting more patches or even re-branching as
a real option.

Richard.


Re: "Error: suffix or operands invalid for `push'" on 64bit boxes

2007-02-22 Thread Richard Guenther

On 2/22/07, Alok kataria <[EMAIL PROTECTED]> wrote:

Hi,

I was trying some inline assembly with  gcc.

Running the assembly program on 64 bit system ( with 64bit gcc
version) gives errors of the form

/tmp/cc28kO9v.s: Assembler messages:
/tmp/cc28kO9v.s:57: Error: suffix or operands invalid for `push'


This is off topic here, please use [EMAIL PROTECTED] for this kind
of questions.
Your assembly statements are not valid 64bit asm but only work on 32bit.

Richard.


Fold and integer types with sub-ranges

2007-02-23 Thread Richard Guenther

Currently for example in fold_sign_changed_comparison we produce
integer constants that are not inside the range of its type values
denoted by [TYPE_MIN_VALUE (t), TYPE_MAX_VALUE (t)].  For example
consider a type with range [10, 20] and the comparison created by
the Ada frontend:

 if ((signed char)t == -128)

t being of that type [10, 20] with TYPE_PRECISION 8, like the constant
-128.  So fold_sign_changed_comparison comes along and decides to strip
the conversion and convert the constant to type T which looks like

 
unit size 
user align 8 symtab 0 alias set 4 canonical type 0x2b8156099c00 
precision 8 min  max  RM size >
readonly sizes-gimplified public unsigned QI size  unit size 
user align 8 symtab 0 alias set 4 canonical type 0x2b8156099f00 
precision 8 min  max 
RM size  constant invariant 5>>

(note it's unsigned!)

So the new constant gets produced using force_fit_type_double with
the above (unsigned) type and the comparison now prints as

  if (t == 128)

and the new constant 128 now being out of range of its type:

  constant invariant 128>

(see the min/max values of that type above).

What do we want to do about that?  Do we want to do anything about it?
If we don't want to do anything about it, why care about an exact
TREE_TYPE of integer constants if the only thing that matters is
signedness and type precision?

Thanks for any hints,
Richard.


Re: Fold and integer types with sub-ranges

2007-02-23 Thread Richard Guenther
On Fri, 23 Feb 2007, Duncan Sands wrote:

> > Currently for example in fold_sign_changed_comparison we produce
> > integer constants that are not inside the range of its type values
> > denoted by [TYPE_MIN_VALUE (t), TYPE_MAX_VALUE (t)].  For example
> > consider a type with range [10, 20] and the comparison created by
> > the Ada frontend:
> > 
> >  if ((signed char)t == -128)
> > 
> > t being of that type [10, 20] with TYPE_PRECISION 8, like the constant
> > -128.  So fold_sign_changed_comparison comes along and decides to strip
> > the conversion and convert the constant to type T which looks like
> ...
> > What do we want to do about that?  Do we want to do anything about it?
> > If we don't want to do anything about it, why care about an exact
> > TREE_TYPE of integer constants if the only thing that matters is
> > signedness and type precision?
> 
> I don't think gcc should be converting anything to a type like t's unless
> it can prove that the thing it's converting is in the range of t's type.  So
> it presumably should try to prove: (1) that -128 is not in the range of
> t's type; if it's not, then fold the comparison to false; otherwise (2) try
> to prove that -128 is in the range of t's type; if so, convert it.  Otherwise
> do nothing.
> 
> That said, this whole thing is a can of worms.  Suppose the compiler wants to
> calculate t+1.  Of course you do something like this:
> 
> int_const_binop (PLUS_EXPR, t, build_int_cst (TREE_TYPE (t), 1), 0);
> 
> But if 1 is not in the type of t, you just created an invalid value!

Eeek ;)  True.  Another reason to not force us creating 1s and 0s in
such cases but allow simply

int_const_binop_int (PLUS_EXPR, t, 1)

of course while int_const_binop will truncate the result to its
TYPE_PRECISION one still needs to make sure the result fits in t.
That is, all the fancy TREE_OVERFLOW stuff only cares about
TYPE_PRECISION and never about the types 'true' range via min/max value.

> Personally I think the right thing to do is to eliminate these types
> altogether somewhere early on, replacing them with their base types
> (which don't have funky ranges), inserting appropriate ASSERT_EXPRs
> instead.  Probably types like t should never be seen outside the Ada
> f-e at all.

Heh - nice long-term goal ;)  Might raise the usual 
wont-work-for-proper-debug-info complaint though.

Richard.


Re: Fold and integer types with sub-ranges

2007-02-24 Thread Richard Guenther
On Fri, 23 Feb 2007, Richard Kenner wrote:

> > That said, this whole thing is a can of worms.  Suppose the compiler wants 
> > to
> > calculate t+1.  Of course you do something like this:
> > 
> > int_const_binop (PLUS_EXPR, t, build_int_cst (TREE_TYPE (t), 1), 0);
> > 
> > But if 1 is not in the type of t, you just created an invalid value!
> 
> Yes, but why not do all arithmetic in the base type, just like Ada itself
> does?  Then you use 1 in that base type.

Sure - I wonder if there is a reliable way of testing whether we face
a non-base type in the middle-end.  I suppose TREE_TYPE (type) != NULL
won't work in all cases... (?)

> > Personally I think the right thing to do is to eliminate these types
> > altogether somewhere early on, replacing them with their base types
> > (which don't have funky ranges), inserting appropriate ASSERT_EXPRs
> > instead.  Probably types like t should never be seen outside the Ada
> > f-e at all.
> 
> Not clear.  There is the debug issue plus VRP can use the range information
> for some things (we have to be very precise as to what).  If we keep the model
> of doing all arithmetic in the base type, there should be no problem.

I agree.  But appearantly fold does not care about base vs. non-base
types and happily strips conversions between them, doing arithmetic
in the non-base type.  Now the question was whether we want to (try to)
fix that for example by forcing gcc_assert (!TREE_TYPE (type)) on
integer types in int_cst_binop and fold_binary and other places?

Richard.


Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread Richard Guenther

On 2/27/07, Menezes, Evandro <[EMAIL PROTECTED]> wrote:

Honza,

> Well, rather than unstable, they seems to be more memory layout
> sensitive I would say. (the differences are more or less reproducible,
> not completely random, but independent on the binary itself. I can't
> think of much else than memory layout to cause it).  I always wondered
> if things like page coloring have chance to reduce this noise, but I
> never actually got around trying it.

You didn't mention the processors in your systems, but I wonder if they are dual-core.  If so, 
perhaps it's got to do with the fact that each K8 core has its own L2, whereas C2 chips have a 
shared L2.  Then, try preceding "runspec" with "taskset 0x02" to avoid the 
process from hopping between cores and finding cold caches (though the kernel strives to stick a 
process to a single core, it's not perfect).


Well, both britten and haydn are single core, two processor systems.  For
SPEC2k6 runs the problem is that the 2gb ram of the machine are
distributed over both numa nodes, so with the memory requirements of
SPEC2k6 we always get inter-node memory traffic.  Vangelis is a single
processor, single core system (and the most stable one).  Any idea on
how to force to use local memory only for a process?

Richard.


Re: Massive SPEC failures on trunk

2007-03-03 Thread Richard Guenther

On 03 Mar 2007 11:00:11 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Grigory Zagorodnev <[EMAIL PROTECTED]> writes:

> Trunk GCC shows massive (2 compile-time and 6 run-time) failures on
> SPEC CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization
> level. Regression introduced somewhere between revision 122487 and
> 122478.
>
> There are three checkins, candidates for the root of regression:
>   http://gcc.gnu.org/viewcvs?view=rev&revision=122487
>   http://gcc.gnu.org/viewcvs?view=rev&revision=122484
>   http://gcc.gnu.org/viewcvs?view=rev&revision=122479

Can you boil one of these problems down into a test case which you can
share?


There are segfaults on Polyhedron rnflow, linpk and test_fpu
as well.

Richard.


Re: Hi all, just an onfo about huw to use gcc tree

2007-03-03 Thread Richard Guenther

On 3/3/07, Fabio Giovagnini <[EMAIL PROTECTED]> wrote:

Thanks for the answer.
I go deeper in my thought so that maybe you can give me more infos about how I
have to investagte about gcc/gdb

We are developers of embedded software on board designed and build by us;
Generally we use 16 bit cisc and 32 bits risc cpu (Renesas h8/h8s; sh 2 / 3 ).
We write applications for automotive enviroment mainly.
We write the code using c/c++ and we compile using gcc.
Because of we do not have much memory (ram and flash) we develop in the
following way: each software we write has the same communication protocol
running on RS232 or over TCP/IP and we have a simple monitoring progarm able
to show the contents of the memory addresses in some format (char, int, long,
signed unsigned, strings, ecc).
Generally we declare a global variable, we write a debug value into it and
during the run time we read at the right moment the content of such a
variable.
Good.
For avoiding to read by hand .map file produced by ld, I developed a flex /
bison simple analizer able to extract from .map file the address of a symbol.
So into my tool I load the .map file and I write the name of the variable and
I can read the content of it.
This way of working becomes very hard if I use struct and union in c and
classes in c++; I should know the offeset of each field of the struct so
addind it to the base address known from the .map file, for each istance of
such a struct I coud write "mysrtuct.myfield" into my tool, and, calculating
the rigth address, my protocol could ask the target to read/write the content
at that address.

I prefer to avoid -g option because of my memory is never enough; but a good
compromise cound be if I could compile the debug infos into sections I could
remove after used by gdb for giving me the informations about the offset of
each field. Is it possible?


Just use the dwarf info produced by the compiler with -g and strip the
binary.  Of course I would debug embedded stuff using a simulator...

Richard.


Re: Massive SPEC failures on trunk

2007-03-12 Thread Richard Guenther

On 3/3/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Hi,
Trunk GCC shows massive (2 compile-time and 6 run-time) failures on SPEC
CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization level.
Regression introduced somewhere between revision 122487 and 122478.

There are three checkins, candidates for the root of regression:
http://gcc.gnu.org/viewcvs?view=rev&revision=122487
http://gcc.gnu.org/viewcvs?view=rev&revision=122484
http://gcc.gnu.org/viewcvs?view=rev&revision=122479

Here is the list of failures:

CPU2006/464.h264ref: GCC ICE
image.c: In function 'UnifiedOneForthPix':
image.c:1407: internal compiler error: in set_value_range, at
tree-vrp.c:267

CPU2006/447.dealII: GCC ICE
fe_tools.cc: In static member function 'static void
FETools::get_projection_matrix(const FiniteElement&,
const FiniteElement&, FullMatrix&)
[with int dim = 3, number = double]':
fe_tools.cc:322: error: definition in block 47 does not
dominate use in block 49 for SSA_NAME: NMT.4199_484 in
statement:
NMT.4199_645 = PHI 
PHI argument
NMT.4199_484
for PHI node
NMT.4199_645 = PHI 
fe_tools.cc:322: internal compiler error: verify_ssa failed


Most of the problems are fixed, dealII remains with:

/gcc/spec/sb-balakirew-head-64-2006/x86_64/install-hack/bin/g++ -c -o
quadrature.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS
-Ddeal_II_dimension=3  -O2   -DSPEC_CPU_LP64   quadrature.cc
quadrature.cc: In constructor 'Quadrature::Quadrature(const
std::vector, std::allocator > >&)':
quadrature.cc:64: error: 'atof' is not a member of 'std'

I guess that's a fallout from Paolos include streamlining in
libstdc++.  Looks like dealII
sources need a fix for that.

bzip2 fails with:

compress.c: In function 'BZ2_compressBlock':
compress.c:711: internal compiler error: in cse_find_path, at cse.c:5930
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [compress.o] Error 1

a known problem.  (x86_64,  -O3 -funroll-loops -fpeel-loops -ftree-vectorize)

Richard.


  1   2   3   4   5   6   7   8   9   10   >