Re: FW: How does GCC implement dynamic binding?

2006-10-13 Thread Daniel Berlin

Given all this, I posed this question to the gcc mailing list and
received a reply that directed me to the C++ ABI
(http://codesourcery.com/cxx-abi/), which is more detailed and has the
information I'm looking for.  However, I need to confirm, in the case of
an FAA audit, that GCC 3.3.1 implements dynamic binding in this fashion.
Can anyone on the steering committee "officially" confirm that GCC uses
static v-tables as described in the ABI?


The steering committee doesn't do this type of thing.
It follows the ABI, and uses static lookup tables.
If you want confirmation, you are free to look at the source code.
There is no legal entity that is going to give you some "official"
confirmation of some sort.

Sorry,
Dan


Re: GCC 4.2 branch created; mainline open in Stage 1

2006-10-23 Thread Daniel Berlin

>> As I understand it, it involves editing the mysql database by hand (well
>> by a script) instead of doing it inside bugzilla.  Daniel Berlin has
>> done that the last couple of releases.
>
> I have checked in the attached patch to add this step to the branching
> checklist.  I will now ask Daniel to help with the SQL bits.

Sorry, here's the patch.


So the way i actually do this is with a crappy perl script.
in /home/dberlin on sourceware.org, there is a script called 42changer.pl.

I just copy it to the next version number (43changer.pl in this case),
and edit the script to do so it changes 4.2 regression to 4.2/4.3
regression instead of changing (in this case) 4.1 regression to
4.1/4.2 regression.

I'm just saying this for posterity sake so if someone needs to lookup
how this actually gets done, they know :)

If i actually knew perl, i'm sure it would be trivial to just make
this stuff a command line parameter to the script.
But i'm afraid of perl, and afraid that if i tried, it would end up
only working on months that end in the letter r or something, since
that is perl for you.

Anyway, i made 43changer.pl and ran it, so the bug summaries have been updated.


Re: Re: LOOP_HEADER tree code?

2006-10-27 Thread Daniel Berlin

On 10/26/06, Steven Bosscher <[EMAIL PROTECTED]> wrote:


It is not a note, it's a statement. The problem with RTL loop notes
was that they were not statements, but rather markers, e.g. "a loop
starts/ends here".  The LOOP_HEADER node, on the other hand, is more
like a placeholder for the result of the number of iterations
computation. Basically it is a statement that does not produce a
result, but does have uses.

I don't see why a code motion pass would have to worry about the
LOOP_HEADER node.


As a general statement, they do, see below:

The LOOP_HEADER node is before the loop, IIUC, so
any code moved out of the loop would not affect the value of the use
operand for the LOOP_HEADER (by definition, because we're in SSA form
so DEFs inside the loop can't reach the LOOP_HEADER node).


However, if you want to hoist the definition of one of those uses used
by the LOOP_HEADER node even further, you have to either know you can
*always* hoist the LOOP_HEADER node (IE that it's uses mean nothing),
or make sure to choose a block to hoist to that also dominates the
LOOP_HEADER node, or you'll violate the SSA property.

This is rarely going to make any difference (i'm actually not sure
it's even possible for it to matter except for irreducible loops), but
it is still something to consider.

Also, if we are talking about a node that looks like

new_value = LOOP_HEADER_NODE 

and not

LOOP_HEADER_NODE 


then it can cause annoyance because you ahve to teach things like
value numbering that it all uses of old_value are the same as
new_value (IE it is a copy).



Gr.
Steven



Re: build failure, GMP not available

2006-10-30 Thread Daniel Berlin

4. Are you aware that the GMP home page says

[2006-05-04] GMP does not build on MacInteltosh machines. No fix
planned for GMP 4.x.

and indeed it does not appear to build correctly when configured on
my MacBook Pro?


Errr, well,
I have installed the version from macports on my macbook pro, and it works fine.
No new regressions in the gcc testsuite since the GMP patches.


Because of the severe nature of this problem (everything doesn't
build, multiple hosts affected), I'd like you to consider backing out
this patch until the problems are fixed.  I'll work on a patch which
just disables the check for Darwin.

Uh, what?

In this case, I have no idea what problems you are experiencing, as
the version of GMP/MPFR required works fine on macbook pro, AFAICT.

It's actually about time we got to using GMP in the middle end, and I
don't believe reverting a patch because some non-primary platforms
have a few pains here and there initially is the best course of
action.

Also, although I experience no regressions, i'll point out that there
is no automated tested for macintel darwin that posts to
gcc-testresults, which does not bode well for something you would like
to be a primary platform.

--Dan


Re: build failure, GMP not available

2006-10-30 Thread Daniel Berlin

On 10/30/06, Geoffrey Keating <[EMAIL PROTECTED]> wrote:


On 30/10/2006, at 10:34 AM, Daniel Berlin wrote:

>> 4. Are you aware that the GMP home page says
>>
>> [2006-05-04] GMP does not build on MacInteltosh machines. No fix
>> planned for GMP 4.x.
>>
>> and indeed it does not appear to build correctly when configured on
>> my MacBook Pro?
>>
> Errr, well,
> I have installed the version from macports on my macbook pro, and
> it works fine.
> No new regressions in the gcc testsuite since the GMP patches.

I don't know what "macports" is.

It (http://www.macports.org) is formerly known as darwinports, and is
a freebsd port style system for Darwin.

There is also "fink" (fink.sf.net), which i believe would provide a
new enough gmp, but i am not positive.

Honestly, I don't know any mac people who *don't* use either fink or
macports to install unix software when possible, because pretty much
everything has required some small patch or another.


When I download gmp-4.2.1 from
ftp.gnu.org, the official GNU site, and run 'configure && make &&
make check', I get:





Googling for 'macports gmp' leads me to <http://gmp.darwinports.com/>
which suggests to me that maybe I should say 'configure --host=none-
apple-darwin'.  I'll try that.

Yup, that should be what macports (the port system formerly known as
darwinports)




>> Because of the severe nature of this problem (everything doesn't
>> build, multiple hosts affected), I'd like you to consider backing out
>> this patch until the problems are fixed.  I'll work on a patch which
>> just disables the check for Darwin.
> Uh, what?
>
> In this case, I have no idea what problems you are experiencing, as
> the version of GMP/MPFR required works fine on macbook pro, AFAICT.

Do you see this error when you run 'make check'?


I just ran port install gmp using macports, which presumably just ran
make and make install. The resulting version has worked fine for me,
so i could not tell you.




> It's actually about time we got to using GMP in the middle end, and I
> don't believe reverting a patch because some non-primary platforms
> have a few pains here and there initially is the best course of
> action.

I agree that it's nice to use GMP in the middle-end, but it raises
significant portability issues.  It looks like there are workarounds
for the x86-darwin problem, but surely there will be other problems
on other platforms.  My greatest concern is bootstrapping.


While i'm sure you are correct that we will have gmp problems on other
platforms, I also believe this is true of any dependency we add.
There are always problems with some library on some platform.  At some
point we have to just say "look, we'll fix the bugs we find and submit
the patches upstream" or we'll never be able to add any
*dependencies*, just a bunch of "optional" stuff that increases the
size of the support matrix (by having "with" and "without" versions).

Thus, if we believe that the benefits provided by GMP are worth it, we
should do it anyway and just fix the bugs.  The consensus of people
was that GMP was worth it (and i agree, from my understanding of the
issues), so here we are, fixing the bugs!



> Also, although I experience no regressions, i'll point out that there
> is no automated tested for macintel darwin that posts to
> gcc-testresults, which does not bode well for something you would like
> to be a primary platform.

You are not seeing any posts because there has never been a
successful build in the tester's environment.  Guess what the current
problem is.


Uh, you proposed it as a primary platform weeks before the switch to
turn on GMP occurred, so while i can understand it being the current
problem, that doesn't explain why it wasn't happening before then.

Honestly, I believe it should be a requirement of any primary platform
that prior to becoming a primary platform, regular testresults are
posted to gcc-testresults (regardless of whether they are done by
machines or humans)


Re: build failure, GMP not available

2006-10-30 Thread Daniel Berlin

On 10/30/06, Marcin Dalecki <[EMAIL PROTECTED]> wrote:


On 2006-10-30, at 21:37, Daniel Berlin wrote:
> Honestly, I don't know any mac people who *don't* use either fink or
> macports to install unix software when possible, because pretty much
> everything has required some small patch or another.

I guess you are joking?


I guess you think the vast majority of mac users who install unix
software do it manually?

do you also think more than 1% of people still use lynx? (This is a
trick question, I know the actual stats for the top 10 websites :P)



Marcin Dalecki





Re: compiling very large functions.

2006-11-05 Thread Daniel Berlin

On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> AFAIK not one of the tree optimizers disables itself, but perhaps we
> should. The obvious candidates would be the ones that require
> recomputation of alias analysis, and the ones that don't update SSA
> info on the fly (i.e. require update_ssa, which is a horrible compile
> time hog).

Tree alias analysis can partially disable itself though:


No, it can't.  Tree alias representation can :)

it is also not really partially disabling. It's really fully disabling
in 99% of



  /* If the program has too many call-clobbered variables and/or function
 calls, create .GLOBAL_VAR and use it to model call-clobbering
 semantics at call sites.  This reduces the number of virtual operands
 considerably, improving compile times at the expense of lost
 aliasing precision.  */
  maybe_create_global_var (ai);

We have found this to be quite helpful on gigantic elaboration procedures
generated for Ada packages instantiating gazillions of generics.  We have
actually lowered the threshold locally.


As i alluded to above, this is disabling representation of accurate
call clobbering. It still performs the same analysis, it's just not
representing the results the same way.

(This is also another one of those things that makes other places
pretty hairy as a result)

This is in fact, a side-effect of the fact that we currently try to
represent aliasing information in terms of "variables things access"
instead of just saying "we know these things can touch the same part
of the heap".

IE we should be making memory equivalence classes and using those as
the symbols, instead of variables.  Otherwise, we end up saying "these
things can touch the same 30 variables" by listing all 30 variables in
vops, instead of just creating a single symbol that represents 30
variables and using this.

(Finding a good set of equivalence classes to use is, of course, a
Hard Problem(TM) , which is why we started by doing it this way).


Re: compiling very large functions.

2006-11-05 Thread Daniel Berlin

On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> > Tree alias analysis can partially disable itself though:
>
> No, it can't.  Tree alias representation can :)

I presume you're thinking of the pass that performs the analysis, while I was
more thinking of the global machinery; my understanding is that the machinery
will not be able to disambiguate memory accesses it would have been able to,
if the limit were not reached.


Depends on what you mean by "the machinery".  There is no standard API
to using the results of the analysis without using the representation
we provide, but some passes do it anyway through hackery, so yes and
no :)



> it is also not really partially disabling. It's really fully disabling
> in 99% of

Partially because it only affects call-clobbered variables IIUC.


It affects all variables that escape or are non-local, which is
roughly all variables that have interesting aliasing properties (IE
those that cannot be ascertained trivially).

Anyway, I would certainly not hold up what we do in alias
representation as a good example of proper throttling in the case of
large functions.


Re: Volatile operations and PRE

2006-11-06 Thread Daniel Berlin

On 11/6/06, Ricardo FERNANDEZ PASCUAL <[EMAIL PROTECTED]> wrote:

Hello,

I have discovered that volatile expresions can cause the tree-ssa
pre pass to loop forever in "compute_antic". The problem seems to be
that the expresion is assigned a different value number at each
iteration, hence the fixed point required to exit the loop is never reached.


This should not be possible, it would imply that you have SSA names
marked as volatile, or a statement whose operands are marked volatile
but ann->has_volatile_ops is false.



This can be fixed with the attached patch, which modifies
"can_value_number_operation" to return false for volatile expresions. I
think this makes sense, because you cannot value number volatile
expresions (in the same sense that you cannot value number non pure or
const function calls).


This is wrong.

The only place that can_value_number_operation is used is inside an if
block that says:

else if (TREE_CODE (stmt) == MODIFY_EXPR
 && !ann->has_volatile_ops
 && TREE_CODE (TREE_OPERAND (stmt, 0)) == SSA_NAME
 && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (TREE_OPERAND (stmt, 0)))
{
...
  if (can_value_number_operation (rhs))
  {
  }
}

Any statement which contains volatile operands should have
ann->has_volatile_ops set.
That is your real bug.







I cannot easily provide a testcase because this problem appears only
with a gcc frontend that I am writting. With this fix, volatile acesses
work correctly (without it they work correctly only if this pass is
disabled).

Do you think this patch is correct?


Nope, for the reasons above.





Re: compiling very large functions.

2006-11-06 Thread Daniel Berlin

The problem with trying to solve this problem on a per pass basis rather
than coming up with an integrate solution is that we are completely
leaving the user out of the thought process.

There are some uses who have big machines or a lot of time on their
hands and want the damn the torpedoes full speed ahead and there are
some uses that want reasonable decisions made even at high
optimization.  We need to give them any easy to turn knob.


This is why we have O2 vs O3 and why we have -fexpensive-optimizations.



I am not saying that my original proposal was the best of all possible
worlds, but solving hacking things on a pass by pass or pr by pr basis
is not really solving the problem.

Sure it is.
The problem with your approach is that most of the algorithms in GCC
that sometimes have very bad times with large functions do not have
such simple qualities as "they take a long time when n_basic_blocks is
large".

First,  almost none of the algorithms are super-linear in any
easy-to-calculate-on-a-global-basis-way.   The only easy ones are
regalloc and non-modulo scheduling.  Everything else is just not
stupid enough to be N^2 in the number of instructions or basic blocks.

Take points-to analysis, one of our N^3 worst case algorithms.
You can throttle PTA by losing precision for speed very easily.
However, the  time bound is very complex.
It takes N iterations where N is the length of the largest
uncollapseable cycle in the constraint graph.  Each iteration takes
V*N time where V is the number of non-unifiable variables, and N is
the size of the pointed-to set.

Uncollapseable cycles only occur when you have address taking of
pointed to fields, or pointer arithmetic on pointers to structures (IE
a = a->next).
Variables can't be unified or collapsed only in some odd cases.
In almost *all* cases, it acts linear.

I can tell you whether points-to is going to take an hour or two
*after* we've run constructed and simplified the constraint graph (IE
before we spend time solving it).

I certainly can't tell you based on the number of basic blocks or instructions.

Want another example?
Take GVN-PRE, which is potentially N^2.
The time bound is related to the largest number of distinct values
that occur in a function.  Even on very large functions, this may not
be a lot.

We have plenty of incredibly large functions that just don't take a
lot of time in PRE.

The only way to make reasonable decisions about time is on a per-pass
basis, because our passes just don't have bad cases that correspond to
global heuristics.


Re: strict aliasing question

2006-11-10 Thread Daniel Berlin

> It will load the value from memory, true, but who says that the store to
> memory will happen before that?  The compiler is allowed to reorder the
> statements since it "knows" that foo and *arg cannot alias.
>

If the compiler is smart enough to know how to reorder the statements,
then it should be smart enough to know that reordering will still leave
foo uninitialized, which is obviously an error.


It's also undefined, so we can *and will* reorder things involving
uninitialized variables.

 Any time an
optimization/reordering visibly changes the results, that reordering is
broken.

Not in this case.
also Note that gcc *guarantees* the union trick will work, even though
the standard does not.


And we already know that gcc is smart enough to recognize
attempts to use uninitialized variables, so there's no reason for it to
go there.

We already do, particularly when it comes to constant propagation

Relying on the idea that "oh, well, this is uninitialized, so the
compiler can't touch it" is going to get you hurt one of these days :)


Re: Threading the compiler

2006-11-10 Thread Daniel Berlin

On 11/10/06, Mike Stump <[EMAIL PROTECTED]> wrote:

On Nov 10, 2006, at 12:46 PM, H. J. Lu wrote:
> Will use C++ help or hurt compiler parallelism? Does it really matter?

I'm not an expert, but, in the simple world I want, I want it to not
matter in the least.  For the people writing most code in the
compiler, I want clear simple rules for them to follow.

For example, google uses mapreduce http://labs.google.com/papers/
mapreduce.html as a primitive, and there are a few experts that
manage that code, and everyone else just mindlessly uses it.  The
rules are explained to them, and they just follow the rules and it
just works.  No locking, no atomic, no volatile, no cleaver lock free
code, no algorithmic changes (other than decomposing into isolated
composable parts) .  I'd like something similar for us.


I think the part that makes me the giggle the most is that we assume
that the actual mapper code is not threadsafe by default, and won't
run multiple threads of the mapper.


Re: strict aliasing question

2006-11-10 Thread Daniel Berlin

Hm. If you're going to reorder these things, then I would expect either
an error or a warning at that point, because you really do know that a
reference to an uninitialized variable is happening.


We do warn when we see an uninitialized value if -Wuninitialized is on.

We don't warn at every point we make an optimization based on it, nor
do i think we should :)



> also Note that gcc *guarantees* the union trick will work, even though
> the standard does not.

That's good to know, thanks. But frankly that's braindead to require
someone to add all these new union declarations all over their code,
when a simple cast used to suffice, and ultimately the generated code is
the same. And since we have to write code for compilers other than just
gcc, we can't even really rely on the union trick. In this respect, the
standard is broken.

This example is worse, it gives no warning and gives the wrong result
with -O3 -Wstrict-aliasing :

#include 

main() {
int i = 0x123456;
int *p = &i;

*(short *)p = 2;

printf("%x\n", i);
}


In this case, it's not two different pointers pointing to the same
memory, it's the same pointer. The compiler doesn't even have to guess
whether two different pointers access the same memory - it knows it's
the same pointer,
understand strange results occurring when there's ambiguity, but there
is no ambiguity here.

You are right, there isn't.

We ask the TBAA analyzer "can a store to a short * touch i.
In this case, it says "no", because it's not legal.


Re: Threading the compiler

2006-11-11 Thread Daniel Berlin


> > whole-program optimisation and SMP machines have been around for a
> > fair while now, so I'm guessing not.
>
> I don't know of anything that is particularly hard about it, but, if
> you know of bits that are hard, or have pointer to such, I'd be
> interested in it.

You imply you're considering backporting this to 4.2. I'd be amazed if that
was worthwhile. I'd expect changes to be required in pretty much the whole
compiler.

Your strategy is built around the assumption that the majority of the work can
be split into multiple independent chunks of work. There are several fairly
obvious places where that is hard. eg. the frontend probably needs to process
the whole file in series because previous declarations effect later code. And
inter-procedural optimisations (eg. inlining) don't lend themselves to
splitting on function boundaries.


Actually, most IPA optimizations parallelize very well.
Pointer analysis, inlining, can all be partitioned in ways that work
can be split into threads.

Mike is actually not saying anything that most people around here
truly disagree with.  We all want to eventually parallelize and
distribute GCC optimizations.  I just don't think we are at the point
where it makes sense to start doing that yet.  Personally, I believe
the time to start thinking about parallelizing about this stuff is
when the problems that make LTO hard (getting rid of all the little
niggles like front ends generating RTL, and doing the hard stuff like
a middle end type system) are solved.Why?. Without solving the
problems that make LTO hard, you are going to hit them in trying to
make the IPA optimizations (or anything else) parallel, because they
are exactly the shared state between functions and global state
problems that GCC has.

In fact, various people  (including me) have been been discussing how
to parallelize and distribute our optimizations for a few months now.

So if you really want to help parallelizing along, the thing to do is
help LTO right now.

I'm happy to commit to parallelizing IPA pointer analysis (which is a
ridiculously parallel problem) once the hard LTO problems are solved.

Before then, I just think we are going to end up with a bunch of hacks
to try to work around our shared state.

--Dan


make clean no longer works properly?

2006-11-13 Thread Daniel Berlin

If i ctrl-c a gcc bootstrap in the middle of building a stage (IE when
it's compiling, not when it's configuring), make clean no longer works
properly.
It used to a few months ago

Now I get:

make[1]: *** No rule to make target `clean'.  Stop.
make: *** [clean-stage4-gcc] Error 2

(with the error target sometimes being clean-stage2, etc, depending on
how far it got)


Re: vectorizer data dependency graph

2006-11-14 Thread Daniel Berlin

On 11/14/06, Sashan Govender <[EMAIL PROTECTED]> wrote:

Hi

I was looking at the vectorizer
(http://gcc.gnu.org/projects/tree-ssa/vectorization.html) and noticed
that in section 6 it says that there is no data dependence graph
implemented. Also had a search throught the mailing list archives and
noticed that although ddg.c exists its not used much?
(http://gcc.gnu.org/ml/gcc/2005-09/msg00661.html). A grep for
creat_ddg shows it's used in modulo-sched.c. So is this fertile ground
for something to be implemented? Is it worth implementing a ddg graph
for the vectorizer?


What exactly do you hope to gain by building a ddg?
If you have some algorithm that can take advantage of a ddg, sure, build one.



Thanks



Re: Testsuite for GlobalGCC: QMTest or DejaGNU?

2006-11-16 Thread Daniel Berlin

On 11/16/06, Alvaro Vega Garcia <[EMAIL PROTECTED]> wrote:

Hi all,
I'm beginning to work on GGCC project(1) and I proposed to continue with
DejaGNU Testsuite for these project when I was asked about better
testing framework. Then I read about "QMTest and the G++ testsuite"
thread (2) of year 2002, but now I think that is more adequate to ask it
here.


Err, your best bet is to file assignment papers with the FSF, and do
your work on a branch in gcc's svn, and just use whatever gcc is using
for it's testsuite (IE dejagnu right now).


What does gcc developers think? Has QMTest improved from (2) state and
could be adopted in future by GCC?

Thanks,
Álvaro


(1) http://gcc.gnu.org/ml/gcc/2006-10/msg00676.html
(2) http://gcc.gnu.org/ml/gcc/2002-05/msg01978.html




Re: alias slowdown?

2006-11-17 Thread Daniel Berlin

On 11/17/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:

On Fri, 2006-11-17 at 12:22 -0500, Andrew MacLeod wrote:
> I just tried compiling cplusplus_grammer.ii with mainline, checking
> disabled, and had to stop it after 30 minutes (use to be <50 seconds on
> my x86-linux box).  A quick check with GDB seems to show that its
> spending in inordinate amount of time in may_alias:


We used to basically say all of these cases pointed to everything and
anything, and collapse them very quickly.
Now we compute  a correct set of aliases and escaped variables
It takes a while in huge cases like this.
I'm working on fixing it.


Re: GIMPLE issues and status of gimple-tuples

2006-11-18 Thread Daniel Berlin

a a. Conditional jumps in GIMPLE are not true three-address-code since they

specify two (2) branch targets (in their general form). E.g.:

if (cond) then
  goto target1;
else
  goto target2;

IMHO, this should be split (or at least made splittable) into:

if (cond) then
  goto target1;
if (!cond) then
  goto target2;

It seems redundant but it's really clean.


Branches are already special in a lot of ways, so whether they have
one argument or two doesn't seem to make any real difference.




b. Are LOOP_EXPRs decomposable to combinations of if-then-else and gotos? It
would help for VM (virtual machine) execution of GIMPLE IR. Ideally, a GIMPLE
representation of a program should be possible without use of LOOP_EXPRs.


Errr, LOOP_EXPR doesn't actually exist anymore :)

Loops are always lowered to if-then-else and gotos.



c. It would be interesting to support simple if-then statements in GIMPLE. I.e.
that do not contain blocks of statements with size larger than one.


Now *this* would be ugly, IMHO, because you'd have to special case
these everywhere.






Re: Why does flow_loops_find modify the CFG, again?

2006-11-18 Thread Daniel Berlin

On 11/18/06, Steven Bosscher <[EMAIL PROTECTED]> wrote:

Hi Zdenek, all,

Is this something that could be easily fixed?  E.g. can we make it
that flow_loops_find only performs transformations if asked to (by
adding a function argument for that)?



Why not have a flow_canonicalize_loops  that does that canonicalization.
:)


Gr.
Steven




Re: alias slowdown?

2006-11-19 Thread Daniel Berlin


In the meantime, is there a simple way to disable this "more correct"
mechanism so I can get my timings?


You'll get testsuite failures if you disable it because it fixes a
bunch of bugs.

You can always disable all of PTA, but i would not recommend it.

With the attached patch, it should take less than 60 seconds per PTA
run for all.i (i have no copy of cplusplus_grammar.i, and it's not
clear where to get it).


includes.diff
Description: Binary data


Re: why gengtype not a filter for GTY?

2006-11-28 Thread Daniel Berlin

On 11/28/06, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote:


Dear All (and especially those implied in the GCC internal garbage
collector).

I read (and contributed a bit to) http://gcc.gnu.org/wiki/Memory_management
and also read http://gcc.gnu.org/onlinedocs/gccint/Type-Information.html

However, there is still a question which puzzles me a lot? Why gengtype is
not a sort of filter or generator (like yacc is) taking a (list of) files on
input and producing a file on output?

More precisely, why gengtype does not take the name of files to be parsed
for GTY(()) thru its program argument (ie argc,argv of its main - they are
explicitly marked as unused in $GCCTOP/gcc/gengtype.c)

I would believe it should make things simpler (in a lot of places) but I
cannot understand why it is not so? Why are the set of files to be parsed
for GTY()) so hard-coded in gengtype?



Because nobody has had the urge to change it yet?


BTW, it is strange that gengtype* files do not mention anyauthor or
contributor (with an email to contact them)?



IF you wrote gengtype, the way it is now, would you want to be listed
as the author?
:)
I kid, I kid.


Re: strict aliasing benefit examples

2006-11-28 Thread Daniel Berlin

> I think there are 3 aliasing possibilities here:
> 1. known to alias
> 2. known to not alias
> 3. may alias

Actually there is only 2, it may alias or not.


Actually, he's right (and both you and Richard are wrong).

The standard taxonomy of classifications for two memory accesses is:

Must-alias
May-Alias
Don't alias (or some variant name).

This is what you will find in literature, not "conflicts" or anything else.

Proving two accesses must-alias can be as useful as proving they don't alias.

In languages like C, type based aliasing information is generally less
*useful* to an optimization than pointer analysis  based aliasing
information.

There have been studies on what type of analysis is helpful.

See, e.g. http://portal.acm.org/citation.cfm?id=378806&coll=portal&dl=ACM

(Search google for pdfs of this paper)

The most important disambiguation technique generally ends up being
things you can tell directly from looking at objects, and not from
anything else.

Everything else is just wildly variant depending on the test case
(though some are consistenly more effective than others).


Re: CEA (France) has signed assignment of copyright to FSF on GCC

2006-12-01 Thread Daniel Berlin

BTW, I am surprised that it is not easy to know which organizations exactly
has signed such legal papers. It could happen (in big organizations) that
such an assignment has been signed, and a putative minor contributor to GCC
does not know about it yet.


There is a copyright list on gnu.org machines that people with
accounts there have access to.

It lists every person and organization with a copyright assignment.

Personally, I think the list should be somewhere that *all* gcc
maintainers have access to (not all of us have gnu.org accounts).


Re: SPEC CFP2000 and polyhedron runtime scores dropped from 13. november onwards

2006-12-01 Thread Daniel Berlin

On 12/1/06, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 12/1/06, Uros Bizjak <[EMAIL PROTECTED]> wrote:
> Hello!
>
> At least on x86_64 and i686 SPEC score [1] and polyhedron [2] scores
> dropped noticeably. For SPEC benchmarks, mgrid, galgel, ammp and
> sixtrack tests are affected and for polygedron, ac (second regression
> in the peak) and protein (?) regressed in that time frame.
>
> [1] http://www.suse.de/~aj/SPEC/amd64/CFP/summary-britten/recent.html
> [2] 
http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html
>
> Does anybody have any idea what is going on there?

It correlates with the PPRE introduction (enabled at -O3 only) which might
increase register pressure, but also improves Polyhedron rnflow a lot.


Feel free to disable it and let me know if it helps.

If it's really affecting scores that badly, i'm happy to turn it off
until we can deal with the register pressure (though i thought we had
out-of-ssa changes to help with this now).


Re: mainline slowdown

2006-12-01 Thread Daniel Berlin

On 12/1/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:

My bootstrap/make check cycle took about 10 hours with yesterdays
checkout (way longer than expected).  A quick investigation shows C++
compilation timed are through the roof.


10 hours?



Using quick (in theory) and trusty cpgram.ii, I get:

tree PTA  :1135.48 (88%) usr   5.47 (55%) sys1168.23 (85%) wall
4045 kB ( 1%) ggc
TOTAL :1283.62 9.97  1381.98 
451745 kB


This is uh, like 20 minutes wall time.
So where is 10 hours coming from?




Is this new code, or is this the old issue we had a few weeks ago?  I
lost track.


Same issue.

Still working on it.
The patch i posted to you ended up having a bunch of underlying issues
that needed solving, so it's taking longer than expected.

I'm down to 1 testsuite failure
I'm happy to commit what i have now and fix that one testsuite failure
in a followup if that is what we want.


Re: mainline slowdown

2006-12-01 Thread Daniel Berlin

On 12/1/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:

On Fri, 2006-12-01 at 13:49 -0500, Daniel Berlin wrote:
> On 12/1/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> > My bootstrap/make check cycle took about 10 hours with yesterdays
> > checkout (way longer than expected).  A quick investigation shows C++
> > compilation timed are through the roof.
>
> 10 hours?

read carefully. "bootstrap/make check"


Yes, so, i've never seen a bootstrap make check take 10 hours.
:)


Re: mainline slowdown

2006-12-01 Thread Daniel Berlin

On 12/1/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:

On Fri, 2006-12-01 at 13:49 -0500, Daniel Berlin wrote:
> On 12/1/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> > My bootstrap/make check cycle took about 10 hours with yesterdays
> > checkout (way longer than expected).  A quick investigation shows C++
> > compilation timed are through the roof.
>
> 10 hours?

read carefully. "bootstrap/make check"



>
> >
> > Using quick (in theory) and trusty cpgram.ii, I get:
> >
> > tree PTA  :1135.48 (88%) usr   5.47 (55%) sys1168.23 (85%) wall 
   4045 kB ( 1%) ggc
> > TOTAL :1283.62 9.97  1381.98
 451745 kB
>
> This is uh, like 20 minutes wall time.
> So where is 10 hours coming from?

this says cpgram.ii, not bootstrap/make check cycle. Big difference.


BTW, what do you think these have to do with each other?

One is a pathological testcase with about 1-5  initializers,
the other is a whole bunch of relatively normal code.
So why would you attempt to draw conclusions about bootstrap/regtest
from cpgram.ii?

*particularly* when the other issue you keep harping on has in fact,
been shown *not* to increase GCC compile time by the regression
testers.


Re: [RFC] timers, pointers to functions and type safety

2006-12-01 Thread Daniel Berlin

On 12/1/06, Al Viro <[EMAIL PROTECTED]> wrote:

There's a bunch of related issues, some kernel, some gcc,
thus the Cc from hell on that one.

First of all, in theory the timers in kernel are done that way:
* they have callback of type void (*)(unsigned long)
* they have data to be passed to it - of type unsigned long
* callback is called by the code that even in theory has no
chance whatsoever of inlining the call.
* one of the constraints on the targets we can port the kernel
on is that unsigned long must be uintptr_t.

The last one means that we can pass any pointers to these suckers; just
cast to unsigned long and cast back in the callback.

While that is safe (modulo the portability constraint that affects much
more code than just timers), it ends up very inconvenient and leads to
lousy type safety.


Understandable.  I assume you are trying to get more  type safety more
for error checking than optimization, being that the kernel still
defaults to -fno-strict-aliasing.


The thing is, absolute majority of callbacks really want a pointer to
some object.  There is a handful of cases where we really want a genuine
number - not a pointer cast to unsigned long, not an index in array, etc.
They certainly can be dealt with.  Nearly a thousand of other instances
definitely want pointers.

Another observation is that quite a few places are playing fast and
loose with function pointers.  Some are just too lazy and cast
void (*)(void) to void (*)(unsigned long).



These, IMO, should stop
wanking and grow an unused argument.  Not worth the ugliness...
However, there are other cases, including very interesting
timer->function = (void (*)(unsigned long))func;
timer->data = (unsigned long)p;
with func actually being void (void *) and p being void *.

Now, _that_ is definitely not a valid C.  Worse, it doesn't take much
to come up with realistic architecture that would have different calling
conventions for those.  Just assume that
* there are two groups of registers (A and D)
* base address for memory access must be in some A register
* both A and D registers can be used for arithmetics
* ABI is such that functions with few arguments have them passed
via A and D registers, with pointers going via A and numbers via D.
Realistic enough?  I wouldn't be surprised if such beasts actually existed -
embedded processors influenced by m68k are not particulary rare and picking
such ABI would make sense for them.

Note that this kind of casts is not just in some obscure code; e.g.
rpc_init_task() does just that.


And that's where it gets interesting.  It would be very nice to get to
the following situation:
* callbacks are void (*)(void *)
* data is void *
* instances can take void * or pointer to object type
* a macro SETUP_TIMER(timer, func, data) sets callback and data
and checks if func(data) would be valid.

It would be remove a lot of cruft and definitely improve the type safety
of the entire thing.  It's not hard to do; all it takes [warning: non
portable C ahead] is
typeof(*data) *p = data;
timer->function = (void (*)(void *))func;
timer->data = (void *)p;
(void)(0 && (func(p),0));

Again, that's not a portable C, even leaving aside the use of typeof.
Casts between the incompatible function types are undefined behaviour;
rationale is that we might have different calling conventions for those.
However, here we are at much safer ground; calling conventions are not
likely to change if you replace a pointer to object with void *.


Is this true of the ports you guys support  even if the object is a
function pointer or a function?
(Though the first case may be insane.  I can't think of a *good*
reason you'd pass a pointer to a function pointer to a timer
callback,).


 It's
still possible in theory, but let's face it, we would have far worse
problems if it ever came to porting to such targets.

Note that here we have absolutely no possibility of eventual call
ever being inlined, no matter what kind of analysis compiler might
be doing.


Ah, well, here is where you are kinda wrong, but not for the reason
you are probably thinking of.

Call happens when kernel/timer.c gets to structure while
trawling the lists and it simply has no way to tell which callback
might be there (as the matter of fact, callback can and often does
come from a module).


Right, it doesn't know what it will *always* be, but it may add if's
and inline *possible* target sites based on profile results.

Particularly since the profile will tell us which are *actually* called.
This shouldn't matter however, we still shouldn't ICE if we inline it :)


IOW, "gcc would ICE if it ever inlined it" kind of arguments (i.e. what
had triggered gcc refusing to do direct call via such cast) doesn't apply
here.  Question to gcc folks: can we expect no problems with the approach
above, provided that calling conventions

Re: Bootstrap broken on i686-darwin

2006-12-05 Thread Daniel Berlin

Cancel that, it's a local change of mine causing the breakage :)

On 12/5/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:

Aldy, your tuples change broke teh build on i686-darwin.

I've attached a file that fails, it should fail with a cross compiler.





Re: destruction of GTY() data

2006-12-05 Thread Daniel Berlin

On 12/5/06, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote:


Hello

I am not sure to understand what if_marked or deletable means in GTY context
http://gcc.gnu.org/onlinedocs/gccint/GTY-Options.html
http://gcc.gnu.org/wiki/Memory_management

I want to have a GTY() garbage collected structure such that, when it
is destoyed, some specific routine is called (this should indeed be
possible, since GGC is a mark& sweep garbage collector, which delet
individually each dead data).

More precisely, I want to ha a GTY data which contains either an MPFR
http://www.mpfr.org/ data, or a PPL (Parma Polyhedra
Libraryhttp://www.cs.unipr.it/ppl/ ) data. Hence the underlying MPFR
(or PPL) destruction routine (like mpfr_clear or
ppl_delete_Coefficient) should be called.


We don't have support for user-specifiable destruction routines, one
reason being that some day, in a galaxy far far away, we will have
something better of a garbage collector, or not use garbage collection
:)


Re: destruction of GTY() data

2006-12-08 Thread Daniel Berlin

I'm not sure to understand what Daniel suggests. If he dreams of a
better memory handling than the current GGC, I certainly agree; I
actually dream of a GCC future compiler where every data is garbage
collected in a copying generational scheme (see my Qish
experiment). This would require some preprocessor or even perhaps some
language support. So I realize that it is currently inpractical. I
won't discuss details now, but suggest diving into Jones & Lins book
on garbage collection),


I've read the book before.

Also, it would require neither a prepocessor or more language support.
It has, in fact, been done twice before, but neither was ever more
than a few percent faster.

This is without generational support, since generational support
required barriers that nobody wanted to implement for a prototype :)



but I still call such futuristic memory
handling garbage collection. If Daniel means that the very idea of
garbage collection in a compiler is bad, and that every object should
be manually allocated & explicitly freed (à la malloc & free or like
C++ new/delete, I respectfully disagree with him. (BTW I must admit
here that I have some Ocaml experience).


Uh, well, you see, there are points in between these two extremes.
Most commercial compilers are not garbage collected, they rely on
allocation pools (ie multiple heap) to get  sane performance and
lifetime management.

You see, we currently waste a lot of memory to avoid the fact that our
GC is very slow.
We still take it on the chin when it comes to locality.  Previous
things such as moving basic blocks from alloc_pools (which are
contiguous) to gc'd space cost us 2-3% compilation time alone, because
of how bad our GC places objects.




Zack Weinberg wrote in http://gcc.gnu.org/ml/gcc/2006-12/msg00159.html

> We definitely don't have the feature you want now, and I would be
> very hesitant to try to add it - the existing sweep phase is quite
> lazy, and I'd really prefer not to do anything that made it harder
> to switch to a more efficient collector algorithm.

> On the other hand, I sympathize with your goal; I've been idly
> thinking a little myself about the sensibility of using MPFR
> throughout, instead of our bespoke REAL_VALUE_TYPE thing.  [I don't
> know if this is actually a good idea yet.]

I presume that Zack refers to some comment in gcc/fold-const.c (rev
119546 of trunk) where I read

/*@@ This file should be rewritten to use an arbitrary precision @@
representation for "struct tree_int_cst" and "struct tree_real_cst".

My understanding is that constant folding is currently done in ad-hoc
(two-words) arithmetic, and that the trend is to go to arbitrary
precision arithmetic using MPFR & GMP (which seems to be needed not
only for Fortran). Since the constants are inside Gimple-like trees
(even if you represent them by tuples), I am expecting that they are
garbage collected, so need to be freed.

> So my question to you is, what do those destruction routines do, and
> is are they actually a necessary thing if the memory has been
> allocated by GGC rather than library-internal calls to malloc()?

If the libraries we are using (today MPFR & GMP, and tomorrow, on my
side, probably PPL -using only its C API interface- -- I am interested
in time-consuming static analysis) do not offer internal memory hooks
but offer only allocate & delete (or clear) routines, then I still
believe that many of us will take advantage of GTY-structure which
have a destructor.


This just isn't that big a problem.  If you want to associate these
things with trees, put them in annotations, mark them GTY ((skip)) and
explicitly manage their lifetime.  It is highly unlikely that they are
going to live for some indeterminate amount of time.

This is what is going to happen on the graphite branch, which will use
PPL to do polyhedral data dependence testing and high level loop
optimizations.

Maybe if you described what use pattern you think will cause you to
need to put these in GC'd objects, but not know when they are all
going to be destructed?


Re: destruction of GTY() data

2006-12-09 Thread Daniel Berlin

On 12/9/06, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote:

Le Fri, Dec 08, 2006 at 07:09:23PM -0500, Daniel Berlin écrivait/wrote:

> You see, we currently waste a lot of memory to avoid the fact that our
> GC is very slow.
> We still take it on the chin when it comes to locality.  Previous
> things such as moving basic blocks from alloc_pools (which are
> contiguous) to gc'd space cost us 2-3% compilation time alone, because
> of how bad our GC places objects.

Even 25% of current GCC compilation time is a noise level to me. If I
achieve 1000% of current GCC -O3 compilation time, I will be very
proud of me. So I really do not care about 3%, and I thought that my
proposal won't cost a lot if it is not used (because if they are no
finalized object, GCC won't run much slower...).

>

That's great. If you want to make a compiler useless to almost
everyone, go for it.  Do it on a branch, go wild.  I'm sure there are
6 or 7 people in the world who will use it because it matters to them
that much.

However, you seem to be trying to propose a mechanism for the *mainline of gcc*.
If you want to get something into the *mainline of gcc*, you need to
be in touch with the concerns that people have about slowing down the
compiler 3%, because that is what our *mainline gcc* customers care
about.


> This just isn't that big a problem.  If you want to associate these
> things with trees, put them in annotations, mark them GTY ((skip)) and
> explicitly manage their lifetime.  It is highly unlikely that they are
> going to live for some indeterminate amount of time.

So basically you are suggesting me to add some kind of specific
garbage collection machinery within my pass. Could be ok, but painful.


This is what the entire rest of the compiler does. Seriously.

That's the whole point: *We don't keep things in GC if they have
determinate lifetimes, because our GC is too slow*.

If you want to implement finalizers on your branch, go for it.  You
should just be aware you are going to run into a lot of resistance if
you ever try to submit these patches for mainline, because of speed
issues.

This may or may not matter for your project.  From my perspective, and
probably the perspective of most people around here, if your code
isn't going to *eventually*  (even years down the road) end up in
mainline, it's generally a waste of time and it won't garner community
support (because nobody will use it in production).  Research for the
sake of research is great, don't get me wrong, but  given the limited
amount of time most GCC developers have to spend, it means we each
pick and choose the projects we work on and try to help contribute to,
and most people contribute to projects that they see being
productionized in some short number of years.  That said, it's your
time and money, you are free to do as you wish with it.


Re: Version of gcc , for which patch submitted?

2006-12-09 Thread Daniel Berlin

On 12/9/06, Andrew Haley <[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] writes:
 > Hi,
 > I want to know that the patch at
 > "http://gcc.gnu.org/ml/gcc-patches/2004-01/msg00211.html"; submitted for
 > which version of gcc?
 > How can we know that any of patch submitted , that for which version?
 > Kindly help me to figure it out soon.

I'm guessing tree-ssa branch.



Yes, it was submitted for tree-ssa branch, but never committed for
various reasons (IE it is not in mainline of gcc).


Re: "Fix alias slowdown" patch miscompiles 464.h264ref in SPEC CPU

2006-12-10 Thread Daniel Berlin

On 12/10/06, H. J. Lu <[EMAIL PROTECTED]> wrote:

2006
Reply-To:

Hi Daniel,

Do you have access to SPEC CPU 2006?


No, i don't, only SPEC CPU 2000.


Your patch

http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00225.html

causes gcc 4.3 to miscompile 464.h264ref in SPEC CPU 2006 with
-O2 -ffast-math on Linux/x86-64. 464.h264ref compiled with gcc 4.3
generates incorrect output. We are working on a small testcase.


There was a typo fix to this patch in a later revision that was
necessary for correctness. I have no idea what rev "gcc 4.3" refers
to, so i can't tell you if what you are testing is new enough or not
:)

If you can get me a testcase, i can fix it.


Re: "Fix alias slowdown" patch miscompiles 464.h264ref in SPEC CPU

2006-12-10 Thread Daniel Berlin

Hey, by chance does the attached fix it?


On 12/10/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 12/10/06, H. J. Lu <[EMAIL PROTECTED]> wrote:
> 2006
> Reply-To:
>
> Hi Daniel,
>
> Do you have access to SPEC CPU 2006?

No, i don't, only SPEC CPU 2000.

> Your patch
>
> http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00225.html
>
> causes gcc 4.3 to miscompile 464.h264ref in SPEC CPU 2006 with
> -O2 -ffast-math on Linux/x86-64. 464.h264ref compiled with gcc 4.3
> generates incorrect output. We are working on a small testcase.

There was a typo fix to this patch in a later revision that was
necessary for correctness. I have no idea what rev "gcc 4.3" refers
to, so i can't tell you if what you are testing is new enough or not
:)

If you can get me a testcase, i can fix it.

--- gcc/tree-ssa-structalias.c  (/mirror/gcc-trunk) (revision 639)
+++ gcc/tree-ssa-structalias.c  (/local/gcc-clean)  (revision 639)
@@ -2885,6 +2885,8 @@ handle_ptr_arith (VEC (ce_s, heap) *lhsc
 {
   rhsoffset = TREE_INT_CST_LOW (op1) * BITS_PER_UNIT;
 }
+  else
+return false;
 
 
   for (i = 0; VEC_iterate (ce_s, lhsc, i, c); i++)

Property changes on: 
___
Name: svk:merge
 +138bc75d-0d04-0410-961f-82ee72b054a4:/trunk:119713
 +23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-pta-dev:259
 +23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-trunk:531
 +7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
 +f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130



Re: "Fix alias slowdown" patch miscompiles 464.h264ref in SPEC CPU

2006-12-11 Thread Daniel Berlin

On 12/11/06, H. J. Lu <[EMAIL PROTECTED]> wrote:

On Sun, Dec 10, 2006 at 09:42:35PM -0800, H. J. Lu wrote:
> On Mon, Dec 11, 2006 at 12:27:07AM -0500, Daniel Berlin wrote:
> > Hey, by chance does the attached fix it?
> >
>
> Yes, it fixes 464.h264ref with the test input. I am running the real
> input now.
>

Do you need a testcase for your fix? We can try to extract one from
464.h264ref.



No, i've got a short but valid testcase that will fail without this change.

I'll finish the bootstrap and regtested and commit this.
Thanks


Re: SSA_NAMES: should there be an unused, un-free limbo?

2006-12-21 Thread Daniel Berlin

On 12/21/06, Diego Novillo <[EMAIL PROTECTED]> wrote:

Robert Kennedy wrote on 12/21/06 11:37:

> The situation is that some SSA_NAMEs are disused (removed from the
> code) without being released onto the free list by
> release_ssa_name().
>
Yes, it happens if a name is put into the set of names to be updated by
update_ssa.

After update_ssa, it should be true that every SSA name with no
SSA_NAME_DEF_STMT is in the free list.


In this case this isn't true, because we have code that orphans ssa
names without releasing them.
I'm sure Robert will explain further details in a few moments :)


However, if we have SSA names with no defining statement that are still
in considered active, I would hardly consider it a serious bug.  It's a
waste of memory, which you are more than welcome to fix, but it should
not cause correctness issues.


It will cause not code correctness issues, but make life very very
annoying.  If you walk the ssa names one by one, and you have some
pointing to statements that don't exist anymore, because the names
were not released and defining statements nulled out by
release_ssa_name,  you are going to run into segfaults trying to walk
them.

This is exactly what happens in the case we have.

BTW, the reason we walk the list of ssa names is to DFS them.

IE the code is someting like:

for (i = 0; i < num_ssa_names; i++)
{
 tree name = ssa_name (i);
 if (name && !SSA_NAME_IN_FREELIST (name)
  DFS (name)
}

IIRC, DFS will crash trying to touch the def statement because it's
been GC'd if the ssa name is orphaned.

Anyway, it seems you still agree this is a bug in any case, even if
you don't think it's a serious bug.


Re: SSA_NAMES: should there be an unused, un-free limbo?

2006-12-21 Thread Daniel Berlin


I may be missing something, but I don't think that is the interesting
issue here.


I agree.



I think the issue is whether we want to have a way to see all
currently valid SSA_NAMEs.  Right now we can have SSA_NAMEs in the
list which are no longer used, and we have no way to tell whether they
are used or not.  Thus the only way to see all valid SSA_NAMEs is to
walk the code.

If that is acceptable, then there is no issue here.  If that is not
acceptable, then we need to fix the code to correctly mark SSA_NAMEs
which are no longer used.  Whether we recycle the memory in the unused
SSA_NAMEs is a separate (and less interesting) discussion.


IMHO, it's not acceptable, because it leaves no way to do DFS walks
that don't involve kludges and hacks.

Ian



Re: SSA_NAMES: should there be an unused, un-free limbo?

2006-12-21 Thread Daniel Berlin

On 12/21/06, Diego Novillo <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote on 12/21/06 12:21:

> for (i = 0; i < num_ssa_names; i++)
> {
>   tree name = ssa_name (i);
>   if (name && !SSA_NAME_IN_FREELIST (name)
>DFS (name)
 >
I see that you are not checking for IS_EMPTY_STMT.  Does DFS need to
access things like bb_for_stmt?


I avoided including that part, but yes, we check for it.



In any case, that is not important.  I agree that every SSA name in the
SSA table needs to have a DEF_STMT that is either (a) an empty
statement, or, (b) a valid statement still present in the IL.


Then we've got a bug here :)



Note that this is orthogonal to the problem of whether we free up unused
names from this list.  Every time a statement S disappears, we should
make sure that the names defined by S get their SSA_NAME_DEF_STMT set to
  NOP.

Frankly, I'm a bit surprised that we are running into this. I'd like to
see a test case, if you have one.

Robert, can you attach the testcase you've been working with?
I'm not surprised, but only because I hit it before.   It's pretty rare.

IIRC, what happens it this:

1. We replace all uses of a phi node with something else
2. We then call remove_phi_node with false as the last parameter (only
3 places in the compiler), which ends up destroying the phi node but
not releasing the LHS name (since this is what the last parameter says
whether to do).


void
remove_phi_node (tree phi, tree prev, bool release_lhs_p)
{
...
 release_phi_node (phi);
 if (release_lhs_p)
   release_ssa_name (PHI_RESULT (phi));
}

3. PHI_RESULT (phi) is now in the ssa name list, but SSA_NAME_DEF_STMT
points to a released phi node.

4. We try to walk this at some point later, and crash.

You can see this happens in tree_merge_blocks:

 /* Remove all single-valued PHI nodes from block B of the form
V_i = PHI  by propagating V_j to all the uses of V_i.  */
 bsi = bsi_last (a);
 for (phi = phi_nodes (b); phi; phi = phi_nodes (b))
   {
 tree def = PHI_RESULT (phi), use = PHI_ARG_DEF (phi, 0);
...
else
 replace_uses_by (def, use);

 remove_phi_node (phi, NULL, false);
   }

Whenever we hit the else block we end up with a phi node result that
points to a released phi node.  It won't appear in the IR (sine the
phi node has been removed and all the result uses replaced), but will
appear in the ssa_names list.

There are only two other places that call remove_phi_node with false
as the last parameter.
One is moving a phi node, the other appears to be a bug just like the above.


Re: SSA_NAMES: should there be an unused, un-free limbo?

2006-12-21 Thread Daniel Berlin

On 12/21/06, Robert Kennedy <[EMAIL PROTECTED]> wrote:

> Robert, can you attach the testcase you've been working with?

One testcase is libstdc++-v3/libsupc++/vec.cc from mainline.

But it compiles without trouble unless you add verification or a walk
over the SSA_NAMEs at the right time.

> 1. We replace all uses of a phi node with something else
> 2. We then call remove_phi_node with false as the last parameter (only
> 3 places in the compiler), which ends up destroying the phi node but
> not releasing the LHS name (since this is what the last parameter says
> whether to do).

That's right. Zdenek recommended that I change one of those places
(the one in tree_merge_blocks that you quoted) to "true", but doing
that causes some other code of his to break. I haven't analyzed why he
needs those things to stay around, but I suspect it's because -- for a
while, anyway -- he needs the DEF_STMT for something that's been
removed.


You can't change the last parameter to true in the if's true branch,
but you can in the else branch.
He's reusing the phi result's ssa_name on the true branch for a
statement copy, which will set the def_stmt to something valid.

I.E. this should work:

 if (!may_replace_uses)
{
  gcc_assert (is_gimple_reg (def));

  /* Note that just emitting the copies is fine -- there is no problem
 with ordering of phi nodes.  This is because A is the single
 predecessor of B, therefore results of the phi nodes cannot
 appear as arguments of the phi nodes.  */
  copy = build2_gimple (GIMPLE_MODIFY_STMT, def, use);
  bsi_insert_after (&bsi, copy, BSI_NEW_STMT);
  SSA_NAME_DEF_STMT (def) = copy;
   remove_phi_node (phi, NULL, false);
 }
 else
 {
replace_uses_by (def, use);
   remove_phi_node (phi, NULL, true);
 }



-- Robert



Re: Compiler loop optimizations

2006-12-28 Thread Daniel Berlin

On 12/28/06, Christian Sturz <[EMAIL PROTECTED]> wrote:

Hi,

I was curious if there are any gcc compiler optimizations that can
improve this code:

void foo10( )
{
  for ( int i = 0; i < 10; ++i )
  {
[...]
if( i == 15 ) { [BLOCK1] }
  }
}

void foo100( )
{
  for ( int i = 0; i < 100; ++i )
  {
[...]
if( i == 15 ) { [BLOCK2] }
  }
}

int main( void )
{
  foo10(  );
  foo100( );
  return 0;
}

1) For the function foo10:
The if-block following "if( i == 15 )" will be never executed since
'i' will never become 15 here. So, this entire block could be removed
without changing the semantics. This would improve the program execution
since the if-condition does not need to be evaluated in each loop iteration. 
Can this code
transformation be automatically performed by a compiler?

Yes

If so, which techniques/analyses and optimizations must be applied?

There are a number of ways to do it, but the easiest is probably value
range propagation.


Would gcc simplify this loop?


yes

2) For the function foo100:
This code is not optimal as well. The if-condition is just once met but
has to be evaluated for all of the 100 loop iterations. An idea
I had in mind was to split the loop in two parts:
for ( int i = 0; i <= 15; ++i ) {...}
BLOCK2
for ( int 16 = 0; i < 100; ++i ) {...}
So, here the evaluation of "i==15" is not required any more and we save
100 comparisions. Is this a good idea and always applicable? Or are there
better compiler optimzations?

This is a variant of loop distribution, and it is sometimes a good
idea, and sometimes not ;)



Thank you very much for your help.

Regards,
Chris
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer



Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 29 Dec 2006 07:55:59 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Paul Eggert <[EMAIL PROTECTED]> writes:

>   * NEWS: AC_PROG_CC, AC_PROG_CXX, and AC_PROG_OBJC now take an
>   optional second argument specifying the default optimization
>   options for GCC.  These optimizations now default to "-O2 -fwrapv"
>   instead of to "-O2".  This partly attacks the problem reported by
>   Ralf Wildenhues in
>   
>   and in .

I fully appreciate that there is a real problem here which needs to be
addressed, but this does not seem like the best solution to me.  A
great number of C programs are built using autoconf.  If we make this
change, then they will all be built with -fwrapv.  That will disable
useful loop optimizations, optimizations which are enabled by default
by gcc's competitors.  The result will be to make gcc look worse than
it is.

You will recall that the problem with the original code was not in the
loop optimizers; it was in VRP.  I think we would be better served by
changing VRP to not rely on undefined signed overflow.  Or, at least,
to not rely on it without some additional option.



Actually, I seriously disagree with both patches.

Nobody has yet showed that any significant number of programs actually
rely on this undefined behavior.  All they have shown is that we have
one program that does, and that some people can come up with loops
that break if you make signed overflow undefined..

OTOH, people who rely on signed overflow being wraparound generally
*know* they are relying on it.
Given this seems to be some  small number of people and some small
amount of code (since nobody has produced any examples showing this
problem is rampant, in which case i'm happy to be proven wrong), why
don't they just compile *their* code with -fwrapv?

I posted numbers the last time this discussion came up, from both GCC
and XLC, that showed that making signed overflow wraparound can cause
up to a 50% performance regression in *real world* mathematical
fortran and C codes  due to not being able to perform loop
optimizations.
Note that these were not just *my* numbers, this is what the XLC guys
found as well.

In fact, what they told me was that since they made their change  in
1991, they have had *1* person who  reported a program that didn't
work.
This is just the way the world goes.  It completely ruins dependence
analysis, interchange, fusion, distribution, and just about everything
else.  Hell, you can't even do a good job of unrolling because you
can't estimate loop bounds anymore.

I'll also point out that *none* of these codes that rely on signed
overflow wrapping will work on any *other* compiler as well, as they
all optimize it.

Most even optimize *unsigned* overflow to be undefined in loops at
high opt levels (XLC does it at -O3+), and warn about it being done,
because this gives them an additional 20-30% performance benefit (in
particular on 32 bit fortran codes that are now run on 64 bit
computers, as the  induction variables are usually still 32 bit, but
they have to cast to 64 bit to index into arrays.  Without them
assuming unsigned integer overflow is undefined for ivs, they can't do
*any* iv related optimization here because the wraparound point would
change).  Since XLC made this change in 1993, they have had 2 bug
reports out of hundreds of thousands that were attributable to doing
this.

I believe what we have here is a very vocal minority.  I will continue
to believe so until someone provides real world counter evidence that
people do, and *need to*, rely on signed overflow being wraparound to
a degree that we should disable the optimization.

--Dan


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 29 Dec 2006 19:33:29 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Daniel Berlin" <[EMAIL PROTECTED]> writes:

[...]

| In fact, what they told me was that since they made their change  in
| 1991, they have had *1* person who  reported a program that didn't
| work.

And GCC made the change recently and got yy reports.  That might say
something about both compilers user base.  Or not.


Right, because the way we should figure out what the majority our
users want is to listen to 3 people on a developer list instead of
looking through the means we give users to give feedback, which is
through bug reports.
We've gotten a total of about 10 reports at last count, in the many
years we've been optimizing this.


Please, feel free to ignore those that don't find the transformations
appropriate, they are just free software written by vocal minority.


Wow Gaby, this sure is useful evidence, thanks for providing it.

I'm sure no matter what argument i come up with, you'll just explain it away.
The reality is the majority of our users seem to care more about
whether they have to write "typename" in front of certain declarations
than they do about signed integer overflow.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 29 Dec 2006 20:15:01 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Daniel Berlin" <[EMAIL PROTECTED]> writes:

| On 29 Dec 2006 19:33:29 +0100, Gabriel Dos Reis
| <[EMAIL PROTECTED]> wrote:
| > "Daniel Berlin" <[EMAIL PROTECTED]> writes:
| >
| > [...]
| >
| > | In fact, what they told me was that since they made their change  in
| > | 1991, they have had *1* person who  reported a program that didn't
| > | work.
| >
| > And GCC made the change recently and got yy reports.  That might say
| > something about both compilers user base.  Or not.
| >
| Right, because the way we should figure out what the majority our
| users want is to listen to 3 people on a developer list instead of
| looking through the means we give users to give feedback, which is
| through bug reports.

And surely, this specific issue did not come from users through a bug
report.

| We've gotten a total of about 10 reports at last count, in the many
| years we've been optimizing this.
|
| > Please, feel free to ignore those that don't find the transformations
| > appropriate, they are just free software written by vocal minority.
|
| Wow Gaby, this sure is useful evidence, thanks for providing it.
|
| I'm sure no matter what argument i come up with, you'll just explain it away.

Not really.  I've come to *agree with you* that we should just ignore
those that don't find the transformation useful for real code: they
are vocal minority.


You can have all the sarcasm you want, but maybe instead of sarcasm,
you should produce real data to contradict our bug reports, and the
experiences of other compilers in the field (note that Seongbae Park
tells me sun had the same level of complaint, i.e., one report, about
their compiler doing this as well).

Basically, your argument boils down to "all supporting data is wrong,
the three people on the mailing list are right, and there are millions
more behind them that just couldn't make it and have never complained
in the past"

Don't buy it.
Put up or shut up for once.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 29 Dec 2006 21:04:08 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:

"Daniel Berlin" <[EMAIL PROTECTED]> writes:

[...]

| Basically, your argument boils down to "all supporting data is wrong,

Really?

Or were you just

 # You can have all the sarcasm you want, but maybe instead of sarcasm,


Otherwise, you have a serious problem hearing anything contrary to
your firm belief.


This is so funny coming from you it's ridiculous.
Anyway, i'm out of this thread until you decide to put up.
I am confident everyone else will  just ignore you too.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 12/29/06, Richard Kenner <[EMAIL PROTECTED]> wrote:

> I'm not sure what data you're asking for.

Here's the data *I'd* like to see:

(1) What is the maximum performance loss that can be shown using a real
program (e.g,. one in SPEC) and some compiler (not necessarily GCC) when
one assumes wrapping semantics?


The XLC numbers i was given about a year ago (i assume it was version 8)

SpecINT with undefined signed overflow at -O5 on a P5 2100mhz running
linux: 1634
SpecFP with undefined signed overflow at -O5 on a P5 2100mhz running linux: 3010

SpecINT with wrapping signed overflow at -O5 on a P5 2100mhz running
linux: 1319
SpecFP with wrapping signed overflow at -O5 on a P5 2100mhz running linux: 1624



(2) In the current SPEC, how many programs benefit from undefined overflow
semantics and how much does each benefit?


All of the fortran programs (IE SpecFP) benefit from undefined
*unsigned* overflow semantics due to 32 bit iv vs 64 bit array index
issues.
The same is true of the SpecFP C programs.

All of the fortran and C programs benefit from undefined *signed*
overflow semantics because it makes dependency and loop counting
analysis possible.

Nobody has analyzed it further than that, afaik, mainly because they
don't have discussions about whether it makes sense to lose 50% of
their FP performance to do something none of *their* users ask them
for (note our users may, of course, be different).  So they generally
won't waste their cycles trying to figure out why something they
aren't going to do would hurt them.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

On 12/29/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 12/29/06, Richard Kenner <[EMAIL PROTECTED]> wrote:
> > I'm not sure what data you're asking for.
>
> Here's the data *I'd* like to see:
>
> (1) What is the maximum performance loss that can be shown using a real
> program (e.g,. one in SPEC) and some compiler (not necessarily GCC) when
> one assumes wrapping semantics?

The XLC numbers i was given about a year ago (i assume it was version 8)

SpecINT with undefined signed overflow at -O5 on a P5 2100mhz running
linux: 1634
SpecFP with undefined signed overflow at -O5 on a P5 2100mhz running linux: 3010

SpecINT with wrapping signed overflow at -O5 on a P5 2100mhz running
linux: 1319
SpecFP with wrapping signed overflow at -O5 on a P5 2100mhz running linux: 1624

>
> (2) In the current SPEC, how many programs benefit from undefined overflow
> semantics and how much does each benefit?

All of the fortran programs (IE SpecFP) benefit from undefined
*unsigned* overflow semantics due to 32 bit iv vs 64 bit array index
issues.
The same is true of the SpecFP C programs.




Just to be clear, the above behavior is not standards conformant, and
they do give a warning that they are doing it.
It is however, the default at -O3 for XLC, and AFAIK, at all opt levels for icc.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Daniel Berlin

Just to address the other compiler issue


No, they will work on other compilers, since 'configure'
won't use -O2 with those other compilers.


icc defaults to -O2 without any options, so unless you are passing
-O0, it will enable this.



Unless you know of some real-world C compiler that breaks
wrapv semantics even compiling without optimization?  If so,
I'd like to hear the details.


Sure. All of them,AFAIK, because they make the assumptions during
constant folding, and they all still constant fold at -O0.
It just so happens that it tends to affect a lot smaller number of
programs, because it essentially ends up being a very very local
optimization.
But it's still going to break some programs, even at -O0.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Daniel Berlin

On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:

"Steven Bosscher" <[EMAIL PROTECTED]> writes:

> On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
>> Also, as I understand it this change shouldn't affect gcc's
>> SPEC benchmark scores, since they're typically done with -O3
>> or better.
>
> It's not all about benchmark scores.

But so far, benchmark scores are the only scores given by the people
who oppose having -O2 imply -fwrapv.  If the benchmarks use -O3 they
wouldn't be affected by such a change -- and if so, we have zero hard
evidence of any real harm being caused by having -O2 imply -fwrapv.

> I think most users compile at -O2

Yes, which is why there's so much argument about what -O2 should do

> You say you doubt it affects performance.  Based on what?  Facts
> please, not guesses and hand-waiving...

The burden of proof ought to be on the guys proposing -O2
optimizations that break longstanding code, not on the skeptics.


The burden ought to be (and IMHO is) on those who propose we change
optimizer behavior in order to support something non-standard.

Why do you believe otherwise?


That being said, I just compiled GNU coreutils CVS on a Debian stable
x86 (2.4 GHz Pentium 4) using GCC 4.1.1.  With -O0, "sha512sum" on the
coreutils tar.gz file took 0.94 user CPU seconds (measured by "time
src/sha512sum coreutils-6.7-dirty.tar.gz").  With -O2 -fwrapv, 0.87
seconds.  With plain -O2, 0.86 seconds.

I also tried gzip 1.3.10, compressing its own tar file with a -9
compression option.  With -O0, 0.30 user CPU seconds.  With -O2
-fwrapv, 0.24 seconds.  With -O2, 0.24 seconds.

In all these cases I've averaged several results.  The difference
between -O2 and -O2 -fwrapv is pretty much in the noise here.

Admittedly it's only two small tests, and it's with 4.1.1.  But that's
two more tests than the -fwrapv naysayers have done, on
bread-and-butter applications like coreutils or gzip or Emacs (or GCC
itself, for that matter).


These are not performance needing applications.
I'll happily grant you that  adding -fwrapv will make no difference at
all on any application that does not demand performance in integer or
floating point calculations.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Daniel Berlin

On 12/31/06, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 12/31/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
> > "Steven Bosscher" <[EMAIL PROTECTED]> writes:
> >
> > > On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
> > >> Also, as I understand it this change shouldn't affect gcc's
> > >> SPEC benchmark scores, since they're typically done with -O3
> > >> or better.
> > >
> > > It's not all about benchmark scores.
> >
> > But so far, benchmark scores are the only scores given by the people
> > who oppose having -O2 imply -fwrapv.  If the benchmarks use -O3 they
> > wouldn't be affected by such a change -- and if so, we have zero hard
> > evidence of any real harm being caused by having -O2 imply -fwrapv.
> >
> > > I think most users compile at -O2
> >
> > Yes, which is why there's so much argument about what -O2 should do
> >
> > > You say you doubt it affects performance.  Based on what?  Facts
> > > please, not guesses and hand-waiving...
> >
> > The burden of proof ought to be on the guys proposing -O2
> > optimizations that break longstanding code, not on the skeptics.
>
> The burden ought to be (and IMHO is) on those who propose we change
> optimizer behavior in order to support something non-standard.
>
> Why do you believe otherwise?
> >
> > That being said, I just compiled GNU coreutils CVS on a Debian stable
> > x86 (2.4 GHz Pentium 4) using GCC 4.1.1.  With -O0, "sha512sum" on the
> > coreutils tar.gz file took 0.94 user CPU seconds (measured by "time
> > src/sha512sum coreutils-6.7-dirty.tar.gz").  With -O2 -fwrapv, 0.87
> > seconds.  With plain -O2, 0.86 seconds.
> >
> > I also tried gzip 1.3.10, compressing its own tar file with a -9
> > compression option.  With -O0, 0.30 user CPU seconds.  With -O2
> > -fwrapv, 0.24 seconds.  With -O2, 0.24 seconds.
> >
> > In all these cases I've averaged several results.  The difference
> > between -O2 and -O2 -fwrapv is pretty much in the noise here.
> >
> > Admittedly it's only two small tests, and it's with 4.1.1.  But that's
> > two more tests than the -fwrapv naysayers have done, on
> > bread-and-butter applications like coreutils or gzip or Emacs (or GCC
> > itself, for that matter).
>
> These are not performance needing applications.
> I'll happily grant you that  adding -fwrapv will make no difference at
> all on any application that does not demand performance in integer or
> floating point calculations.

I added -fwrapv to the Dec30 run of SPEC at
http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html
and
http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html



Note the distinct drop in performance across almost all the benchmarks
on Dec 30, including popular programs like bzip2 and gzip.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Daniel Berlin

On 12/31/06, Bruce Korb <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:
>> Admittedly it's only two small tests, and it's with 4.1.1.  But that's
>> two more tests than the -fwrapv naysayers have done, on
>> bread-and-butter applications like coreutils or gzip or Emacs (or GCC
>> itself, for that matter).
>
> These are not performance needing applications.
> I'll happily grant you that  adding -fwrapv will make no difference at
> all on any application that does not demand performance in integer or
> floating point calculations.

It seems then that this pretty-much ought to settle it:
If the only folks that would really care are those that do performance
critical work, then 99.9% of folks not doing that kind of work should
not bear the risk of having their code break.  The long standing
presumption, standardized or not, is that of wrapv semantics.
Changing that presumption without multiple years of -Wall warnings
is a Really, Really, Really Bad Idea.



I generally have no problem with turning on -fwrapv at O2, but i'm
curious where this ends.
After all, strict aliasing makes it hard to write a bunch of styles of
code people really want to write, and breaks real world programs and
GNU software.

Yet we decided to keep it on at O2, and off at O1.

We assume array accesses outside the defined length of an array are
invalid, but hey, maybe you really know something we don't there too.

Nothing but apps that actually do real work (and not just large
amounts of I/O) will notice these things.

Anyway, if you include "gzip" and "bzip2" in the applications that
demand performance in integer calculations, then you don't want it off
at O2. The spec scores show it makes both about 10% slower (at least).

You can distinguish the above cases all you want, but the idea that we
should trade performance for doing things we've told people for
*years* not to do, and finally made good on it, doesn't sit well with
me at all.


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-01 Thread Daniel Berlin

On 1/1/07, Paul Eggert <[EMAIL PROTECTED]> wrote:

Mark Mitchell <[EMAIL PROTECTED]> writes:

> * Dan Berlin says that xlc assumes signed overflow never occurs, gets
> much better performance as a result, and that nobody has complained.

Most likely xlc and icc have been used to compile the gnulib
mktime-checking code many times without incident (though I can't
prove this, as I don't use xlc and icc myself).  If so, icc and
xlc do not optimize away the overflow-checking test in question
even though C99 entitles them to do so; this might help explain
why they get fewer complaints about this sort of thing.

> I haven't yet seen that anyone has actually tried the obvious: run SPEC
> with and without -fwrapv.

Richard Guenther added -fwrapv to the December 30 run of SPEC at
<http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html>
and
<http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html>.
Daniel Berlin and Geert Bosch disagreed about how to interpret
these results; see <http://gcc.gnu.org/ml/gcc/2007-01/msg00034.html>.
Also, the benchmarks results use -O3 and so aren't directly
applicable to the original proposal, which was to enable -fwrapv
for -O2 and less.


No offense, but all enabling wrapv at O2 or less would do is cause
more bug reports about
1. Getting different program behavior between O2 and O3
2. Missed optimizations at O2
It also doesn't fit with what we have chosen to differentiate
optimization levels based on.

IMHO, it's just not the right solution to this problem.



> Also, of the free software that's assuming signed overflow wraps, can we
> qualify how/where it's doing that?  Is it in explicit overflow tests?
> In loop bounds?

We don't have an exhaustive survey, but of the few samples I've
sent in most of code is in explicit overflow tests.  However, this
could be an artifact of the way I searched for wrapv-dependence
(basically, I grep for "overflow" in the source code).  The
remaining code depended on -INT_MIN evaluating to INT_MIN.  The
troublesome case that started this thread was an explicit overflow
test that also acted as a loop bound (which is partly what caused
the problem).


If your real goal is to be able to just write explicit bounds
checking, and you don't want wrapping semantics for signed integers in
general (which i don't think most people do, but as with every single
person on this discussion, it seems we all believe we are in the 99%
of programmers who want something), then we should just disable this
newly added ability for VRP to optimize signed overflow and call it a
day.
VRP's optimizations are not generally useful in determining loop
bounds (we have other code that does all the bound determination) or
doing data dependence, so you would essentially lose no performance
except in very weird cases.

Of course, you will still be able to come up with cases where signed
overflow fails to wrap.  But IMHO, we have to draw the line somewhere,
and i'm fine with "if you want to test overflow, do it like this and
we will guarantee it will work".

We do the same thing with type punning through unions (guarantee that
reading a different member than you write will work) , even though the
standard says we don't have to.

All the arguments about what most people are going to want are
generally flawed on all sides.  Where there are reasonable positions
on both sides, nobody ever accurately predicts what the majority of a
hugely diverse population of language users is going to want, and
almost everyone believes they are in that majority.

--Dan


Fwd: Bugzilla internal error

2007-01-04 Thread Daniel Berlin

Guys, i changed the cookie  prevent this error, and to stop it from
continually asking for logins.

Please clear your current gcc.gnu.org bugzilla cookie from your
browser, or both this error, and getting asked for logins on every
page, will continue.

-- Forwarded message --
From: Paolo Carlini <[EMAIL PROTECTED]>
Date: Jan 4, 2007 2:25 PM
Subject: Bugzilla internal error
To: [EMAIL PROTECTED]


Hi Danny,

for some reason, Bugzilla undergoes an internal error when I try to add
a small patch to PR libstdc++/30365:

undef error - Undefined subroutine Fh::slice at
data/template/template/en/default/global/hidden-fields.html.tmpl line 58

Can you look into it?

Thanks,
Paolo.


Re: Fwd: Bugzilla internal error

2007-01-04 Thread Daniel Berlin

On 1/4/07, Paolo Carlini <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

> Guys, i changed the cookie  prevent this error, and to stop it from
> continually asking for logins.

I'm not sure to understand, I never had problems before...


Others have :)


> Please clear your current gcc.gnu.org bugzilla cookie from your
> browser, or both this error, and getting asked for logins on every
> page, will continue.

Anyway, ok, done, now it works.

Thanks,
Paolo.



Re: gcc 3.4 > mainline performance regression

2007-01-05 Thread Daniel Berlin

On 05 Jan 2007 07:18:47 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Andrew Haley <[EMAIL PROTECTED]> writes:

> It appears that memory references to arrays aren't being hoisted out
> of loops: in this test case, gcc 3.4 doesn't touch memory at all in
> the loop, but 4.3pre (and 4.2, etc) does.
>
> Here's the test case:
>
> void foo(int *a)
> {   int i;
> for (i = 0; i < 100; i++)
>a[0] += a[1];
> }

At the tree level, the problem is that the assignment to a[0] is seen
as aliasing a[1].  This causes the use of a[1] to look like a USE of
an SMT, and the assignment to a[0] to look like a DEF of the same
SMT.  So in tree-ssa-loop-im.c the statements look like they are not
loop invariant.

I don't know we can do better with our current aliasing
representation.  Unless we decide to do some sort of array SRA.


Well, we don't treat it like an array because it doesn't look like one.
You have a couple options here:

If you change create_overlap_variables in tree-ssa-alias.c to do SFT's
for small arrays, the testcase should just work.

Alternatively, you could teach tree PRE's load motion to go past using
just the VUSE/VDEF's to determine where loads are killed, and it will
do this for you.

That code is being reworked right now as part of some work i'm doing
on a branch (gcc-pre-vn branch, it's just a dump of my VN rewrite
tree, which is why it's not in svn.html), so if you wanted to take
this approach, let me know.



Or perhaps we could make the loop invariant motion pass more
complicated: when it sees a use or assignment of a memory tag, it
could explicitly check all the other uses/assignments in the loop and
see if they conflict.  I don't really know how often this would pay
off, though.

Array accesses are one of the only places we get seriously confused very easily.
There are places where load motion at the tree level is currently more
conservative than it needs to be, but the amount of loads we fail to
move as a result is nowhere near as high (percentage wise) as those
due to array accesses.


Re: Tricky(?) aliasing question.

2007-01-10 Thread Daniel Berlin


It is possible that somebody else will disagree with me.


FWIW, our currently aliasing set implementation agrees with you on
both counts :)


Re: Serious SPEC CPU 2006 FP performance regressions on IA32

2007-01-11 Thread Daniel Berlin

On 1/11/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Menezes, Evandro wrote:
> Though not as pronounced, definitely significant.
>

Using binary search I've detected that 30% performance regression of
cpu2006/437.leslie3d benchmark is caused by revision 117891.

http://gcc.gnu.org/viewcvs?view=rev&revision=117891

I assume same commit causes regression of all other benchmarks from
cpu2k6 suite (running others to confirm).


This only affects 4.2, and the only solution would be to try to
backport some of the 4.3 aliasing stuff to 4.2, which i'm not sure is
a great idea.



- Grigory



Re: Serious SPEC CPU 2006 FP performance regressions on IA32

2007-01-12 Thread Daniel Berlin

On 1/12/07, H. J. Lu <[EMAIL PROTECTED]> wrote:

On Thu, Jan 11, 2007 at 08:06:31PM -0500, Daniel Berlin wrote:
> On 1/11/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:
> >Menezes, Evandro wrote:
> >> Though not as pronounced, definitely significant.
> >>
> >
> >Using binary search I've detected that 30% performance regression of
> >cpu2006/437.leslie3d benchmark is caused by revision 117891.
> >
> >http://gcc.gnu.org/viewcvs?view=rev&revision=117891
> >
> >I assume same commit causes regression of all other benchmarks from
> >cpu2k6 suite (running others to confirm).
>
> This only affects 4.2, and the only solution would be to try to
> backport some of the 4.3 aliasing stuff to 4.2, which i'm not sure is
> a great idea.
>

If this serious performance in gcc 4.2 isn't addressed, it may
make gcc 4.2 less attractive since it may generate much slower
executables.


I'm happy to backport it, but it's going to introduce other possible
problems in 4.2.




H.J.



Re: Signed int overflow behaviour in the security context

2007-01-23 Thread Daniel Berlin


> This is a typical example of removing an if branch because signed
> overflow is undefined.  This kind of code is common enough.

I could not have made my point any better myself.


And you think that somehow defining it (which the definition people
seem to favor would be to make it wrapping) ameliorates any of these
concerns?

User parameters can't be trusted no matter whether signed overflow is
defined  or not.
Making it defined and wrapping doesn't help at all. It just means you
write different checks, not less of them.


Re: Signed int overflow behaviour in the security context

2007-01-26 Thread Daniel Berlin


> Every leading C compiler has for years done things like this to boost
> performance on scientific codes.

The Sun cc is a counter-example.  And even then, authors of scientific
code usually do read the compiler manual, and will discover any
additional optimizer flags.


Errr, actually, Seongbae, who worked for Sun on Sun CC until very
recently, says otherwise, unless i'm mistaken.

Seongbae, didn't you say that Sun's compiler uses the fact that signed
overflow is undefined when performing optimizations?


Re: Trunk GCC fails to compile cpu2k6/dealII at -O2

2007-01-29 Thread Daniel Berlin

On 1/29/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Hi!
GCC 4.3 compiler revision 121206 gets ICE while compiling
cpu2006/447.dealII source file data_out_base.cc at -O2 optimization
level on x86_64-redhat-linux.

Similar to previously reported cpu2k6/perlbench failure, this regression
is caused by "Rewrite of portions of points-to solver" patch
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01541.html (revision 120931).


No, actually, it's not.
It's caused by one of Jan's patches to remove unreferenced vars.
In fact, i'm pretty sure he already fixed this bug.


[PATCH]: Fix hang while compiling cpu2k6/perlbench at -O2

2007-01-29 Thread Daniel Berlin

On 1/29/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote:

Grigory Zagorodnev wrote:
> GCC 4.3 compiler revision 121206 goes into infinitive loop while
> compiling cpu2k6/perlbench source file regcomp.c at -O2 optimization
> level on x86_64-redhat-linux.

This regression is caused by "Rewrite of portions of points-to solver"
patch http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01541.html
revision 120931 http://gcc.gnu.org/viewcvs?view=rev&revision=120931



Patch attached and committed after bootstrap and regtest on i686-darwin.

2007-01-29  Daniel Berlin  <[EMAIL PROTECTED]>

* tree-ssa-structalias.c (do_complex_constraint): Mark correct
variable as changed.



- Grigory

Index: testsuite/gcc.c-torture/compile/20070129.c
===
--- testsuite/gcc.c-torture/compile/20070129.c	(revision 0)
+++ testsuite/gcc.c-torture/compile/20070129.c	(revision 0)
@@ -0,0 +1,94 @@
+/* This testcase would cause a hang in PTA solving due to a complex copy
+   constraint and marking the wrong variable as changed.  */
+
+typedef struct RExC_state_t
+{
+ char *end;
+ char *parse;
+} RExC_state_t;
+
+struct regnode_string
+{
+ unsigned char str_len;
+ char string[1];
+};
+
+static void *regatom (RExC_state_t * pRExC_state, int *flagp);
+
+static void *
+regpiece (RExC_state_t * pRExC_state, int *flagp)
+{
+ return regatom (0, 0);
+}
+
+static void *
+regbranch (RExC_state_t * pRExC_state, int *flagp, int first)
+{
+ return regpiece (0, 0);
+}
+
+static void *
+reg (RExC_state_t * pRExC_state, int paren, int *flagp)
+{
+ return regbranch (0, 0, 1);
+}
+
+void *
+Perl_pregcomp (char *exp, char *xend, void *pm)
+{
+ return reg (0, 0, 0);
+}
+
+static void *
+regatom (RExC_state_t * pRExC_state, int *flagp)
+{
+ register void *ret = 0;
+ int flags;
+
+tryagain:
+ switch (*(pRExC_state->parse))
+   {
+   case '(':
+ ret = reg (pRExC_state, 1, &flags);
+ if (flags & 0x8)
+   {
+ goto tryagain;
+   }
+ break;
+   default:
+ {
+   register unsigned long len;
+   register unsigned ender;
+   register char *p;
+   char *oldp, *s;
+   unsigned long numlen;
+   unsigned long foldlen;
+   unsigned char tmpbuf[6 + 1], *foldbuf;
+
+ defchar:
+   s = (((struct regnode_string *) ret)->string);
+   for (len = 0, p = (pRExC_state->parse) - 1;
+len < 127 && p < (pRExC_state->end); len++)
+ {
+   if (((*p) == '*' || (*p) == '+' || (*p) == '?'
+|| ((*p) == '{' && regcurly (p
+ {
+   unsigned long unilen;
+   for (foldbuf = tmpbuf; foldlen; foldlen -= numlen)
+ {
+   reguni (pRExC_state, ender, s, &unilen);
+   s += unilen;
+ }
+   break;
+ }
+   unsigned long unilen;
+
+   reguni (pRExC_state, ender, s, &unilen);
+   s += unilen;
+ }
+
+ };
+ break;
+   }
+ return (ret);
+}
Index: tree-ssa-structalias.c
===
--- tree-ssa-structalias.c	(revision 121279)
+++ tree-ssa-structalias.c	(working copy)
@@ -1538,9 +1538,9 @@ do_complex_constraint (constraint_graph_
   if (flag)
 	{
 	  get_varinfo (t)->solution = tmp;
-	  if (!TEST_BIT (changed, c->lhs.var))
+	  if (!TEST_BIT (changed, t))
 	{
-	  SET_BIT (changed, c->lhs.var);
+	  SET_BIT (changed, t);
 	  changed_count++;
 	}
 	}
@@ -2065,6 +2065,7 @@ solve_graph (constraint_graph_t graph)
 	  bitmap solution;
 	  VEC(constraint_t,heap) *complex = graph->complex[i];
 	  bool solution_empty;
+
 	  RESET_BIT (changed, i);
 	  changed_count--;
 


Re: Which optimization levels affect gimple?

2007-01-29 Thread Daniel Berlin

On 1/29/07, Diego Novillo <[EMAIL PROTECTED]> wrote:

Paulo J. Matos wrote on 01/29/07 06:35:
> On 1/29/07, Diego Novillo <[EMAIL PROTECTED]> wrote:
>> -fdump-tree-all gives you all the dumps by the high-level optimizers.
>> -fdump-all-all gives you all the dumps by both GIMPLE and RTL optimizers.
>>
>
> Is this -fdump-all-all version specific? Doesn't work on 4.1.1:
> $ g++ -fdump-all-all allocation.cpp
> cc1plus: error: unrecognized command line option "-fdump-all-all"
>
No, I goofed.  I must've dreamed the -all-all switch.  You have to use
-fdump-tree- for GIMPLE dumps and -fdump-rtl- for RTL dumps.
It's also possible that -fdump-rtl doesn't work on the 4.1 series (I
don't recall when -fdump-rtl was introduced, sorry).


-fdump-tree-all-all will work
as will -fdump-rtl-all-all

I never added support for -fdump-all-all-all :)


Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!

2007-01-29 Thread Daniel Berlin

On 1/29/07, David Edelsohn <[EMAIL PROTECTED]> wrote:

> Joe Buck writes:

Joe> There you go again.  Mark did not support or oppose rth's change, he just
Joe> said that rth probably thought he had a good reason.  He was merely
Joe> opposing your personal attack.  We're all human, we make mistakes, there
Joe> can be better solutions.

Joe> If you think that there's a problem with a patch, there are ways to say so
Joe> without questioning the competence or good intentions of the person who
Joe> made it.

Have any of you considered that Steven was using hyperbole as a
joke?  Are some people so overly-sensitized to Steven that they assume the
worst and have a knee-jerk reaction criticizing him?

The issue began as a light-hearted discussion on IRC.  Steven's
tone came across as inappropriate in email without context.  However,
Mark's reply defending RTH was not qualified with "probably", which was an
unfortunate omission, IMHO.

Encouraging a more collegial tone on the GCC mailinglists is a
good goal, but I hope that we don't over-react and create a larger
problem.


I hope so too.
Steven is also somewhat frustrated by what he (and I, for that matter)
see as often over-politicized processes of GCC.
I believe this is perfectly understandable given the amount of
politics it seems to take to get a significant design change pushed
forward in GCC.


Re: Interprocedural optimization question

2007-01-29 Thread Daniel Berlin

On 1/29/07, Razya Ladelsky <[EMAIL PROTECTED]> wrote:

Razya Ladelsky/Haifa/IBM wrote on 29/01/2007 13:46:33:

> Hi,
>
> Does gcc apply inter-procedural optimizations across functions called
using
> a function pointer? I guess that gcc performs conservatively assuming
that
> the pointer could point everywhere because the pointer is a declared as
a
> global variable and could be changed in any place. Is it true?

Yes.
The callgraph does not include these calls.


Well, not quite.

While the callgraph doesn't include them, ipa-pta will come up with a
conservatively correct set for them.

IN the presence of static function pointers, we will come up with a
completely accurate set for them unless they escape


Re: bugzilla error

2007-02-05 Thread Daniel Berlin

Clear your cookie, try again, and it should fix it.

(Sorry, i'm working on the cookie issues. There is something very odd going on)

On 2/5/07, Matthias Klose <[EMAIL PROTECTED]> wrote:

Got this page, trying to add an attachment to #30706.

  Matthias


This is GCC Bugzilla

This is GCC Bugzilla Version 2.20+
Internal Error

GCC Bugzilla has suffered an internal error. Please save this page and
send it to [EMAIL PROTECTED] with details of what you were doing at
the time this message appeared.

URL: http://gcc.gnu.org/bugzilla/attachment.cgi
undef error - Undefined subroutine Fh::slice at
data/template/template/en/default/global/hidden-fields.html.tmpl line 58
Actions:
Home | New | Search | bug # | Reports | Requests | New Account | Log In



Re: GCC 4.1.2 Status Report

2007-02-05 Thread Daniel Berlin

On 2/4/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

[Danny, Richard G., please see below.]

Thanks to all who have helped tested GCC 4.1.2 RC1 over the last week.

I've reviewed the list traffic and Bugzilla.  Sadly, there are a fair
number of bugs.  Fortunately, most seem not to be new in 4.1.2, and
therefore I don't consider them showstoppers.

The following issues seem to be the 4.1.1 regressions:

  http://gcc.gnu.org/wiki/GCC_4.1.2_Status

PR 28743 is only an ICE-on-invalid, so I'm not terribly concerned.

Daniel, 30088 is another aliasing problem.  IIIRC, you've in the past
said that these were (a) hard to fix, and (b) uncommon.  Is this the
same problem?  If so, do you still feel that (b) is true?  I'm
suspicious, and I am afraid that we need to look for a conservative hack.


It's certainly true that people will discover more and more aliasing
bugs the harder they work 4.1 :)
There is always the possibility of turning off the pruning, which will
drop our performance, but will hide most of the latent bugs we later
fixed through rewrites well enough that they can't be triggered (the
4.1 optimizers aren't aggressive enough).


Re: [Autovect]dependencies of virtual defs/uses

2007-02-12 Thread Daniel Berlin

On 2/12/07, Jiahua He <[EMAIL PROTECTED]> wrote:

Hi,

I am reading the code of autovect branch and curious about how to deal
with the dependencies of virtual defs/uses. In the function
vect_analyze_scalar_cycles( ), I found the statement "Skip virtual
phi's. The data dependences that are associated with virtual defs/uses
( i.e., memory accesses) are analyzed elsewhere." But where is the
code? I tried to search  for "vect_induction_def" and
"vect_reduction_def" and found that they are not used to assign
elsewhere. Is the analysis not implemented yet? Thanks in advance!


They show up as data references because of tree-data-reference.c marking them.
At lets, that's how other linear loop transforms handles it.
Not sure about how vectorizer deals with it specifically



Jiahua



Re: Some thoughts and quetsions about the data flow infrastracture

2007-02-12 Thread Daniel Berlin

On 2/12/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:

  On Sunday I had accidentally chat about the df infrastructure on
IIRC.  I've got some thoughts which I'd like to share.

  I like df infrastructure code from the day one for its clearness.
Unfortunately users don't see it and probably don't care about it.
With my point of view the df infrastructure has a design flaw.  It
extracts a lot of information about RTL and keep it on the side.  It
does not make the code fast.  It would be ok if we got a better code
quality.  Danny told me that they have 1.5% better code using df.


I also said we hadn't run numbers in about a year and a half, because
it wasnt our main goal anymore.

...

especially if we have no alternative faster path (even if the old live
analysis is the mess).


I also pointed out that df's merge criteria are no more than 5%
compile time degradation.
So what are you worried about here?

Life analysis isn't just a mess, it's inaccurate,  and intertwined
with dead store elimination and dead code elimination.


  Even rewriting the current optimizations on the new data flow
infrastructure makes situation worse because it will be not easy to
rid off the data flow infrastructure because probably part of the flaw
in the df interface.


What?
The flaw we have now is that every pass creates it's own
datastructures and dataflow, and is complete impossible to make
dataflow faster without rewriting every single pass.

With DF, you could make every single pass faster simply by improving ... DF!

If the datastructures it has don't work well enough for any pass of
course, you can add your own as df problems and results.


So it might create problems in the future.

  Especially I did not like David Edelhson's phrase "and no new
private dataflow schemes will be allowed in gcc passes".  It was not
such his first expression.  Such phrases are killing competition which
is bad for gcc.  What if the new specialized scheme is faster.  What
if somebody decides to write another better df infrastructure from the
scratch to solve the coming df infrastructure problems.


If you want to rewrite DF, please do.
But honesty, GCC has enough separate solvers that simply are not
faster anymore than df branch's solver.  We know. We replaced a lot of
them.

And that's the thing. We had to go and replace every single one of
these, when if they had just used df's solver in the first place (and
taken the 1-2% slowdown they probably faced), they would all just have
been sped up.
Worse, some of these solvers were buggy or inaccurate, so now that we
give it better information, faster, we have to go fix bugs that never
would have existed had they reused the infrastructure we provided.
This is in fact, a lot of what has taken up df branch time.  Fixing
bugs that fixing the dataflow exposed.


  I am not in opposition to merge if it satisfies the merge criteria.



People've done a lot of work.  It is too late.  I've should oppose the
criteria when they were discussed.  Sorry I've missed the discussion
if there were such discussion.  I am just rising some questions and
saying that more work will be needed for df-infrastructure even after
the merge.


There is always more work to be done.

BTW, I'll happily remove DF when all that is left of RTL is the
scheduler, RA, and instruction selector.
Hell, i'll throw a party.

But i wouldn't hold your breath for this to happen. :)
--Dan


Re: maybe vectorizer-bug regarding unhandled data-ref

2007-02-15 Thread Daniel Berlin

On 2/15/07, Dorit Nuzman <[EMAIL PROTECTED]> wrote:

> Hi,
>
> while playing with gcc-4.3 rev. 121994, i encountered a problem with
> autovectorisation.
>
> In the following simple code, the inner loop of c1() becomes vectorized
as
> expected, but the inner loop of c2() not because of
>
>test2.c:15: note: = analyze_loop_nest =
>test2.c:15: note: === vect_analyze_loop_form ===
>test2.c:15: note: === get_loop_niters ===
>test2.c:15: note: ==> get_loop_niters:(unsigned int) n_6(D)
>test2.c:15: note: Symbolic number of iterations is (unsigned int)
n_6(D)
>test2.c:15: note: === vect_analyze_data_refs ===
>
>test2.c:15: note: get vectype with 4 units of type float
>test2.c:15: note: vectype: vector float
>test2.c:15: note: not vectorized: unhandled data-ref
>test2.c:15: note: bad data references.
>
> (even with -ftree-vectorizer-verbose=99 there is no more info than that)
>
> The only difference between the two functions is that in c1() static
> arrays are used and in c2() pointer to arrays.. Is this a problem with
> aliasing/alignment of pointer parameters or a vectorizer bug? And is
there
> a work-around?
>

The first problem is that a[i] is invariant in the inner-loop, and the
vectorizer wants to work only with data-references that have a nice
evolution in the loop (i.e. advance between iterations of the loop). In
other words - it assumes that invariant accesses had been moved out of the
loop before vectorization:

"
ptr is loop invariant.

create_data_ref: failed to create a dr for *pretmp.27_46
"

The work around for that is to manually move the invariant a[i] out of the
inner-loop, put it into a temporary, and use that temporary in the
inner-loop.

The second problem is aliasing - the vectorizer can't tell that the write
through pointer o doesn't overlap with the read through pointer b.

The work around for that is to add the "__restrict" qualifier to the
declaration of the pointers.

To fix the first problem in the compiler, we can teach the vectorizer to
work with invariant datarefs. This is easy to do, but I think the right
solution is to enhance loop-invariant-motion pass to use an aliasing oracle
that would tell it that the invariant load can be safely moved out of the
loop (given that the pointers are __restrict qualified). I think such a
solution is in the works?


It is.


Do people think it's worth while to work around this invariant-motion issue
in the vectorizer?


Probably not, it's just going to make your code more complex for no real gain.


Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-18 Thread Daniel Berlin

On 2/17/07, H. J. Lu <[EMAIL PROTECTED]> wrote:

On Sat, Feb 17, 2007 at 01:35:28PM +0300, Vladimir Sysoev wrote:
> Hello, Daniel
>
> It looks like your changeset listed bellow makes performance
> regression ~40% on SPEC2006/leslie3d. I will try to create minimal
> test for this issue this week and update you in any case.
>

That is a known issue:

http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html


Yes, it is something we sadly cannot do anything about without doing a
very large number of backports.

There were some seriously broken things in 4.2's aliasing that got
fixed properly in 4.3.
The price of fixing them in 4.2 was a serious performance drop.



H.J.



Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-18 Thread Daniel Berlin

On 2/18/07, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 2/18/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On 2/17/07, H. J. Lu <[EMAIL PROTECTED]> wrote:
> > On Sat, Feb 17, 2007 at 01:35:28PM +0300, Vladimir Sysoev wrote:
> > > Hello, Daniel
> > >
> > > It looks like your changeset listed bellow makes performance
> > > regression ~40% on SPEC2006/leslie3d. I will try to create minimal
> > > test for this issue this week and update you in any case.
> > >
> >
> > That is a known issue:
> >
> > http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html
>
> Yes, it is something we sadly cannot do anything about without doing a
> very large number of backports.
>
> There were some seriously broken things in 4.2's aliasing that got
> fixed properly in 4.3.
> The price of fixing them in 4.2 was a serious performance drop.

There's the option of un-fixing them to get back to the state of 4.1 declaring
them fixed in 4.3 earliest.


This is fine by me, but they were rated as blockers.



Richard.



Re: Preserving alias analysis information

2007-02-19 Thread Daniel Berlin

On 2/19/07, Roberto COSTA <[EMAIL PROTECTED]> wrote:

Hello,
I've got a question for experts of alias analysis in GCC.

In the CLI back-end of GCC, there is a CLI-specific pass responsible of
some modifications on GIMPLE that simplify the emission of CIL bytecode
(see http://gcc.gnu.org/projects/cli.html#internals for more details).
One of such modifications makes sure that all ARRAY_REF nodes have zero
indexes. ARRAY_REFs with non-zero indexes are transformed into
zero-index equivalent plus some pointer arithmetics.
After noticing a failure of the CLI back-end in gcc.dg/struct-alias-1.c
test from the testsuite, I'm concerned that doing so may decrease the
effectiveness of GCC alias analysis.

For instance, let's consider the following struct definition (taken from
gcc.dg/struct-alias-1.c):

struct S {
int a[3];
int x;
};

This is the original code in GIMPLE pseudo-C dump representation:

   s.x = 0;
   i.0 = i;
   s.a[i.0] = 1;
   D.1416 = s.x;
   if (D.1416 != 0) goto ; else goto ;
:;
   link_error ();

This is the code after the CLI-specific array simplification:

   s.x = 0;
   i.0 = i;
   cilsimp.1 = &s.a[0];
   D.1423 = i.0 * 4;
   D.1424 = D.1423 + cilsimp.1;
   *D.1424 = 1;
   D.1416 = s.x;
   if (D.1416 != 0) goto ; else goto ;
:;
   link_error ();

In the original code, GCC alias analysis understands that accesses to
s.x and s.a do not alias; therefore, it understands that the "then"
condition of the if statement is never taken.
In the modified code, the alias analysis conclude that access to s.x and
pointer D.1424 may alias.




My question is: is this unavoidable because of the memory access pattern
in the modified code, or was there additional information the
transformation pass could have attached to D.1424 (or to something else)
that would have have excluded such a memory alias?


No. Sadly, this is done because we allow both positive and negative
offseting of structure fields to get to other structure fields.  Thus,
we can't prove that s.a and s.x do not alias here.

You could invent an attribute for field accesses that specify that
even though they look variable, they never stray outside their own
field.



Cheers,
Roberto



Re: Preserving alias analysis information

2007-02-19 Thread Daniel Berlin

On 2/19/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:

Hello,





you might try turning the references to TARGET_MEM_REFs, and copy the
alias information using copy_ref_info to it.  I am not sure how that
would interact with the transformations you want to do, but we do lot
of magic to keep the virtual operands for TARGET_MEM_REFs the same
as before the transformation (unless that got broken in last few months,
which unfortunately is pretty likely).


It would be better to annotate things with better alias information
than transform into target specific trees, which none of the other
transformations actually know how to deal with.


Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-19 Thread Daniel Berlin

On 2/19/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

>> > > > It looks like your changeset listed bellow makes performance
>> > > > regression ~40% on SPEC2006/leslie3d. I will try to create minimal
>> > > > test for this issue this week and update you in any case.

>> > The price of fixing them in 4.2 was a serious performance drop.
>>
>> There's the option of un-fixing them to get back to the state of 4.1
>> declaring
>> them fixed in 4.3 earliest.

I would like to understand a few things:

1. What is the overall drop in SPEC scores as a result of this patch?  I
understand the impact on leslie3d, but what is the overall impact?
Hopefully, this is an easy question to answer: run SPEC, revert the
patch, run SPEC again.


I'll let others answer this (i don't have spec2006), but there have
been reports filed about other spec2006 benchmarks as well on the 4.2
branch already.



2. What is the effort required to backport the necessary infrastructure
from 4.3?  I'm not looking for "a lot" or "is hard", but rather, "two
weeks" or "six months".  What needs to be backported, and what are the
challenges?


Including bug fixes, i'd guess 2 months to be conservative.  It may be
faster, of course.
The main problem is that the patches are now intermingled with other
changes that happened at the same time.  Other than that, they should
be pretty directly applicable.
There are quite a few of them though (5 or 6 large patches).
The other challenge is that some of the patches were written after
mem-ssa merged, and that changed a lot of little infrastructure
pieces.
These patches will not directly apply to 4.2 anymore, and it's going
to just take time to convert them. How much? A week per patch or less,
i'd guess.
There are no real challenges other than applying the patches and doing
a lot of testing, since most of these patches were written pretty
early on during the 4.3 cycle.



3. Is there any conceivable way to fix the alias analysis for 4.2 so
that it is robust, but not overly conservative, even if that means a
special algorithm just for 4.2?  Yes, that would be a bizarre thing to
do on a release branch, but humor me: is it possible, and what would it
take?

We had one in 4.2 prior to these fixes, but it used a huge amount of
memory on large testcases.
So it's possible, of course.


Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-19 Thread Daniel Berlin

On 2/19/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Daniel Berlin wrote:

>> 2. What is the effort required to backport the necessary infrastructure
>> from 4.3?  I'm not looking for "a lot" or "is hard", but rather, "two
>> weeks" or "six months".  What needs to be backported, and what are the
>> challenges?
>
> Including bug fixes, i'd guess 2 months to be conservative.

OK, thanks.  That seems like a lot of effort; certainly more than I
could try to browbeat you into doing. :-)


A lot of this is also that we are still shaking out performance
regressions that are a combination of fixes and changes for mem-ssa.
We'd still have to do that if we backported the 4.3 changes.
There are less of them, for sure, than there are in 4.2, but they are
still there.



> There are no real challenges other than applying the patches and doing
> a lot of testing, since most of these patches were written pretty
> early on during the 4.3 cycle.

Is there anyone brave enough to volunteer to try?
  If this were to turn
out to be substantially easier than Danny thinks, then it sounds like it
might be a good solution.  However, there's no reason to expect that,
and I think two person-months is rather much.


If things go perfect, it would take probably 2 weeks.
We could start by reverting the patch and applying the fixes until you
are happy, if that would be better?

If you stopped before the end you will end up with the freefem3d
memory regression, but nothing more.


Therefore, it sounds like the practical choices are revert the patch and
accept the bugs, or vice versa.  Is there any reason to expect the bugs
to be particularly more prevalent in 4.2 than they were in 4.1?


Nope.


Re: GCC 4.2.0 Status Report (2007-02-19)

2007-02-20 Thread Daniel Berlin

On 2/19/07, Joe Buck <[EMAIL PROTECTED]> wrote:

On Tue, Feb 20, 2007 at 12:27:42AM +, Joseph S. Myers wrote:
>...  *All* releases seem to have the
> predictions that they are useless, should be skipped because the next
> release will be so much better in way X or Y, etc.; I think the question
> of how widely used a release series turned out to be in practice may be
> relevant when deciding after how many releases the branch is closed, but
> simply dropping a release series after the branch is created is pretty
> much always a mistake.  (When we rebranded 3.1 as 3.2 in the hopes of
> getting a stable C++ ABI, I think that also with hindsight was a mistake,
> given that the aim was that the stable ABI would also be the correct
> documented ABI but more ABI bugs have continued to turn up since then.)

I agree.  To me, the only issue with 4.2 is the performance drop due to
aliasing issues; whether to address that by reverting patches to have 4.1
performance + 4.1 bugs, or by backporting fixes from 4.3, I would leave
for the experts to decide (except that I don't think it's shippable
without some solution to the performance drop).

Fixing bugs is always harder than reverting patches.
You can get 4.1 performance with 4.1 bugs simply by reverting the
patches and turning off pruning (In particular, making
access_can_touch_variable always return true).
This hides pretty much all of the aliasing bugs that were reported,
except one (which was due to TBAA pruning of sets in a case we
shouldn't have).

BTW, I hope nobody was surprised by the performance drop.  I did warn
it was going to happen. I only fixed those bugs because they were
listed as blockers.  I'm still of the opinion that even though you can
write relatively simple testcases for them, they are actually pretty
rare.  In most of the bugs, it is in fact, the absence of any real
code (or local variables in one case) that triggers the bad result.
Anything more complex and we get the right answer.  Strange but true.

--Dan


Re: reassociation pass and built-in functions

2007-02-22 Thread Daniel Berlin

On 2/20/07, Revital1 Eres <[EMAIL PROTECTED]> wrote:


Hello,

We saw that the reassociation pass does not operate on built-in functions,
for example:

vp3 = vec_madd (vp1, vp2, vp3);

In the RTL level the function is expanded to regular insn:

(insn 87 91 88 9 (set (reg/v:V4SF 217 [ vp3 ])
(plus:V4SF (mult:V4SF (reg/v:V4SF 219 [ vp1 ])
(reg/v:V4SF 218 [ vp2 ]))
(reg/v:V4SF 217 [ vp3 ]))) -1 (nil)
(nil))


The reassociation could open opportunity for the variable expansion
optimization to be applied when vec_madd is in a loop.
(this is what the reassociation pass do for similar accumulation
instruction that is not in built-in function)
Currently MVE fails as it expects the pattern:

x = x + something

while it finds:

x = something + x

I could fix this in the MVE code, but I was wondering about the
relations of reassociation pass and built-in functions in general.


If you find builtin functions you want to reassociate, this is
generally a good sign they should have been expanded to trees rather
than kept as builtins.
If you expanded them, they would have been reassociated (or at least,
they should be. I think we reassociate vector calculations, but it's
been a while since i looked).


Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-25 Thread Daniel Berlin

On 2/24/07, Serge Belyshev <[EMAIL PROTECTED]> wrote:

I have compared 4.1.2 release (r121943) with three revisions of 4.2 on spec2k
on an 2GHz AMD Athlon64 box (in 64bit mode), detailed results are below.

In short, current 4.2 performs just as good as 4.1 on this target
with the exception of huge 80% win on 178.galgel. All other difference
lies almost in the noise.

results:

first number in each column is a runtime difference in %
between corresponding 4.2 revision and 4.1.2 (+ is better, - is worse).

second number is a +- confidence interval, i.e. according to my results,
current 4.2 does (82.0+-1.7)% better than 4.1.2 on 178.galgel.

(note some results are clearly noisy, but I've tried hard to avoid this --
I did three runs on a completely idle machine, wasting 14 hours of machine time 
in total).

r117890 -- 4.2 just before DannyB's aliasing fixes
r117891 -- 4.2 with aliasing fixes.
r122236 -- 4.2 current.



Uh, these are the wrong revisions.
117890 is correct, but 117891 is superseded by 117922, which will make
things worse than 117891 will.

This is why the current numbers are worse than the second column, my guess.

In particular, 117922 is goin


Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-25 Thread Daniel Berlin

On 2/25/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 2/24/07, Serge Belyshev <[EMAIL PROTECTED]> wrote:
> I have compared 4.1.2 release (r121943) with three revisions of 4.2 on spec2k
> on an 2GHz AMD Athlon64 box (in 64bit mode), detailed results are below.
>
> In short, current 4.2 performs just as good as 4.1 on this target
> with the exception of huge 80% win on 178.galgel. All other difference
> lies almost in the noise.
>
> results:
>
> first number in each column is a runtime difference in %
> between corresponding 4.2 revision and 4.1.2 (+ is better, - is worse).
>
> second number is a +- confidence interval, i.e. according to my results,
> current 4.2 does (82.0+-1.7)% better than 4.1.2 on 178.galgel.
>
> (note some results are clearly noisy, but I've tried hard to avoid this --
> I did three runs on a completely idle machine, wasting 14 hours of machine 
time in total).
>
> r117890 -- 4.2 just before DannyB's aliasing fixes
> r117891 -- 4.2 with aliasing fixes.
> r122236 -- 4.2 current.


Uh, these are the wrong revisions.
117890 is correct, but 117891 is superseded by 117922, which will make
things worse than 117891 will.

This is why the current numbers are worse than the second column, my guess.

In particular, 117922 is goin


grrr.
117922 is going to make all nonlocal loads and stores link together,
and 117891 will not.




Re: Re; Maintaining, was: Re: Reduce Dwarf Debug Size

2007-03-01 Thread Daniel Berlin

On 01 Mar 2007 18:05:50 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Olivier Galibert <[EMAIL PROTECTED]> writes:

> On Thu, Mar 01, 2007 at 04:51:24PM -0800, Andrew Pinski wrote:
> > If someone wants a patch committed they will ping it
> > a couple of times and if they lost interest because they now decide it
> > is not a good thing or they no longer care about it, it will just fall
> > down the way side.
>
> If it's a new contributor, you will just have lost him forever at that
> point.  The barrier of entry is high enough with the copyright
> assignment issue, you don't want to raise it through disorganization.

One answer to that is to have patch advocates to help push patches in.
They would need some experience with the community, but would not need
deep technical knowledge.  This would be a volunteer position, along
the lines of the bugmasters.

Another answer is to have Danny's patch queue do automatic pings.  Of
course that would only work for patches which were added to the patch
queue.

Ian


It actually has code to be able to do this.
It can even guess who the relevant maintainers are based on regexp
matching of the listed maintenance area.

The reason it is turned off is because when i asked a few maintainers
privately whether it would be helpful, they said it would annoy the
hell out of them, and not be useful.

The first i expected, the second is why it is not on :)

It also has code to send summaries to the list of patches that are >30
days outstanding, along with their ML urls, and people who could
possibly review them, etc.
I turned it for a week or two, but again, after talking with people,
nobody even noticed it :)

--Dan


Re: Improvements of the haifa scheduler

2007-03-06 Thread Daniel Berlin

On 3/5/07, Maxim Kuvyrkov <[EMAIL PROTECTED]> wrote:

Diego Novillo wrote:
> Maxim Kuvyrkov wrote on 03/05/07 02:14:
>
>>o Fix passes that invalidate tree-ssa alias export.
>
> Yes, this should be good and shouldn't need a lot of work.
>
>>o { Fast but unsafe Gupta's aliasing patch, Unsafe tree-ssa alias
>> export } in scheduler's data speculation.
>
> "unsafe" alias export?  I would definitely like to see the tree->rtl
> alias information transfer fixed once and for all.  Finishing RAS's
> tree->rtl work would probably make a good SoC project.

"Unsafe" doesn't mean not fixed.  My thought is that it would be nice to
have a switch in aliasing that will turn such operations as

join (pt_anything, points_to) -> pt_anything

into

join (pt_anything, points_to) -> points_to

This transformation will sacrifice correctness for sake of additional
information.


In 4.3, doesn't exist outside of the confines of tree-ssa-structalias.c
We transform it into a conservatively correct set of variables, rather
than giving up like we used to.

This alone should significantly improve your export results.


Re: Improvements of the haifa scheduler

2007-03-06 Thread Daniel Berlin

On 3/6/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:

On 3/5/07, Maxim Kuvyrkov <[EMAIL PROTECTED]> wrote:
> Diego Novillo wrote:
> > Maxim Kuvyrkov wrote on 03/05/07 02:14:
> >
> >>o Fix passes that invalidate tree-ssa alias export.
> >
> > Yes, this should be good and shouldn't need a lot of work.
> >
> >>o { Fast but unsafe Gupta's aliasing patch, Unsafe tree-ssa alias
> >> export } in scheduler's data speculation.
> >
> > "unsafe" alias export?  I would definitely like to see the tree->rtl
> > alias information transfer fixed once and for all.  Finishing RAS's
> > tree->rtl work would probably make a good SoC project.
>
> "Unsafe" doesn't mean not fixed.  My thought is that it would be nice to
> have a switch in aliasing that will turn such operations as
>
> join (pt_anything, points_to) -> pt_anything
>
> into
>
> join (pt_anything, points_to) -> points_to
>
> This transformation will sacrifice correctness for sake of additional
> information.

In 4.3, doesn't exist outside of the confines of tree-ssa-structalias.c

^
pt_anything doesn't exist


We transform it into a conservatively correct set of variables, rather
than giving up like we used to.

This alone should significantly improve your export results.



Re: reload.c as a bugzilla quip

2007-03-06 Thread Daniel Berlin

On 3/5/07, Joe Buck <[EMAIL PROTECTED]> wrote:

On Sun, Mar 04, 2007 at 09:45:13AM +0100, FX Coudert wrote:
> One of the bugzilla quips (the headlines appearing at random for each
> bug list) is actually the head of gcc/reload.c (full text below).

That is really obnoxious and should be removed.



Gone.
Along with the 10 huge quotes of somebody quoting themselves.


Re: Manipulating the tree Structure

2007-03-12 Thread Daniel Berlin

On 3/12/07, Andrea Callia D'Iddio <[EMAIL PROTECTED]> wrote:

Great! thank you! I tested with your code and it works... but I'm
still a bit confused.
Could you help me with this simple example?
Suppose that I obtained a tree structure with the following command:

tree stmt = bsi_stmt (si);

and suppose I want to execute the following task:

For each tree statement t:
  IF t is an assignment, then output variable name


In 4.3 (earlier replace GIMPLE_MODIFY_STMT with MODIFY_EXPR)

if (TREE_CODE (t) == GIMPLE_MODIFY_STMT)
{
tree lhs = TREE_OPERAND (t, 0);
tree rhs = TREE_OPERAND (t, 1);

 print_generic_expr (stderr, lhs, 0);
 print_generic_expr (stderr, rhs, 0);
}

  ELSE IF t is an IF statement then output the IF condition

else if TREE_CODE (t) == COND_EXPR)
{
tree condition = COND_EXPR_COND (t);
print_generic_expr (stderr, condition, 0);
}


  ELSE ...

how can I do this task?


Really, this isn't that hard

Can you give me some reference?


Look at any existing tree-ssa-*.c pass

They all do this in a myriad of ways.


Re: Import GCC 4.2.0 PRs

2007-03-12 Thread Daniel Berlin

On 3/12/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Here are some GCC 4.2.0 P1s which I think it would be good for GCC to
have resolved before the release, together with names of people I'd like
to volunteer to help.  (Naturally, I have no command authority, and I'd
encourage anyone else who wants to help to pitch in, but I'm trying to
tap a few likely suspects.)




* PR 29585 (Berlin) -- This is a crash in alias analysis with anonymous
namespaces.  I can't imagine that anonymous namespaces have any deep
implications for aliasing, so I would hope this is an easy fix.


The bug report says that part is fixed, and only a ccp issue remains.
"The testcase in comment #2 works for me on the 4.2 branch now, but the one in
comment #7 fails with ... in tree-ssa-ccp.c"






Re: We're out of tree codes; now what?

2007-03-12 Thread Daniel Berlin

On 3/12/07, Andrew Pinski <[EMAIL PROTECTED]> wrote:

On 3/12/07, Steven Bosscher <[EMAIL PROTECTED]> wrote:
> On 3/12/07, Andrew Pinski <[EMAIL PROTECTED]> wrote:
> > Can I recommend something just crazy, rewrite the C and C++ front-ends
> > so they don't use the tree structure at all except when lowering until
> > gimple like the rest of the GCC front-ends?
>
> The C front end already emits generic, so there's almost no win in
> rewriting it (one lame tree code in c-common.def -- not worth the
> effort ;-).

I was thinking to rewrite the C++ front-end not to use extra tree
codes you would also need to rewrite the C front-end but now thinking
about, you could use one tree code for all of C++ and then use
subcodes or at least one tree code for types, one for expressions, one
for decls and one for all others and then use subcodes.


This was part of the point of making decl nodes a hierarchy, to enable
exactly this (even though it has not been split into subcodes yet)
tree_contains_struct then allows you to exactly check things are still
safe to access.


Re: We're out of tree codes; now what?

2007-03-12 Thread Daniel Berlin

On 3/12/07, Mike Stump <[EMAIL PROTECTED]> wrote:

On Mar 12, 2007, at 2:14 PM, Paolo Carlini wrote:
> When I said, let's support Doug, I meant let's support Doug from a
> *practical* point of view. Either we suggest something doable with
> a realistically sized effort or a little larger and at the same
> time we volunteer to actually do it. In my opinion, "visions" for a
> better future do not help here.

I'd disagree.  It is nice to have a stated idea of where we want to
go, even if we can't get there today.  We can measures patch goodness
by how closely each patch moves us in that direction.  That said, we
all realize we are _not_ asking Doug to please re-implement the C++
frontend to our design to fix this issue.  I'd be against that.
Making C++ use a few tree codes (a la the back end) and burying the
`extended' code in the C++ frontend sounds enticing, sounds like less
work than redoing the FE to not use trees entirely and a step in the
right direction.  I checked the Objective-C frontend, it seems
possible to do it as well.  The tree_contains_struct stuff seems the
`hardest' to get right, though, for Objective-C I think we could get
by with 2 tree codes, maybe.  Those that have TS_DECL_NON_COMMON,
TS_DECL_WITH_VIS, TS_DECL_WRTL, TS_DECL_MINIMAL and TS_DECL_COMMON
and those without?


Feel free to make TS_ structs that do what you need, and use them.
They correspond to the structs that are contained into each decl node in tree.h
How the decl hiearchy is organized is explained in the docs, as well
as how to add a new "subclass".

For non-DECL nodes (or anything else), you'd want to make up your own TS_ stuff.

(tree_contains_struct not only made the checking easy, it bought us
some bits that were stored on every DECL tree, like whether it could
contain RTL or not)


Re: Referenced Vars in IPA pass

2007-03-13 Thread Daniel Berlin

Uh, since when did 4.1 support IPA GIMPLE?


On 3/13/07, Paulo J. Matos <[EMAIL PROTECTED]> wrote:

On 3/13/07, Paolo Bonzini <[EMAIL PROTECTED]> wrote:
>
> > int x;
> >  {
> >  int y;
> >  {
> >  int z;
> >  ...
> >  }
> >  ...
> > }
> >
> > just happens to have three statements, all VAR_DECL,x, y, z, without
> > any reference to the starting and ending blocks. As a side question,
> > how can I get hand of where the blocks start and finish? Don't really
> > know if it's useful but If I need it later, better I know how.
>
> This is not available anymore after lowering to GIMPLE.  BIND_EXPRs
> (representing lexical scope) are removed in gimple-low.c.
>

Ah, by the way, I'm not using trunk, I'm using stable 4.1 code.

> Paolo
>


--
Paulo Jorge Matos - pocm at soton.ac.uk
http://www.personal.soton.ac.uk/pocm
PhD Student @ ECS
University of Southampton, UK



Re: Referenced Vars in IPA pass

2007-03-13 Thread Daniel Berlin

On 3/13/07, Paulo J. Matos <[EMAIL PROTECTED]> wrote:

On 3/13/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> Uh, since when did 4.1 support IPA GIMPLE?
>
>

What do you mean by that?


I'm pretty sure there were a number of cgraph and other related
changes necessary to make IPA work completely that were first in 4.2.

I may be misremembering though, Jan?


Re: Listing file-scope variables inside a pass

2007-03-20 Thread Daniel Berlin

On 3/20/07, Dave Korn <[EMAIL PROTECTED]> wrote:

On 19 March 2007 22:16, Karthikeyan M wrote:

> What should I do if I want a list of all file-scope variables inside
> my own pass ?
>
> The file_scope variable is local to c-decl.c . Is there a reason why
> the scope holding variables are local to c-decl.c ?

  Because we want to keep front-, mid- and back- ends of the compiler as
modular and non-interdependent as possible, perhaps?

  If you need a routine to dump that data, why not write it in c-decl.c and
just expose the prototype in a suitable header file (c-tree.h)?


He already can get the file-scope variables by going through the
cgraph variable nodes.


Re: Listing file-scope variables inside a pass

2007-03-20 Thread Daniel Berlin

On 3/20/07, Karthikeyan M <[EMAIL PROTECTED]> wrote:

Thanks.
Where exactly should I be looking?

cgraph.c, cgraphunit.c, cgraph.h
see cgraph_varpool_nodes, FOR_EACH_STATIC_VARIABLE (static here means
having scope greater than a single function, it does not mean "all
variables declared static in C")


Will the cgraph nodes also have global declarations that are never
used inside any
function .

If you ask for all of them, it will give you all of them
If you ask for only the needed ones, it will give you all of the
needed ones (see FOR_EACH_STATIC_VARIABLE)


On 3/20/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On 3/20/07, Dave Korn <[EMAIL PROTECTED]> wrote:
> > On 19 March 2007 22:16, Karthikeyan M wrote:
> >
> > > What should I do if I want a list of all file-scope variables inside
> > > my own pass ?
> > >
> > > The file_scope variable is local to c-decl.c . Is there a reason why
> > > the scope holding variables are local to c-decl.c ?
> >
> >   Because we want to keep front-, mid- and back- ends of the compiler as
> > modular and non-interdependent as possible, perhaps?
> >
> >   If you need a routine to dump that data, why not write it in c-decl.c and
> > just expose the prototype in a suitable header file (c-tree.h)?
> >
> He already can get the file-scope variables by going through the
> cgraph variable nodes.
>


--

Karthik


To laugh often and love much; to win the respect of intelligent
persons and the affection of children; to earn the approbation of
honest critics; to appreciate beauty; to give of one's self; to leave
the world a bit better, whether by a healthy child, a garden patch or
a redeemed social condition; to have played and laughed with
enthusiasm and sung with exultation; to know even one life has
breathed easier because you have lived--that is to have succeeded.
--Ralph Waldo Emerson




Re: Listing file-scope variables inside a pass

2007-03-21 Thread Daniel Berlin

On 3/20/07, Karthikeyan M <[EMAIL PROTECTED]> wrote:

Are these macros not a part of 4.1.2 ?
I just picked up the tarball of the 4.1.2-core source.

Which release has this code ?

4.2 or 4.3

You should never try to be doing real development work on GCC against
anything but the development trunk (or a branch of the development
trunk).
If for no other reason than we only fix regressions on release branches.



Thanks a lot.


On 3/20/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On 3/20/07, Karthikeyan M <[EMAIL PROTECTED]> wrote:
> > Thanks.
> > Where exactly should I be looking?
> cgraph.c, cgraphunit.c, cgraph.h
> see cgraph_varpool_nodes, FOR_EACH_STATIC_VARIABLE (static here means
> having scope greater than a single function, it does not mean "all
> variables declared static in C")
>
> > Will the cgraph nodes also have global declarations that are never
> > used inside any
> > function .
> If you ask for all of them, it will give you all of them
> If you ask for only the needed ones, it will give you all of the
> needed ones (see FOR_EACH_STATIC_VARIABLE)
> >
> > On 3/20/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> > > On 3/20/07, Dave Korn <[EMAIL PROTECTED]> wrote:
> > > > On 19 March 2007 22:16, Karthikeyan M wrote:
> > > >
> > > > > What should I do if I want a list of all file-scope variables inside
> > > > > my own pass ?
> > > > >
> > > > > The file_scope variable is local to c-decl.c . Is there a reason why
> > > > > the scope holding variables are local to c-decl.c ?
> > > >
> > > >   Because we want to keep front-, mid- and back- ends of the compiler as
> > > > modular and non-interdependent as possible, perhaps?
> > > >
> > > >   If you need a routine to dump that data, why not write it in c-decl.c 
and
> > > > just expose the prototype in a suitable header file (c-tree.h)?
> > > >
> > > He already can get the file-scope variables by going through the
> > > cgraph variable nodes.
> > >
> >
> >
> > --
> >
> > Karthik
> >
> > 
> > To laugh often and love much; to win the respect of intelligent
> > persons and the affection of children; to earn the approbation of
> > honest critics; to appreciate beauty; to give of one's self; to leave
> > the world a bit better, whether by a healthy child, a garden patch or
> > a redeemed social condition; to have played and laughed with
> > enthusiasm and sung with exultation; to know even one life has
> > breathed easier because you have lived--that is to have succeeded.
> > --Ralph Waldo Emerson
> > 
> >
>


--

Karthik


To laugh often and love much; to win the respect of intelligent
persons and the affection of children; to earn the approbation of
honest critics; to appreciate beauty; to give of one's self; to leave
the world a bit better, whether by a healthy child, a garden patch or
a redeemed social condition; to have played and laughed with
enthusiasm and sung with exultation; to know even one life has
breathed easier because you have lived--that is to have succeeded.
--Ralph Waldo Emerson




Re: GCC priorities [Was Re: We're out of tree codes; now what?]

2007-03-22 Thread Daniel Berlin

On 3/21/07, Nicholas Nethercote <[EMAIL PROTECTED]> wrote:

On Wed, 21 Mar 2007, Paul Brook wrote:

> The problem is that I don't think writing a detailed "mission statement" is
> actually going to help anything. It's either going to be gcc contributors
> writing down what they're doing anyway, or something invented by the SC or
> FSF. I the latter case nothing's going to change because neither the SC nor
> the FSF have any practical means of compelling contributors to work on a
> particular feature.
>
> It's been said before that Mark (the GCC release manager) has no real power to
> make anything actually happen. All he can do is delay the release and hope
> things get better.

Then it will continue to be interesting, if painful, to watch.


It's not clear what you think would happen and be fixed if he did.

Realistically, compile time will not be solved until someone with an
interest in solving it does the hard work (and before starting any
huge projects, posits a reasonable way to do whatever major surgery
they need to, so they can get community buy-in. Note that this is not
actually hard, but it does require not just submitting huge patches
that do something you've never discussed before on the ML).

This won't change no matter what you do.
You simply can't brow-beat people into fixing huge things, and i'm not
aware of a well-functioning open-source project that does.

If you want to help fix compile time, then get started, instead of
commenting from the sidelines.
As they say, "Patches welcome".

--Dan


Re: Using SSA

2007-03-22 Thread Daniel Berlin

On 3/22/07, Alexander Lamaison <[EMAIL PROTECTED]> wrote:

> > The tree_opt_pass for my pass has PROP_ssa set in the
> properties_required
> > field.  Is this all I need to do?
>
> You need to put your pass after pass_build_ssa.  Setting PROP_ssa does
> not build SSA itself, but it will cause an assertion failure if the
> pass is run while SSA is (not yet) available.
>
> Paolo

I think (if I'm correctly interpreting the list in passes.c) it is.  It's
right after pass_warn_function_noreturn, just before pass_mudflap_2.  Is
this right?  I don't get any assertion about SSA not being available.



This is a bug then, btw.
You should file it.
It should have asserted that PROP_ssa was not available, because it
was destroyed by del_ssa.
In particular, this code:

#ifdef ENABLE_CHECKING
 do_per_function (verify_curr_properties,
  (void *)(size_t)pass->properties_required);
#endif

should have triggered.

(I have long believed that our properties mechanism should be used as
a mechanism to decide what analysis needs to be run, not a static
assertion mechanism, and should recompute what is necessary on demand,
but that is not currently the case).


Re: SoC Project: Propagating array data dependencies from Tree-SSA to RTL

2007-03-24 Thread Daniel Berlin

On 3/23/07, Alexander Monakov <[EMAIL PROTECTED]> wrote:

Hello,


I would be pleased to see Ayal Zaks as my mentor, because proposed
improvement is primarily targeted as modulo scheduling improvement. In
case this is not possible, I will seek guidance from Maxim Kuvyrkov.


Ayal has not signed up to be a mentor (as of yet). If he doesn't, i'd
be happy to mentor you here, since i wrote part of tree-data-ref.c


Re: We're out of tree codes; now what?

2007-03-24 Thread Daniel Berlin

On 3/23/07, Marc Espie <[EMAIL PROTECTED]> wrote:

In article <[EMAIL PROTECTED]> you write:
>On 19 Mar 2007 19:12:35 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:
>> similar justifications for yet another small% of slowdown have been
>> given routinely for over 5 years now.  small% build up; and when they
>> build up, they don't not to be convincing ;-)
>
>But what is the solution? We can complain about performance all we
>want (and we all love to do this), but without a plan to fix it we're
>just wasting effort. Shall we reject every patch that causes a slow
>down? Hold up releases if they are slower than their predecessors?
>Stop work on extensions, optimizations, and bug fixes until we get our
>compile-time performance back to some predetermined level?

Simple sociology.

Working on new optimizations = sexy.
Trimming down excess weight = unsexy.

GCC being vastly a volunteer project, it's much easier to find people
who want to work on their pet project, and implement a recent
optimization they found in a nice paper (that will gain 0.5% in some
awkward case) than to try to track down speed-downs and to reverse them.

I'm not sure I buy this.
Most of the new algorithm implementations I see are generally
replacing something slower with something faster.
Examples:
GVN-PRE is about 10x faster than SSAPRE in all cases, while doing
about 30% better on every testcase that SSAPRE sucked at.
The new points-to implementation is about 100x faster than the old one
(on smaller cases, it actually gets faster as the size of the problem
to be solved grows).
New ivopts algorithm replaced old ivopts algorithm for a large speedup
Newer propagation algorithm replaced older CCP implementation for a speedup

Most of these have not had *any* real effect on the time it takes to
run GCC in the common case.  Why?
Because except for the same edge cases you complain we are spending
time speeding up, they aren't a significant amount of time!
Most of the time is in building and manipulating trees and RTL.


And then disappointment, as the ssa stuff just got added on top of the
RTL stuff, and the RTL stuff that was supposed to vanish takes forever
to go away...


Mainly because people want it to produce *exactly* the same code it
used to, instead of being willing to take a small generated code
performance hit for a while.  Since backend code generation is a
moving target with very complex dependencies, this is a hard target to
hit.



At some point, it's going to be really attractive to start again from
scratch, without all the backends/frontend complexities and interactions
that make cleaning up stuff harder and harder...

This i agree with.  I'd much rather stop trying to do everything we
can to support more than the top 5 architectures (though i have no
problem with all their OS variants).


Also, I have the feeling that quite a few of gcc sponsors are in it for
the publicity mostly (oh look, we're nice people giving money to gcc),
and new optimization passes that get 0.02% out of SPEC are better bang
for their money.

And some people just like to sit on the sidelines and complain instead
of submitting patches to do anything.


Kuddoes go to the people who actually manage to reverse some of the
excesses of the new passes.


Most of these people are the same people who implemented the passes in
the first place!


Re: Creating parameters for functions calls

2007-03-28 Thread Daniel Berlin

On 3/27/07, Antoine Eiche <[EMAIL PROTECTED]> wrote:

Dear all,

I want to insert functions calls during a new pass.


Which version of GCC?
The problem is to

create parameters. At this time, I  successfully create a function call
with two constante as parameter and insert it (I can see that in the
asm's code). But, I want to give the address of an array and a constante
to this function.
I think the problem is the creation of the node which contains the
address of the array.

For example:
I get from the code (in tree "rhs"):
a[i]
I want build a node like that:
&a[i]
and build a function call like that:
foo(constante,&a[i])

This is the error message when a try to compile a program with my pass :

tab.c: In function 'main':
tab.c:7: erreur interne du compilateur: dans lookup_subvars_for_var, à
tree-flow-inline.h:1629


Can you send a backtrace?

This means nothing has created the variable annotation for a.


Re: tuples: data structure separation from trees

2007-03-30 Thread Daniel Berlin

On 3/29/07, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:

On Thu, Mar 29, 2007 at 06:40:30PM -0700, Andrew Pinski wrote:
> On 29 Mar 2007 18:24:56 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> >Why will expressions have location?  It seems to me preferable to save
> >the memory.  After a few optimization passes many of the expressions
> >have no location anyhow.

> And I know from past experiences, that this is really a bug that they
> don't produce expressions with locations.  I remember Daniel Berlin
> was talking about how SRA does the correct thing with respect of
> locations and other passes should really follow that.  We can see how
> out of SSA can produce cases where PHIs would create expressions
> without locations but that is a bug (I cannot find the PR right now
> but Daniel J. filed it).

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26475

But I'm not convinced that adding locations on more things is a
workable solution to the problem.  I wish someone had sufficient
incentive to sit down and design a proper solution to our degenerating
debug info.



The problem is that we have conflicting goals. We want good debug
info, but don't want -g to affect the optimizations performed.
However, the easiest (and probably best) way to keep the debug info up
to date through optimizations is to do what LLVM does, and make it
part of the IR.
This sadly, can affect optimizations performed.
I'm not sure we will ever keep every pass from mucking up debug info
unless they have to do a *lot* less work than they would now. I don't
see how to make that happen without making it either completely
automatic using magic pixie dust, or part of the IR.


  1   2   3   4   5   6   7   8   9   10   >