Re: return void from void function is allowed.

2006-10-31 Thread Dale Johannesen


On Oct 31, 2006, at 12:49 PM, Igor Bukanov wrote:


-- Forwarded message --
From: Igor Bukanov <[EMAIL PROTECTED]>
Date: Oct 31, 2006 9:48 PM
Subject: Re: return void from void function is allowed.
To: Mike Stump <[EMAIL PROTECTED]>


On 10/31/06, Mike Stump <[EMAIL PROTECTED]> wrote:


This is valid in C++.


My copy of 1997 C++ public draft contains:

6.6.3  The return statement
...
2 A return statement without an expression can be used only in  
functions
 that  do not return a value, that is, a function with the return  
value
 type   void,   a   constructor   (_class.ctor_),   or   a
destructor
 (_class.dtor_).   A  return  statement  with an expression can be  
used
 only in functions returning a value; the value of  the   
expression  is
 returned  to  the caller of the function.  If required, the  
expression
 is implicitly converted to the return type of the function in  
which it
 appears.   A return statement can involve the construction and  
copy of
 a temporary object (_class.temporary_).  Flowing  off  the  end   
of  a
 function  is  equivalent  to  a  return with no value; this  
results in

 undefined behavior in a value-returning function.

My reading of that is C++ does not allow return void-expression from
void function. Was it changed later?


Yes, it was:

[stmt.return] 6.6.3 The return statement
...
2 A return statement without an expression can be used only in  
functions that do not return a value, that is, a
function with the return type void, a constructor (12.1), or a  
destructor (12.4). A return statement with an
expression of non-void type can be used only in functions returning a  
value; the value of the expression is
returned to the caller of the function. The expression is implicitly  
converted to the return type of the func-
tion in which it appears. A return statement can involve the  
construction and copy of a temporary object
(12.2). Flowing off the end of a function is equivalent to a return  
with no value; this results in undefined

behavior in a value-returning function.

3 A return statement with an expression of type “cv void”can be used  
only in functions with a return type
of cv void; the expression is evaluated just before the function  
returns to its caller.



And final thought, wrong mailing list...   gcc-help would have been

better.

I thought bugs in GCC can be discussed here. Sorry if it is a wrong  
assumption.


Regards, Igor




Re: Canonical type nodes, or, comptypes considered harmful

2006-11-07 Thread Dale Johannesen


On Nov 7, 2006, at 11:47 AM, Douglas Gregor wrote:

I just read Nathan's discussion [1] on changing GCC's type system  
to use canonical type nodes, where the comparison between two types  
requires only a pointer comparison. Right now, we use "comptypes",  
which typically needs to do deep structural checks to determine if  
two types are equivalent, because we often clone _TYPE nodes.


One difficulty is that compatibility of types in C is not transitive,  
especially when you're compiling more than one translation unit at a  
time.
See the thread "IMA vs tree-ssa" in Feb-Mar 2004.  Geoff Keating and  
Joseph Myers give good examples.




Re: Canonical type nodes, or, comptypes considered harmful

2006-11-07 Thread Dale Johannesen


On Nov 7, 2006, at 3:05 PM, Gabriel Dos Reis wrote:


"Joseph S. Myers" <[EMAIL PROTECTED]> writes:

[...]

| In C, there are several other sources of intransitivity, such as
| incomplete array types (int[10] and int[5] both compatible with  
int[] but


I'm sure we can (and probably must) specical-case that.

| not with each other), unprototyped functions and enums (two enums
| compatible with the same integer type are not compatible with  
each other).


well, we can just have the unified type nodes for the C++ front-end if
the C system is determined to be resilient :-)


I do understand the advantages of sharing them more.  Perhaps some  
90% solution could be made to work, with most type nodes being  
unified and the problem cases (there would not be any in C++,  
apparently) using the existing inefficient mechanisms.







Re: Canonical type nodes, or, comptypes considered harmful

2006-11-09 Thread Dale Johannesen


On Nov 9, 2006, at 4:54 PM, Mike Stump wrote:


On Nov 8, 2006, at 7:14 AM, Ian Lance Taylor wrote:

The way to canonicalize them is to have all equivalent types point to
a single canonical type for the equivalence set.  The comparison is
one memory dereference and one pointer comparison, not the current
procedure of checking for structural equivalence.


Once not equal addresses might mean equal types, you have to do a  
structure walk to compare types, and you're right back were we  
started.  The only way to save yourself, is to be able to say,  
different addresses, _must_ be different types.


I had in mind something like

   if (p1 == p2)
  equal
   else if (p1->ptr_equality_suffices_for_this_type || p2- 
>ptr_equality_suffices_for_this_type)

  not equal
   else
  tree walk

Don't know how workable that is.



Re: strict aliasing question

2006-11-11 Thread Dale Johannesen


On Nov 11, 2006, at 10:45 PM, Howard Chu wrote:


Andrew Pinski wrote:

On Sat, 2006-11-11 at 22:18 -0800, Ian Lance Taylor wrote:


Your code will be safe on all counts if you change buf from int[] to
char[].  The language standard grants a special exemption to char*
pointers.  Without that exemption, it would be impossible to write
malloc in C.



As I recall, we chose int[] for alignment reasons, figuring we'd  
have no guarantees on the alignment of a char[].


True, but add __attribute__((aligned(4))) and all is well.






Re: 32bit Calling conventions on linux/ppc.

2006-12-12 Thread Dale Johannesen


On Dec 12, 2006, at 11:42 AM, David Edelsohn wrote:


Joslwah  writes:


Joslwah> Looking at the Linux 32bit PowerPC ABI spec, it appears to  
me that
Joslwah> floats in excess of those that are passed in registers are  
supposed to
Joslwah> be promoted to doubles and passed on the stack.  Examing  
the resulting
Joslwah> stack from a gcc generated C call it appears they are  
passed as

Joslwah> floats.

Joslwah> Can someone confirm/refute this, or else point me to an  
ABI that says

Joslwah> that they should be passed as floats.

I have not been able to find any motivation for promoting floats
passed ont the stack.  Does this provide some form of compatibility  
with

SPARC?


It may have been intended to allow the callee to be a K&R-style or  
varargs function, where all float args get promoted to double.
In particular, printf was often called without being declared in K&R- 
era code.  This is one way to make that code work in a C90 environment.




Re: 32bit Calling conventions on linux/ppc.

2006-12-12 Thread Dale Johannesen


On Dec 12, 2006, at 12:07 PM, David Edelsohn wrote:


Dale Johannesen writes:


Dale> It may have been intended to allow the callee to be a K&R- 
style or

Dale> varargs function, where all float args get promoted to double.
Dale> In particular, printf was often called without being declared  
in K&R-
Dale> era code.  This is one way to make that code work in a C90  
environment.


Except that arguments in registers are not promoted and arguments
in registers spilled to the stack for varargs are not promoted.  In  
fact

it makes varargs more complicated.  And it does not really match K&R
promotion rules.


On ppc, floating point regs always contain values in double format,  
so passing a single value and reading it as double Just Works.


To clarify, I am not defending this, just offering a possible  
explanation.  If I'm right, the whole issue is obsolete and there is  
currently no good reason to do the promotion.





Re: REG_ALLOC_ORDER and Altivec registers

2007-03-01 Thread Dale Johannesen


On Mar 1, 2007, at 12:57 AM, Tehila Meyzels wrote:

Revital Eres  wrote on 01/03/2007 10:37:36:

Hello,

I wonder why this order (non-consecutive, decreasing) of Altivec

registers

was chosen when specifying the allocation order in REG_ALLOC_ORDER.

(taken from rs6000.h)

   /* AltiVec registers.  */\
   77, 78,  \
   90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80,  \
   79,  \
   96, 95, 94, 93, 92, 91,  \
   108, 107, 106, 105, 104, 103, 102, 101, 100, 99, 98, 97, \
   109, 110,\
   111, 112, 113\



I think part of the answer can be found here:
http://gcc.gnu.org/ml/gcc/2003-06/msg00902.html
"We have found that re-arranging the REG_ALLOC_ORDER in rs6000.h so  
that all

the FP registers come after the integer registers greatly reduces the
tendency of the compiler to generate code that moves 8-byte quantites
through the FP registers."


I don't think so, the ordering above is in the original Altivec patch  
here

http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00453.html
which precedes that discussion by over a year.

Obviously you want to use caller-saved registers before callee-saved  
ones.


The consecutive reverse ordering of the callee-saved registers  
matches the

ordering in the save/restore routines in the Altivec PIM (also found in
darwin-vecsave.asm), which is desirable for that mechanism to work well.
(The common ordering doesn't logically have to be reversed; I'd guess  
that

was chosen to be analogous to the integer stmw/lmw instructions.)

The ordering of the caller-saved regs looks odd to me.   V13..V2  
should be used
in that order to minimize conflict with parameters and return  
values.  V0 and V1
are preferred to those, and I'd expect V14..V19 to be preferred also,  
but they

aren't.  Perhaps to minimize the code that sets up VRsave?



Re: GCC -On optimization passes: flag and doc issues

2007-04-17 Thread Dale Johannesen


On Apr 17, 2007, at 4:20 PM, Eric Christopher wrote:



increase code size?  I feel I must be missing something really  
obvious... is
it just that the other optimisations that become possible on  
inline code

usually compensate?


That or the savings from not having to save/restore registers, set  
up the frame, etc as well.


Don't forget the call and its setup.  Trivially, inlining an empty  
function is always a size win.

There actually were a couple in Spec95.



Re: Extension compatibility policy

2005-02-27 Thread Dale Johannesen
On Feb 27, 2005, at 12:56 PM, Mike Hearn wrote:
Are these compatibility patches available in discrete diff form
anywhere?
No.
The branch's name is apple-ppc-branch, and changes are marked
as APPLE LOCAL.


Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)

2005-02-28 Thread Dale Johannesen
On Feb 28, 2005, at 4:43 AM, Joern RENNECKE wrote:
Dale Johannesen wrote:
   Well, no, what is supposed to happen (I haven't tried it for a 
while, so I don't promise
this still works) is code like this:

.hotsection:
loop:
  conditional branch (i?==1000) to L2
L1:
  /* do stuff */
end loop:
/* still in hot section  */
L2:  jmp L3
.coldsection:
L3:
  i = 0;
  jmp L1
Well, even then, using of the cold section can increase the hot 
section size, depending on target, and for some
targets the maximum supported distance of the cold section.
Certainly.  In general it will make the total size bigger, as does 
inlining.  If you have good
information about what's hot and cold, it should reduce the number of 
pages that actually
get swapped in.  The information has to be good, though, as a branch 
from
hot<->cold section becomes more expensive.  I'd recommend it only if 
you have
profiling data (this is a known winner on Spec in that situation).

Should I do custom basic block reordering in machine_dependent_reorg 
to clean up
the turds of hot and cold partitioning?
No, you should not turn on partitioning in situations where code size 
is important to you.



Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)

2005-02-28 Thread Dale Johannesen
On Feb 28, 2005, at 10:19 AM, Joern RENNECKE wrote:
Dale Johannesen wrote:
   Certainly.  In general it will make the total size bigger, as does 
inlining.  If you have good
information about what's hot and cold, it should reduce the number of 
pages that actually
get swapped in.  The information has to be good, though, as a branch 
from
hot<->cold section becomes more expensive.  I'd recommend it only if 
you have
profiling data (this is a known winner on Spec in that situation).

Should I do custom basic block reordering in machine_dependent_reorg 
to clean up
the turds of hot and cold partitioning?
No, you should not turn on partitioning in situations where code size 
is important to you.
You are missing the point.  In my example, with perfect profiling 
data, you still end up with
more code in the hot section,
Yes.
i.e. more pages are actually swapped in.
Unless the cross-section branch is actually executed, there's no reason 
the unconditional
jumps should get paged in, so this doesn't follow.

A block should not
be put in the code section unless it is larger than a jump into the 
cold section.
Worth trying, certainly.  My guess is it won't matter much either way.


Re: Merging calls to `abort'

2005-03-14 Thread Dale Johannesen
On Mar 14, 2005, at 10:30 AM, Joe Buck wrote:
Steven Bosscher <[EMAIL PROTECTED]> wrote:
 system.h:#define abort() fancy_abort (__FILE__, __LINE__, 
__FUNCTION__)

I agree that this is the best technical solution, even if cross-jumping
were not an issue.
This invokes undefined behavior in a program that includes ,
which some would consider a good reason not to prefer it.
I believe the cross-jumping should definitely be done with -Os; the 
optimization
makes a useful contribution to reducing code size, which the user has 
told us
is important to him.  Other than that, I don't care much.  (I have 
debugged
problems where the debugger was showing me the wrong abort call, and
this was annoying, but not something I couldn't deal with.  Typically 
you
just have to stop on the right call to the function that's calling 
abort.)



RFC: always-inline vs unit-at-a-time

2005-03-15 Thread Dale Johannesen
Consider the following:
static inline int a() __attribute((always_inline));
static inline int b() __attribute((always_inline));
static inline int b() { a(); }
static inline int a() { }
int c() { b(); }
This compiles fine at -O2.  At -O0 we get the baffling error
  sorry, unimplemented: inlining failed in call to 'a': function not 
considered for inlining

It seems undesirable for -O options to affect which programs will 
compile.

The obvious thing to do about it is turn on -funit-at-a-time always, 
but I'm
concerned about the effect on compile speed; has anybody measured it?



Re: RFC: always-inline vs unit-at-a-time

2005-03-15 Thread Dale Johannesen
On Mar 15, 2005, at 10:32 AM, Zack Weinberg wrote:
Dale Johannesen <[EMAIL PROTECTED]> writes:
Consider the following:
static inline int a() __attribute((always_inline));
static inline int b() __attribute((always_inline));
static inline int b() { a(); }
static inline int a() { }
int c() { b(); }
This compiles fine at -O2.  At -O0 we get the baffling error
   sorry, unimplemented: inlining failed in call to 'a': function not
   considered for inlining
It seems undesirable for -O options to affect which programs will
compile.
Agreed.  Perhaps we should run the inliner at -O0 if we see
always_inline attributes, just for those functions?
We do; the problem is that it makes only 1 pass, so tries to inline
"a" before it has seen the body of "a".  If you interchange the 
definitions
of "a" and "b" the inlining is done at all optimization levels.

I think this
could be done without turning on -funit-at-a-time, even (the inliner
does work in -O2 -fno-unit-at-a-time mode, after all).
That gets the same failure on this example.
The problem is not the effect on compile speed (IIRC Honza had it down
to negligible) but the way it breaks assembly hacks such as crtstuff.c.
(I would love to see a solution to that.)
I wasn't aware of this problem, can you give me a pointer?


Re: Do we still need get_callee_fndecl?

2005-03-22 Thread Dale Johannesen
On Mar 22, 2005, at 8:14 AM, Kazu Hirata wrote:
After all, all we need in get_callee_fndecl seems to be
  addr = TREE_OPERAND (call_expr, 0);
  return ((TREE_CODE (addr) == ADDR_EXPR
   && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL)
  ? TREE_OPERAND (addr, 0) : NULL_TREE;
Thoughts?
In Objective C (and ObjC++) it's also a good idea to look under 
OBJ_TYPE_REF.
See this patch, which was deferred to 4.1 and I'm going to resubmit RSN:
http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt



Re: Do we still need get_callee_fndecl?

2005-03-22 Thread Dale Johannesen
On Mar 22, 2005, at 10:21 AM, Kazu Hirata wrote:
Hi Dale,
After all, all we need in get_callee_fndecl seems to be
  addr = TREE_OPERAND (call_expr, 0);
  return ((TREE_CODE (addr) == ADDR_EXPR
   && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL)
  ? TREE_OPERAND (addr, 0) : NULL_TREE;
Thoughts?
In Objective C (and ObjC++) it's also a good idea to look under
OBJ_TYPE_REF.
See this patch, which was deferred to 4.1 and I'm going to resubmit 
RSN:
http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt
Thanks for the information.  Does OBJ_TYPE_REF_EXPR only apply to a
CALL_EXPR?  In other words, are there other forms of constants that
are exposed by looking into OBJ_TYPE_REF_EXPR?
I believe the usage here is the only one relevant to ObjC.
It is used for other things in C++, but I don't know how.


RFA: PR 19225

2005-03-22 Thread Dale Johannesen
I'm interested in fixing this, but could use some help from somebody
knowledgeable about how x86 EH is supposed to work.  In particular,
what's the expected relationship between SP at the point of a throwing
call, and when it gets back to the landing pad?


Re: RFA: PR 19225

2005-03-24 Thread Dale Johannesen
On Mar 24, 2005, at 12:35 PM, James E Wilson wrote:
Dale Johannesen wrote:
I'm interested in fixing this, but could use some help from somebody
knowledgeable about how x86 EH is supposed to work.  In particular,
what's the expected relationship between SP at the point of a throwing
call, and when it gets back to the landing pad?
There is no direct relationship between the two SP values.  If they are
different, then there should be unwind info indicating the difference,
and the unwinder should be applying those differences while unwinding.
There is a statement to this effect in comment #3 from Andrew.
Actually I wrote that comment.  While I see that it could be done that 
way
in the unwinder, I found no code that was actually trying to do it.  So
I was unclear about the intent.

However, looking at this, I am tempted to call it a bug in the defer 
pop
optimization.  ...It is probably much easier to fix the defer pop 
optimization
than to fix the unwinder to handle this.
I had tentatively reached this conclusion also, more slowly I'm sure.
Actually, looking at this, I am surprised how may NO_DEFER_POP calls we
have without corresponding OK_DEFER_POP calls.  I wonder if this
optimization is already broken, in the sense that is it being
accidentally disabled when it shouldn't be.  Or maybe the code is just
more obtuse than it needs to be.
No, I think you are right, I'll see if I can clean things up without 
breaking it.
Thanks for your comments.



Re: GCC 4.0 Status Report (2005-03-24)

2005-03-24 Thread Dale Johannesen
On Mar 24, 2005, at 3:08 PM, James E Wilson wrote:
Richard Henderson wrote:
19255 EH bug on IA32 when using heavy optimization
Typo in pr number?
I think that is supposed to be 19225, for which I have already 
suggested a solution though not a patch (disable deferred argument 
popping when a call can throw).  It isn't marked critical though, so I 
don't know why it is on the list, unless perhaps Mark just changed the 
status to be not critical.
I'm testing a fix for this.  Will assign to myself.


Re: bootstrap fails for apple-ppc-darwin

2005-03-31 Thread Dale Johannesen
On Mar 31, 2005, at 12:18 PM, Mike Stump wrote:
On Mar 31, 2005, at 10:54 AM, Fariborz Jahanian wrote:
Today, I tried bootstrapping gcc mainline on/for apple-ppc-darwin. It 
fails in stage1.
I can see the problem also...  :-(
I doubt if the person that broke it knows about it.  It was working 
just a short time ago (beginning of the week?).
My March 26 checkout works fine.


Re: bootstrap fails for apple-ppc-darwin

2005-03-31 Thread Dale Johannesen
On Mar 31, 2005, at 12:23 PM, Dale Johannesen wrote:
On Mar 31, 2005, at 12:18 PM, Mike Stump wrote:
On Mar 31, 2005, at 10:54 AM, Fariborz Jahanian wrote:
Today, I tried bootstrapping gcc mainline on/for apple-ppc-darwin. 
It fails in stage1.
I can see the problem also...  :-(
I doubt if the person that broke it knows about it.  It was working 
just a short time ago (beginning of the week?).
My March 26 checkout works fine.
...but it occurs to me I didn't install 8A428 until the 28th or 29th 
and haven't rebuilt since,
so if it's a recently introduced cctools problem,  I might not be 
seeing it.  Anybody seeing
this on an old cctools?



RFC: #pragma optimization_level

2005-03-31 Thread Dale Johannesen
I've currently got the job of implementing pragma(s) to change
optimization level in the middle of a file.  This has come up a few 
times before,

http://gcc.gnu.org/ml/gcc/2001-06/msg01275.html
http://gcc.gnu.org/ml/gcc/2002-09/msg01171.html
http://gcc.gnu.org/ml/gcc/2003-01/msg00557.html
and so far nothing has been done, but the users who want
this feature have not gone away, so I will be doing it now.
The only real opposition to the idea was from Mark Mitchell, in
the earliest of these threads,
http://gcc.gnu.org/ml/gcc/2001-06/msg01395.html
So I guess question 1 is, Mark, do you feel negatively enough about this
feature to block its acceptance in mainline?  If so, I'll go do this as 
a local
patch, Geoff will complain a lot, and it will be done 4 times as fast :)

Let's assume for the sake of argument that Mark is OK with it.
Mark's message also raises some good questions about semantics.
My answers are:
- Flags that logically refer to a whole file at once cannot be changed.
In this category I know of -funit-at-a-time and -fmerge-constants; there
may be others I haven't found.
- When function A is inlined into B, the inlined copy is now part of B, 
and
whatever flags were in effect at the beginning of B apply to it.  (The 
decision
whether to inline is also based on the flags in effect at the beginning 
of B.)
- As a first cut I intend to allow only -O[0123s] to be specified in 
the pragma,
as suggested by Geert Bosch.  I don't think there's any reason this 
couldn't
be extended to single flags.

Implementation:   the general idea is
- at the beginning of parsing each function, record the flags currently 
in effect
in (or pointed to from) the FUNC_DECL node.
- before optimizing/generating code for each function, reset the flags 
from stored values.
As a first step I think I'll unify the various flags into a struct; 
that seems like a good
janitor patch anyway.

(I should add that a requirement for me is CodeWarrior compatibility.  
Their syntax is
#pragma optimization_level [01234]
#pragma optimize_for_size
Functionality doesn't have to match exactly, but ought to be more or 
less the same.
CW's treatment of the interaction with inlining is as described above, 
and I'd be averse
to changing that; their way is reasonable, and there's existing code 
that depends on it.
For mainline I assume we'll need "GCC" to the syntax; that local change 
is small compared
to making it work though.)

Comments?


Re: RFC: #pragma optimization_level

2005-04-01 Thread Dale Johannesen
On Apr 1, 2005, at 11:24 AM, Mark Mitchell wrote:
Dale Johannesen wrote:
So I guess question 1 is, Mark, do you feel negatively enough about 
this
feature to block its acceptance in mainline?
I'm not sure that I *could* block it, but, no, I don't feel that 
negatively.
Well, in theory nobody can block anything (although some people's posts 
suggest
they don't understand this).  In practice if you or another GWM objects 
to something,
nobody else is going to override you and approve it.

I tried to address your other questions in my previous message, but:
I think that a #pragma (or attribute) that affects only optimization 
options is less problematic than generic option processing (.e.g, 
flag_writable_strings, as in the email you reference).

I do think that you need to clearly document how inlining plays with 
this.  In particular, if you inline a -O2 function into a -O0 
function, what happens?  (My suggestion would be that the caller's 
optimization pragmas win.)
Agree.  (And documentation will be written.)
Also, you should document what set of optimization options can be 
specified.  I think it should only be ones that do not change the ABI; 
things like -O2, or turning off particular passes are OK, while 
options that change calling conventions, etc., should be disallowed.
Agree.
Also, you need to say what happens in the situation where the user has 
done "-O2 -fno-gcse" and the #pragma now says "-O2".  Does that 
reenable GCSE?  (I think it should.)
Yes.
what's the
granularity of this #pragma?  Function-level, I hope?
That's what I assumed.  Anything finer than that is insane. :-)
Actually there are cases where it makes sense:  you could ask that a 
particular call be
inlined, or a particular loop be unrolled N times.  However, I'm not 
planning to do anything
finer-grained than a function at the moment.  Certainly for 
optimizations that treat a whole
function at once, which is most of them, it doesn't make sense.



Re: RFC: #pragma optimization_level

2005-04-04 Thread Dale Johannesen
On Apr 3, 2005, at 5:31 PM, Geert Bosch wrote:
On Apr 1, 2005, at 16:36, Mark Mitchell wrote:
In fact, I've long said that GCC had too many knobs.
(For example, I just had a discussion with a customer where I 
explained that the various optimization passes, while theoretically 
orthogonal, are not entirely orthogonal in practice, and that truning 
on another pass (GCSE, in this caes) avoided other bugs.  For that 
reason, I'm not actually convinced that all the -f options for 
turning on and off passes are useful for end-users, although they are 
clearly useful for debugging the compiler itself.  I think we might 
have more satisfied users if we simply had -Os, -O0, ..., -O3.  
However, many people in the GCC community itself, and in certain 
other vocal areas of the user base, do not agree.)
Pragmas have even more potential for causing problems than 
command-line options.
People are generally persuaded more easily to change optimization 
options, than
to go through hundreds of source files fixing pragmas.
I would hope so.  But the reason I'm doing this is that we've got a lot 
of customer
requests for pragma-level control of optimization.

As the average life of a piece of source code is far longer than the 
life-span
of a specific GCC release, users expect to compile unchanged source 
code with
many different compilers. For this reason, I think it is big mistake to
allow pragmas to turn on or off individual passes. The internal 
structure
of the compiler changes all the time, and pragmas written for one 
version
may not make sense for another version.

The effect will be that over time, user pragmas are wrong more often 
than
right, and the compiler will often do better when just ignoring them 
all together.
(This is when people will ask for a 
-fignore-source-optimization-pragmas flag.)
Pressure on GCC developers to maintain compatibility with old flags 
will increase
as well. This is a recipe for disaster.
Certainly problems can arise, but I think you're seriously overstating 
them.
The C and C++ standards require that unrecognized pragmas be ignored,
and the pragmas we're talking about don't affect correctness.  So the 
worst effect
you should see is that your code is less efficient than expected.  (The 
changes
to disallow nonconforming code which go in with every release, some of 
which
change behavior that's been stable for years, are a much bigger problem 
for users.)

But doing anything much more elaborate than optimization
(off, size, some, all, inlining) corresponding to (-O0, Os, O1, O2, O3)
on a per-function basis seems a bad idea.
Personally I have no strong opinion about this either way.


Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?

2005-04-04 Thread Dale Johannesen
On Apr 4, 2005, at 2:32 PM, Alexandre Oliva wrote:
On Mar 26, 2005, Graham Stott <[EMAIL PROTECTED]> wrote:
I do regular bootstraps of mainline all languages on FC3
i686-pc-linuux-gnu and haven't seen any problemss upto Friday. I'm
using --enable-checking=tree,misc,rtl,rtlflag which might make a
difference.
I'm still observing this problem every now and then.  It's not
consistent or easily reproducible, unfortunately.  I suspect we're
using pointers somewhere, and that stack/mmap/whatever address
randomization is causing different results.  I'm looking into it.
I've found 2 bugs over the last 6 months where the problem is exposed
only if two pointers happen to hash to the same bucket.  It's occurred
to me that doing a bootstrap with all hashtable sizes set to 1 might be
a good idea.


Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?

2005-04-04 Thread Dale Johannesen
On Apr 4, 2005, at 3:21 PM, Alexandre Oliva wrote:
On Apr  4, 2005, Dale Johannesen <[EMAIL PROTECTED]> wrote:
On Apr 4, 2005, at 2:32 PM, Alexandre Oliva wrote:
On Mar 26, 2005, Graham Stott <[EMAIL PROTECTED]> wrote:
I do regular bootstraps of mainline all languages on FC3
i686-pc-linuux-gnu and haven't seen any problemss upto Friday. I'm
using --enable-checking=tree,misc,rtl,rtlflag which might make a
difference.

I'm still observing this problem every now and then.  It's not
consistent or easily reproducible, unfortunately.  I suspect we're
using pointers somewhere, and that stack/mmap/whatever address
randomization is causing different results.  I'm looking into it.

I've found 2 bugs over the last 6 months where the problem is exposed
only if two pointers happen to hash to the same bucket.  It's occurred
to me that doing a bootstrap with all hashtable sizes set to 1 might 
be
a good idea.
Perhaps.  But the fundamental problem is that we shouldn't be hashing
on pointers, and tree-eh.c does just that for finally_tree and
throw_stmt_table.
Hmm.  Of the earlier bugs, in
http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01760.html
the hash table in question is built by DOM, and in
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01810.html
it's built by PRE (VN).  I don't think there's general agreement
that "we shouldn't be hashing on pointers"


Re: ERROR : pls help

2005-04-07 Thread Dale Johannesen
On Apr 7, 2005, at 11:58 AM, Virender Kashyap wrote:
hi
 i made some changes in gcc code. when i try to comile it using 
make , i get the following error ( last few lines from output ). Pls 
help me in removing this error.
The command line you show is the built compiler trying to build gcc's 
library.
It doesn't work, which means there is a bug in your changes.



Re: Q: C++ FE emitting assignments to global read-only symbols?

2005-04-08 Thread Dale Johannesen
On Apr 8, 2005, at 4:40 PM, Mark Mitchell wrote:
Daniel Berlin wrote:
Your transform is correct.
The FE is not. The variable is not read only.
It is write once, then read-only.
Diego, your analysis is exactly correct about what is happenning.
I agree, in principle.  The C++ FE should not set TREE_READONLY on 
variables that require dynanmic initialization.  Until now, that's not 
been a problem, and it does result in better code.  But, it's now 
becoming a problem, and we have others way to get good code coming 
down the pipe.

I do think the C++ FE needs fixing before Diego's change gets merged, 
though.  I can make the change, but not instantly.  If someone files a 
PR, and assigns to me, I'll get to it at some not-too-distant point.
It would be good to have a way to mark things as "write once, then 
readonly" IMO.
It's very common, and you can do some of the same optimizations on such 
things
that you can do on true Readonly objects.



Re: unreducable cp_tree_equal ICE in gcc-4.0.0-20050410

2005-04-14 Thread Dale Johannesen
On Apr 14, 2005, at 7:14 AM, Andrew Pinski wrote:
Does this bug look familiar?  20629 is ICEing in the same spot, but
it looks like theirs was reproducible after preprocessing.  Is there
any more information that I provide that would be helpful?  I've
attached the command line, specs and a stacktrace from cc1plus.
I think this was fixed on the mainline by:
2005-03-18  Dale Johannese  <[EMAIL PROTECTED]>
* cp/tree.c (cp_tree_equal):  Handle SSA_NAME.
Yep, and I didn't put it in the release branch.  Bad Dale.  OK to do 
that?

If this is the same problem, changing the VN hashtable size to 1
should make it show up reproducibly.


Re: struct __attribute((packed));

2005-04-15 Thread Dale Johannesen
On Apr 15, 2005, at 8:27 AM, E. Weddington wrote:
Ralf Corsepius wrote:
Hi,
I just tripped over this snipped below in a piece of code, I didn't
write and which I don't understand:
...
struct somestruct {
 struct entrystruct *e1 __attribute__ ((packed));
 struct entrystruct *e2 __attribute__ ((packed));
};
...
Is this meaningful?
I guess the author wanted e1 and e2 to point to a "packed struct 
entrystruct", but this doesn't seem to be what GCC
interprets this code.
There is no reason a definition of "struct entrystruct" should even be 
visible at
this point, so that doesn't seem like a very reasonable interpretation.



Re: how small can gcc get?

2005-04-24 Thread Dale Johannesen
On Apr 24, 2005, at 6:43 AM, Mike Stump wrote:
On Saturday, April 23, 2005, at 05:05  PM, Philip George wrote:
What's the smallest size I can squeeze gcc down to and how would I go 
about compiling it in such a way?
My take:
#define optimize 0
"optimize" is a variable and "int 0" won't parse, so that won't come 
close.
What did you really mean?

Turning off optimization is not going to get you the smallest code 
size, since many
optimizations reduce it...the option intended to produce smallest code
is -Os.  Configuring with --disable-checking is also important.

and then rebuild with dead code stripping.  :-)  You'd be the first to 
do this that I know of, so, won't necessarily be easy, but, might be a 
bit smaller than you'd get otherwise.



Re: volatile semantics

2005-05-03 Thread Dale Johannesen
On May 3, 2005, at 7:41 AM, Nathan Sidwell wrote:
Mike Stump wrote:
int avail;
int main() {
  while (*(volatile int *)&avail == 0)
continue;
  return 0;
}
Ok, so, the question is, should gcc produce code that infinitely  
loops, or should it be obligated to actually fetch from memory?   
Hint, 3.3 fetched.
I beleive the compiler is so licensed. [5.1.2.3/2] talks about 
accessing
a volatile object.  If the compiled can determine the actual object
being accessed through a series of pointer and volatile cast 
conversions,
then I see nothing in the std saying it must behave as-if the object
were volatile when it is not.
This is correct; the standard consistently talks about the type of the 
object,
not the type of the lvalue, when describing volatile.

However, as a QOI issue, I believe the compiler should treat the 
reference as
volatile if either the object or the lvalue is volatile.  That is 
obviously the
user's intent.




Re: volatile semantics

2005-05-03 Thread Dale Johannesen
On May 3, 2005, at 11:03 AM, Nathan Sidwell wrote:
Dale Johannesen wrote:
However, as a QOI issue, I believe the compiler should treat the 
reference as
volatile if either the object or the lvalue is volatile.  That is 
obviously the
user's intent.
I'm not disagreeing with you, but I wonder at gcc's ability to make
good on such a promise.  A cast introducing a volatile qualifier
will be a NOP_EXPR, and gcc tends to strip those at every opportunity.
You may well be right, I haven't tried to implement it (and am not 
planning to).

Also, I wonder about the following example
int const avail = 
int main() {
  while (*(int *)&avail == Foo ())
do_something();
  return 0;
}
Seeing through the const-stripping cast is a useful optimization.
It is?  Why would somebody write that?
A further pathelogical case would be,
int main() {
  while (*(int *)(volatile int *)&avail)
do_something ();
  return 0;
}
What should this do, treat the volatile qualifier as sticky?
IMO, no, but surely we don't have to worry about this one.  Either way
is standard conformant and the user's intent is far from clear, so 
whatever
we do should be OK.



Re: volatile semantics

2005-05-03 Thread Dale Johannesen
On May 3, 2005, at 11:21 AM, Paul Koning wrote:
This change bothers me a lot.  It seems likely that this will break
existing code possibly in subtle ways.
It did, that is why Mike is asking about it. :)


Re: volatile semantics

2005-05-03 Thread Dale Johannesen
On May 3, 2005, at 11:52 AM, Nathan Sidwell wrote:
Dale Johannesen wrote:
On May 3, 2005, at 11:03 AM, Nathan Sidwell wrote:

Seeing through the const-stripping cast is a useful optimization.
It is?  Why would somebody write that?
perhaps a function, which returned a non-const reference that
happened to be bound to a constant, has been inlined.
OK, I agree.
IMO, no, but surely we don't have to worry about this one.  Either way
is standard conformant and the user's intent is far from clear, so 
whatever
we do should be OK.
If we guarantee one to work and not the other, we need to have a clear
specification of how they differ.  What if intermediate variables -- 
either
explicit in the program, or implicitly during the optimization -- get
introduced?

My guess is that the wording of the standard might be the best that
could be achieved in this regard.  It would be nice to have some clear
wording indicating that Mike's example will work, but some other, 
possibly
closely related, example will not.
It's not that bad; the type of an lvalue is already well defined (it is 
"int" in
your last example, and "volatile int" in Mike's).  We just take this 
type into
account in determining whether a reference is to be treated as volatile.
(Which means we need to keep track of, or at least be able to find, both
the type of the lvalue and the type of the underlying object.  As you 
say,
gcc may have some implementation issues with this.)

And we don't have to document the behavior at all; it is not documented 
now.



Re: volatile semantics

2005-05-04 Thread Dale Johannesen
On May 4, 2005, at 5:06 AM, Gabriel Dos Reis wrote:
Andrew Haley <[EMAIL PROTECTED]> writes:
| Nathan Sidwell writes:
|  > Dale Johannesen wrote:
|  >
|  > > And we don't have to document the behavior at all; it is not 
documented
|  > > now.
|  > I disagree.  It's not documented explicitly in gcc now, because 
it is doing
|  > what the std permits, and so documented there. We should document 
either
|  >
|  > a) that current gcc is not breaking the std, and Mike's example 
is invalid
|  > code, if one expects a volatile read.  This would be a FAQ like 
thing.
Both behaviors are standard-compliant.  Treating a reference as 
volatile when
you don't have to just means strictly following the rules of the 
abstract machine;
it can never break anything.

I vote for (a).
[...]
| This is a bad extension to gcc and will cause much trouble, just like
| the old guarantee to preserve empty loops.
I see a difference between a documented extension, and quietly choosing
from among standard-compliant behaviors the one which is most 
convenient for
users.



Re: volatile semantics

2005-05-05 Thread Dale Johannesen
On May 5, 2005, at 5:23 AM, Kai Henningsen wrote:
[EMAIL PROTECTED] (Nathan Sidwell)  wrote on 03.05.05 in 
<[EMAIL PROTECTED]>:
Mike Stump wrote:
int avail;
int main() {
  while (*(volatile int *)&avail == 0)
continue;
  return 0;
}
Ok, so, the question is, should gcc produce code that infinitely  
loops,
or should it be obligated to actually fetch from memory?   Hint, 3.3
fetched.
I beleive the compiler is so licensed. [5.1.2.3/2] talks about 
accessing
a volatile object.  If the compiled can determine the actual object
being accessed through a series of pointer and volatile cast 
conversions,
then I see nothing in the std saying it must behave as-if the object
were volatile when it is not.

This, of course, might not be useful to users :)
As a QOI issue, it would be nice if such a situation caused a warning
("ignoring volatile cast ..." or something like that).
It's rather dangerous to have the user believe that this worked as
intended when it didn't.
If we aren't going to make this work as obviously intended, and the 
sentiment
seems to be against it, then this is certainly a good idea.



Re: Proposed resolution to aliasing issue.

2005-05-11 Thread Dale Johannesen
On May 11, 2005, at 11:42 AM, Mark Mitchell wrote:
Kenny and I had a long conversation about the aliasing issue, and 
reached the following proposed solution.

In short, the issue is, when given the following code:
  struct A {...};
  struct B { ...; struct A a; ...; };
  void f() {
B b;
g(&b.a);
  }
does the compiler have to assume that "g" may access the parts of "b" 
outside of "a".  If the compiler can see the body of "g" than it may 
be able to figure out that it can't access any other parts, or figure 
out which parts it can access, and in that case it can of course use 
that information.  The interesting case, therefore, is when the body 
of "g" is not available, or is insufficient to make a conclusive 
determination.

Our proposed approach is to -- by default -- assume that "g" may 
access all of "b".  However, in the event that the corresponding 
parameter to "g" has an attribute (name TBD, possibly the same as the 
one that appears in Danny's recent patch), then we may assume that "g" 
(and its callees) do not use the pointer to obtain access to any 
fields of "b".

For example:
  void g(A *p __attribute__((X)));
  void f() {
B b;
g(&b.a); /* Compiler may assume the rest of b is not accessed
in "g".  */
  }
This approach allows users to annotate code to get better optimization 
while still perserving the behavior of current, possibly conforming, 
progrBams.\
I assume the type of the field is irrelevant (although you chose a 
struct for your example)?

I assume the attribute has both positive and negative forms?
I assume the semantics have nothing to do with B per se, but apply to 
all possible containing structs?
(Mail.app *will* screw this up, please be tolerant):

struct  B { ...; struct A a1; struct A a2; ... };
struct C  { ...; struct A a; ... };
struct D  { ... ; struct B b;   };
  void g(A *p __attribute__((X)), int field_addressed);
  void f() {
B b;  C c;  D d;
g(&b.a1, 1); /* Compiler may assume the rest of b is not accessed 
in "g".  */
g(&b.a2, 2):   /* How about now?  */
g(&c.a, 3);  /* What b?  */
g(&d.b.a1, 4);   /* cannot alter rest of d.b, how about rest of d? 
*/

If, in future, the commitee reaches the conclusion that all functions 
should be treated as if they had the attribute, i.e., that you cannot 
perform the kinds of operations shown above in the example for "g", 
then we will modify the compiler so that, by default, the compiler 
treats all parameters as if they had this atrribute.  We would then 
also add a switch to disable the optimization for people who have 
legacy code, just as we have -fno-strict-aliasing.

[ I did not discuss this with Kenny, but another option is to have a 
-fassume-X switch, off by default, which treats your code as if you 
had the magic attribute everywhere. ]
I'm not so sure an attribute is a good idea.  That's definitely a 
language extension, one way
or another; I'm thinking more along the lines of trying to follow the 
standard, with the problem
being that we can't figure out what it says.  The flag seems cleaner to 
me.  Also certain
optimizations are slightly easier that way, e.g. figuring out whether a 
field can be kept in
a register when another field's address is taken just requires looking 
at the global flag
rather than every call in the function (not a big deal).

The attribute might well be unnecessary, and once it's in it's in 
forever.  And I suspect supporting
different semantics for different calls will create problems down the 
line, somehow or other
(although I confess I don't think of any offhand).



Re: Compiling GCC with g++: a report

2005-05-24 Thread Dale Johannesen

On May 24, 2005, at 9:43 AM, Joe Buck wrote:

On Tue, May 24, 2005 at 05:03:27PM +0200, Andreas Schwab wrote:

Paul Koning <[EMAIL PROTECTED]> writes:

I hope that doesn't require (void *) casts for pointer arguments
passed to the likes of memcpy...


Only the (void*) -> (any*) direction requires a cast in C++, the other
direction is still converted implicitly.


For this reason, I always cast the result of malloc to the proper type;
it just feels wrong otherwise.


Yes, if the cast looks odd to you, you probably don't go back far 
enough.
I've certainly used compilers that warned when you didn't have a cast 
there.




Re: More front end type mismatch problems

2005-05-27 Thread Dale Johannesen


On May 27, 2005, at 11:05 AM, Diego Novillo wrote:


This is happening in gcc.dg/tree-ssa/20040121-1.c.  The test
specifically tests that (p!=0) + (q!=0) should be computed as
int:

char *foo(char *p, char *q) {
int x = (p !=0) + (q != 0);
...
}



Is this program legal C?


!= is defined to produce an int result in C.  This is valid, and
may produce a result of 0, 1, or 2.



Re: Will Apple still support GCC development?

2005-06-08 Thread Dale Johannesen

On Jun 6, 2005, at 12:17 PM, Samuel Smythe wrote:

It is well-known that Apple has been a significant provider of GCC 
enhancements. But it is also probably now well-known that they have 
opted to drop the PPC architecture in favor of an x86-based 
architecture. Will Apple continue to contribute to the PPC-related 
componentry of GCC, or will such contributions be phased out as the 
transition is made to the x86-based systems? In turn, will Apple be 
providing more x86-related contributions to GCC?


Nobody from Apple has yet responded to this because Apple does not 
generally like
its employees to make public statements about future plans.  I have 
been authorized
to say this, however:   Apple will be using gcc as its development 
compiler for producing
Mac OS X Universal Binaries which target both PowerPC and Intel 
architectures.

We will continue to contribute patches to both efforts.



Re: Can't bootstrap mainline on powerpc64-linux

2005-06-09 Thread Dale Johannesen


On Jun 9, 2005, at 12:43 PM, Pat Haugen wrote:

cc1: warnings being treated as errors
/home/pthaugen/work/src/mainline/gcc/gcc/config/rs6000/rs6000.c:12538:
warning: ‘rs6000_invalid_within_doloop’ defined but not used


Problem is Adrian changed TARGET_INSN_VALID_WITHIN_DOLOOP to
TARGET_INVALID_WITHIN_DOLOOP most places, but not in rs6000.c.
I'll commit the following as obvious after bootstrap succeeds.

Index: rs6000.c
===
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v
retrieving revision 1.838
diff -u -b -r1.838 rs6000.c
--- rs6000.c9 Jun 2005 14:23:28 -   1.838
+++ rs6000.c9 Jun 2005 22:46:02 -
@@ -906,8 +906,8 @@
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL rs6000_function_ok_for_sibcall
 
-#undef TARGET_INSN_VALID_WITHIN_DOLOOP
-#define TARGET_INSN_VALID_WITHIN_DOLOOP rs6000_invalid_within_doloop
+#undef TARGET_INVALID_WITHIN_DOLOOP
+#define TARGET_INVALID_WITHIN_DOLOOP rs6000_invalid_within_doloop
 
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS rs6000_rtx_costs


Re: basic VRP min/max range overflow question

2005-06-17 Thread Dale Johannesen


On Jun 17, 2005, at 5:59 PM, Paul Schlie wrote:


From: Andrew Pinski <[EMAIL PROTECTED]>
On Jun 17, 2005, at 8:20 PM, Paul Schlie wrote:


  ["undefined" only provides liberties within the constrains of what
  is specifically specified as being undefined, but none beyond 
that.]


That is not true.  Undefined means it can run "rm /" if you ever 
invoke

the undefined code.


- If the semantics of an operation are "undefined", I'd agree; but if
  control is returned to the program, the program's remaining specified
  semantics must be correspondingly obeyed, including the those which
  may utilize the resulting value of the "undefined" operation.

- If the result value is "undefined", just the value is undefined.

(Unless one advocates that any undefined result implies undefined 
semantics,

which enables anything to occur, including the arbitrary corruption of
the remaining program's otherwise well defined semantics; in which 
case any
invocation of implementation specific behavior may then validly result 
in

arbitrary remaining program behavior.)

Which I'd hope isn't advocated.


You are wrong, and this really isn't a matter of opinion.  The standard 
defines exactly

what it means by "undefined behavior":

3.4.3 1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or 
of erroneous data,

 for which this International Standard imposes no requirements
2 NOTE  Possible undefined behavior ranges from ignoring the situation 
completely with
unpredictable results, to behaving during translation or program 
execution in a documented
manner characteristic of the environment (with or without the issuance 
of a diagnostic message),
to terminating a translation or execution (with the issuance of a 
diagnostic message).




Re: Scheduler questions (related to PR17808)

2005-06-29 Thread Dale Johannesen

On Jun 29, 2005, at 3:46 PM, Steven Bosscher wrote:

I have a question about the scheduler.  Forgive me if I'm totally
missing the point here, this scheduling business is not my thing ;-)

Consider the following snippet that I've derived from PR17808 with a
few hacks in the compiler to renumber insns and dump RTL with all the
dependencies before scheduling.  There is a predicate register that
gets set, then a few cond_exec insns, then a jump, and finally a set
using some of the registers that may be set by the cond_exec insns.
This is the RTL before scheduling:





Notice how the conditional sets of r14 and r17 in insns 9 and 10 have
been moved past insn 14, which uses these registers.  Shouldn't there
be true dependencies on insns 9 and 10 for insn 14?


I think so.  This is figured out in sched_analyze_insn in sched-deps.c, 
I'd

suggest stepping through there.



RFA: -mfpmath=sse -fpic vs double constants

2005-07-07 Thread Dale Johannesen

Compiling a simple function like

double foo(double x)  {   return x+1.0;  }

on x86 with -O2 -march=pentium4 -mtune=prescott -mfpmath=sse -fpic, the 
load of 1.0 is done as


cvtss2sd[EMAIL PROTECTED](%ecx), %xmm0

(this is Linux, the same happens on Darwin).
This is not really a good idea, as movsd of a double-precision 1.0 is 
faster.
The change from double to single precision is done in 
compress_float_constant,
and there's no cost computation there; presumably the RTL optimizers 
are expected

to change it back if that's beneficial.

Without -fpic, this does happen in cse_insn.  (mem/u/i:SF 
(symbol_ref/u:SI ("*.LC0")
gets run through fold_rtx, which recognizes this as a pool constant.  
This causes the

known equivalent CONST_DOUBLE 1.0 to be run through force_const_mem,
producing (mem/u/i:DF (symbol_ref/u:SI ("*.LC1").  Which is then tried 
in place
of the FLOAT_EXTEND, and selected as valid and cheaper.  This all seems 
to

be working as expected.

With -fpic, first, fold_rtx doesn't recognize the PIC form as 
representing a constant,
so cse_insn never tries forcing the CONST_DOUBLE into memory.   Hacking 
around
that doesn't help, because force_const_mem doesn't produce the PIC form 
of

constant reference, even though we're in PIC mode; we get the same
(mem/u/i:DF (symbol_ref/u:SI ("*.LC1"), which doesn't test as valid in 
PIC mode (correctly).


At this point I'm wondering if this is the right place to be attacking 
the problem at all.

Advice?  Thanks.



Re: isinf

2005-07-13 Thread Dale Johannesen

On Jul 13, 2005, at 4:29 PM, Joe Buck wrote:

On Thu, Jul 14, 2005 at 08:16:07AM +0900, Hiroshi Fujishima wrote:

Eric Botcazou <[EMAIL PROTECTED]> writes:


The configure script which is included in rrdtool[1] checks whether
the system has isinf() as below.

#include 
int
main ()
{
float f = 0.0; isinf(f)
  ;
  return 0;
}


The test is clearly fragile.  Assigning the return value of isinf to 
a

variable should be sufficient for 4.0.x at -O0.


Yes, I contact rrdtool maintainer.  Thank you.


Best to make it a global variable, to guard against dead code 
elimination.


Volatile would be even better.  It's valid to eliminate stores into 
globals

if you can determine the value isn't used thereafter, which we can here,
at least theoretically.



gcc vs Darwin memcmp

2005-07-15 Thread Dale Johannesen
Darwin's memcmp has semantics that are an extension of the language 
standard:


 The memcmp() function returns zero if the two strings are 
identical, oth-

 erwise returns the difference between the first two differing bytes
 (treated as unsigned char values, so that `\200' is greater than 
`\0',

 for example).

gcc's x86 inline expansion of memcmp doesn't do this, so I need to fix 
it.  Is there
interest in having this in mainline, and if so how would you like it 
controlled?




Re: volatile semantics

2005-07-16 Thread Dale Johannesen


On Jul 16, 2005, at 10:34 AM, Andrew Haley wrote:

6.3.2.1:  when an object is said to have a particular type, the type 
is

specified by the lvalue used to designate the object.


I don't have a standard here, but I will point out that IF this 
sentence is

interpreted to mean


 the type of an object
changes depending on how it is accessed.


this also makes nonsense of gcc's implementation of type-based aliasing
rules.

  *((int *)&x) = 3

would then be valid whatever the type of x.



-malign-double vs __alignof__(double)

2005-07-20 Thread Dale Johannesen
While fighting with the x86-darwin alignment rules, I noticed that 
-malign-double
doesn't seem to affect __alignof__(double).  This seems like a bug, but 
the
alignof doc has so many qualifications I'm not sure exactly what it's 
supposed to

do.  Is this broken?  Thanks.



Re: splitting load immediates using high and lo_sum

2005-07-21 Thread Dale Johannesen


On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote:


Hi,

I am working on a port for a processor that has 32 bit registers but 
can

only load 16 bit immediates.
  ""
  "%0.h = #HI(%1)")


What are the semantics of this?  Low bits zeroed, or untouched?
If the former, your semantics are identical to Sparc; look at that.



RFA: Darwin x86 alignment

2005-07-21 Thread Dale Johannesen

On x86 currently the alignments of double and long long are linked:
they are either 4 or 8 depending on whether -malign-double is set.
This follows the documentation of -malign-double.  But it's wrong for
what we want the Darwin ABI to be:  the default should be that double
is 4 bytes and long long is 8 bytes.

So I can do that, but what should -malign-double do?
- Control double but not long long; add -malign-long-long (at least if
   somebody asks for it; probably it wouldn't be used)
- Have flags work as now:  -malign-double makes both 8, 
-mno-align-double
  makes both 4.  Problem with that is the default is neither of these, 
and
  this doesn't fit neatly into gcc's model of two-valued flags; it's 
also a bit

  tricky to implement for the same reason.
- something else?

thanks.



Re: splitting load immediates using high and lo_sum

2005-07-21 Thread Dale Johannesen


On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote:


From: Dale Johannesen [mailto:[EMAIL PROTECTED]

On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote:


Hi,

I am working on a port for a processor that has 32 bit registers but
can
only load 16 bit immediates.
  ""
  "%0.h = #HI(%1)")


What are the semantics of this?  Low bits zeroed, or untouched?
If the former, your semantics are identical to Sparc; look at that.


The low bits are untouched.  However, I would expect the compiler to
always follow setting the high bits with setting the low bits.


OK, if you're willing to accept that limitation (your architecture could
handle putting the LO first, which Sparc can't) then Sparc is still a
good model to look at.  What it does should work for you.



Re: RFA: Darwin x86 alignment

2005-07-21 Thread Dale Johannesen


On Jul 21, 2005, at 5:00 PM, Richard Henderson wrote:


On Thu, Jul 21, 2005 at 04:56:01PM -0700, Dale Johannesen wrote:

- Have flags work as now:  -malign-double makes both 8,
-mno-align-double
  makes both 4.  Problem with that is the default is neither of these,
and
  this doesn't fit neatly into gcc's model of two-valued flags; it's
also a bit
  tricky to implement for the same reason.


Nah, you just remove it from target_flags, and control the two
new variables from ix86_handle_option.


OK.  Think that's the better approach?


Why do you want to make these sort of arbitrary changes to your
ABI?  I can't see what you win...


The compiler people are not driving this.

Of course, 4-byte alignment subjects you to a penalty for misaligned
loads and stores, and 8-byte alignment subjects you to a size penalty
for extra holes.   People have been making measurements about the
issue and this is what they've come up with; I don't know details.
What I wrote isn't necessarily the final change, either.



Re: Minimum target alignment for a datatype

2005-07-22 Thread Dale Johannesen


On Jul 22, 2005, at 11:07 AM, Chris Lattner wrote:



Hi All,

I'm trying to determine (in target-independent code) what the 
*minimum* target alignment of a type is.  For example, on darwin, 
double's are normally 4-byte aligned, but are 8-byte aligned in some 
cases (e.g. when they are the first element of a struct).  TYPE_ALIGN 
on a double returns 8 bytes, is there any way to find out that they 
may end up being aligned to a 4-byte boundary?


#pragma pack or attribute((__aligned__)) can result in arbitrary 
misalignments for any type.




Re: RFA: Darwin x86 alignment

2005-07-23 Thread Dale Johannesen


On Jul 23, 2005, at 6:40 AM, Tobias Schlüter wrote:

I have a strong suspicion there is a reason why the two are linked,
and that that reason is FORTRAN.  A lot of FORTRAN code assumes
EQUIVALENCE of floating-point and integer types of equal size.  Such
code will in all likelyhood break if those types have different
alignment.  For x86 this means that int/float and long long/double
will have to have the same alignment.


This might indeed be a problem, as the alignments not only have to be 
the same

if they appear in an equivalence, but also in arrays or when using the
TRANSFER intrinsic.  Out of the types dicussed, the standard only 
specifies
this for default INTEGERs (=int in C) and default REALs (=float in C), 
but
users do expect this to consistently extend to bigger types, otherwise 
they

consider the compiler buggy instead of their code.


Thanks for bringing this up.  It's probably true that nobody has 
thought about Fortran,
but so far I'm not convinced it would actually be a problem.  Can 
somebody provide

an example that would break?

More precisely, the standard says this: a scalar variable of a certain 
type
occupies a certain number of "storage units".  Default INTEGERs, and 
REALs
take one storage unit, default COMPLEX and DOUBlE PRECISION (= REAL*8 
= double
in C) take two storage units.  Finally, arrays of these types take a 
sequence

of contiguous storage units.


I know.  These rules aren't affected by target alignments, though, and 
I would
not expect the Fortran FE to be affected by alignments when doing 
layout.   If it is, why?

The compiler already has to deal with misaligned data in Fortran:

  INTEGER I(3)
  DOUBLE PRECISION A,B
  EQUIVALENCE (A,I(1)), (B,I(2))

not to mention the user-specified alignment extensions in C, so I 
wouldn't expect

the optimizers to break or anything like that.



Re: [BUG] gcc-3.4.5-20050531 (i386): __FUNCTION__ as a part of the printf's format argument

2005-07-25 Thread Dale Johannesen


On Jul 25, 2005, at 1:58 AM, Paolo Carlini wrote:


Richard Guenther wrote:


Btw, this list is for the development _of_ gcc, not with gcc.
Use gcc-help for that.



By the way, since we have to point out that *so often*, maybe there is
something wrong on our part: I wonder whether changing the names of
those lists would help!?!? I don't know: gcc-development, gcc-users, 
...


Perhaps adding something similar to the above to the description
of the gcc list on the web page would help.  What's there seems clear
enough to me, but perhaps a bigger hammer would help other people.



Re: gcc 3.3.6 - stack corruption questions

2005-07-25 Thread Dale Johannesen

O Jul 25, 2005, at 3:50 PM, Robert Dewar wrote:

The unoptimized version completed a 401,900 transaction test with no
problem.  All day, I've been playing with different things,


there are many bugs, most notably uninitialed vars, that show
up only when you turn on optimization.


Also violations of strict aliasing rules are common.  -Wuninitialized
-fno-strict-aliasing [after the -O] will exercise those two.   Also,
mixed builds with some -O0 and some -O3 files should
narrow it down.



rfa (x86): 387<=>sse moves

2005-07-25 Thread Dale Johannesen
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code 
like


double d = atof(foo);
int i = d;

callatof
fstpl   -8(%ebp)
movsd   -8(%ebp), %xmm0
cvttsd2si   %xmm0, %eax

(This is Linux, Darwin is similar.)  I think the difficulty is that for

(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}

regclass decides SSE_REGS is a zero-cost choice for 58.  Which looks
wrong, as that requires a store and load from memory.  In fact, memory 
is
the cheapest overall choice for 58 (taking its use into account also), 
and
gcc will figure that out correctly if a more reasonable assessment is 
given

to SSE_REGS.  The immediate cause is the #Y's in the constraint:

"=f#Y,m  ,f#Y,*r  ,o  ,Y*x#f,Y*x#f,Y*x#f  ,m
"


and there's probably a simple fix, but it eludes me.  Advice?  Thanks.



Re: rfa (x86): 387<=>sse moves

2005-07-26 Thread Dale Johannesen

On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:

Dale Johannesen wrote:
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code 
like

double d = atof(foo);
int i = d;
callatof
fstpl   -8(%ebp)
movsd   -8(%ebp), %xmm0
cvttsd2si   %xmm0, %eax
(This is Linux, Darwin is similar.)  I think the difficulty is that 
for



(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}


Try the attached patch.  It gave a 3% speedup on -mfpmath=sse for 
tramp3d.  Richard Henderson asked for SPEC testing, then it may go in.


Thanks.  That's progress; the cost computation in regclass now figures 
out that memory

is that fastest place to put R58:

  Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:5 FLOAT_REGS:5 SSE_REGS:5 
FP_TOP_SSE_REGS:75000

FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:4

Unfortunately local-alloc insists on putting in a register anyway 
(ST(0) instead of an XMM,

but the end codegen is unchanged):

;; Register 58 in 8.

I think the RA may be missing the concept that memory might be faster 
than any possible register

will dig further.



Re: rfa (x86): 387<=>sse moves

2005-07-26 Thread Dale Johannesen

On Jul 26, 2005, at 3:34 PM, Dale Johannesen wrote:


I think the RA may be missing the concept that memory might be faster 
than any possible register

will dig further.


Yes, it is.  The following fixes my problem, and causes a couple of 
3DNow-specific regressions
in the testsuite which I need to look at, but nothing serious; I think 
it's gotten far enough to post

for opinions.  This is intended to go on top of Paolo's patch
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
It may, of course, run afoul of inaccuracies in the patterns on various 
targets, haven't

tried any performance testing yet.

Index: regclass.c
===
RCS file: /cvs/gcc/gcc/gcc/regclass.c,v
retrieving revision 1.206
diff -u -b -r1.206 regclass.c
--- regclass.c  25 Jun 2005 02:00:52 -  1.206
+++ regclass.c  27 Jul 2005 06:04:40 -
@@ -838,7 +838,8 @@
 /* Structure used to record preferences of given pseudo.  */
 struct reg_pref
 {
-  /* (enum reg_class) prefclass is the preferred class.  */
+  /* (enum reg_class) prefclass is the preferred class.  May be
+ NO_REGS if no class is better than memory.  */
   char prefclass;
 
   /* altclass is a register class that we should use for allocating
@@ -1321,6 +1322,10 @@
best = reg_class_subunion[(int) best][class];
}
 
+ /* If no register class is better than memory, use memory. */
+ if (p->mem_cost < best_cost)
+   best = NO_REGS;
+
  /* Record the alternate register class; i.e., a class for which
 every register in it is better than using memory.  If adding a
 class would make a smaller class (i.e., no union of just those
@@ -1528,7 +1533,7 @@
 to what we would add if this register were not in the
 appropriate class.  */
 
- if (reg_pref)
+ if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS)
alt_cost
  += (may_move_in_cost[mode]
  [(unsigned char) reg_pref[REGNO (op)].prefclass]
@@ -1754,7 +1759,7 @@
 to what we would add if this register were not in the
 appropriate class.  */
 
- if (reg_pref)
+ if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS)
alt_cost
  += (may_move_in_cost[mode]
  [(unsigned char) reg_pref[REGNO (op)].prefclass]
@@ -1840,7 +1845,8 @@
  int class;
  unsigned int nr;
 
- if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0)
+ if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0
+ && reg_pref[regno].prefclass != NO_REGS)
{
  enum reg_class pref = reg_pref[regno].prefclass;
 




Re: rfa (x86): 387<=>sse moves

2005-07-27 Thread Dale Johannesen


On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:


On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:

Yes, it is.  The following fixes my problem, and causes a couple of
3DNow-specific regressions
in the testsuite which I need to look at, but nothing serious; I think
it's gotten far enough to post
for opinions.  This is intended to go on top of Paolo's patch
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
It may, of course, run afoul of inaccuracies in the patterns on 
various

targets, haven't tried any performance testing yet.


Looks plausible.  Let us know what you wind up with wrt those
regressions and testing.


With the latest version of Paolo's patch (in PR 19653) the regressions
are gone.  Spec is going to take a bit longer, I haven't gotten GMP to
build yet on x86 Darwinsince the FP benchmarks are the interesting
ones for this I should work through it.





Re: rfa (x86): 387<=>sse moves

2005-07-29 Thread Dale Johannesen

On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:

On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:

Yes, it is.  The following fixes my problem, and causes a couple of
3DNow-specific regressions
in the testsuite which I need to look at, but nothing serious; I think
it's gotten far enough to post
for opinions.  This is intended to go on top of Paolo's patch
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html
It may, of course, run afoul of inaccuracies in the patterns on 
various

targets, haven't tried any performance testing yet.


Looks plausible.  Let us know what you wind up with wrt those
regressions and testing.


OK, I've tested this on darwin x86 (both patches together).  No 
regressions.
I don't think I ought to publish absolute Spec numbers for this 
machine, but

I get +1% on FP and +1/2% on Int.   Wins:  applu +3%, lucas +10%,
eon +3%.  Losses:  apsi -9%.  All other changes under 2%.  This looks
OK to me, though I'll be investigating apsi.
(Paolo and Richard Guenther are doing this for Linux.)



Re: rfa (x86): 387<=>sse moves

2005-08-01 Thread Dale Johannesen


On Jul 31, 2005, at 9:51 AM, Uros Bizjak wrote:


Hello!

With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code 
like


   double d = atof(foo);
   int i = d;


   callatof
   fstpl   -8(%ebp)
   movsd   -8(%ebp), %xmm0
   cvttsd2si   %xmm0, %eax


(This is Linux, Darwin is similar.) I think the difficulty is that for


This problem is similar to the problem, described in PR target/19398. 
There is another testcase and a small analysis in the PR that might 
help with this problem.


Thanks, that does seem relevant.  The patches so far don't fix this 
case;

I've commented the PR explaining why.



Re: [RFC] - Regression exposed by recent change to compress_float_constant

2005-08-10 Thread Dale Johannesen

On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote:

Following patch has exposed an optimization shortcoming:

2005-07-12  Dale Johannesen  <[EMAIL PROTECTED]>

* expr.c (compress_float_constant):  Add cost check.
* config/rs6000.c (rs6000_rtx_cost):  Adjust FLOAT_EXTEND cost.

This patch results in generating worse code for the following test case:

1) Test case:

struct S {
float d1, d2, d3;

I believe you mean double not float; the RTL snippets you give indicate this.

(insn 12 7 13 0 (set (reg:SF 59)
(mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil)
(nil))

(insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32])
(float_extend:DF (reg:SF 59))) -1 (nil)
(nil))

However, if you try your example with float as given, you see it does not do a 
direct store of constant 0 with or without the compress_float patch.  IMO the
compress_float patch does not really have anything to do with this problem;
before this patch the double case was working well by accident, my patch
exposed a problem farther downstream, which was always there for the float
case.

When I put that patch in, rth remarked:

While I certainly wouldn't expect fold_rtx to find out about this
all by itself, I'd have thought that there would have been a
REG_EQUIV or REG_EQUAL note that indicates that the end result is
the constant (const_double:DF 1.0), and use that in any simplification.

Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this.


Re: [RFC] - Regression exposed by recent change to compress_float_constant

2005-08-11 Thread Dale Johannesen
Fariborz is having trouble with his mailer and has asked me to forward his response.

On Aug 10, 2005, at 2:35 PM, Dale Johannesen wrote:
On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote:
Following patch has exposed an optimization shortcoming:

2005-07-12  Dale Johannesen  <[EMAIL PROTECTED]>

        * expr.c (compress_float_constant):  Add cost check.
        * config/rs6000.c (rs6000_rtx_cost):  Adjust FLOAT_EXTEND cost.

This patch results in generating worse code for the following test case:

1) Test case:

struct S {
        float d1, d2, d3;


I believe you mean double not float; the RTL snippets you give indicate this.

Yes, it is double. Copied the wrong test.

(insn 12 7 13 0 (set (reg:SF 59)
        (mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil)
    (nil))

(insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32])
        (float_extend:DF (reg:SF 59))) -1 (nil)
    (nil))


However, if you try your example with float as given, you see it does not do a 
direct store of constant 0 with or without the compress_float patch.  IMO the
compress_float patch does not really have anything to do with this problem;

Yes. Title says Regression 'exposed' by  But as my email pointed out, float_extend is substituted in cse. So, this is another case of change in rtl pattern breaks an optimization down the road. I don't know if this is a regression or exposition of a lurking bug.

before this patch the double case was working well by accident, my patch
exposed a problem farther downstream, which was always there for the float
case.

Yes. I mentioned that in my email.

- fariborz

When I put that patch in, rth remarked:

While I certainly wouldn't expect fold_rtx to find out about this
all by itself, I'd have thought that there would have been a
REG_EQUIV or REG_EQUAL note that indicates that the end result is
the constant (const_double:DF 1.0), and use that in any simplification.

Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this.


Inlining vs the stack

2005-08-12 Thread Dale Johannesen
We had a situation come up here where things are like this (simplified, 
obviously):


c() { char x[100]; }
a() { b(); c(); }
b() { a(); c(); }

c() is a leaf.  Without inlining, no problem.  WIth c() inlined into 
a() and/or b(),
a few mutually recursive calls to a() and b() blow out the stack.  It's 
not clear
the inliner should try to do anything about this, but I think it's 
worth discussing.
The inliner can't detect the recursive loop in the general case, since 
it might
be split across files, so the thing to do would be put some 
(target-OS-dependent)
limit on local stack usage of the inlinee.  Right now there's no such 
check.




Re: Inlining vs the stack

2005-08-12 Thread Dale Johannesen


On Aug 12, 2005, at 12:25 PM, Paul Koning wrote:


"Mike" == Mike Stump <[EMAIL PROTECTED]> writes:


 Mike> On Aug 12, 2005, at 10:39 AM, Dale Johannesen wrote:

We had a situation come up here where things are like this
(simplified, obviously):

c() { char x[100]; }


 Mike> I think we should turn off inlining for functions > 100k stack
 Mike> size.  (Or maybe 500k, if you want).

Why should stack size be a consideration?  Code size I understand, but
stack size doesn't seem to matter.


Sometimes it matters, as in the original example:

c() { char x[100]; }
a() { b(); c(); }
b() { a(); c(); }



Fwd: [RFC] - Regression exposed by recent change to compress_float_constant

2005-08-19 Thread Dale Johannesen
Fariborz is still having problems with his mailer and has asked me to forward this.

On Aug 10, 2005, at 2:35 PM, Dale Johannesen wrote:

On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote:


Following patch has exposed an optimization shortcoming:

2005-07-12 Dale Johannesen <[EMAIL PROTECTED]>

  * expr.c (compress_float_constant): Add cost check.
  * config/rs6000.c (rs6000_rtx_cost): Adjust FLOAT_EXTEND cost.

This patch results in generating worse code for the following test case:

1) Test case:

struct S {
  float d1, d2, d3;


I believe you mean double not float; the RTL snippets you give indicate this.


(insn 12 7 13 0 (set (reg:SF 59)
  (mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil)
  (nil))

(insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32])
  (float_extend:DF (reg:SF 59))) -1 (nil)
  (nil))


However, if you try your example with float as given, you see it does not do a 
direct store of constant 0 with or without the compress_float patch. IMO the
compress_float patch does not really have anything to do with this problem;
before this patch the double case was working well by accident, my patch
exposed a problem farther downstream, which was always there for the float
case.

When I put that patch in, rth remarked:

While I certainly wouldn't expect fold_rtx to find out about this
all by itself, I'd have thought that there would have been a
REG_EQUIV or REG_EQUAL note that indicates that the end result is
the constant (const_double:DF 1.0), and use that in any simplification.

Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this.
It turned out that cse does put REG_EQUIV on the insn which sets load of "LC0" to the register. So, no need to do this. It also tells me that cse is expected to use this information to do the constant propagation (which in the example test case is the next few insns). Attached patch accomplishes this task. It is against apple local branch. It has been bootstrapped and dejagnu tested on x86-darwin, ppc-darwin. Note that patch is similar to the code right before it (which is also shown in this patch), so there is a precedence for this type of a fix. If this looks reasonable, I will prepare an FSF patch.

ChangeLog:

2005-08-19  Fariborz Jahanian <[EMAIL PROTECTED]>

        * cse.c (cse_insn): Use the constant to propagte
        into the rhs of a set insn which is a register.
        This is cheaper.



Index: cse.c
===
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.342.4.3
diff -c -p -r1.342.4.3 cse.c
*** cse.c   5 Jul 2005 23:21:50 -   1.342.4.3
--- cse.c   19 Aug 2005 18:21:56 -
*** cse_insn (rtx insn, rtx libcall_insn)
*** 5455,5460 
--- 5455,5469 
if (dest == pc_rtx && src_const && GET_CODE (src_const) == LABEL_REF)
src_folded = src_const, src_folded_cost = src_folded_regcost = -1;
  
+   /* APPLE LOCAL begin radar 4153339 */
+   if (n_sets == 1 && GET_CODE (sets[i].src) == REG 
+ && src_const && GET_CODE (src_const) == CONST_DOUBLE)
+   {
+ src_folded = src_const;
+ src_folded_cost = src_folded_regcost = -1;
+   }
+   /* APPLE LOCAL end radar 4153339 */
+ 
/* Terminate loop when replacement made.  This must terminate since
   the current contents will be tested and will always be valid.  */
while (1)
Index: testsuite/ChangeLog.apple-ppc
===
RCS file: /cvs/gcc/gcc/gcc/testsuite/Attic/ChangeLog.apple-ppc,v
retrieving revision 1.1.4.88
diff -c -p -r1.1.4.88 ChangeLog.apple-ppc
*** testsuite/ChangeLog.apple-ppc   15 Aug 2005 21:02:26 -  1.1.4.88
--- testsuite/ChangeLog.apple-ppc   19 Aug 2005 18:21:59 -
***
*** 1,3 
--- 1,8 
+ 2005-08-18  Fariborz Jahanian <[EMAIL PROTECTED]>
+ 
+   Radar 4153339
+   * gcc.dg/i386-movl-float.c: New.
+ 
  2005-08-15  Devang Patel  <[EMAIL PROTECTED]>
  
Radar 4209318
Index: testsuite/gcc.dg/i386-movl-float.c
===
RCS file: testsuite/gcc.dg/i386-movl-float.c
diff -N testsuite/gcc.dg/i386-movl-float.c
*** /dev/null   1 Jan 1970 00:00:00 -
--- testsuite/gcc.dg/i386-movl-float.c  19 Aug 2005 18:22:03 -
***
*** 0 
--- 1,15 
+ /* APPLE LOCAL begin radar 4153339 */
+ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O1 -mdynamic-no-pic -march=pentium4 -mtune=prescott" } */
+ /* { dg-final { scan-assembler-times "movl\[^\\n\]*" 8} } */
+ 
+ struct S {
+   double d1, d2, d3;
+ };
+ 
+ struct S ms()
+ {
+   struct S s = {0,0,0};
+   return s;
+ }
+ /* APPLE LOCAL end radar 4153339 */



Bug in builtin_floor optimization

2005-08-22 Thread Dale Johannesen

There is some clever code in convert_to_real that converts

double d;
(float)floor(d)

to

floorf((float)d)

(on targets where floor and floorf are considered builtins.)
This is wrong, because the (float)d conversion normally uses
round-to-nearest and can round up to the next integer.  For example:

double d = 1024.0 - 1.0 / 32768.0;
extern double floor(double);
extern float floorf(float);
extern int printf(const char*, ...);

int main() {

double df = floor(d);
float f1 = (float)floor(d);

printf("floor(%f) = %f\n", d, df);
printf("(float)floor(%f) = %f\n", d, f1);

return 0;
}



with -O2.
The transformation is also done for ceil, round, rint, trunc and 
nearbyint.
I'm not a math guru, but it looks like ceil, rint, trunc and nearbyint 
are

also unsafe for this transformation.  round may be salvageable.
Comments?  Should I preserve the buggy behavior with -ffast-math?


Re: Bug in builtin_floor optimization

2005-08-23 Thread Dale Johannesen


On Aug 23, 2005, at 9:53 AM, Richard Henderson wrote:


On Tue, Aug 23, 2005 at 09:28:50AM -0600, Roger Sayle wrote:

Good catch.  This is indeed a -ffast-math (or more precisely a
flag_unsafe_math_optimizations) transformation.  I'd prefer to
keep these transformations with -ffast-math, as Jan described them
as significantly helping SPEC's mesa when they were added.


Are you sure it was "(float)floor(d)"->"floorf((float)d)" that
helped mesa and not "(float)floor((double)f)"->"floorf(f)" ?


All the floor calls in mesa seem to be of the form (int)floor((double)f)
or (f - floor((double)f)).  (the casts to double are implicit, 
actually.)



It wouldn't bother me if the first transformation went away
even for -ffast-math.  It seems egregeously wrong.


I think I'd prefer this, given that it is not useful in mesa.  Will put
together a patch.



RFC: bug in combine

2005-08-25 Thread Dale Johannesen

The following demonstrates a bug in combine
(x86 -mtune=pentiumpro -O2):

struct Flags {
 int filler[18];
 unsigned int a:14;
 unsigned int b:14;
 unsigned int c:1;
 unsigned int d:1;
 unsigned int e:1;
 unsigned int f:1;
};
extern int bar(int), baz();
int foo (struct Flags *f) {
  if (f->b > 0)
return bar(f->d);
  return baz();
}



The test of f->b comes out as

  testl  $1048512, 73(%eax)

This is wrong, because 4 bytes starting at 73 goes outside the original 
object and can
cause a page fault.  The change from referencing a word at offset 72 to 
offset 73

happens in make_extraction in combine, and I propose to fix it thus:

Index: combine.c
===
RCS file: /cvs/gcc/gcc/gcc/combine.c,v
retrieving revision 1.502
diff -u -b -c -3 -p -r1.502 combine.c
cvs diff: conflicting specifications of output style
*** combine.c   8 Aug 2005 18:30:09 -   1.502
--- combine.c   25 Aug 2005 17:57:21 -
*** make_extraction (enum machine_mode mode,
*** 6484,6491 
  && GET_MODE_SIZE (inner_mode) < GET_MODE_SIZE (is_mode))
offset -= GET_MODE_SIZE (is_mode) - GET_MODE_SIZE (inner_mode);
  
!   /* If this is a constant position, we can move to the desired byte.  */
!   if (pos_rtx == 0)
{
  offset += pos / BITS_PER_UNIT;
  pos %= GET_MODE_BITSIZE (wanted_inner_mode);
--- 6484,6493 
  && GET_MODE_SIZE (inner_mode) < GET_MODE_SIZE (is_mode))
offset -= GET_MODE_SIZE (is_mode) - GET_MODE_SIZE (inner_mode);
  
!   /* If this is a constant position, we can move to the desired byte.
!This is unsafe for memory objects; it might result in accesses
!outside the original object.  */
!   if (pos_rtx == 0 && !MEM_P (inner))
{
  offset += pos / BITS_PER_UNIT;
  pos %= GET_MODE_BITSIZE (wanted_inner_mode);



Still testing, but I'm a bit concerned this is overkill.  Are there 
targets/situations where

this transformation is useful or even necessary?  Comments?


doloop-opt deficiency

2005-08-29 Thread Dale Johannesen

We noticed that the simple loop here
extern int a[];
int foo(int w) {
  int n = w;
  while (n >= 512)
{
a[n] = 42;
n -= 256;
}
  }



was being treated as ineligible for the doloop modification.  I think 
this is

a simple pasto; this code was evidently copied from the previous block:

Index: loop-iv.c
===
RCS file: /cvs/gcc/gcc/gcc/loop-iv.c,v
retrieving revision 2.35
diff -u -b -c -p -r2.35 loop-iv.c
cvs diff: conflicting specifications of output style
*** loop-iv.c   21 Jul 2005 07:24:07 -  2.35
--- loop-iv.c   29 Aug 2005 23:34:12 -
*** iv_number_of_iterations (struct loop *lo
*** 2417,2423 
  tmp0 = lowpart_subreg (mode, iv0.base, comp_mode);
  tmp1 = lowpart_subreg (mode, iv1.base, comp_mode);
  
! bound = simplify_gen_binary (MINUS, mode, mode_mmin,
   lowpart_subreg (mode, step, comp_mode));
  if (step_is_pow2)
{
--- 2417,2423 
  tmp0 = lowpart_subreg (mode, iv0.base, comp_mode);
  tmp1 = lowpart_subreg (mode, iv1.base, comp_mode);
  
! bound = simplify_gen_binary (PLUS, mode, mode_mmin,
   lowpart_subreg (mode, step, comp_mode));
  if (step_is_pow2)
{



The code as it was computed -2147483648-256 which overflows.
Still testing, but is there anything obvious wrong with this?


Re: doloop-opt deficiency

2005-08-30 Thread Dale Johannesen
extern int a[];
int foo(int w) {
int n = w;
while (n >= 512)
{
a[n] = 42;
n -= 256;
}
}

On Aug 30, 2005, at 9:25 AM, Sebastian Pop wrote:

Thanks for looking at this.  But...

Dale Johannesen wrote:
I think this is a simple pasto; this code was evidently copied from
the previous block:

I don't think that this was a simple pasto.  The code looks correct.
We have the same code in tree-ssa-loop-niter.c around line 436, since
we inherited this code from the rtl-level.

No, look closer.  The version in loop-iv.c does a NEG of 'step'  just before what's
shown here.  The version in tree-ssa-loop-niter.c doesn't.   Reversing the
operator does make them do the same thing.  As a sanity check, try the same
loop going the other direction:

extern int a[];
int foo(int w) {
int n = w;
while (n <= 512)
{
a[n] = 42;
n += 256;
}
}

and you'll see it does do the doloop transformation.


Re: rtl line no

2005-09-11 Thread Dale Johannesen


On Sep 11, 2005, at 8:09 AM, shreyas krishnan wrote:


Hi,
  Can anyone tell me if there is a way to find out roughly the
source line no of a particular rtl instruction (if there is ) ? I
believe tree has a link to the source line no, in which case how do I
find out the source tree node for a particular rtl stmt ?


See INSN_LOCATOR and locator_line().



RFA: pervasive SSE codegen inefficiency

2005-09-14 Thread Dale Johannesen

Consider the following SSE code
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)
#include 
__m128i foo3(__m128i z, __m128i a, int N) {
int i;
for (i=0; i


The first inner loop compiles to

paddq   %xmm0, %xmm1

Good.  The second compiles to

movdqa  %xmm2, %xmm0
paddw   %xmm1, %xmm0
movdqa  %xmm0, %xmm1

when it could be using a single paddw.  The basic problem is that
our approach defines __m128i to be V2DI even though all the operations
on the object are V4SI, so there are a lot of subreg's that don't need
to generate code.  I'd like to fix this, but am not sure how to go 
about it.

The pattern-matching and RTL optimizers seem quite hostile to mismatched
mode operations.  If I were starting from scratch I'd define a single 
V128I mode

and distinguish paddw and paddq by operation codes, or possibly by using
subreg:SSEMODEI throughout the patterns.  Any less intrusive ideas?  
Thanks.


(ISTR some earlier discussion about this but can't find it; apologies if
I'm reopening something that shouldn't be:)


Re: RFA: pervasive SSE codegen inefficiency

2005-09-15 Thread Dale Johannesen


On Sep 14, 2005, at 9:50 PM, Andrew Pinski wrote:

On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote:

Consider the following SSE code
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)
<4256776a.c>

The first inner loop compiles to

paddq   %xmm0, %xmm1

Good.  The second compiles to

movdqa  %xmm2, %xmm0
paddw   %xmm1, %xmm0
movdqa  %xmm0, %xmm1

when it could be using a single paddw.  The basic problem is that
our approach defines __m128i to be V2DI even though all the operations
on the object are V4SI, so there are a lot of subreg's that don't need
to generate code.  I'd like to fix this, but am not sure how to go 
about it.


From real looks of this looks more like a register allocation issue and
nothing to do with subregs at all, except subregs being there.


That's kind of an overstatement; obviously getting rid of the subregs 
would
solve the problem as you can see from the first function.  I think 
you're right that


If we allocated 64 and 63 as the same register, it would have worked 
correctly.


(you mean 64 and 66) would fix this example; I'll look at that.  Having 
a more
uniform representation for operations on __m128i objects would simplify 
things

all over the place, though.



Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Dale Johannesen

Just to review, the second function here was the problem:
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)
#include 
__m128i foo3(__m128i z, __m128i a, int N) {
int i;
for (i=0; i


where the inner loop compiles to

movdqa  %xmm2, %xmm0
paddw   %xmm1, %xmm0
movdqa  %xmm0, %xmm1

instead of a single paddw.  Response was that I should look at the 
register allocator.

OK.  Rtl coming in looks like:

R70:v8hi  <-  R59:v8hi + subreg:v8hi (R66:v2di)
R66:v2di <- subreg:v2di(R70:v8hi)

where R70 is used only in these 2 insns, and R66 is live on entry and 
exit to the loop.
First, local-alloc picks a hard reg (R21) for R70.  Global has some 
code that tries to assign
R66 to the same hard regs as things that R66 is copied to 
(copy_preference); that code
doesn't look under subregs, so isn't triggered in this rtl.  It's 
straightforward to extend this
code to look under subregs, and that works for this example.  (Although 
just which subregs
are safe to look under will require more attention than I've given it, 
if we want this in.)


However, that's not the whole problem.  When we have two accumulators 
in the loop:


#include 

__m128i foo1(__m128i z, __m128i a, __m128i b, int N) {
int i;
for (i=0; i


R70:v8hi  <-  R59:v8hi + subreg:v8hi (R66:v2di)
R66:v2di <- subreg:v2di(R70:v8hi)
R72:v8hi  <-  R61:v8hi + subreg:v8hi (R68:v2di)
R68:v2di <- subreg:v2di(R72:v8hi)

local-alloc assigns the same reg (R21) to R70 and R72.  This means R21 
conflicts with
both R66 and R68, so is not considered for either of them, and the 
copy_preference
optimization isn't invoked.   I don't see a way to fix that in global.  
 Doing round-robin
allocation in local-alloc would alleviate that...for a while, until the 
block gets big

enough that registers are reused; that's not a complete solution.

Really I don't think this is an RA problem at all.  We ought to be able 
to combine these
patterns no matter what the RA does.  The following pattern makes 
combine do it:


(define_insn "*addmixed3"
  [(set (match_operand:V2DI 0 "register_operand" "=x")
(subreg:V2DI (plus:SSEMODE124
  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")
  (subreg:SSEMODE124 (match_operand:V2DI 1 "nonimmediate_operand" "%0") 
0)) 0))]
  "TARGET_SSE2 && ix86_binary_operator_ok (PLUS, mode, operands)"
  "padd\t{%2, %0|%0, %2}"
  [(set_attr "type" "sseiadd")
   (set_attr "mode" "TI")])



I'm not very happy about this because it's really not an x86 problem 
either, at least in
theory, but flushing the problem down to the RA doesn't look 
profitable.  Comments?


Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Dale Johannesen


On Sep 19, 2005, at 5:30 PM, Richard Henderson wrote:

(define_insn "*addmixed3"
  [(set (match_operand:V2DI 0 "register_operand" "=x")
(subreg:V2DI (plus:SSEMODE124
  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")
	  (subreg:SSEMODE124 (match_operand:V2DI 1 "nonimmediate_operand" 
"%0") 0)) 0))]


I absolutely will not allow you do add 5000 of these patterns.
Which is what you'll need if you think you'll be able to solve
the problem this way.


Do you have any constructive suggestions for how the RA might be fixed, 
then?




Re: RFA: pervasive SSE codegen inefficiency

2005-09-20 Thread Dale Johannesen

On Sep 19, 2005, at 9:15 PM, Richard Henderson wrote:

On Mon, Sep 19, 2005 at 05:33:54PM -0700, Dale Johannesen wrote:
Do you have any constructive suggestions for how the RA might be 
fixed,

then?


Short term?  No.  But I don't see this as a short term problem.


OK.  Unfortunately, it is a short term problem for Apple.  I don't know 
how to
fix it in the RA and it looks like nobody else does either, so I'll 
have to do

something local, I guess.  (Thanks Daniel and Giovanni, suggestions for
incremental updates that don't address this problem are not really what 
I

was looking for here.)



x86 SSE constants

2005-09-30 Thread Dale Johannesen

The C constraint on x86 is defined, in both the doc and the comments, as
"constant that can be easily constructed in SSE register without loading
from memory".   Currently the only one handled is 0, but there is at 
least

one more, all 1 bits, which is constructed by
   pcmpeqd  %xmm, %xmm
 Unfortunately there are quite a few places in the patterns that assume 
C

means zero, and generate pxor or something like that.  What would be
the preferred way to fix this, new constraint or change the existing 
patterns?




Re: x86 SSE constants

2005-09-30 Thread Dale Johannesen


On Sep 30, 2005, at 4:17 PM, Jan Hubicka wrote:

The C constraint on x86 is defined, in both the doc and the comments, 
as
"constant that can be easily constructed in SSE register without 
loading

from memory".   Currently the only one handled is 0, but there is at
least
one more, all 1 bits, which is constructed by
   pcmpeqd  %xmm, %xmm
 Unfortunately there are quite a few places in the patterns that 
assume

C
means zero, and generate pxor or something like that.  What would be
the preferred way to fix this, new constraint or change the existing
patterns?

My original plan was to add pcmpeqd by extending the 'C' constraint and
the patterns where pxor/xorp? is currently generated unconditionally.
This is pretty similar to what we do to i387 constants as well.  I 
never

actually got to realizing this (for the scalar FP work I was mostly
interested in that time it was not at all that interesting), but I 
think

there is nothing in md file preventing it (or I just missed it when it
was added :)...


No, there isn't, but it might be a smaller change to add a new 
constraint

having constraints tied to specific constants is pretty ugly, and so is
having (if (constant value==0)) in a lot of patterns..,,




RFC: redundant stores in C++

2005-10-01 Thread Dale Johannesen

In C++, when we have an automatic array with variable initializers:

void bar(char[4]);
void foo(char a, char b, char c, char d) {
  char x[4] = { a, b, c, d };
  bar(x);
}

the C++ FE generates 32-bit store(s) of 0 for the entire array, 
followed by stores
of the individual elements.  In the case above, where the elements are 
not
32-bits, the optimizers do not figure out they can eliminate the 
redundant store(s)
of 0.  The C FE does not generate that to begin with, and the C++ FE 
should
not either.   This is not my native habitat, but I think this is the 
right general idea:
*** typeck2.c   Thu Aug  4 17:52:43 2005
--- /Network/Servers/harris/Volumes/haus/johannes/temp/typeck2.cSat Oct 
 1 14:44:46 2005
*** split_nonconstant_init (tree dest, tree 
*** 534,540 
code = push_stmt_list ();
split_nonconstant_init_1 (dest, init);
code = pop_stmt_list (code);
!   DECL_INITIAL (dest) = init;
TREE_READONLY (dest) = 0;
  }
else
--- 534,551 
code = push_stmt_list ();
split_nonconstant_init_1 (dest, init);
code = pop_stmt_list (code);
!   /* APPLE LOCAL begin */
!   /* If the constructor now doesn't construct anything, that
!means constant 0 for the entire object.  We don't need
!to do this for non-statically-allocated objects.
!Functionally it is harmless, but leads to inferior code
!in cases where the optimizers don't get rid of the
!redundant stores of 0.  */
!   if (TREE_CODE (dest) != VAR_DECL 
! || TREE_STATIC (dest)
! || CONSTRUCTOR_ELTS (init) != 0)
!   DECL_INITIAL (dest) = init;
!   /* APPLE LOCAL end */
TREE_READONLY (dest) = 0;
  }
else


Testsuite passes with this but I can believe improvements are possible; 
comments?




Re: RFC: redundant stores in C++

2005-10-01 Thread Dale Johannesen


On Oct 1, 2005, at 7:29 PM, Andrew Pinski wrote:

I don't think this will work for the following code:

void foo(char a, char b) {
  char x[4] = { a, b } ;
  if (x[3] != 0)
   abort ();
}


Duh.  I thought that was too easy.


But better fix would be not call split_nonconstant_init_1 for
local decls and have the front-end produce a CONSTRUCTOR which is
just like what the C front-end produces.


I'll try it.
\



Re: Should -msse3 enable fisttp

2005-10-04 Thread Dale Johannesen

On Oct 3, 2005, at 3:49 PM, Andrew Pinski wrote:

On Oct 3, 2005, at 6:41 PM, Evan Cheng wrote:
But according to the manual -msse3 does not turn on generation of 
SSE3 instructions:


The manual is semi-confusing I had forgot about that.
There is a bug about the issue recorded as PR 23809:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23809


I'm a little disappointed the behavior of the compiler was changed 
without

addressing this.  Maybe somebody could review the patch in that radar?



Re: RFC: redundant stores in C++

2005-10-04 Thread Dale Johannesen


On Oct 1, 2005, at 8:41 PM, Andrew Pinski wrote:

On Oct 1, 2005, at 11:10 PM, Dale Johannesen wrote:

But better fix would be not call split_nonconstant_init_1 for
local decls and have the front-end produce a CONSTRUCTOR which is
just like what the C front-end produces.


I'll try it.


This patch should fix the problem and also fixes FSF PR 8045 at
the same time.  FSF PR 8045 is about an missing unused variable
causes by this code.

This patch causes us to be more similar with the C front-end.  It
should also cause us save us some compile time issue when gimplifing
and memory too.

Note I have not tested this yet by either looking at the code gen
or even compiling it.

I will be doing a bootstrap/test of this right now.

-- Pinski
Index: typeck2.c
===
RCS file: /cvs/gcc/gcc/gcc/cp/typeck2.c,v
retrieving revision 1.192
diff -u -p -r1.192 typeck2.c
--- typeck2.c   1 Aug 2005 04:02:26 -   1.192
+++ typeck2.c   2 Oct 2005 03:36:41 -
@@ -613,10 +613,13 @@ store_init_value (tree decl, tree init)
   value = digest_init (type, init);
   /* If the initializer is not a constant, fill in DECL_INITIAL with
  the bits that are constant, and then return an expression that
- will perform the dynamic initialization.  */
+ will perform the dynamic initialization.  We don't have to do this
+ for local variables either.  */
   if (value != error_mark_node
   && (TREE_SIDE_EFFECTS (value)
-  || ! initializer_constant_valid_p (value, TREE_TYPE (value
+  || ! initializer_constant_valid_p (value, TREE_TYPE (value)))
+  && (TREE_CODE (decl) != VAR_DECL
+  || TREE_STATIC (dest)))
 return split_nonconstant_init (decl, value);
   /* If the value is a constant, just put it in DECL_INITIAL.  If DECL
  is an automatic variable, the middle end will turn this into a



Thanks.  The last line of this patch should use "decl", not "dest".  
With that obvious
change it tests OK in apple's branch.  (I cannot build mainline on 
darwin

x86 at the moment for unrelated reasons - nothing that means there is
a problem in mainline, I don't think.)


Re: Need advice: x86 redudant compare to zero

2005-10-13 Thread Dale Johannesen


My question is: where and how would you suggest we do this 
optimization. With peephole2? Or in combine? In i386.md, I see pattern 
*subsi_2 looks like what I'd like to combine these two insn into:


(define_insn "*subsi_2"
  [(set (reg FLAGS_REG)
(compare
  (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")
(match_operand:SI 2 "general_operand" "ri,rm"))
  (const_int 0)))
   (set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
(minus:SI (match_dup 1) (match_dup 2)))]
  "ix86_match_ccmode (insn, CCGOCmode)
   && ix86_binary_operator_ok (MINUS, SImode, operands)"
  "sub{l}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "mode" "SI")])


That's quite similar to several PPC patterns for andi. , and they work.
If you've got two other insns that look like the set's I'd expect 
combine

to merge them, and would look to see why it doesn't.



Re: backslash whitespace newline

2005-10-26 Thread Dale Johannesen


On Oct 25, 2005, at 5:40 PM, Joe Buck wrote:

The problem, I think, is that the behavior of both GCC *and* the
other compilers does not serve the users.

The reason is that there simply isn't any reason why a user would
use a backslash to continue a C++ comment on purpose, and plenty of
reason why she might do it by accident.

...users think they can put anything in a comment.  A backslash at the 
end is likely to be an accident,

since just starting the next line with a // is easy enough.


Yes.  From the user's point of view, the best thing appears to be
treating backslashes in C++ comments as part of the comment,
regardless of what follows them; that seems to follow the principle
of least surprise. That's not standard conforming, and therefore I'm
not advocating it for gcc, but it probably wouldn't break anything
outside compiler testsuites.  Maybe this treatment should be made
standard conforming...?



Re: Link-time optimzation

2005-11-17 Thread Dale Johannesen

On Nov 17, 2005, at 3:09 PM, Robert Dewar wrote:

Richard Earnshaw wrote:


We spend a lot of time printing out the results of compilation as
assembly language, only to have to parse it all again in the 
assembler.



I never like arguments which have loaded words like "lot" without
quantification. Just how long *is* spent in this step, is it really
significant?


When I arrived at Apple around 5 years ago, I was told of some recent
measurements that showed the assembler took around 5% of the time.
Don't know if that's still accurate.  Of course the speed of the 
assembler

is also relevant, and our stubs and lazy pointers probably mean Apple's
.s files are bigger than other people's.



Re: identifying c++ aliasing violations

2005-12-05 Thread Dale Johannesen

On Dec 5, 2005, at 12:03 AM, Giovanni Bajo wrote:

Jack Howarth <[EMAIL PROTECTED]> wrote:

What exactly is the implication of having a hundred or more of this in
an application being built with gcc/g++ 4.x at -O3? Does it only risk
random crashes in the generated code or does it also impact the
quality
of the generated code in terms of execution speed?


The main problem is wrong-code generation. Assuming the warning is 
right and
does not mark false positives, you should have those fixed. I don't 
think

quality of the generated code would be better with this change.

However, it's pretty strange that C++ code generation is worse with 
GCC 4: I
saw many C++ programs which actually got much faster due to higher 
lever
optimazations (such as SRA). You should really try and identify inner 
loops

which might have been slowed down and submit those as bugreports in our
Bugzilla.


Could also be inlining differences, and you might check out whether
-fno-threadsafe-statics is applicable; that can make a big difference.
Bottom line, you're going to have to do some analysis to figure out
why it got slower.  (It sounds like you're on a MacOSX system, in
which case Shark is a good tool for this.)



Re: Performance comparison of gcc releases

2005-12-16 Thread Dale Johannesen


On Dec 16, 2005, at 10:31 AM, Dan Kegel wrote:


Ronny Peine wrote:
-ftree-loop-linear is removed from the testingflags in gcc-4.0.2 
because

it leads to an endless loop in neural net in nbench.


Could you fill a bug report for this one?


Done.


This is probably the same as 20256.


Your PR is a bit short on details.  For instance, it'd be nice to
include a link to the source for nbench, so people don't have
to guess what version you're using.  Was it
  http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz
?

It'd be even more helpful if you included a recipe a sleepy person
could use to reproduce the problem.  In this case,
something like

wget http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz
tar -xzvf nbench-byte-2.2.2.tar.gz
cd nbench-byte-2.2.2
make CC=gcc-4.0.1  CFLAGS="-ftree-loop-linear"

Unfortunately, I couldn't reproduce your problem with that command.
Can you give me any tips?

Finally, it's helpful when replying to the list about filing a PR
to include the PR number or a link to the PR.
The shortest link is just gcc.gnu.org/PR%d, e.g.
   http://gcc.gnu.org/PR25449
- Dan

--
Wine for Windows ISVs: http://kegel.com/wine/isv




Re: Corrupted Profile Information

2006-01-27 Thread Dale Johannesen


On Jan 26, 2006, at 4:05 PM, [EMAIL PROTECTED] wrote:

 I really need correct profile information before PRE. By moving
rest_of_handle_branch_prob() just before rest_of_handle_gcse() have I
violated some critical assumptions which is causing the profile
information to be occasionally corrupted ?


Yes; various CFG transformations before the profiling phase don't 
maintain the profiling info, because there isn't any.  In gcc-4 the 
profiling phase has been moved much earlier and this information is 
maintained by the later transformations.  Backporting all that logic to 
3.4 might be possible, but is not easy.  You're better off using gcc-4.




Re: x86-64, I definitely can't make sense out of that

2006-02-04 Thread Dale Johannesen


On Feb 4, 2006, at 7:06 AM, Andrew Pinski wrote:

signs_all[4] = { !(sx > 0), !(sy > 0), !(sz > 0),  0 },



C++ front-end produces:
<>;
<>>;
<<< Unknown tree: expr_stmt  signs_all[1] = (int) sy <= 0 >>>;
<<< Unknown tree: expr_stmt  signs_all[2] = (int) sz <= 0 >>>;

While the C front-end is producing:
const int signs_all[4] = {(int) sx <= 0, (int) sy <= 0, (int) sz <= 0, 
0};


Dale Johannesen and I came up with a patch to the C++ front-end
for this except it did not work with some C++ cases.


Yes, we had it in Apple's branch for a while and had to back it out.  
The place to look is split_nonconstant_init in cp/typeck2.c if you want 
to try.  The tricky part is making sure the entire object is 
initialized in all cases when only a partial initializer is specified.




Re: x86 -ffast-math problem on SPEC CPU 2K

2006-02-23 Thread Dale Johannesen


On Feb 23, 2006, at 8:54 AM, H. J. Lu wrote:


When I use -O2 -mtune=pentium4 -ffast-math on SPEC CPU 2K on Linux/x86
with gcc 4.2, I get

*** Miscompare of 200.s, see
/export/spec/src/2000/spec/benchspec/CINT2000/176.gcc/run/0004/ 
200.s.mis

*** Miscompare of scilab.s, see
/export/spec/src/2000/spec/benchspec/CINT2000/176.gcc/run/0004/ 
scilab.s.mis


Is that a known issue?


This is what you get if the benchmark source thinks the host is of the  
wrong endianness.

Do you have -DHOST_WORDS_BIG_ENDIAN in your config file perhaps?



Re: documentation on inlining model needed

2006-03-07 Thread Dale Johannesen


On Mar 7, 2006, at 12:28 AM, Yang Yang wrote:


Recently, I'm very interested in the inlining model of gcc.I need a
detailed documentation describing how the inlining is implemented in
gcc 4.0. Anybody who has been or is working on it please send me a
documentation. I'd really appreciate your help.


There is no such documentation; you're going to have to look at the  
source.  The mechanism of  actually duplicating a function body and  
substituting it for a call is in tree-inline.c.  The decision about  
which calls to expand inline is made in cgraph.c and cgraphunit.c.




Re: "Experimental" features in releases

2006-04-17 Thread Dale Johannesen

On Apr 17, 2006, at 11:52 AM, Mark Mitchell wrote:

Dan Berlin and I exchanged some email about PR 26435, which concerns a
bug in -ftree-loop-linear, and we now think it would make sense to  
have

a broader discussion.

The PR in question is about an ice-on-valid regression in 4.1, when
using -O1 -ftree-loop-linear.  Dan notes that this optimization option
is "experimental", but I didn't see that reflected in the  
documentation,

which says:


@item -ftree-loop-linear
Perform linear loop transformations on tree.  This flag can  
improve cache

performance and allow further loop optimizations to take place.


I wasn't aware that it was supposed to be experimental either, and it  
wasn't explained that way when it went in (Sep 2004).  (Incomplete or  
buggy would not be surprising, but it sounds now like we're talking  
about fatally flawed design, which is different.)



In any case, the broader question is: to what extent should we have
experimental options in releases, and how should we warn users of  
their

experimental nature?


In general I would agree in principle with Diego that such features  
don't belong in releases, but this isn't the first time features have  
been found to be buggy after they've gone in.  -frename-registers  
comes to mind; in that case, the bugginess was documented for several  
releases, and that warning has recently been removed as the bugs are  
believed to be fixed.


This optimization is worth about a 5x speedup on one of the SPECmarks  
(see discussion in archives), so IMO we should consider carefully  
before removing it.  It was in 4.0 and 4.1 releases.


My suggestion is that features that are clearly experimental (like  
this

one) should be (a) documented as such, and (b) should generate a
warning, like:

  warning: -ftree-loop-linear is an experimental feature and is not
recommended for production use


Looks good to me.



Re: "Experimental" features in releases

2006-04-17 Thread Dale Johannesen


On Apr 17, 2006, at 2:31 PM, Richard Guenther wrote:


On 4/18/06, Ivan Novick <[EMAIL PROTECTED]> wrote:
I am a gcc user at a fininancial institution and IMHO it would not  
be a

good idea to have non-production ready functionality in gcc.  We are
trying to use gcc for mission critical functionality.


It has been always the case that additional options not enabled with
any regular -O level gets less testing and more likely has bugs.   
So for
mission critical functionality I would strongly suggest to stay  
with -O2
and not try to rely on not thoroughly tested combinations of  
optimization

options.


I'd go further:  you should not be trusting a compiler (gcc or any  
other) to be correct in "mission critical" situations.  Finding a  
compiler without bugs is not a realistic expectation.  Every compiler  
release I'm familiar with has had bugs.



So from my point of view, the situation with -ftree-loop-linear is
fine - it's ICEing after all, not producing silently wrong-code.  For
experimental options (where
I would include all options not enabled by -O[123s]) known wrong- 
code bugs

should be fixed.


The case of this in 20256 did produce silent bad code when it was  
reported, but that seems to have changed.




Re: "Experimental" features in releases

2006-04-19 Thread Dale Johannesen

On Apr 19, 2006, at 12:04 AM, Kai Henningsen wrote:
[EMAIL PROTECTED] (Daniel Berlin)  wrote on 18.04.06 in  
<[EMAIL PROTECTED]>:


This is in fact, not terribly surprising, since the algorithm used  
was the
result of Sebastian and I sitting at my whiteboard for 30 minutes  
trying to

figure out what we'd need to do to make swim happy :).


This would leave -ftree-loop-linear in 4.2, but make it not useful  
for

increasing SPEC scores.


So is this an object lesson for why optimizing for benchmarks is a bad
idea?


If you're inclined to believe this, you could find a confirming  
instance here, but there are other lessons that could be drawn.  If  
you go back to the original thread, you'll see this from Toon Moene:

http://gcc.gnu.org/ml/gcc-patches/2004-09/msg00256.html
It didn't have to be a benchmark-only optimization.



Re: "Experimental" features in releases

2006-04-19 Thread Dale Johannesen

On Apr 19, 2006, at 11:52 AM, Daniel Berlin wrote:
So is this an object lesson for why optimizing for benchmarks is  
a bad

idea?


If you're inclined to believe this, you could find a confirming
instance here, but there are other lessons that could be drawn.  If
you go back to the original thread, you'll see this from Toon Moene:
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg00256.html
It didn't have to be a benchmark-only optimization.


It isn't a benchmark only optimization.  Only the perfect nest
conversion was targeted for the benchmarks, because it was necessary.

The rest uses standard spatial optimality metrics to decide whether it
makes sense to interchange loops or not, and *that* works great on
fortran applications (except for a few other random bugs).


OK, I didn't get that.



Re: address order and BB numbering

2006-05-19 Thread Dale Johannesen


On May 19, 2006, at 12:48 PM, sean yang wrote:

Although "BASIC_BLOCK array contains BBs in an unspecified order"  
as the GCC internal doc says, can I assume that the final virtual  
address for an instruction in BB_m is always higher than the  
virtual address for an instruction in BB_n, when m < n.  (Let's  
assume the linker for the target machine produce code from low  
address to high address.)


Definitely not.
Various phases that need to know the order of insns produce a CUID  
for that phase, but it is not maintained globally.




  1   2   >