Re: A question about DCE

2008-10-15 Thread Trevor_Smigiel
Revital,

* Revital1 Eres <[EMAIL PROTECTED]> [2008-10-15 02:20]:
> 
> r77 is defined as 'fixed register' which is a register that the register
> allocator can not use.  (triggers by SPU option -mfixed-range)
> r77 is used to pass information to some other routine at run-time (the
> next instruction is branch to this routine; the routine does not exist
> at the compile time of the function which contains this instruction).

You are using r77 as a function argument for this other routine.  I
believe there is a way to tell GCC that this function uses a special
ABI, that it uses different parameter registers.

Trevor




Request for acceptance of new port (Cell SPU)

2006-10-09 Thread trevor_smigiel
Dear Steering Committee,

We, Sony Computer Entertainment, would like to contribute a port for a
new target, the Cell SPU, and seek acceptance from the Steering
Committee to do so.

(David Edelsohn indicated that before submitting patches we should
request acceptance for the new port from the Steering Committee.)

Thank you,
Trevor Smigiel



aligned attribute and the new operator (pr/15795)

2006-10-09 Thread trevor_smigiel
Hi,

I would like to reopen the discussion for pr/15795, or at least get
clarification on the current resolution of WONTFIX.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795

Let me state right at the beginning that I am also volunteering to do
the actual work to come up with an agreed solution and implement it.

Briefly, pr/15795 was submitted to address the issue of calling the new
operator for a class which has an alignment greater than that guaranteed
by the underlying malloc().  It specifically talks about alignment for
vector types, but it applies to any alignment.

For example, a 16 byte vector type typically requires 16 byte alignment
and malloc() might only guarantee 8 byte alignment.  When a class
contains that vector the class requires alignment of 16, but a new
operator that calls that malloc() will not always return properly
aligned memory.

The pr proposes some solutions:

  solution:  operator new should always return 16 byte aligned memory
  response:  glibc doesn't do that, neither should libstdc++

I agree with this response.  A solution that handles any alignment
is better, simply changing the default isn't sufficient.

  solution:  create an additional signature for operator new which
 has a parameter that indicates the alignment.  The compiler
 should call this version of the operator when necessary and
 the standard library should provide an appropriate
 implementation.
  response:  "would interact badly with overriding the default operator
 new."

At first glance, this could break some existing code, but I think
this solution could be made to work.  I'll discuss below.

  solution:  the library will provide an implementation of a new operator
 with an additional parameter, but the user is responsible for
 calling it.
  response:  doesn't work well with existing template class libraries,
 like STL or Qt.

I agree with this response.  (There was no agreement or disagreement
with this response in the pr.)

For the second proposed solution, let's define precisely the cases that
"interact badly".  I can think of only one case, defined by these
conditions:
  - a type has an alignment greater than what malloc() guarantees
  - operator new for that type does not call the default implementation
for operator new
  - an object of that type is created using operator new
  - an appropriate implementation of operator new with the additional
alignment parameter is not provided for this type.
 
Clearly, if the compiler does not call the user defined version of
operator new in this case, the code is likely to break.  Are there
other cases which "interact badly"?

I'm hoping a solution to this case is as simple as:
  - when the compiler would call the aligned version of operator new it
first checks which definition of operator new would have been called
if it were not aligned.  If there is an aligned version of operator
new in the same place*, call it, otherwise call the non-aligned
version and issue a warning.  (* for "place" fill in the appropriate
C++ jargon for it to make sense, e.g., namespace, class, scope.)
  - the default versions of operator new and the aligned version of
operator new should be defined in the same section.  That way,
when a user overrides the default operator new, they will get
a link error (duplicate definitions of new) unless they also
define the aligned version of operator new.

Can anyone identify situations where this wouldn't work?  In the
case where it generates a warning the code might not work because
of improper alignment, but at that point I would consider it the
users problem.

While I'm here let me also point out that an object which is allocated
on the stack and has alignment greater than what the stack guarantees is
also an issue.  I have a patch which fixes this for any alignment,
though it doesn't take advantage of stack ordering.  

Thanks,
Trevor



Re: aligned attribute and the new operator (pr/15795)

2006-10-11 Thread trevor_smigiel
* Ian Lance Taylor <[EMAIL PROTECTED]> [2006-10-10 10:15]:
> [EMAIL PROTECTED] writes:
> 
> >   - the default versions of operator new and the aligned version of
> > operator new should be defined in the same section.  That way,
> > when a user overrides the default operator new, they will get
> > a link error (duplicate definitions of new) unless they also
> > define the aligned version of operator new.
> 
> That would be a bad idea since it would break standard conformant
> programs which only override the default operator new.

Ahh, right.  Regardless, I've realized this technique won't work anyway.
No error would be given when the implementation of operator new is
declared weak, which is common enough.

There are some other possibilities, none of them perfect, like a command
line option, or teaching the linker something about operator new.  

> You are implicitly assuming that we know the alignment of memory
> returned by new.  If we know that, then whenever we need aligned
> memory, we can ask for a bit more memory and force the alignment.
> That will always work and avoids the concerns of overriding new.  We
> can generate more optimal code by providing a command line option
> which specifies the alignment of memory returned by new.

I think I'm assuming the same thing GCC already assumes about new.  I
don't see any thing in GCC that attempts to align the result of a call
to operator new.  For some targets it doesn't matter, for example, when
load and store instructions work on unaligned addresses, but pr/15795
points out a case where this doesn't work.  

Forcing the alignment of the result of new isn't going to work because
we have no way to reconstruct the correct address when calling delete.
We could save the correct address at the beginning of the memory
returned, but that is an ABI change.  

If we are willing to consider an ABI change, I think an approach that
allows new to call some form of memalign would be better than having the
compiler force alignment after calling new.  

Are we open to making an ABI change?

For now, I will still promote the original idea of having a second
version of new with an alignment argument.  And require a command line
option to emit calls to the aligned version of the default new operator.
Is a command line option sufficient?

Trevor


Re: __builtin_expect for indirect function calls

2008-01-03 Thread trevor_smigiel
If possible, I agree it seems natural to extend __builtin_expect.  My
concern would be backwards compatibility.

Currently, the prototype for __builtin_expect is

long __builtin_expect (long expression, long constant);

Extending it to functions would change it to

T __builtin_expect (T expression, T expected);

With these additional semantics and restrictions:
- when the return value is being used as a call expression:
  * T is the type of 'expression'
  * 'expected' is allowed to be a non-constant
- when the return value is not being used as a call expression:
  * T is type 'long;
  * 'expected' must be a compile-time constant

Given the above definition, I don't think there is any backwards
compatibility issues because we are inspecting the context of the
use of __builtin_expect.

Rather than the above definition, we could choose not to inspect the
context and just say:
* T is the type of 'expression'
* 'expected' is allowed to be a non-constant

In this case I think there would only be compatibility issues with
unusual uses of __builtin_expect, for example, if it was being used in a
function argument and its type effected overload resolution. Or if the
argument was a float and was being implicitly converted to a long (with
a warning).  There would also be code which previously gave warnings but
does not with the extended __builtin_expect.

I'm ok with either of these definitions, if extending __builtin_expect
is the preferred way to go.

Are either of these definitions ok?  Or are there other ideas how to
define it?

Trevor

* Mark Mitchell <[EMAIL PROTECTED]> [2008-01-03 12:12]:
> Hans-Peter Nilsson wrote:
> > On Mon, 17 Dec 2007, [EMAIL PROTECTED] wrote:
> >> When we can't hint the real target, we want to hint the most common
> >> target.   There are potentially clever ways for the compiler to do this
> >> automatically, but I'm most interested in giving the user some way to do
> >> it explicitly.  One possiblity is to have something similar to
> >> __builtin_expect, but for functions.  For example, I propose:
> >>
> >>   __builtin_expect_call (FP, PFP)
> > 
> > Is there a hidden benefit?  I mean, isn't this really
> > expressable using builtin_expect as-is, at least when it comes
> > to the syntax?  
> 
> That was my first thought as well.  Before we add __builtin_expect_call,
> I think there needs to be a justification of why this can't be done with
> __builtin_expect as-is.
> 
> -- 
> Mark Mitchell
> CodeSourcery
> [EMAIL PROTECTED]
> (650) 331-3385 x713



Re: __builtin_expect for indirect function calls

2008-01-03 Thread trevor_smigiel
> >
> > which returns the value of FP with the same type as FP, and tells the
> > compiler that PFP is the expected target of FP.  Trival examples:
> >
> >   typedef void (*fptr_t)(void);
> >
> >   extern void foo(void);
> >
> >   void
> >   call_fp (fptr_t fp)
> >   {
> > /* Call the function pointed to by fp, but predict it as if it is
> >calling foo() */
> > __builtin_expect_call (fp, foo)();
> 
> __builtin_expect (fp, foo);  /* alt __builtin_expect (fp == foo, 1); */
> fp ();

> >   }
> >
> >   void
> >   call_fp_predicted (fptr_t fp, fptr_t predicted)
> >   {
> > /* same as above but the function we are calling doesn't have to be
> >known at compile time */
> > __builtin_expect_call (fp, predicted)();
> 
> __builtin_expect (fp, predicted);
> fp();
> 
> I guess the information just isn't readily available in the
> preferred form when needed and *that* part could more or less
> simply be fixed?


The main reason I didn't like this is that in some other context 'fp'
could be any expression, potentially with side effects.  This would
require either special handling by the compiler, or the user would have
to be sure to write it such that the side effects only happen once.

Trevor



Re: [RFC] Improve Tree-SSA if-conversion - convergence of efforts

2007-09-12 Thread trevor_smigiel
Tehila asked me a while ago to comment based on my experience with the
RTL if convert pass and the discussions some of us had at the GCC
summit.  Sorry it took me so long to respond.

The target I care about (Cell SPU) has some things that make an
aggressive if convert very useful and profitable.  It has conditional
moves for every mode (there is a single, unified register file), never
traps on illegal addresses (addresses always wrap to the 256KB address
space), and branches are expensive (there is no hardware cache).

The main limitation with the RTL if-convert pass is that it only
recognizes specific patterns.  It is easy to write a complicated if
statement (just using normal C with &&/||) that would never get
converted because it ends up with basic blocks that have many in edges
that if-convert generally doesn't handle.   (In our internal tree we
modified the RTL pass to handle some cases of multiple in-edges, and can
handle any number of insns in a basic block.)

I haven't looked at the tree-SSA if-convert code yet, but based on what
was described to me at the summit it seemed to be taking the same
approach as the RTL pass.  Recognize certain patterns and convert it.

I would like to see an approach that is able to take an arbitrary flow
graph without backedges and convert it to straight line code, limited
only by the cost model and impossible cases (e.g., inline asm).  

I'm not sure how that would be achieved in a target neutral way.

Trevor


* Tehila Meyzels <[EMAIL PROTECTED]> [2007-07-31 06:50]:
> 
> Hi,
> 
> I'd like to bring up on the list a discussion that a bunch of people (most
> of those CC-ed above) started at the GCC Summit:
> 
> Lately, there were few efforts, that are not necessarily related to each
> other, but are all relevant to if-conversion.
> Each of them has its own restriction, like a specific control-flow, target
> dependent information, permission to transform speculative loads, etc.
> 
> Few patches that I'm aware of are:
> 1.  Conditional store sinking, by Michael Matz:
> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00724.html
> 
> 2. If -conversion for multiple IF_THEN_ELSE clauses, by Victor Kaplansky:
> http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00265.html
> Also mentioned here:  http://gcc.gnu.org/wiki/AutovectBranchOptimizations
> (2.3.3)
> 
> 3.  (unconditional) Store sinking (4.1.1 based), by Revital Eres and Victor
> Kaplansky:
> http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00265.html (same patch as
> previous)
> Also mentioned here:  http://gcc.gnu.org/wiki/AutovectBranchOptimizations
> (2.3.2)
> 
> 4. Conditional load hoisting (4.1.1 based), by myself:
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02168.html
> 
> 5. Maybe more?
> 
> You're welcome to share your/others related works here...
> 
> 
> I'd like to suggest to converge all these efforts into a single improved
> tree-level if-conversion pass (i.e., on the top of tree-if-conv.c).
> Currently, the tree-level if-conversion pass is quite limited in several
> ways, and mostly with respect to handling of loads/stores (it basically
> doesn't handle them), but not only.
> 
> There are several reasons why to store-sinking and load-hoisting should be
> combined with the if-conversion pass:
> 1. Store-sinking/load hoisting effect one another and they both can create
> new opportunities for if-conversion (not only in vectorizable loops, for
> example).
> Currently, load-store motion pass happens too late and thus don't help
> the (tree-ssa) if-converter.
> 2. Store-sinking/load hoisting may have an overhead and may degrade
> performance unless the relevant conditional branch gets if-converted.
> 
> Issues/Questions to be considered and discussed:
> 1. Cost model and machine dependency issues:
> - When is it profitable to perform these motions? What is the algorithm
> to decide whether there is a good chance for if-conversion?
> - Target dependency - What to check?
>   A. Are there scalar select/cmove/predicated instructions  (like in
> SPU)?
>   B. Are there vector select/cmove/predicated instructions (like in
> PowerPC)? + will  the loop be vectorized?
>   C. Are speculative loads allowed? Do memory accesses trap?
>   D. More?
> 
> 2. Which transformations we want to take care of in this pass?
>A. Conditional/unconditional loads/stores.
>B. PHI nodes with operands that are neither constants nor SSA NAMES
> (Currently, this is not supported in tree-if-conv.c).
>C. PHIOPT transformations (i.e., merge the PHIOPT pass into this pass
> maybe)?
>D More?
> 3. New control-flow graphs we want to support (besides the regular
> IF_THEN_ELSE, diamond-based):
> A. Nested diamonds.
> B. Sequential diamonds.
> C. More?
> 4. After we complete this pass, will the RTL-level ifcvt be needed?
> I guess the answer is yes, but I would like to hear more opinions.
> 
> Any comments/ideas/thoughts are really appreciated.
> 
> Thanks,
> Tehila.
> 
> 


__builtin_expect for indirect function calls

2007-12-17 Thread trevor_smigiel
Hi,

I'm looking for comments on a possible GCC extensions described below.

For the target I'm interested in, Cell SPU, taken branches are only
predicted correctly by explicitly inserting a specific instructions (a
hint instruction) that says "the branch at address A is branching to
address B".  This allows the processor to prefetch the instructions at
B, potentially with no penalty.

For indirect function calls, the ideal case is we know the target soon
enough at run-time that the hint instruction simply specifies the real
target.  Soon enough means about 18 cycles before the execution of the
branch.  I don't have any numbers as to how often this happens, but
there are enough cases where it doesn't.

When we can't hint the real target, we want to hint the most common
target.   There are potentially clever ways for the compiler to do this
automatically, but I'm most interested in giving the user some way to do
it explicitly.  One possiblity is to have something similar to
__builtin_expect, but for functions.  For example, I propose:

  __builtin_expect_call (FP, PFP)

which returns the value of FP with the same type as FP, and tells the
compiler that PFP is the expected target of FP.  Trival examples:

  typedef void (*fptr_t)(void);

  extern void foo(void);

  void
  call_fp (fptr_t fp)
  {
/* Call the function pointed to by fp, but predict it as if it is
   calling foo() */
__builtin_expect_call (fp, foo)();
  }

  void
  call_fp_predicted (fptr_t fp, fptr_t predicted)
  {
/* same as above but the function we are calling doesn't have to be
   known at compile time */
__builtin_expect_call (fp, predicted)();
  }

I believe I can add this just for the SPU target without effecting
anything else, but it could be useful for other targets.

Are there any comments about the name, semantics, or usefulness of this
extension?

Thanks,
Trevor