from:"Kazu Hirata"

Re: Re; Maintaining, was: Re: Reduce Dwarf Debug Size

2007-03-01 Thread Kazu Hirata


Hi Ian,


I don't see why:

   http://gcc.gnu.org/ml/gcc-patches/2006-02/msg02031.html

was a bad thing.  i think gcc would have been better if it had just
been committed.


Not fair.  Zack himself says the patch is not recommended for commit,
and it just a baseline for further work.  For this kind of target
specific change, I think it is best to let the target port maintainers
make the call.  I've CC'ed the maintainers listed for h8.  Perhaps
they will take a look at it.


I am aware of the define_constraints patch.  I am thinking about working on 
the H8 and M68K ports at least.  How urgent is this?  FWIW, I've reproduced 
the ICE that Zack mentioned, but I haven't investigated it.


Kazu Hirata

XFAILing gcc.c-torture/execute/mayalias-2.c -O3 -g (PR 28834)

2007-03-13 Thread Kazu Hirata

Hi Janis,

While PR 28834 stays open, I'm thinking about XFAILing
gcc.c-torture/execute/mayalias-2.c when it is run with -O3 -g.
However, I am not having any luck with writing mayalias-2.x.  I am
wondering if you could help me with XFAIL.

When I try mayalias-2.x like so:

set torture_eval_before_execute {
global compiler_conditional_xfail_data
set compiler_conditional_xfail_data {
"PR 28834" \
{ "*-*-*" } \
{ "-O3" } \
{ "" }
}
}
return 0

I get

XPASS: gcc.c-torture/execute/mayalias-2.c execution,  -O3 -fomit-frame-pointer 
FAIL: gcc.c-torture/execute/mayalias-2.c compilation,  -O3 -g  (internal 
compiler error)

That is, I am getting an unintended XPASS for
-O3 fomit-frame-pointer.  Also, the "-O3 -g" one doesn't show XFAIL
even though the options do contain -O3.

How do I make gcc.c-torture/execute/mayalias-2.c XFAIL on -O3 -g?

Thanks,

Kazu Hirata

A question about DOM

2005-02-27 Thread Kazu Hirata

Hi,

Consider the following code from
tree-ssa-dom.c:tree_ssa_dominator_optimize.

  /* Thread jumps, creating duplicate blocks as needed.  */
  cfg_altered = thread_through_all_blocks ();

  /* Removal of statements may make some EH edges dead.  Purge
 such edges from the CFG as needed.  */
  if (!bitmap_empty_p (need_eh_cleanup))
{
  cfg_altered |= tree_purge_all_dead_eh_edges (need_eh_cleanup);
  bitmap_zero (need_eh_cleanup);
}

  free_dominance_info (CDI_DOMINATORS);
  cfg_altered = cleanup_tree_cfg ();
  calculate_dominance_info (CDI_DOMINATORS);

Notice that we have three assignments to cfg_altered, but the last one
kills the two previous assignments.  Should the last one be |=?

Kazu Hirata

Re: RFC: Plan for cleaning up the "Addressing Modes" macros

2005-02-28 Thread Kazu Hirata

Hi Zack,

> So, the plan: Step 1 introduces - one at a time - target hooks
> corresponding to each of the above macros.  All callers are changed to
> use the hooks.  The default definitions of the hooks invoke the
> existing macros.  The hardest part of this is working out exactly what
> their sole callers expect of LEGITIMIZE_ADDRESS and
> LEGITIMIZE_RELOAD_ADDRESS.  The manual, unfortunately, is not very
> clear.  I think that in both cases it amounts to "either replace X
> with a new value and jump to WIN, or leave X unmodified and drop
> through", which can be translated to "return a new value for X, or
> NULL", but I'm not sure, particularly about LEGITIMIZE_RELOAD_ADDRESS
> and its interaction with push_reload.
> 
> Step 2 is to convert each architecture, one at a time, to make
> target-hook definitions.  I think it makes most sense to do each
> architecture in one go, because the definitions of these macros tend
> to be intertwined.  In particular, REG_OK_FOR_BASE_P etc are often
> used in the definition of GO_IF_LEGITIMATE_ADDRESS.  It will require
> some care to be sure that the hooks retain the same semantics wrt
> REG_OK_STRICT.  Some of the macros have sensible defaults; for
> instance, it is allowed to define a LEGITIMIZE_ADDRESS that does
> nothing.  In this stage we should identify what those defaults are,
> and add appropriate functions to hooks.c; but we cannot change the
> default settings in target-def.h until all architectures are
> converted.

I would swap these two steps.

Last time I looked at this stuff, I was surprised by how many things
depended on REG_OK_STRICT.  Even if macro A does not mention
REG_OK_STRICT, macro B used in A might use REG_OK_STRICT, so it's
difficult to tell in advance which macro depends on REG_OK_STRICT
either directly or indirectly.  So I would change each macro to an
architecture-specific function in each architecture first.  For
example, GO_IF_LEGITIMATE_ADDRESS should look like this (taken from
i386.h):

#ifdef REG_OK_STRICT
#define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \
do {\
  if (legitimate_address_p ((MODE), (X), 1))\
goto ADDR;  \
} while (0)

#else
#define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \
do {\
  if (legitimate_address_p ((MODE), (X), 0))\
goto ADDR;  \
} while (0)

#endif

Note that REG_OK_STRICT is essentially used as an argument to
legitimate_address_p and that the use of the function argument
decouples legitimate_address_p and REG_OK_STRICT.

Once REG_OK_STRICT appears in this way only, we are ready to introduce
new target hooks.  The new target hooks would have a REG_OK_STRICT as
an argument.  So in the above example, we would simply psas that
argument to the last argument of legitimate_address_p.

Kazu Hirata

Re: RFC: Plan for cleaning up the "Addressing Modes" macros

2005-02-28 Thread Kazu Hirata

Hi Dave,

>   I'm basically in agreement with you here, and just want to suggest you can
> avoid an awful lot of code duplication by doing something like
> 
> #ifdef REG_OK_STRICT
> #define ${CPU}_IS_STRICT 1
> #else 
> #define ${CPU}_IS_STRICT 0
> #endif

Sure.  In fact, the FRV port and possible others have already started
doing this like so:

#ifdef REG_OK_STRICT
#define REG_OK_STRICT_P 1
#else
#define REG_OK_STRICT_P 0
#endif

Kazu Hirata

A headache with fold_ternary and CALL_EXPR.

2005-03-03 Thread Kazu Hirata

Hi,

These days, I am reorganizing fold.  One of my goals is to provide a
function like

  fold_ternary (code, type, op0, op1, op2)

without taking a tree that would be obtained by

  build3 (code, type, op0, op1, op2)

So we need to eliminate a reference to the original tree, that
ie, the result of the build3 above.

It turns out that the CALL_EXPR case of fold_ternary needs the
original tree like so.  (Notice a reference to t.)

  /* Check for a built-in function.  */
  if (TREE_CODE (op0) == ADDR_EXPR
  && TREE_CODE (TREE_OPERAND (op0, 0)) == FUNCTION_DECL
  && DECL_BUILT_IN (TREE_OPERAND (op0, 0)))
{
  tree tmp = fold_builtin (t, false);
  if (tmp)
return tmp;
}

If we want to change fold_builtin to take operands like op0, op1, and
op2, I would have to change so many things that depend on
fold_builtin.  (Note that the subroutines of fold_builtin also depends
on fold_builtin in a sense that they need the original tree, one whose
TREE_CODE is CALL_EXPR is available.)

So the question is: Is it worth changing all these?  Or should I stop
folding builtins in fold_ternary?

IMHO, the latter isn't a totally crazy idea because the CCP folds
builtins by directly calling fold_builtin, and Diego is planning to
run CCP at several places, even at an early stage of tree SSA
optimizations.  (Currently, we depend on DOM to propagate constants,
but it doesn't fold builtins IIRC.)

Thoughts?

Kazu Hirata

Things blocking fold_ARITY and fold_buildN

2005-03-05 Thread Kazu Hirata

Hi,

The uses of the whole expression in the current fold_ARITY are
blocking transition to fold_ARITY (code, type, op0[, op1[, op2]]).

Below I am showing places where we need to make changes in the form of
a patch.  Wherever 't' is used, we need to make changes.

The first hunk comes from fold_unary, and the last one comes from the
fold_ternary.  The rest comes from fold_binary.

I've already got suggestions from Zack and Roger for fold_ternary.  I
think I'll replace a call to fold_builtin with fold_builtin_1 and
change fold_builtin_1 to take fn, arglist, and possibly static chain
if needed (as suggested by Roger on IRC).

COMPARISON_CLASS_P (t) is easy because it's equivalent to
TREE_CODE_CLASS (code) == tcc_comparison.

I might need help on others.  I'll try to go through these "KAZU" one
by one, but if you could volunteer to fix one, you are more than
welcome.  Suggestions like "Here is what I would do" would be fine,
too.

Kazu Hirata

diff -u fold-const.c fold-const.c
--- fold-const.c6 Mar 2005 00:44:01 -
+++ fold-const.c6 Mar 2005 00:45:38 -
@@ -6802,6 +6802,7 @@
{
  /* Don't leave an assignment inside a conversion
 unless assigning a bitfield.  */
+ /* KAZU.  */
  tem = copy_node (t);
  TREE_OPERAND (tem, 0) = TREE_OPERAND (op0, 1);
  /* First do the assignment, then return converted constant.  */
@@ -7152,6 +7153,7 @@
 
   if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0))
{
+ /* KAZU.  */
  tem = fold_binary_op_with_conditional_arg (t, code, arg0, arg1, 
 /*cond_first_p=*/1);
  if (tem != NULL_TREE)
@@ -7160,6 +7162,7 @@
 
   if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1))
{
+ /* KAZU.  */
  tem = fold_binary_op_with_conditional_arg (t, code, arg1, arg0, 
 /*cond_first_p=*/0);
  if (tem != NULL_TREE)
@@ -7170,6 +7173,7 @@
   switch (code)
 {
 case RANGE_EXPR:
+  /* KAZU.  */
   if (TREE_CONSTANT (t) != wins)
{
  tem = copy_node (t);
@@ -8611,6 +8615,7 @@
}
 
   /* See if we can build a range comparison.  */
+  /* KAZU.  */
   if (0 != (tem = fold_range_test (t)))
return tem;
 
@@ -8727,6 +8732,7 @@
   /* If this is a comparison of two exprs that look like an
 ARRAY_REF of the same object, then we can fold this to a
 comparison of the two offsets.  */
+  /* KAZU.  */
   if (COMPARISON_CLASS_P (t))
{
  tree base0, offset0, base1, offset1;
@@ -9142,6 +9148,7 @@
   && (TREE_CODE (arg0) == MIN_EXPR
   || TREE_CODE (arg0) == MAX_EXPR)
   && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST)
+   /* KAZU.  */
return optimize_minmax_comparison (t);
 
   /* If we are comparing an ABS_EXPR with a constant, we can
@@ -9845,6 +9852,7 @@
  && TREE_CODE (TREE_OPERAND (op0, 0)) == FUNCTION_DECL
  && DECL_BUILT_IN (TREE_OPERAND (op0, 0)))
{
+ /* KAZU.  */
  tree tmp = fold_builtin (t, false);
  if (tmp)
return tmp;

Re: Things blocking fold_ARITY and fold_buildN

2005-03-05 Thread Kazu Hirata

Hi,

> I might need help on others.  I'll try to go through these "KAZU" one
> by one, but if you could volunteer to fix one, you are more than
> welcome.  Suggestions like "Here is what I would do" would be fine,
> too.

Roger kindly suggested what to do for each of these on IRC.

Kazu Hirata

A preliminary result of fold_buildN (memory usage included)

2005-03-06 Thread Kazu Hirata

Hi,

Here is a preliminary result for fold_buildN from my personal tree.  I
compiled cc1-i files with ./cc1 -quiet -O2 -fomit-frame-pointer
-fmem-report -o /dev/null.  I used --enable-gather-detailed-mem-stats
to see how many nodes and bytes we are allocating for trees.  (Thanks
Honza!)  I then summed up the memory usage of trees.

original   patched   diff%
#nodes24,234,58923,577,245 -2.712%
#bytes 1,228,797,315 1,205,528,427 -1.893%

As far as compile time, I consistently see improvements of 0.1% or a
bit more.

In case you are wondering what fold_buildN is,

  fold_build1 (code, type, op0)

is equivalent to

  fold (build (code, type, op0))

except that fold_build1 does not build a scratch node when folding is
successful.  fold_build2 operates in a similar way.

Note that ternary expressions did not take advantage of this
fold_build3 mechanism because of the handling of CALL_EXPR in
fold_ternary.  Since ternary expressions are not as common as binary
expressions, I don't except much difference even after we start using
fold_build3.

Kazu Hirata

Shall we take a short break from my fold changes?

2005-03-08 Thread Kazu Hirata

Hi,

Roger Sayle and I are wondering if we should take a short break (maybe
a few days!?) from my fold changes.  The reason is that I've already
broken fold once.  (Thanks goes to David Edelsohn for fixing the
problem!)  If we wait for a few days, it's likely that people
bootstrap and test GCC on their favorite platforms.  Some people might
even try to build their favorite applications with mainline GCC.

Plus, as Roger Sayle puts it, applying parts 16 and 17

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00629.html
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00630.html

would make it harder to revert any of parts 1 through 15.

On the other hand, after applying parts 16 and 17, it's trivial to
make fold_build[12] available although fold_build3 will take some
time, and it would be a bit awkward to provide fold_build[12] but not
fold_build3.

Thoughts?

Kazu Hirata

For those who want to automatically generate predicates.md...

2005-03-17 Thread Kazu Hirata

Hi,

I created a set of scripts that generates predicates.md based on
PREDICATE_CODES in tm.h.  The generated file looks like this:

;; Predicate definitions for FIXME FIXME.
;; Copyright (C) 2005 Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; :
;; : Usual copyright notice
;; :

;; Return true if OP is a valid source operand for an integer move instruction.

(define_predicate "general_operand_src"
  (match_code 
"const_int,const_double,const,symbol_ref,label_ref,subreg,reg,mem")
{
  if (GET_MODE (op) == mode
  && GET_CODE (op) == MEM
  && GET_CODE (XEXP (op, 0)) == POST_INC)
return 1;
  return general_operand (op, mode);
})

  :
  : More predicates follow.
  :

1. A copyright is automatically inserted except the port name.

2. A comment for each function is taken from tm.c.

3. The name of a predicate along with codes it accepts are
   automatically taken from PREDICATE_CODES.

4. The C code for a predicate is automatically taken from tm.c.

My scripts will only generate predicate.md.  It does not remove
PREDICATE_CODES from tm.h, predicates from tm.c, or prototypes from
tm-protos.h.  All these are left for your code cleanup pleasure. :-)

Another thing that my scripts won't do is to convert a C-style
predicate to a LISP-style predicate.  My scripts are only meant to
alleviate the mechanical part of the conversion.

Anyway, untar the attachment and run

  predicatecodes.sh h8300

under config/h8300 to generate predicates.md.  Of course, you can
replace h8300 with any port with PREDICATE_CODES.  My scripts are not
robust, so don't blame me if they eat your files.

I might actually start posting patches to convert to predicate.md.

Kazu Hirata


conv_predicate_codes.tar.gz
Description: Binary data

Obsoleting more ports for 4.0.

2005-03-21 Thread Kazu Hirata

Hi,

First off, Mark, if you think this stuff is too late for 4.0, I'll
postpone this to 4.1.  Please note that all we have to do is add a few
lines to config.gcc as far as printing the "obsolete" message is
concerned.

Below, I propose to obsolete the following three architectures for GCC
4.0 and remove them for 4.1 unless somebody steps up and does *real
work*.

If you are working on these ports, please send us real patches.

If you would like to work on these ports privately, please refrain
from telling us that port xxx should be kept.  More ports we have,
more work we would have to do to clean up the infrastructure
connecting the middle end and the backends.

arc
---

  No maintainer.

  PR 8972 hasn't been fixed.  GCC miscompiles a function as simple as.

int
f (int x, int i)
{
  return x << i;
}

  There were some recent interests like

http://gcc.gnu.org/ml/gcc/2004-10/msg00408.html

  Obsoleting a port on the grounds of a single bug may seem a bit
  strange.  However, PR 8972 implies that nobody is working on the
  FSF's mainline at least.

  PR 8973 hasn't been fixed.

fr30


  The same justification as

http://gcc.gnu.org/ml/gcc/2004-06/msg01113.html

  Nobody showed an interest in keeping this port.

i860


  A hardware implementation is not currently being manufactured.

  Jason Eckhardt, the maintainer of i860, has told us that it would be
  OK to go.

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02033.html

ip2k


  PR 20582: libgcc build fails despite some interests, such as

http://gcc.gnu.org/ml/gcc/2004-06/msg01128.html

ns32k
-

  The same justification as

http://gcc.gnu.org/ml/gcc/2004-06/msg01113.html

  Nobody showed an interest in keeping this port.

Kazu Hirata

A question about java/lang.c:java_get_callee_fndecl.

2005-03-21 Thread Kazu Hirata

Hi,

I see that the implementation of LANG_HOOKS_GET_CALLEE_FNDECL in Java
always returns NULL (at least for the time being).

static tree
java_get_callee_fndecl (tree call_expr)
{
  tree method, table, element, atable_methods;

  HOST_WIDE_INT index;

  /* FIXME: This is disabled because we end up passing calls through
 the PLT, and we do NOT want to do that.  */
  return NULL;

  :
  :

Is anybody planning to fix this?  If not, I'm thinking about removing
this language hook.  The reason is not just clean up.  Rather it is
because I need to change the prototype of get_callee_fndecl and
LANG_HOOKS_GET_CALLEE_FNDECL..  Currently, fold_ternary has the
following call tree.

  fold_ternary
get_callee_fndecl
  java_get_callee_fndecl

If I change fold_ternary to take components of CALL_EXPR like the
address expression of CALL_EXPR and the argument list, instead of
CALL_EXPR itself, I would have to change java_get_callee_fndecl to
take the first operand of a CALL_EXPR, instead of a CALL_EXPR.

It's not that the change is so involved, but it doesn't make much
sense to keep something dead up to date.

In other words, when I posted the following patch

  http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02038.html

Roger Sayle requested to keep the call to get_callee_fndecl so that we
can "fold" the first operand of a CALL_EXPR to a FUNCTION_DECL.

FYI, the above FIXME comes from

  http://gcc.gnu.org/ml/java-patches/2004-q2/msg00083.html

Kazu Hirata

Re: Obsoleting more ports for 4.0.

2005-03-22 Thread Kazu Hirata

Hi Ramana,

> PR8972 has been fixed in our tree and we should be pushing 
> the patch upstream soon.  We had got the FSF tree building 
> with patches to fix PR17317 / PR11476 / PR17240 .We have 
> also been making fixes with binutils which can be found in 
> the corresponding archives.

If that's the case, you might want to assign those bugs to yourself
and leave some comments on each of them, saying that you've got a
patch and will be posting it soon.  Doing so will prevent several
people from independently working on the same problem (although it may
be unlikely in particular this case).  Also, note that our bugmasters
go through open bugs from time to time.  If you merge your fix into
mainline in a timely manner, they wouldn't have to spend extra time to
see if bugs still exist.

> We would like to merge our changes with mainline, 4.0 branch 
> and the 3.4 branch and continue work in the FSF domain. What 
> would be the best way to go about this ?

Send patches to [EMAIL PROTECTED]  If you are familiar with ARC
and willing to continue to work on the port, you may want to become a
maintainer.  That will speed up the process of merging your work into
the FSF tree.

Kazu Hirata

Do we still need get_callee_fndecl?

2005-03-22 Thread Kazu Hirata

Hi,

I am wondering if we still need get_callee_fndecl in the presence of
tree optimizers.  I think this function performs a form of constant
propagation at a rather strange place.

Currently, given an address expression of CALL_EXPR, get_callee_fndecl
tries to get to its DECL_INITIAL like so.

  STRIP_NOPS (addr);

  /* If this is a readonly function pointer, extract its initial value.  */
  if (DECL_P (addr) && TREE_CODE (addr) != FUNCTION_DECL
  && TREE_READONLY (addr) && ! TREE_THIS_VOLATILE (addr)
  && DECL_INITIAL (addr))
addr = DECL_INITIAL (addr);

This is a constant propagation in that we are propagating the value of
DECL_INITIAL (addr) into a call site.

In fact, even if I remove the block of code above from
get_callee_fndecl, the tree optimizers can still turn an indirect call
into a direct one "bar ()" in the following test case.

extern int bar (void);
typedef int (*bar_type) (void);
const bar_type g = &bar;

int
foo_1 (void)
{
  /* Call through a function pointer stored in a const global
 variable.  */
  return (*g)();
}

A call through a function pointer stored in a local variable would be
fine, too.

int
foo_2 (void)
{
  /* Call through a function pointer stored in a local variable.  */
  bar_type h = &bar;
  return (*h)();
}

But a call through a function pointer stored in a constant global
array is not optimized.  Currently, we don't dig that deep.

const bar_type array[3] = { bar, bar, bar };

int
foo_3 (void)
{
  /* Call through a function pointer stored in a constant global
 array.  */
  bar_type h = array[2];
  return (*h)();
}

All of the above should illustrate my point.

If we improve our constant/copy propagator to dig into a constant
array and structures, we would get foo_3 optimized for free.  After
all, a function pointer is just a variable, and if we have an
assignment of a constant, we can propagate the constant into its uses.

The Java front end used to look into an array, but it doesn't
nowadays.  I wonder if there is any data structure that a front end
does not expose to the middle for the purpose of constant propagation.

After all, all we need in get_callee_fndecl seems to be

  addr = TREE_OPERAND (call_expr, 0);
  return ((TREE_CODE (addr) == ADDR_EXPR
   && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL)
      ? TREE_OPERAND (addr, 0) : NULL_TREE;

Thoughts?

Kazu Hirata

Re: Do we still need get_callee_fndecl?

2005-03-22 Thread Kazu Hirata

Hi Andrew,

> I am wondering if we still need get_callee_fndecl in the presence of
> tree optimizers.  I think this function performs a form of constant
> propagation at a rather strange place.

Sorry for omitting you in the CC.

Once you fix java_get_callee_fndecl, perhaps we only need to call it
during inlining?

Kazu Hirata

Conversion from fold (buildN (...)) to fold_buildN.

2005-03-22 Thread Kazu Hirata

Hi,

I would like to announce that fold_buildN are now ready.

What's this?


Put shortly, this is a tree equivalent of
simplify_build_{unary,binary,ternary}.  For example, we could replace

  fold (build2 (MULT_EXPR, type, op0, op1));

with

  fold_build2 (MULT_EXPR, type, op0, op1);

while avoiding a scratch node that build2 would create when folding is
possible.

What's so good about this?
--

Reduced memory usage.  Here is some quantitative justification for
this change:

  http://gcc.gnu.org/ml/gcc/2005-03/msg00277.html

It's my my turn to ask you a question
-

When and how do we want to do these conversion?

Currently, I am thinking about doing converting all of
"fold (buildN (...))" and "fold (build (...))" to fold_buildN as soon
as stage 2 starts, which is about one month away, so that I won't
annoy people as much.  I don't mind people starting to use fold_buildN
on new code or doing conversions on what they maintain, but I don't
think now is the right time to do the wholesale conversion.

Would you like fold_{unary,binary,ternary} exported?

These are equivalent to simplify_{unary,binary,ternary}.  They return
a folded tree if folding is possible.  Otherwise, they return
NULL_TREE.  If need arises, I am happy to export them for you.

Kazu Hirata

Re: Do we still need get_callee_fndecl?

2005-03-22 Thread Kazu Hirata

Hi Dale,

> > After all, all we need in get_callee_fndecl seems to be
> >
> >   addr = TREE_OPERAND (call_expr, 0);
> >   return ((TREE_CODE (addr) == ADDR_EXPR
> >&& TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL)
> >   ? TREE_OPERAND (addr, 0) : NULL_TREE;
> >
> > Thoughts?
> 
> In Objective C (and ObjC++) it's also a good idea to look under 
> OBJ_TYPE_REF.
> See this patch, which was deferred to 4.1 and I'm going to resubmit RSN:
> http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt

Thanks for the information.  Does OBJ_TYPE_REF_EXPR only apply to a
CALL_EXPR?  In other words, are there other forms of constants that
are exposed by looking into OBJ_TYPE_REF_EXPR?

Kazu Hirata

Merging CCP and VRP?

2005-03-26 Thread Kazu Hirata

Hi Diego,

Have you considered merging CCP and VRP (as suggested by Kenny last
year at the summit)?

Originally I was thinking that ASSERT_EXPRs, or ranges gathered by VRP
rather, were very useful for jump threading, but they seem to be
useful for constant propagation, too.  Consider

void bar (int);

void
foo (int a)
{
  if (a == 0)
bar (a);
}

At the end of VRP we still have

:
  if (a_1 == 0) goto ; else goto ;

:;
  a_2 = 0;
  bar (a_2);  <-- a_2 isn't replaced with 0 yet!

Note that we don't have bar (0) at the end.  This is because currently
VRP does not have the equivalent of substitute_and_fold.  We just use
the range information to fold COND_EXPR; we don't fold each statement
using constants and ranges gathered by VRP.

We could have VRP call substitute_and_fold, but then once we do that,
the distinction between CCP and VRP would become less clear.  Plus,
propagating conditional equivalences may have a chain effect on
simplification, especially with inlining.  That is, an equivalence
like "a = 0" above may end up massively simplifying the code.  For the
obvious reason, we don't want to do

  do
{
  Run CCP;
  Insert ASSERT_EXPRs.
  Run VRP;
  Remove ASSERT_EXPRs.
}
  while (something changes)

So I am thinking about inserting ASSERT_EXPRs up front *before* going
into SSA, without much of pruning, and then run an enhanced version of
CCP, which includes ranges and anti-ranges in the lattice, which are
all suggested by Kenny last year.  I'm thinking about keeping
ASSERT_EXPRs until it's difficult to keep them.  I don't know much
about loop optimizers, so if I were to write code to keep
ASSERT_EXPRs, I might give up by turning ASSERT_EXPRs into copy
statements just before hitting loop optimizers.  :-) I have not
figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin
says I just have to teach tree-vn.c how to deal with it.

One note about the order of optimizations.  I think it's a good idea
to run an enhanced version of CCP before copy-prop because the
propagation engine can deal with presence of copies very well, whether
copy statements or PHI nodes.  If we run copy prop after an enhanced
version of CCP, we would still have useful information in
SSA_NAME_VALUE_RANGE at the end.  Copy prop only kills newer copies;
it doesn't even touch SSA_NAME_VALUE_RANGE stored in older copies.

Last but not least, I'm willing to do the work, but I'd like to be
more or less on the same page with other people hacking these scalar
optimizations before I do any significant work.

Thoughts?

p.s.
By the way, looking at Kenny's slides from last year, one thing we
have not implemented in our propagation engine is to process the CFG
worklist in the DFS order of a dominator tree.  I haven't looked
closely at this, but if the speed of propagation is a concern, we
should come back to this.

Kazu Hirata

Re: Merging CCP and VRP?

2005-03-27 Thread Kazu Hirata

Hi Diego,

> By merging, do you mean *replacing* CCP with VRP?  Yes, it's
> doable.  No, it's not a good idea.

Understood.

Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to
run copy-prop before VRP.  Otherwise, we would end up with lots of

  D.18001_101 = D.18001_198;
  D.18011_102 = D.18001_101->head_;
  D.18001_12 = ASSERT_EXPR ;

Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is
older.  As a result, VRP does not notice that D.18001_198 != 0B.
Currently, we still have these even after copy prop because we don't
allow copy propagation between const and non-const pointers, which I
think is a bit too restrictive.

> At one point, all the passes had to deal with ASSERT_EXPRs.
> Mostly by ignoring them.  Which is additional, unwanted work
> because some of them had to actively know about them being
> nothing but fancy copy operations.  That gets in the way of their
> work.  I think that ASSERT_EXPRs should only survive as long as
> they're useful.

Yes, playing with VRP, I've realized that I am not interested in
ASSERT_EXPRs per se, but I am more interested in SSA_NAME_VALUE_RANGE
that VRP leaves us with.

> Sure.  Go ahead.  My short term plan is to start merging the various
> components into mainline.  I'll start with the incremental SSA
> patches, followed by VRP, the CCP and copy-prop improvements.
> Perhaps we could leave the changes to the threader in TCB for a
> little while longer, but first I need to read your proposal in
> detail.

Thank you.  It turns out that if I blindly insert ASSERT_EXPRs for
everything that allows us to infer a range, my jump threading pass
finds more opportunities than what the current DOM can find.  There
are certain classes of opportunities that my pass misses but DOM
catches, but I'll mention them in a separate message.

> ISTR either stevenb or dberlin implementing a dom-order
> propagation.  I think they got a minor speedup that could be
> worth having.

Huh, whey I talked to them on IRC they didn't seem to have implemented
this.  I'll try to get this issue one of these days.

Kazu Hirata

Re: Merging CCP and VRP?

2005-03-27 Thread Kazu Hirata

Hi Diego,

> There is a copy-propagation pass before VRP.  Or do you mean
> right before?  Sure, the ordering of these passes is in eternal
> flux anyway.

"Before", but doesn't have to be "right before".  The current ordering
is reasonable.

> > Currently, we still have these even after copy prop because we don't
> > allow copy propagation between const and non-const pointers, which I
> > think is a bit too restrictive.
> > 
> I assume these are blocked by may_propagate_copy.  What's
> blocking these?  Do they have different alias set numbers or do
> we need a conversion from the const to the non-const version?

Yes, they are blocked by may_propagate_copy.  Seems to be const
v.s. non-const pointer issue.  I haven't come up with a brakedown of
reasons why copies are blocked.

Kazu Hirata

Re: Merging CCP and VRP?

2005-03-30 Thread Kazu Hirata

Hi Jeff,

> We'd have to go back and find the PR.  I don't remember all the
> details, but the problem was big enough to make ASSERT_EXPRs a
> far inferior solution to the others I'd looked at for recording
> context sensitive equivalences.

Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP,
etc, all take a lot of time although using the information gathered by
VRP for jump threading purposes is pretty straightforward and fast.

So I am sandwiched between two things.

1) DOM is fairly fast with its iteration is limited, which shouldn't
   be too restrictive if we are running CCP/copy-prop before DOM.  But
   DOM rebuilds information built by other passes, like keeping track
   of nonzero-ness, const/copy table, available expr table, etc.

2) VRP is slow and even slower with PHI node insertion.  But if we use
   its result for jump threading, we are not duplicating much code.
   Plus the context-sensitive equivalences are globally visible and
   fine grained, so we don't have to mix identification of jump
   threading opportunities into a dom walk.  (ASSERT_EXPRs and PHI
   nodes essentially splits a single variable into multiple variables,
   making a bunch of smaller live ranges to allow fine-grained value
   ranges to be stored.)  Having said all this, I can't think of uses
   of these fine-grained context-sensitive equivalences other than
   jump threading, so I can't justify constructing this much
   information just for jump threading.

> > > And, yes, it is imperative that we const/copy propagate into
> > > ASSERT_EXPRs to the fullest extent possible.  Otherwise we lose a lot
> > > of threading opportunities.
> > > 
> > ASSERT_EXPRs are regular expressions, so this is not a problem.
> It is if you don't run const/copy propagation immediately after
> insertion of the ASSERT_EXPRs :-)

It may also be a problem to not run const/copy prop before insert
ASSERT_EXPRs as noted in the problem of

>   D.18001_101 = D.18001_198;
>   D.18011_102 = D.18001_101->head_;
>   D.18001_12 = ASSERT_EXPR ;

where we are asserting a copy of D.18001_198, and D.18001_198 itself
may also be used later.

Kazu Hirata

Obsoleting c4x last minute for 4.0

2005-04-05 Thread Kazu Hirata

Hi,

I would like to propose that the c4x port be obsoleted for 4.0.

c4x-*
tic4x-*

  The primary reason is that the port doesn't build.

  Richard Sandiford's recent patch allows us to go further during the
  build process, but the port still does not build.

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01840.html

  I've filed the current build problem as PR 20765.

  There are two other outstanding PRs against c4x, namely PR 14436 and
  PR 17824.  Note that the maintainer hasn't commented on either one
  of these PRs, which worries me a bit.

  If the port doesn't build, we cannot clean up the interface between
  the middle end and back ends because we cannot even do minimal
  testing of "port builds".  The PREDICATE_CODE clean-up is almost
  over, but c4x is blocking the last one.  Given that the stage 2
  hasn't even started, we may be able to clean up a few other things
  for 4.1 (if every port builds at least).

Mark, it's still April 5, so we could wait for a week and add a couple
of lines to config.gcc in time for 4.0, but if you think this is too
pushy or could distract the 4.0 effort, I am happy to postpone this
until 4.0.1 or 4.1.

Kazu Hirata

Status of conversions to predicates.md

2005-04-07 Thread Kazu Hirata

Hi,

c4x

  The port doesn't build.

cris

  A patch has been posted.

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01705.html

  Hans-Peter Nilsson says he wants to modify the port, so the
  predicate conversion patch may need to be adjusted.

i860

  Obsoleted.

ip2k

  Obsoleted.

mmix

  A patch has been posted.

http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00272.html

  Hans-Peter Nilsson says he will pick up my work and finish up the
  conversion.

sh

  A patch has been posted.

http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00346.html

  Joern Rennecke requested me to hold off.

sparc

  Eric Botcazou says he is working on predicates.md himself.

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01694.html

Kazu Hirata

Bootstrap failure on i686-pc-linux-gnu since 2005-04-09 20:41UTC

2005-04-11 Thread Kazu Hirata

Hi,

I have been getting bootstrap failures on i686-pc-linux-gnu since
Caroline's patch

  http://gcc.gnu.org/ml/gcc-cvs/2005-04/msg00541.html

Errors go like so:

org/xml/sax/helpers/ParserAdapter.java: In class 
'org.xml.sax.helpers.ParserAdapter':
org/xml/sax/helpers/ParserAdapter.java: In constructor '()':
org/xml/sax/helpers/ParserAdapter.java:84: internal compiler error: 
Segmentation fault

I have confirmed that the first patch attached to PR 20934 fixes this
problem.  Looking at the audit trails in PR 20934, I see that Caroline
has been working on this problem.  What is the status of the patch?
Is there more work or refinement needed?

Kazu Hirata

About the number of DOM's iterations.

2005-04-12 Thread Kazu Hirata

Hi,

Note that with -O1, we limit the number of iterations of DOM to 1 so
that we can get resonable compile time performance.  I was wondering
what happens if we do the same with -O2 now that we have passes like
copy-prop, VRP, and improved FRE as well as good and old CCP.

Numbers go first.  All of the following numbers are obtained from
compiling cc1-i files with -O2.

version #iters #blocks  %cov #thread

 v0   9097  514501 94.5%9438
 v1   8602  553685 95.5%   20868
 v2   8609  510618 96.3%   23236
 v3   8490  510734 97.2%   23262
 v4   7193  188793   N/A   18450

Legend
--

v0 : Vanilla mainline with the second and third runs of DOM disabled.
v1 : v0 with Jeff's patch in PR 19794
v2 : v1 with the following TCB-like tweak to the pass ordering

  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_fre);
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_vrp);
  NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_dominator);

v3 : v2 with my queued VRP improvements
v4 : v3 with the number of DOM's iterations limited to 1

#iters : the number of times we iterate in the do-while loop in
tree-ssa-dom.c:tree_ssa_dominator_optimize.

#blocks : the number of blocks scanned by DOM.  This is done by adding
one line at the entrance of the aforementioned do-while loop.

  num_cfg_blocks += n_basic_blocks;

%cov : the percentage of invocations of DOM that finished its work
within 2 iterations.

#threaded : the number of edges threaded

Disclaimers
---

o I have not measured compile time performance.
o I have not done benchmark on generated code.
o I have not tried any other pass ordering.
o I have not considered other testcases or languages.

Justifications for v0, ..., v4 and columns
--

v0 is because I wanted to stay simple and focus on the initial basic
propagation.  v1 is because Jeff will soon remove various restrictions
on the jump threader that have been limiting jump threading.  For v2,
note that ccp, copy-prop, FRE, and VRP are all something that DOM does
to some extent.  I put my merge_phi right before DOM so that DOM can
find more jump threading opportunities in its first iteration.

#iter is mostly out of curiosity.  Functions that we compile come in
different size, so the number isn't that important.  For the same
reason, #blocks is more interesting.  It's a rough measure of how much
time DOM takes.

Now %cov.  Why 2 iterations?  Well, DOM has a fixed point operation,
meaning that the last iteration does not do anything.  More
specifically, the last iteration does not perform any jump threading.
Whenever jump threading opportunities are taken, cleanup_tree_cfg
called from tree_ssa_dominator_optimize always returns 1, triggering
the next iteration.

Observations


Even with v0, the vast majority of functions require at most two
iterations, so we are in diminishing returns territory to begin with.
With v3, we get impressive 97.2%, meaning that 97.2% of time, DOM
iterates at most two times, which in turn means that 97.2% of time, we
don't take any jump threading opportunity in DOM's second iteration.

Note that with v4, the number of blocks that DOM looks at goes down
significantly.  At the same time, the number of edges threaded also
goes down but is still close to the level of v1.

It's interesting that even if we let DOM iterate as many times as it
likes to iterate, there is a sizable difference between v1 and v2 as
far as the number of edges threaded.  Certain things cannot be
propagated using DOM.

Future work
---

o Address things listed in Disclaimers.
o Submit my queued VRP patches as soon as mainline gets a bit more
  stable.
o Speed up things like the incremental SSA update, and the propagator.
o Try out some ideas to get more out of the first iteration of DOM.
o Take advantage of value ranges in thread_across_edge.

Kazu Hirata

What's the fate of VARRAY_*?

2005-04-21 Thread Kazu Hirata

Hi Nathan,

Now that you've checked in your new VEC API, I think that's strictly
more powerful than VARRAY.  Should we start converting uses of VARRAY
to VEC?  I'd be happy to help out here.

Kazu Hirata

Re: Regression on mainline in tree-vrp.c

2005-04-22 Thread Kazu Hirata

Hi Rainer and Steve,

> | 4.1.0 20050419 (experimental) (i386-pc-solaris2.10) GCC error:   |
> | in set_value_range, at tree-vrp.c:124|
> | Error detected at sem_intr.adb:437:1 |

There seem to be several ways to get to this ICE.  set_value_range is
used in many places in tree-vrp.c.  I'll try to take a look at these
in the next few days.

Kazu Hirata

Merging stmt_ann_d into tree_statement_list_node

2005-04-24 Thread Kazu Hirata

Hi,

I am thinking about merging stmt_ann_d into tree_statement_list_node.

Background
--

tree_statement_list_node is a linked list node defined like so

struct tree_statement_list_node {
  struct tree_statement_list_node *prev;
  struct tree_statement_list_node *next;
  tree stmt;
};

stmt_ann_d is an annotation for a statement ("stmt" above) that
includes operand cache, the basic block that the statement belongs to,
and various flags among other things.

Justifications
--

o We have a 1:1 correspondence between tree_statement_list_node and
  stmt_ann_d.  Why have separate data structures?

o Cache win.  Calling bsi_stmt loads tree_statement_list_node into
  memory, which might in turn bring members of stmt_ann_d into cache.
  Note that it's very common to call bsi_stmt and then do something
  with operand cache while walking statements in a basic block.

o bsi_for_stmt becomes an O(1) operation.  Steven Bosscher says that
  the CFG inliner will hit bsi_for_stmt hard.

o Less memory management overhead.

o One step closer to a flat statement structure (or a tuple form if
  you like).  It's a bit silly that we have GIMPLE, but we still do
  not have a flat structure.

o We can probably remove get_stmt_ann and replace all uses of
  stmt_ann.  No more lazy stmt_ann allocation.

Thoughts?

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Zdenek,

> guessing from (*) below, you probably just forgot to mention that it also
> involves changing SSA_NAME_DEF_STMT to point to the
> tree_statement_list_node rather than the statement itself,
> otherwise there is no way how to get the
> annotation for SSA_NAME_DEF_STMT (name)?

I am not planning to change SSA_NAME_DEF_STMT yet.  If stmt_ann is
embedded into tree_statement_list_node, we can compute the address of
tree_statement_list_node from that of stmt_ann.  I am planning to keep
a pointer from a statement proper to a stmt_ann for now.

> It is definitely a good idea, IMHO.

Thanks!

I might need your help later as a loop optimizer expert.  I just found
yesterday that tree-ssa-loop-im.c:schedule_sm is using stmt_ann before
a statement proper is linked to a basic block.  Specifically, it
creates a statement, attaches a stmt_ann, and puts it on edge.  We
probably have to one of the following.

1) delay the creation of stmt_ann until after a statement is linked,

2) create a stmt_ann just like we do now, and then copy its contents
   to stmt_ann embedded in tree_statement_list_node, or

3) create a tree_statement_list_node in schedule_sm, which contains
   stmt_ann, and link it to a basic block.  (This option would require
   bigger changes as we are currently supposed to put a statement in
   PENDING_STMT, not a tree_statement_list_node.  Plus, bsi_insert_*
   and tsi_link_* stuff all expect a statement.)

Right now I am just instrumenting stmt_ann to see where people are
using stmt_ann while a statement is not linked to a basic block.

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Andrew,

> It will also involve figuring out what to do with DOM, which has some
> hacks via a call to create_ssa_artficial_load_stmt() which causes a
> non-stmt to have a stmt annotation and operands created for it.  I've
> been bugging Diego to replace the parts of DOM which use this so we can
> remove the hideous abberation, but it still exists at this point. 
> 
> I believe DOM uses the routine to create a dummy expression for a store
> which looks like a load. This load is then entered into the equivalency
> tables such that the store of the expression also shows up as a load
> allowing subsequent loads to be subsumed.  Or something like that
> anyway.  Bottom line is it creates a stmt annotation and operands for an
> expression which is not in any stmt list.

Does this standalone stmt_ann end up in the instruction stream?  That
is, are we going to have a statement as a part of a basic block whose
stmt_ann is the one created in create_ssa_artficial_load_stmt?

If not, I guess we can let DOM do what it wants to do because I am
currently thinking about simply embedding stmt_ann into
tree_statement_list_node like so.

struct tree_statement_list_node
  GTY ((chain_next ("%h.next"), chain_prev ("%h.prev")))
{
  struct tree_statement_list_node *prev;
  struct tree_statement_list_node *next;
  tree stmt;
  struct stmt_ann_d ann;
};

Though leaving this standalone stmt_ann as isn't probably the cleanest
thing to do.  Eventually, I want to completely hide
creation/destruction of stmt_ann from applications (or optimizers if
you like) of the infrastructure.

If this standalone stmt_ann does end up in the instruction stream, we
could structure-copy stmt_ann for the time being (yuck!).

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Andrew,

> > I might need your help later as a loop optimizer expert.  I just found
> > yesterday that tree-ssa-loop-im.c:schedule_sm is using stmt_ann before
> > a statement proper is linked to a basic block.  Specifically, it
> > creates a statement, attaches a stmt_ann, and puts it on edge.  We
> > probably have to one of the following.
> > 
> > 1) delay the creation of stmt_ann until after a statement is linked,
> > 
> > 2) create a stmt_ann just like we do now, and then copy its contents
> >to stmt_ann embedded in tree_statement_list_node, or
> > 
> > 3) create a tree_statement_list_node in schedule_sm, which contains
> >stmt_ann, and link it to a basic block.  (This option would require
> >bigger changes as we are currently supposed to put a statement in
> >PENDING_STMT, not a tree_statement_list_node.  Plus, bsi_insert_*
> >and tsi_link_* stuff all expect a statement.)
> > 
> > Right now I am just instrumenting stmt_ann to see where people are
> > using stmt_ann while a statement is not linked to a basic block.
> 
> These are in fact issues with immediate_uses too.  I fixed the ones I
> found that were causing problems, but the stmt annotation and operands
> should *NOT* be created until a stmt is actually linked the program via
> a bsi_ routine or some such thing. 

Ah, so we should go with 1) above wherever possible.  Once these uses
of stmt_ann are fixed, we can arrange things so that we *CANNOT
POSSIBLY* create stmt_ann and operands (except maybe in
create_ssa_artficial_load).  Then the actual work of embedding
stmt_ann into tree_statement_list_node shouldn't be difficult.

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Andrew,

> well, there isnt a tree_statement_list_node right... DOM simply has an
> expression, and we end up calling create_stmt_ann.  I guess you would
> have to create a completely detached stmt_list node for this case.  what
> are you planning to have create_stmt_ann do? 

I am thinking about leaving create_stmt_ann as is just for
create_ssa_artficial_load_stmt.  Other than that, I don't want to call
create_stmt_ann.  I am planning to allocate stmt_ann as a part of
tree_statement_list_node.  That is, if I have this

struct tree_statement_list_node
  GTY ((chain_next ("%h.next"), chain_prev ("%h.prev")))
{
  struct tree_statement_list_node *prev;
  struct tree_statement_list_node *next;
  tree stmt;
  struct stmt_ann_d ann;
};

then tsi_link_after and tsi_link_before would allocate the right
amount of memory because they have

  head = ggc_alloc (sizeof (*head));

where head is a a pointer to tree_statement_list_node.  I still have
to initialize members of stmt_ann_d.  I believe all I have to do is

  memset (ann, 0, sizeof (ann));
  ann.common.type = STMT_ANN;
  ann.modified = true;

> I eventually want to see this entire routine go away. The only reason
> DOM wants it is so that it can scan the operands like it does for a real
> stmt. noone else needs to do such a thing.. Presumably the overhead of
> actually inserting these loads immediately after the store and deleting
> them later is too high...   Ive never looked at the details of what DOM
> does with the info.  if it only needs the operands for a short time, a
> different interface could be provided which returns the operands in a 
> VEC or something. I just never felt like looking at DOM too heavily.

I guess we can come back to this once stmt_ann is merged into
tree_statement_list_node.  I don't feel like mixing infrastructure
work and optimizer work.

By the way, I just realized you were planning to do "New SSA Operand
Cache Implementation" in stage 2, which is starting tomorrow.  Would I
be stepping on your toes?  I am not planning to change the members of
stmt_ann, so hopefully my work won't interfere with yours, but if it
does, please let me know.

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Zdenek,

> Once this is done, I would like to start working on moving gimple out of
> trees to the "flat" structure.

Very cool!  I am glad you'll be working on this!

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Andrew,

> How is the get_stmt_ann (stmt) going to work from the caller of
> create_ssa_artificial_load_stmt() when it goes to look at the operands
> for its new "stmt"?   

I am going to replace that particular call to get_stmt_ann with a call
to create_stmt_ann.  The beginning of create_ssa_artficial_load_stmt
would look like

  /* Make sure we don't have a stmt_ann already.  */
  gcc_assert (stmt_ann (stmt) == NULL);

  /* Create a standalone stmt_ann that is not a part of
 tree_statement_list_node.  */
  create_stmt_ann (stmt);

> The annotation was simply attached to the stmt before, so it was
> easy to locate.  There is no bsi_for_stmt() since it isnt a real
> stmt, so you can't get to a stmt_list node that way either.  Im not
> sure how you would locate the annoation for a fake stmt like this.

I am going to keep stmt_ann() so that you can get to stmt_ann from
stmt even for a fake statement.

Eventually, we might want to replace stmt_ann() with
stmt_ann_from_bsi() so that you can get to stmt_ann from bsi directly
without going through a statement proper.  I think this is a good
follow-up project and should be a good preparation for Zdenek's flat
statement tree project.

> And DOM treats it like a real stmt by calling the same routines, so I
> dont know how you could change this without first making DOM do
> something different.

Again, I'll let DOM create its own instance of stmt_ann_d for stores.
Since it's only used for look-ups and never entered into the
instruction stream, I don't think that would cause a problem.

> I have the patch done, I'm evaluating whether its worth using or not.
> 
> I doubt you would be stepping on my toes any more than the annoyance of
> diffs not applying all over the place like they did with
> get_stmt_operands() being removed.

OK.  Meanwhile I'll sit here quietly, collecting information like
which places do weird things with stmt_ann and thinking about how to
go about the transition.

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Andrew,

> OK, let me word it a different way.  What is create_stmt_ann (stmt)
> going to do with the annotation it creates? You are moving the
> stmt_ann_t pointer from the stmt node to the tree_statement_list_node...
> so there will be nothing in the stmt node to attach the annotation to
> like there is now.

Ah, OK.  I am going to keep

  union tree_ann_d *ann;

in tree_common.  So create_stmt_ann will store a pointer to stmt_ann_d
in ann in tree_common just like before.  In other words, for normal
cases, we would have

  tree_statement_list_node+stmt_ann_d
  |  ^
  | via stmt | via ann in tree_common
  v  |
MODIFY_EXPR -+

For DOM's case, we would have

   stmt_ann_d
 ^
 | via ann in tree_common
 |
MODIFY_EXPR -+

So stmt_ann_d won't be hanging in space.

> I though the whole point of this exercise was to remove the
> stmt_ann_t pointer from the stmt node, and put it directly in the
> tree_statement_list_node.  Im saying that will not work until you
> either change DOM to not use these fake stmt's, or figure out some
> way of associating a stmt_ann to these fake stmts.

No, I would like to remove stmt_ann_t pointer from the stmt node, but
not yet.  All I want to do in this project is to put
tree_statement_list_node and stmt_ann_d next to each other in memory
while doing necessary adjustments in the compiler.  Nothing more than
that.

Yes, it is nice to get rid of stmt_ann_t pointer from the stmt node,
and I want to eventually get that, but not in this project.  I think
there are so many places that use "stmt" instead of "bsi".  They may
either pass it to subroutines or put it in some internal data
structures, so I would like to do that separately.  So if you like,
the roadmap might be something like

1. Put tree_statement_list_node and stmt_ann_d next to each other in
   memory while doing necessary adjustments.

2. Provide stmt_ann_from_bsi(), a function that gives you stmt_ann
   given bsi.

3. Let everybody use stmt_ann_from_bsi(), replace stmt_ann(), and
   remove the stmt_ann_t pointer from the stmt node (if these are
   possible at all).

4. Hand off to Zdenek for his flat statement project.

I hope this message clarifies things.

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Adnrew,

> > No, I would like to remove stmt_ann_t pointer from the stmt node, but
> > not yet.  All I want to do in this project is to put
> > tree_statement_list_node and stmt_ann_d next to each other in memory
> > while doing necessary adjustments in the compiler.  Nothing more than
> > that.
> 
> does that buy us much?

I have to try, but it seems to be the lowest hanging fruit as far as
the instruction chain infrastructure goes.

> Then why not simply shorten this to:
> 
> 1) put stmt annoation directly in the tree_statement_list_node
> 
> 2) replace stmt_ann_t pointer in stmt with a pointer to its BSI, or the
> tree_statement_list_node.  This makes bsi_from_stmt O(1) immediately.
> 
> 3) all stmts now have access to the stmt_ann directly in the
> stmt_list_node, nothing changes except get_stmt_ann() returns the
> address of the stmt ann within the tree_stmt_list_node.
> 
> 4) For fake stmts you will have to allocate a fake tree_stmt_list_node
> which isnt linked with anything, and set the bsi pointer to that... then
> you have the annotation and access to it from the stmt.
> 
> that seems like a quicker more direct approach then phasing it in
> slowly, with less impact to anyone, and gives you immediate benefits. I
> would think the patch to do this would be fairly small and
> non-intrusive.

This looks like a better approach.  How would we do your step 1?  We
have var_ann and tree_ann in addition to stmt_ann.  Shall we put a
type field at the beginning of tree_statement_list_node+stmt_ann_d so
that an annotation node can identify itself?  (Since all these tree
annotations already have a field for annotation type, it's more like
appending tree_statement_list_node to stmt_ann_d.)

Kazu Hirata

Re: Merging stmt_ann_d into tree_statement_list_node

2005-04-25 Thread Kazu Hirata

Hi Zdenek,

> > This looks like a better approach.  How would we do your step 1?  We
> > have var_ann and tree_ann in addition to stmt_ann.  Shall we put a
> > type field at the beginning of tree_statement_list_node+stmt_ann_d so
> > that an annotation node can identify itself?  (Since all these tree
> > annotations already have a field for annotation type, it's more like
> > appending tree_statement_list_node to stmt_ann_d.)
> 
> I would go just for having
> 
> union
> {
>   struct stmt_list_node *container;   /* For gimple statements.  */
>   tree_ann_t ann; /* For everything else.  */
> }

Err, I don't see how to tell the garbage collector about this without
a type field.  We cannot rely on TREE_CODE (stmt) because CALL_EXPR
may appear by itself or as an rhs of MODIFY_EXPR.

Kazu Hirata

Re: Struggle with FOR_EACH_EDGE

2005-04-29 Thread Kazu Hirata

t;vec[0];
  D.15284_47 = e_57->dest;
  if (D.15284_47 == 0B) goto ; else goto ;

:;
  ivtmp.198_31 = vec__44;
  goto  ();

For normal mode of operation, we have

bb0 -> L17 -> L18 -> L19 -> L30 -> L3 -> L9
   ^ |
   +-+

This is similar to the previous case, but notice that we have two
induction variables, namely ivtmp.195_53 and ivtmp.198_7.  My guess is
that a bunch of * and & are confusing the optimizers, but that's just
my guess.

Can we somehow have the best of the two worlds?  I want to use the
container field (as in the last illustration), but I also want to use
VEC_iterate.

Note that the second case above speeds up GCC by 0.5% or so.

Kazu Hirata

Re: Struggle with FOR_EACH_EDGE

2005-05-01 Thread Kazu Hirata

Hi Nathan,

> Kazu, I hope I have time to look in detail at your investigation, however
> one thing that might be causing a problem is that FOR_EACH_EDGE is an iterator
> that works for a non-constant VEC.  This means there's an extra level of
> indirection that the optimizer cannot remove, unless it completely inlines
> the body of the loop (there might be some cases it can do it without inlining,
> but I suspect not).  I wonder if it is worthwhile implementing FOR_EACH_EDGE
> and FOR_EACH_EDGE_VOLATILE (I want the default, shorter name, to be the
> constant iterator, so that you have to chose to use the non-constant one).

Yes, the iterator that works for a non-constant VEC adds some
overhead.  I am wondering if it's OK to tweak FOR_EACH_EDGE like so

#define FOR_EACH_EDGE (...)\
  if (vec == NULL) \
/* The vector is empty.  We don't have anything to do.  */ \
e = NULL;  \
  else \
for (/* usual iterator stuff */)

In other words, I am wondering if it's safe to assume that nobody
calls VEC_free during edge vector iteration.  (I know calling VEC_free
sounds crazy, but I want to hear second opinions).  If I am allowed to
put an "if" like this, the optimizers (or slightly reordered ones) can
remove the null check in the loop.

Anyway, there seem to be several things going on.

1. FOR_EACH_EDGE itself could be improved (like above).

2. VRP is not really effective in this case because it is run before
   SRA, meaning it cannot get at as many scalar variables.  (Andrew
   Pinski pointed this out on IRC).

3. VRP does not know that &x->a is nonnull is x is nonnull.  (PR21294)

I'll start with a low hanging fruit.

Kazu Hirata

Questions about a constant array reference in the C++ front end

2005-05-07 Thread Kazu Hirata

Hi,

I have two questions about the C++ front end.

Consider a C++ program

static const int array[] = { 0, 0, 0, 1, 1, 1, 2, 2, 2 };

int
foo (int a)
{
  return array[7];
}

I am trying to fold array[7] into 2.  It turns out that if T is an
ARRAY_REF,

  TREE_READONLY (TREE_OPERAND (T, 0))

is 0.  Why?  This would be 1 if the program is fed into the C front
end, which is needed to safely fold a constant array reference.

Another question.  How is a RANGE_EXPR used in a C++'s array
constructor?  The CONSTRUCTOR section of tree.def says

   The TREE_PURPOSE of each node is the corresponding index.
   If the TREE_PURPOSE is a RANGE_EXPR, it is a short-hand for many nodes,
   one for each index in the range.  (If the corresponding TREE_VALUE
   has side-effects, they are evaluated once for each element.  Wrap the
   value in a SAVE_EXPR if you want to evaluate side effects only once.)

I created an array with more than one thousand elements.  I still did
not see a RANGE_EXPR in the array's CONSTRUCTOR.  How do I get a
RANGE_EXPR in a CONSTRUCTOR?

Kazu Hirata

Re: Questions about a constant array reference in the C++ front end

2005-05-08 Thread Kazu Hirata

Hi Nathan,

> this is untrue. the elements hold the qualification.

Then how do I know that an array is declared with const (or static
const)?  When I looked at the CONSTRUCTOR stored in the DECL_INITIAL
of an array, I saw it with the TREE_STATIC flag set regardless of
whether the array is declared with const, so that's not useful to
determine whether the array is declared with const.

All I need is a sufficient condition that works with C, C++, and any
other language that GCC supports to determine whether it's safe to
fold array[CST] into a constant.

> Incorrect.  RANGE_EXPRs get generated during processing of an array's
> initializer -- very early on in the C++ FE.

Do you know how to trigger a RANGE_EXPR?  I see that in
build_zero_init, so I tried a big array with only zeros in it, but
that didn't help.

Kazu Hirata

Re: Questions about a constant array reference in the C++ front end

2005-05-09 Thread Kazu Hirata

Hi Mark,

> That's a bug, or, rather, a place where the C++ front-end is failing
> to give full information to the optimizer.  It should be
> TREE_READONLY.  There are some tricky bits, in that if we're doing a
> dynamic initialization, we presently cannot mark it TREE_READONLY,
> but this is a static initialization, so it should be fine.  Isn't
> TREE_OPERAND (T, 0) the VAR_DECL for array itself?

Yes, if T is an ARRAY_REF, TREE_OPERAND (T, 0) is the VAR_DECL for the
array itself.

> If so, waht's probably going wrong is that it's not being marked
> TREE_READONLY because we're afraid of the dynamic initialization
> case.  We're missing a call to c_apply_quals_to_decl (sp?)
> somewhere.

Thank you for the function name.  I might dig into the C++ front end.

Anyway, I filed this "bug" as PR 21454, saying that the contents of a
"completely static" array ends up in the .data section, not the
.rodata.  It turned out to be a regression from 3.x.

Kazu Hirata

A question about tree-if-conv.c:tree_if_convert_stmt.

2005-05-10 Thread Kazu Hirata

Hi,

According to the CVS annotate of tree-if-conv.c, Devang Patel checked
in the following code in tree_if_convert_stmt.

case GOTO_EXPR:
  /* Unconditional goto */
  add_to_predicate_list (bb_for_stmt (TREE_OPERAND (t, 1)), cond);

Note that TREE_OPERAND (t, 1) seems wrong.  A GOTO_EXPR takes only one
operand.  I am pretty sure this part of the code has never been run
because if it had been, then the compiler would have ICEd by now.

Now, what would be the right fix here?  Something like this?

  case GOTO_EXPR:
break;

As far as I know, all jumps except computed gotos are are implicitly
represented by the CFG.  I don't think the if-conv is trying to handle
computed gotos here.

By the way, this problem has been filed as PR 18472.

Kazu Hirata

Re: A question about tree-if-conv.c:tree_if_convert_stmt.

2005-05-10 Thread Kazu Hirata

Hi Devang,

> Just remove handling of GOTO_EXPR here and in if_convertible_stmt_p().

OK.  Will submit a patch.

Kazu Hirata

No tail call optimization in Thumb mode?

2005-06-23 Thread Kazu Hirata

Hi,

Why is tail call optimization for Thumb disabled on GCC?  I am
wondering if this is a TODO item or something that we cannot do
intrinsically.

"The ARM-THUMB Procedure Call Standard" says "No tail continuation in
Thumb-state" several times in its figures and measurements, but the
document doesn't explicitly forbid tail call optimization for Thumb.

Thanks in advance,

Kazu Hirata

A trouble with libssp in one-tree builds

2005-07-04 Thread Kazu Hirata

Hi Jakub,

I am having a trouble with libssp in one-tree builds.  That is, if I
try to build binutils and gcc at the same time, libssp/configure
complains while compiling (and linking) the following program and the
build process stops.

/* confdefs.h.  */

#define PACKAGE_NAME "libssp"
#define PACKAGE_TARNAME "libssp"
#define PACKAGE_VERSION "1.0"
#define PACKAGE_STRING "libssp 1.0"
#define PACKAGE_BUGREPORT ""
#define PACKAGE "libssp"
#define VERSION "1.0"
/* end confdefs.h.  */

int
main ()
{

  ;
  return 0;
}

The problem is that libssp/configure contains a link test, but we may
not have crt0.o built yet (assuming targets like h8300-elf,
arm-none-eabi, etc).

Is there anyway we could stop configuring libssp in this kind of case?
Or somehow work around the link test?  FWIW, I am sitting on the
attached patch to disable libssp for now.

Kazu Hirata

Index: configure.in
===
RCS file: /home/gcc/repos/gcc/gcc/configure.in,v
retrieving revision 1.355
diff -u -d -p -r1.355 configure.in
--- configure.in2 Jul 2005 08:50:56 -   1.355
+++ configure.in4 Jul 2005 04:42:35 -
@@ -307,6 +307,14 @@ if test "${ENABLE_LIBADA}" != "yes" ; th
   noconfigdirs="$noconfigdirs gnattools"
 fi
 
+AC_ARG_ENABLE(libssp,
+[  --enable-libsspBuilds libssp directory],
+ENABLE_LIBSSP=$enableval,
+ENABLE_LIBSSP=yes)
+if test "${ENABLE_LIBSSP}" != "yes" ; then
+  noconfigdirs="$noconfigdirs configure-target-libssp target-libssp"
+fi
+
 # Save it here so that, even in case of --enable-libgcj, if the Java
 # front-end isn't enabled, we still get libgcj disabled.
 libgcj_saved=$libgcj

A question about having multiple insns for one operation

2005-11-20 Thread Kazu Hirata

Hi,

I have a question about having multiple insns for one operation.  Take
m68k port for example.  It has many SImode move insns like so:

(define_insn "pushexthisi_const"
  [(set (match_operand:SI 0 "push_operand" "=m")
(match_operand:SI 1 "const_int_operand" "J"))]

(define_insn ""
  [(set (match_operand:SI 0 "nonimmediate_operand" "=a")
(match_operand:QI 1 "address_operand" "p"))]

(define_insn "movsi_const0"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(const_int 0))]

(define_insn ""
  ;; Notes: make sure no alternative allows g vs g.
  ;; We don't allow f-regs since fixed point cannot go in them.
  [(set (match_operand:SI 0 "nonimmediate_operand" "=g,d,a<")
(match_operand:SI 1 "general_src_operand" "daymSKT,n,i"))]
  "!TARGET_COLDFIRE"

Now, is it OK to have this many?  IIRC, reload makes certain changes
to insns without re-recognizing modified ones.  (Well, it used to at
least.)  A few years ago, I ran into this kind of trouble in H8 port
having multiple add insns, which were cleaned up soon afterwards.
Aside from reload issues, are there other potential problems with
having multiple insns for one operation?  Maybe maintainance reasons?
Having multiple move insns may make it difficult for people to figure
out which pattern a given insn matches because we would have to go
through a machine description to find the first matching insn.  (By
the way, I do understand that it's OK to have multiple patterns for
different targets because values of TARGET_xxx don't change within one
run of cc1.)

FWIW, the second define_insn above could have "plus" in
operands[1], in which case it overlaps with other insns with plus.

Thanks in advance,

Kazu Hirata

A questionable predicate in sh/predicates.md

2006-01-08 Thread Kazu Hirata

Hi,

In sh/predicates.md, I see

(define_predicate "sh_rep_vec"
  (match_code "const_vector")
{
  int i;
  rtx x, y;

  if ((GET_CODE (op) != CONST_VECTOR && GET_CODE (op) != PARALLEL)
  || (GET_MODE (op) != mode && mode != VOIDmode))
return 0;

Notice that match_code at the beginning does not mention PARALLEL, but
we have GET_CODE (op) != PARALLEL later.  Is this predicate intended
to accept PARALLEL as well?  If so, should we change the match_code at
the beginning?  When I did the conversion from PREDICATE_CODES to
predicates.md, I did so in a mechanical way, so loose things like this
might have gone unnoticed.

Kazu Hirata

Re: bootstrap broken

2006-01-11 Thread Kazu Hirata


Hi Andreas,


/cvs/gcc-svn/trunk/gcc/df-core.c: In function ‘df_compact_blocks’:
/cvs/gcc-svn/trunk/gcc/df-core.c:795: error: invalid lvalue in assignment
/cvs/gcc-svn/trunk/gcc/df-core.c:803: error: invalid lvalue in assignment
/cvs/gcc-svn/trunk/gcc/df-core.c: In function ‘df_bb_replace’:
/cvs/gcc-svn/trunk/gcc/df-core.c:833: error: invalid lvalue in assignment


I just fixed these.

Kazu Hirata

What shall we do with gcc.dg/vect/vect-11.c?

2006-01-16 Thread Kazu Hirata

Hi Dorit,

What shall we do with gcc.dg/vect/vect-11.c?  After I fixed PR
tree-optimization/25125, gcc.dg/vect/vect-11.c started failing on some
targets but not on x86_pc-linux-gnu.  If we XFAIL it, we would get
XPASS on x86_pc-linux-gnu.  I am pretty sure that adding -fwrapv will
fix the problem, but that means we are changing the test case.  Shall
we add something like gcc.dg/vect/fwrapv-vect-11.c, remove vect-11.c,
and file a missed optimization PR for vect-11.c?

Any thoughts?

Kazu Hirata

Re: What shall we do with gcc.dg/vect/vect-11.c?

2006-01-16 Thread Kazu Hirata


Hi Eric,


What shall we do with gcc.dg/vect/vect-11.c?  After I fixed PR
tree-optimization/25125, gcc.dg/vect/vect-11.c started failing on some
targets but not on x86_pc-linux-gnu.



It's gcc.dg/tree-ssa/gen-vect-11.c and it presumably fails everywhere.  
But it is not run on x86-64 and ia64 so you don't see the failure there.


Oops, I now see what's going on.  In this case, I think we can just XFAIL 
gcc.dg/tree-ssa/gen-vect-11.c.


Kazu Hirata

Use of VARRAY in alias analysis

2006-04-10 Thread Kazu Hirata

Hi Daniel and Diego,

Several months ago, you mentioned that the alias analysis in gcc would
be overhauled.  Has this happened yet?  If not, I am thinking about
working on alias.c and tree-ssa-alias.c as there are only a handful of
VARRAY uses left.

Thanks,

Kazu Hirata

Re: Use of VARRAY in alias analysis

2006-04-10 Thread Kazu Hirata


Hi Diego,


Several months ago, you mentioned that the alias analysis in gcc would
be overhauled.  Has this happened yet?  If not, I am thinking about
working on alias.c and tree-ssa-alias.c as there are only a handful of
VARRAY uses left.



It's happening in the mem-ssa branch.  But switching from VARRAY to VEC
will probably not affect my work, so go ahead.  I guess that's what you
have in mind?  Switching VARRAY to VEC?


Yes, thanks!

Kazu Hirata

A question about tree-ssa-structalias.h.

2006-04-13 Thread Kazu Hirata

Hi Daniel and Diego,

I see that tree-ssa-structalias.h contains

  varray_type num_references;

around line 58 as a member of alias_info, but it doesn't seem to be
used anywhere.  Is this going to be used in near future?  VARRAY is
going away pretty soon, so we have to do something -- either remove
the line or convert it to VEC somehow.

Thanks,

Kazu Hirata

A question about the VEC API

2006-04-16 Thread Kazu Hirata

Hi Nathan,

Does the VEC API support the following?

typedef unsigned char vec_uchar;
DEF_VEC_I(vec_uchar);
DEF_VEC_ALLOC_I(vec_uchar,gc);

Note that this is GC'ed.  When I use this construct in except.c, I get

libbackend.a(except.o): In function `gt_ggc_mx_eh_status':
./gt-except.h:104: undefined reference to `gt_ggc_mx_VEC_vec_uchar_gc'
./gt-except.h:106: undefined reference to `gt_ggc_mx_VEC_vec_uchar_gc'
libbackend.a(except.o): In function `gt_pch_nx_eh_status':
./gt-except.h:221: undefined reference to `gt_pch_nx_VEC_vec_uchar_gc'
./gt-except.h:223: undefined reference to `gt_pch_nx_VEC_vec_uchar_gc'

The reason I'd like to use GC memory for these integer arrays is
because the structure, namely eh_status, that contains the integer
arrays is GC'ed.

I have a workaround like so

typedef struct vec_uchar_struct GTY(()) {
  unsigned char uc;
} vec_uchar;

DEF_VEC_O(vec_uchar);
DEF_VEC_ALLOC_O(vec_uchar,gc);

But this is more complicated than it should be.  If the VEC API does
not support an integer array in GC, what should I do to add that?
Some sketch would be greatly appreciated.

Thanks,

Kazu Hirata

CC0 questions

2006-05-09 Thread Kazu Hirata

Hi,

Can a jump_insn use cc0 but not actually jump?  The following
instruction is found in the H8 port.

(jump_insn 18 17 20
   (set (reg:HI 3 r3)
(eq:HI (cc0)
   (const_int 0

Is there a requirement that every cc0 user must be a jump_insn?

Can there be two consecutive insns that use cc0 after cc0 is set?  (I
don't think so.  I think there should be one-to-one correspondence
between cc0 setters and cc0 users, but I've never seen a rule written
somewhere.)

Thanks,

Kazu Hirata

A question about your patch for PR c++/26534

2006-05-11 Thread Kazu Hirata

Hi Mark,

I have a question about your patch for PR c++/26534.  In particular,
you added to standard_conversion in call.c the following code:

  if (expr)
{
  tree bitfield_type;
  bitfield_type = is_bitfield_expr_with_lowered_type (expr);
  if (bitfield_type)
from = bitfield_type;
}

Now consider the following testcase derived from PR c++/27505 and my
analysis on why it fails.

struct s {
  bool field:8;
};

void
foo (struct s *p)
{
  if (!p->field)
;
}

When build_unary_op builds TRUTH_NOT_EXPR, it calls
perform_implicit_conversion to convert p->field to the boolean type.
(FWIW, p->field is expressed as 
>).  The call to perform_implicit_conversion eventually
gets to standard_conversion.  The code fragment of standard_conversion
shown above changes FROM to the boolean_type.  Since TO is also the
boolean type, no conversion happens, causing
perform_implicit_conversion to return the original expression, which
is of INTEGER_TYPE.

invert_truthvalue, called from cp/typeck.c:3969, expects nothing but
expressions of BOOLEAN_TYPE, so it ICEs at fold-const.c:3165.

Now, is the code in standard_conversion meant to be an optimization or
something that's required for language conformance?  If the latter is
the case, can we somehow force a conversion?

Thanks,

Kazu Hirata

A question about TYPE_ARG_TYPES

2006-07-05 Thread Kazu Hirata

Hi,

I keep finding places in GCC sources that check whether a member of
TYPE_ARG_TYPES is 0.  For example,

  for (link = TYPE_ARG_TYPES (function_or_method_type);
   link && TREE_VALUE (link);
   link = TREE_CHAIN (link))
gen_type_die (TREE_VALUE (link), context_die);

Notice that TREE_VALUE (link) is part of the loop condition.

Now, do we ever allow a NULL in TYPE_ARG_TYPES?  If so, what does that
mean?  My guess is that soneone was trying to be cautious about
encountering a NULL in TYPE_ARG_TYPES.  (If that's the case, we should
be using gcc_assert instead.)

Thanks,

Kazu Hirata

Re: A question about TYPE_ARG_TYPES

2006-07-05 Thread Kazu Hirata


Hi Ian,


I keep finding places in GCC sources that check whether a member of
TYPE_ARG_TYPES is 0.  For example,

 for (link = TYPE_ARG_TYPES (function_or_method_type);
  link && TREE_VALUE (link);
  link = TREE_CHAIN (link))
   gen_type_die (TREE_VALUE (link), context_die);

Notice that TREE_VALUE (link) is part of the loop condition.

Now, do we ever allow a NULL in TYPE_ARG_TYPES?  If so, what does that
mean?  My guess is that soneone was trying to be cautious about
encountering a NULL in TYPE_ARG_TYPES.  (If that's the case, we should
be using gcc_assert instead.)



Just guessing here, but what happens with an old-style function
definition in C?

void f();


AFAIK, that gets TYPE_ARG_TYPES (...) == NULL, so we don't even get to evaluate 
TREE_VALUE (TYPE_ARG_TYPES (...)).


On IRC, Daniel Berlin claims that there are some weird cases where TREE_VALUE 
(TYPE_ARG_TYPES (...)) is NULL.  I'll keep putting gcc_assert to see what happens.


Kazu Hirata

A question about ARG_FINAL_P in the Java frontend.

2006-07-31 Thread Kazu Hirata

Hi,

I'm planning to change TYPE_ARG_TYPES to use TREE_VEC instead of
TREE_LIST for compact representation.

I just noticed that the Java frontend has ARG_FINAL_P, which uses a
bit in the TREE_LIST node that is pointed to from TYPE_ARG_TYPES.

I am wondering if there is any way we could move this bit elsewhere.
The TREE_VEC node doesn't really have space for per-parameter data
other than the vector proper.  I could use TREE_TYPE of TREE_VEC to
keep an array of boolean values to represent ARG_FINAL_P for each
parameter, but I am not sure if that is a clean solution.

Thoughts?

Kazu Hirata

Re: A question about ARG_FINAL_P in the Java frontend.

2006-07-31 Thread Kazu Hirata


Hi Tom,


Kazu> I just noticed that the Java frontend has ARG_FINAL_P, which uses a
Kazu> bit in the TREE_LIST node that is pointed to from TYPE_ARG_TYPES.

Kazu> I am wondering if there is any way we could move this bit elsewhere.

On the gcj-eclipse branch the code that uses ARG_FINAL_P is actually
no longer used.  It hasn't been deleted yet but it won't ever run.
I'm hoping to merge this to trunk after 4.2 branches ... will that
help?


Yes, that definitely helps.

Thanks,

Kazu Hirata

Re: A question about ARG_FINAL_P in the Java frontend.

2006-07-31 Thread Kazu Hirata


Hi Mark,


Yes.  Kazu, I'd suggest you just ignore Java; you can still get
proof-of-concept for tree-trimming without Java.  The ECJ changes are
going to be massive, and they're going to go in before we get our stuff
ready to go in, so dealing with Java now is probably a waste of time;
we'll have to regroup after ECJ goes in.


OK.

Kazu Hirata

A question about cp/pt.c:type_unification_real

2006-08-08 Thread Kazu Hirata

Hi,

I have a question about cp/pt.c:type_unification_real.

What does "args" represent?  The comment above type_unification_real
says "most parms like fn_type_unification."  According to
fn_type_unification, "args" is supposed to point to function
arguments.  In fact, when I backtraced a call to
type_unification_real, I traced "args" to the second argument of
build_new_function_call, which indicates that "args" are function
arguments.

However, "args" appears to be a parameter list, too.  For example,
type_unification_real contains code like:

  parms = xparms;
  args = xargs;

  while (parms && parms != void_list_node
 && args && args != void_list_node)

Note that "args" is compared against void_list_node.

As another example, also in type_unification_real, we have

  if (args && args != void_list_node && parms == void_list_node)
return 1;

Note again that "args" is compared to void_list_node.

Even more confusing is:

  arg = TREE_VALUE (args);
  :
  :
  if (!TYPE_P (arg))

Can a function argument be a type?

Clarification would be greatly apprecaited.

Thanks,

Kazu Hirata

Re: GCC 4.3 project to merge representation changes

2006-09-28 Thread Kazu Hirata


Hi Mark,

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you please 
create  a Wiki page for that?


What would you suggest me to do for missing bits?  Specifically, most backends 
with the exception of x86 are broken because I haven't converted uses of 
TYPE_ARG_TYPEs in those backends.  The ARM port was broken at the time of branch 
creation.  The Java frontend uses a flag within the TREE_LIST object that makes 
up TYPE_ARG_TYPEs, so it is blocking the propsed merge.  (Java maintainers are 
planning to fix this in future.)


So, Sandra's CALL_EXPEs stuff may be ready for merge, but my TYPE_ARG_TYPE stuff 
has a somewhat long way to go.


Thanks,

Kazu Hirata

[RFC] fold Reorganization Plan

2005-02-12 Thread Kazu Hirata

Hi,

I am planning to reorganize fold as suggested by Roger Sayle on IRC.

The shortest way to describe this mini project would be to develop the
tree equivalent of simplify_gen_ARITY and simplify_ARITY in the RTL
world.  Doing so should reduce the number of scratch tree nodes
created when idioms like fold (buildN (...)) are used.  Hopefully, we
should be able to pick up some compile-time improvement along the way.

Step 1
--

Split fold into fold_unary, fold_binary, etc.

Make fold a simple dispatch function into fold_unary, fold_binary,
etc.

The interfaces are kept exactly the same.  We would pass one tree as
the only argument.  Each function returns the folded tree or the
original tree.

No change to the external interface (outside of fold-const.c) is made.

Step 2
--

Eliminate the direct use of the original tree.  For example, fold
currently has things like

  fold_binary_op_with_conditional_arg (t, ...)

and

  fold_range_test (t)

These functions, as they stand, would not work without the original
tree "t".  We need to change their interfaces so that they will work
with decomposed arguments like code, type, op0, and op1.

Again, no change to the external interface (outside of fold-const.c)
is made.

Step 3
--

Change fold_unary, fold_binary, etc, so that they will return
NULL_TREE instead of the original tree when no folding is performed.

Change fold accordingly so that it will still return either the
original tree or the folded tree (but not NULL_TREE).

Again, no change to the external interface (outside of fold-const.c)
is made.

Step 4
--

Change fold_unary, fold_binary, etc so that they will take decomposed
arguments like code, type, op0, op1.  At this point, fold_ARITY
functions should be just like their RTL equivalent, simplify_ARITY.

Change fold accordingly.

Again, no change to the external interface (outside of fold-const.c)
is made.

Step 5
--

Provide fold_buildN as extern functions.

Step 6
--

Convert fold (buildN (...)) to fold_buildN.

This is very mechanical but very disturbing at the same time.  We need
to coordinate thing first with various people, especially those
maintaining branches.  One thing I can say is that converting a little
by little would be even more disturbing than the one-shot conversion
as people with patches spanning several files may have to adjust their
patches several times.

Step 7
--

Export fold_ARITY.

Step 8
--

Look for places where we can use fold_ARITY and convert them.

Step 9
--

Enjoy the result and continue to hack GCC. :-)

Summary
---

The point is to do as much cleanup and reorganization as possible
without changing the external interface before making the big
conversion.

By the way, the past proposals from Roger Sayle are found at:

http://gcc.gnu.org/ml/gcc-patches/2003-10/msg01514.html
http://gcc.gnu.org/ml/gcc/2004-01/msg00560.html

Both of these are along the same lines as above.

Any comments?

Kazu Hirata

Re: [RFC] fold Reorganization Plan

2005-02-12 Thread Kazu Hirata

Hi Nathan,

> I question if it is better to fold early.  As I've said before, I think
> the optimizations that fold performs should be turned into a proper SSA
> optimization phase% -- that can be repeatedly applied as necessary.  In the
> front end, folding should not generally be done.  I see two needs for
> early folding,

I may not be quite answering your question, but I have a comment.

Maybe we can have an early fold and a general fold.  The former would
handle constant expressions for front ends.  The latter is a full
fledged version but optimized to handle GIMPLE statements.

The reasons to optimize the full fledged version to handle GIMPLE
statements include

1) We can remove parts of fold handling those cases that never occur
   in the GIMPLE world.

2) Currently, fold has so many transformations that look into a
   heavily nested tree, but all of those are useless in the GIMPLE
   world unless one builds scratch node and passes it to fold.  An
   example of such a transformation would be:

   (A * C) + (B * C) -> (A + B) * C

   We would express this in GIMPLE as

   D = A * C;
   E = B * C;
   F = D + E;

   Given D + E, we can instead have fold chase SSA_NAME_DEF_STMT of D
   and E so that the above transformation is performed.  Whether we
   want to always chase SSA_NAME_DEF_STMT is another question.

   Richard Henderson once suggested putting a hook for chasing.  In a
   tree combiner, we may want to limit SSA_NAME_DEF_STMT chasing to
   the case where the SSA_NAME is used only once.  In other
   situations, we might want to have a more relaxed hook although I
   cannot come up with a specific example.

Kazu Hirata

Re: [RFC] fold Reorganization Plan

2005-02-12 Thread Kazu Hirata

Hi Nathan,

> hm, we may be in violent agreement :)  It depends what you mean
> by 'early fold'.  You say it would handle constant expressions for FEs
> -- isn't that the same as what I described as a constant expression
> evaluator?

Yes.

> After all, if it is just for constant exprs, it is required
> to 'fold' to a _CST node -- or give an error.  If I've misunderstood,
> could you clarify?

Can a compile-time constant appearing in an initializer be as wild as
the following?

  0 ? (foo () + 9) : (3 + 5)

Here (foo () + 9) does not fold to a constant, but the whole
expression does fold to 8.

> > The reasons to optimize the full fledged version to handle GIMPLE
> > statements include
> > 
> > 1) We can remove parts of fold handling those cases that never occur
> >in the GIMPLE world.
> Do you have examples of what this would be?

Certainly, a GIMPLE-specific fold wouldn't need to handle
TRUTH_ANDIF_EXPR.  I cannot come up with a better example right now.
TRUTH_ANDIF_EXPR wouldn't be a good example because we are just
removing one case of the big switch statement in fold, which is
unlikyly to give any measurable speed-up.

> I have no feeling for what might be foldable in GIMPLE.  I'm curious
> (I don't think it'll affect the discussion though).

x & x  ->  x ? :-)

> hm, there's really two kinds of thing that need to happen,
> 1) the kind of reassociation & strength reduction that you describe
> 2) folding of constant expressions, for when we discover that some of
> A, B and C are constants.
> 
> I don't know whether these operations should be part of the same SSA
> optimization or not. #2 is more of a constant propagation kind of
> thing I guess. #1 is the kind of thing that has made const-fold so
> complicated. #1 is the important thing to add to the SSA optimizers,
> isn't it?

Yes.  Some transformations that happen in fold would involve CFG
manipulation in the GIMPLE world.  Those transformations include
TRUTH_{AND,OR}IF_EXPR, a lot of COND_EXPR manipulations, etc.  If
these are good transformations, we need to move them to the SSA
optimizers as you mentioned..

Kazu Hirata

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Kazu Hirata

Hi Jeff,

> Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK
> and SUBREG_TICKED saves about 1% compilation time for the components
> of cc1.  Yes, that's a net 1% improvement by dropping the abstraction
> layer.

Yes, I've noticed the problem.  In my defense, the code in question
was even worse before I touched it. :-) With the old code, every time
we access a cse_reg_info entry that is different from the last access,
we were generating a function call.  Nowadays, we avoid calls to
get_cse_reg_info_1 95% of the time.

Of course, it's tough to beat the performance of your explicit
initialization approach, but here are couple of things that I have
thought about while keeping some abstraction layer.

The first thought is to expose the timestamp update to the user of
those macros that you mentioned.

/* Find a cse_reg_info entry for REGNO.  */

static inline struct cse_reg_info *
get_cse_reg_info (unsigned int regno)
{
  struct cse_reg_info *p = &cse_reg_info_table[regno];

  /* If this entry has not been initialized, go ahead and initialize
 it.  */
  if (p->timestamp != cse_reg_info_timestamp)
{
  get_cse_reg_info_1 (regno);
  p->timestamp = cse_reg_info_timestamp;  /* <- Look! */
}

  return p;
}

This way, DOM may be able to do jump threading to some extent and
remove a lot of the timestamp checks.  Of couse, jump threading
opportunities are blocked when we have a non-pure/const function call
like so:

  for (i = regno; i < endregno; i++)
{
  if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
remove_invalid_refs (i);   /* <- Look! */

  REG_IN_TABLE (i) = REG_TICK (i);
  SUBREG_TICKED (i) = -1;
}

The second thought is to initialize all of cse_reg_info entries at the
beginning of cse_main.  Set aside a bitmap with as many bits as
max_regs.  Whenever we use one of these accessor macros for register
k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
when we are done with a basic block, we can walk the bitmap and
reinitialize those that are used.  Again, a good optimizer should be
able to eliminate most of these bit sets, but a non-pure/const
function call will block the cleanup opportunities.  Of course, this
bitmap walk is far more expensive than cse_reg_info_timestamp++.

Kazu Hirata

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Kazu Hirata

Hi Jeff,

> The second thought is to initialize all of cse_reg_info entries at the
> beginning of cse_main.  Set aside a bitmap with as many bits as
> max_regs.  Whenever we use one of these accessor macros for register
> k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
> when we are done with a basic block, we can walk the bitmap and
> reinitialize those that are used.  Again, a good optimizer should be
> able to eliminate most of these bit sets, but a non-pure/const
> function call will block the cleanup opportunities.  Of course, this
> bitmap walk is far more expensive than cse_reg_info_timestamp++.

I just tried this.  It comes very close to the current timestamp
approach but loses by 0.3% or so in compile time.  I'm attaching a
patch for completeness.

Kazu Hirata

Index: cse.c
===
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.343
diff -c -d -p -r1.343 cse.c
*** cse.c   21 Feb 2005 02:02:19 -  1.343
--- cse.c   21 Feb 2005 19:41:04 -
*** struct cse_reg_info
*** 328,346 
  /* A table of cse_reg_info indexed by register numbers.  */
  struct cse_reg_info *cse_reg_info_table;
  
! /* The size of the above table.  */
! static unsigned int cse_reg_info_table_size;
! 
! /* The index of the first entry that has not been initialized.  */
! static unsigned int cse_reg_info_table_first_uninitialized;
! 
! /* The timestamp at the beginning of the current run of
!cse_basic_block.  We increment this variable at the beginning of
!the current run of cse_basic_block.  The timestamp field of a
!cse_reg_info entry matches the value of this variable if and only
!if the entry has been initialized during the current run of
!cse_basic_block.  */
! static unsigned int cse_reg_info_timestamp;
  
  /* A HARD_REG_SET containing all the hard registers for which there is
 currently a REG expression in the hash table.  Note the difference
--- 328,334 
  /* A table of cse_reg_info indexed by register numbers.  */
  struct cse_reg_info *cse_reg_info_table;
  
! static sbitmap cse_reg_info_init_bitmap;
  
  /* A HARD_REG_SET containing all the hard registers for which there is
 currently a REG expression in the hash table.  Note the difference
*** notreg_cost (rtx x, enum rtx_code outer)
*** 841,908 
  static void
  init_cse_reg_info (unsigned int nregs)
  {
!   /* Do we need to grow the table?  */
!   if (nregs > cse_reg_info_table_size)
! {
!   unsigned int new_size;
! 
!   if (cse_reg_info_table_size < 2048)
!   {
! /* Compute a new size that is a power of 2 and no smaller
!than the large of NREGS and 64.  */
! new_size = (cse_reg_info_table_size
! ? cse_reg_info_table_size : 64);
! 
! while (new_size < nregs)
!   new_size *= 2;
!   }
!   else
!   {
! /* If we need a big table, allocate just enough to hold
!NREGS registers.  */
! new_size = nregs;
!   }
  
!   /* Reallocate the table with NEW_SIZE entries.  */
!   if (cse_reg_info_table)
!   free (cse_reg_info_table);
!   cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info)
!* new_size);
!   cse_reg_info_table_size = new_size;
!   cse_reg_info_table_first_uninitialized = 0;
! }
  
!   /* Do we have all of the first NREGS entries initialized?  */
!   if (cse_reg_info_table_first_uninitialized < nregs)
  {
!   unsigned int old_timestamp = cse_reg_info_timestamp - 1;
!   unsigned int i;
! 
!   /* Put the old timestamp on newly allocated entries so that they
!will all be considered out of date.  We do not touch those
!entries beyond the first NREGS entries to be nice to the
!virtual memory.  */
!   for (i = cse_reg_info_table_first_uninitialized; i < nregs; i++)
!   cse_reg_info_table[i].timestamp = old_timestamp;
! 
!   cse_reg_info_table_first_uninitialized = nregs;
  }
  }
  
! /* Given REGNO, initialize the cse_reg_info entry for REGNO.  */
  
  static void
! get_cse_reg_info_1 (unsigned int regno)
  {
!   /* Set TIMESTAMP field to CSE_REG_INFO_TIMESTAMP so that this
!  entry will be considered to have been initialized.  */
!   cse_reg_info_table[regno].timestamp = cse_reg_info_timestamp;
! 
!   /* Initialize the rest of the entry.  */
!   cse_reg_info_table[regno].reg_tick = 1;
!   cse_reg_info_table[regno].reg_in_table = -1;
!   cse_reg_info_table[regno].subreg_ticked = -1;
!   cse_reg_info_table[regno].reg_qty = -regno - 1;
  }
  
  /* Find a cse_reg_info entry for REGNO.  */
--- 829,857 
  static void
  init_cse_reg_info (unsigned int nregs)
  {
!   unsigned int i;
  
!   cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info) * nregs);
  
!   for (i = 0; i < nr

70 matches

Mail list logo