Re: [PATCH v3] Re: avoid useless if-before-free tests
On Thu, Mar 24, 2011 at 18:51, Jim Meyering wrote: > Janne Blomqvist wrote: >> On Tue, Mar 8, 2011 at 19:53, Jim Meyering wrote: >>> Relative to v2, I've added libgo/ to the list of exempt directories and >>> added >>> this recently discussed gfc_free patch, at the request of Tobias Burnus. >>> Also, I corrected an error in fortran's ChangeLog and removed all >>> whitespace changes from all ChangeLog files. >> >> The libgfortran changes are Ok for 4.7. >> >> For the gfortran frontend (gcc/fortran/*) I'd prefer if you'd >> >> - Replace all calls to "gfc_free (x)" with "free (x)". >> - Remove the gfc_free() function and prototype. >> - Remove the free() macro which currently prevents calling free() directly. > > Following up, I've refreshed the series but hit a minor snag > while converting new uses of gfc_free, removing new tests-before-free > and merging/reordering changes. > > Applying this fix first makes my problem go away: [snip] So, what's the plan here? Do you plan to get a GCC account, do you already have one, or what? Now that 4.7 is open for development, it's perhaps the right time to poke the maintainers to get this patch in. If you don't have a GCC account, as one of the Fortran maintainers I can commit the Fortran and libgfortran parts, but someone else will have to do the rest (were they ever approved, BTW?) as I have only write after approval privileges for the rest of GCC. -- Janne Blomqvist
[PATCH] Improve combining of conditionals
This patch fixes the problem with substituting expressions into IF_THEN_ELSE during combining. Without this patch combining of conditionals inside IF_THEN_ELSE is seriously inhibited. The problem this patch fixes is that combine_simplify_rtx() prefers to return an expression (say, ) even when a comparison is prefered (say, ). Expressions are not recognized as valid conditions of if_then_else for most targets, so combiner misses a potential optimization. This patch makes combine_simplify_rtx() aware of the context it was invoked in, and, when appropriate, does not discourage it from returning a conditional. The motivating example for this fix was crcu8() routine from CoreMark. Compiling this routine for MIPS32R2 at -O3 produces there are several instances of sequence andi$2,$2,0x1 xori$2,$2,0x1 movn$3,$5,$2 ; $2 dies here which can be optimized into andi$2,$2,0x1 movz$3,$5,$2 ; $2 dies here . The patch was successfully tested on {i686, arm, mips}-linux, both GCC testsuites and SPEC2000 runs. For all targets there was no observable code difference in SPEC2000 benchmarks, so the example does not trigger very often. Still, it speeds up CoreMark by about 1%. OK for trunk? -- Maxim Kuvyrkov Mentor Graphics / CodeSourcery gcc-combine-if_then_else.ChangeLog Description: Binary data gcc-combine-if_then_else.patch Description: Binary data
Re: [PATCH v3] Re: avoid useless if-before-free tests
Janne Blomqvist wrote: > On Thu, Mar 24, 2011 at 18:51, Jim Meyering wrote: >> Janne Blomqvist wrote: >>> On Tue, Mar 8, 2011 at 19:53, Jim Meyering wrote: Relative to v2, I've added libgo/ to the list of exempt directories and added this recently discussed gfc_free patch, at the request of Tobias Burnus. Also, I corrected an error in fortran's ChangeLog and removed all whitespace changes from all ChangeLog files. >>> >>> The libgfortran changes are Ok for 4.7. >>> >>> For the gfortran frontend (gcc/fortran/*) I'd prefer if you'd >>> >>> - Replace all calls to "gfc_free (x)" with "free (x)". >>> - Remove the gfc_free() function and prototype. >>> - Remove the free() macro which currently prevents calling free() directly. >> >> Following up, I've refreshed the series but hit a minor snag >> while converting new uses of gfc_free, removing new tests-before-free >> and merging/reordering changes. >> >> Applying this fix first makes my problem go away: > > [snip] > > So, what's the plan here? Do you plan to get a GCC account, do you > already have one, or what? Now that 4.7 is open for development, it's > perhaps the right time to poke the maintainers to get this patch in. > > If you don't have a GCC account, as one of the Fortran maintainers I > can commit the Fortran and libgfortran parts, but someone else will > have to do the rest (were they ever approved, BTW?) as I have only > write after approval privileges for the rest of GCC. Can someone add me to the gcc group? That would help. I already have ssh access to sourceware.org. I've rebased the series a few times, but it's been a week or so. More convertible uses are being added regularly. Plus I have to reorder/split things a little to avoid a new conflict.
Re: [PATCH][RFC] Make called function type explicit, make function pointer type conversions useless
> 2011-04-14 Eric Botcazou > > * cfgexpand.c (expand_call_stmt): Rematerialize the original function > type if this is not a builtin function. Hum, using fold_convert seems to be more appropriate here. Bootstrapped/regtested on i586-suse-linux, applied as obvious. 2011-04-15 Eric Botcazou * cfgexpand.c (expand_call_stmt): Simply convert the function type. -- Eric Botcazou Index: cfgexpand.c === --- cfgexpand.c (revision 172469) +++ cfgexpand.c (working copy) @@ -1851,8 +1851,8 @@ expand_call_stmt (gimple stmt) call is made may be different from the type of the function. */ if (!builtin_p) CALL_EXPR_FN (exp) - = fold_build1 (NOP_EXPR, build_pointer_type (gimple_call_fntype (stmt)), - CALL_EXPR_FN (exp)); + = fold_convert (build_pointer_type (gimple_call_fntype (stmt)), + CALL_EXPR_FN (exp)); TREE_TYPE (exp) = gimple_call_return_type (stmt); CALL_EXPR_STATIC_CHAIN (exp) = gimple_call_chain (stmt);
[PATCH] doubled words
Signed-off-by: Jim Meyering --- While most of these are in comments, the corrections in gcc/tree-cfg.c and gcc/config/sh/constraints.md are in strings. The former at least is marked for translation, and hence appears in every .po file. gcc/config/alpha/vms-unwind.h |4 ++-- gcc/config/arm/unwind-arm.h|4 ++-- gcc/config/microblaze/microblaze.c |2 +- gcc/config/sh/constraints.md |4 ++-- gcc/cp/pt.c|2 +- gcc/java/jcf-parse.c |4 ++-- gcc/tree-cfg.c |4 ++-- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/config/alpha/vms-unwind.h b/gcc/config/alpha/vms-unwind.h index ea2c3a3..71cb7b8 100644 --- a/gcc/config/alpha/vms-unwind.h +++ b/gcc/config/alpha/vms-unwind.h @@ -1,5 +1,5 @@ /* Fallback frame unwinding for Alpha/VMS. - Copyright (C) 1996, 1997, 1998, 2000, 2001, 2002, 2003, 2009, 2010 + Copyright (C) 1996, 1997, 1998, 2000, 2001, 2002, 2003, 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -229,7 +229,7 @@ alpha_vms_fallback_frame_state (struct _Unwind_Context *context, /* If PV designates an exception dispatcher, we have to adjust the return address column to get at the signal occurrence point, and account for - for what the CHF context contains. */ + what the CHF context contains. */ if (DENOTES_EXC_DISPATCHER (pv)) { diff --git a/gcc/config/arm/unwind-arm.h b/gcc/config/arm/unwind-arm.h index a9ba126..1a51d8d 100644 --- a/gcc/config/arm/unwind-arm.h +++ b/gcc/config/arm/unwind-arm.h @@ -1,5 +1,5 @@ /* Header file for the ARM EABI unwinder - Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009 + Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011 Free Software Foundation, Inc. Contributed by Paul Brook @@ -65,7 +65,7 @@ extern "C" { } _Unwind_State; - /* Provided only for for compatibility with existing code. */ + /* Provided only for compatibility with existing code. */ typedef int _Unwind_Action; #define _UA_SEARCH_PHASE 1 #define _UA_CLEANUP_PHASE 2 diff --git a/gcc/config/microblaze/microblaze.c b/gcc/config/microblaze/microblaze.c index 6ea5fa2..5796bc7 100644 --- a/gcc/config/microblaze/microblaze.c +++ b/gcc/config/microblaze/microblaze.c @@ -919,7 +919,7 @@ microblaze_rtx_costs (rtx x, int code, int outer_code ATTRIBUTE_UNUSED, int *tot { /* Add 1 to make shift slightly more expensive than add. */ *total = COSTS_N_INSNS (INTVAL (XEXP (x, 1))) + 1; - /* Reduce shift costs for for special circumstances. */ + /* Reduce shift costs for special circumstances. */ if (optimize_size && INTVAL (XEXP (x, 1)) > 5) *total -= 2; if (!optimize_size && INTVAL (XEXP (x, 1)) > 17) diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md index 6b0e5d2..40d0d0b 100644 --- a/gcc/config/sh/constraints.md +++ b/gcc/config/sh/constraints.md @@ -1,5 +1,5 @@ ;; Constraint definitions for Renesas / SuperH SH. -;; Copyright (C) 2007, 2008 Free Software Foundation, Inc. +;; Copyright (C) 2007, 2008, 2011 Free Software Foundation, Inc. ;; ;; This file is part of GCC. ;; @@ -99,7 +99,7 @@ (match_test "ival >= -128 && ival <= 127"))) (define_constraint "I10" - "A signed 10-bit constant, as used in in SHmedia andi, ori." + "A signed 10-bit constant, as used in SHmedia andi, ori." (and (match_code "const_int") (match_test "ival >= -512 && ival <= 511"))) diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 3356e75..fc69a0c 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -13980,7 +13980,7 @@ type_unification_real (tree tparms, gcc_assert (ntparms > 0); /* Reset the number of non-defaulted template arguments contained - in in TARGS. */ + in TARGS. */ NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE; switch (strict) diff --git a/gcc/java/jcf-parse.c b/gcc/java/jcf-parse.c index feeddad..a56c1b7 100644 --- a/gcc/java/jcf-parse.c +++ b/gcc/java/jcf-parse.c @@ -1,6 +1,6 @@ /* Parser for Java(TM) .class files. Copyright (C) 1996, 1998, 1999, 2000, 2001, 2002, 2003, 2004, - 2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. + 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -368,7 +368,7 @@ set_source_filename (JCF *jcf, int index) from the input class file into the output file. We don't decode the data at all, merely rewriting constant indexes whenever we come across them: this is necessary because the constant pool in the - output file isn't the same as the constant pool in in the input. + output file isn't the same as the constant pool in the input. The main advantage of this technique is that the resulting annotation data is pointer-free, so it doesn't have to be relocated diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c ind
Re: [PATCH v3] Re: avoid useless if-before-free tests
On Fri, Apr 15, 2011 at 10:54, Jim Meyering wrote: > Janne Blomqvist wrote: > >> On Thu, Mar 24, 2011 at 18:51, Jim Meyering wrote: >>> Janne Blomqvist wrote: On Tue, Mar 8, 2011 at 19:53, Jim Meyering wrote: > Relative to v2, I've added libgo/ to the list of exempt directories and > added > this recently discussed gfc_free patch, at the request of Tobias Burnus. > Also, I corrected an error in fortran's ChangeLog and removed all > whitespace changes from all ChangeLog files. The libgfortran changes are Ok for 4.7. For the gfortran frontend (gcc/fortran/*) I'd prefer if you'd - Replace all calls to "gfc_free (x)" with "free (x)". - Remove the gfc_free() function and prototype. - Remove the free() macro which currently prevents calling free() directly. >>> >>> Following up, I've refreshed the series but hit a minor snag >>> while converting new uses of gfc_free, removing new tests-before-free >>> and merging/reordering changes. >>> >>> Applying this fix first makes my problem go away: >> >> [snip] >> >> So, what's the plan here? Do you plan to get a GCC account, do you >> already have one, or what? Now that 4.7 is open for development, it's >> perhaps the right time to poke the maintainers to get this patch in. >> >> If you don't have a GCC account, as one of the Fortran maintainers I >> can commit the Fortran and libgfortran parts, but someone else will >> have to do the rest (were they ever approved, BTW?) as I have only >> write after approval privileges for the rest of GCC. > > Can someone add me to the gcc group? That would help. > I already have ssh access to sourceware.org. According to http://gcc.gnu.org/svnwrite.html , you should send an email to overseers(at)gcc.gnu.org . It says that you need a sponsor and "The steering committee or a well-established GCC maintainer (including reviewers) can approve for write access any person with GNU copyright assignment papers in place and known to submit good patches.". I'm not sure if I'm considered to be well-established enough, so could someone help Jim out here, please? Or, if nobody pipes up within a few days, just mention my name as your sponsor and we'll see if the overseers consider me well-established or not.. ;) -- Janne Blomqvist
Re: [PATCH v3] Re: avoid useless if-before-free tests
Janne Blomqvist wrote: > On Fri, Apr 15, 2011 at 10:54, Jim Meyering wrote: >> Janne Blomqvist wrote: >> >>> On Thu, Mar 24, 2011 at 18:51, Jim Meyering wrote: Janne Blomqvist wrote: > On Tue, Mar 8, 2011 at 19:53, Jim Meyering wrote: >> Relative to v2, I've added libgo/ to the list of exempt >> directories and added >> this recently discussed gfc_free patch, at the request of Tobias Burnus. >> Also, I corrected an error in fortran's ChangeLog and removed all >> whitespace changes from all ChangeLog files. > > The libgfortran changes are Ok for 4.7. > > For the gfortran frontend (gcc/fortran/*) I'd prefer if you'd > > - Replace all calls to "gfc_free (x)" with "free (x)". > - Remove the gfc_free() function and prototype. > - Remove the free() macro which currently prevents calling free() > directly. Following up, I've refreshed the series but hit a minor snag while converting new uses of gfc_free, removing new tests-before-free and merging/reordering changes. Applying this fix first makes my problem go away: >>> >>> [snip] >>> >>> So, what's the plan here? Do you plan to get a GCC account, do you >>> already have one, or what? Now that 4.7 is open for development, it's >>> perhaps the right time to poke the maintainers to get this patch in. >>> >>> If you don't have a GCC account, as one of the Fortran maintainers I >>> can commit the Fortran and libgfortran parts, but someone else will >>> have to do the rest (were they ever approved, BTW?) as I have only >>> write after approval privileges for the rest of GCC. >> >> Can someone add me to the gcc group? That would help. >> I already have ssh access to sourceware.org. > > According to http://gcc.gnu.org/svnwrite.html , you should send an > email to overseers(at)gcc.gnu.org . It says that you need a sponsor > and "The steering committee or a well-established GCC maintainer > (including reviewers) can approve for write access any person with GNU > copyright assignment papers in place and known to submit good > patches.". I'm not sure if I'm considered to be well-established > enough, so could someone help Jim out here, please? > > Or, if nobody pipes up within a few days, just mention my name as your > sponsor and we'll see if the overseers consider me well-established or > not.. ;) Thanks. FYI, I have an "ANY" assignment on file, so no need to wait for paperwork.
Fix minor issues in gimplify.c
Long lines, superfluous 's' and missing head comment. No functional changes. There is one function left without head comment: gimplify_adjust_omp_clauses. Jakub, when you have a few minutes, would you mind adding one? TIA. Tested on i586-suse-linux, applied on the mainline as obvious. 2011-04-15 Eric Botcazou * gimplify.c: Fix issues in comments throughout. (voidify_wrapper_expr): Fix long line. (build_stack_save_restore): Likewise. (gimplify_loop_expr): Likewise. (gimplify_compound_lval): Likewise. (gimplify_init_ctor_eval): Likewise. (gimplify_modify_expr_rhs): Likewise. (omp_notice_threadprivate_variable): Likewise. -- Eric Botcazou Index: gimplify.c === --- gimplify.c (revision 172469) +++ gimplify.c (working copy) @@ -91,7 +91,7 @@ static struct gimplify_ctx *gimplify_ctx static struct gimplify_omp_ctx *gimplify_omp_ctxp; -/* Formal (expression) temporary table handling: Multiple occurrences of +/* Formal (expression) temporary table handling: multiple occurrences of the same scalar expression are evaluated into the same temporary. */ typedef struct gimple_temp_hash_elt @@ -100,11 +100,12 @@ typedef struct gimple_temp_hash_elt tree temp; /* Value */ } elt_t; -/* Forward declarations. */ +/* Forward declaration. */ static enum gimplify_status gimplify_compound_expr (tree *, gimple_seq *, bool); /* Mark X addressable. Unlike the langhook we expect X to be in gimple form and we don't do any syntax checking. */ + void mark_addressable (tree x) { @@ -232,6 +233,8 @@ pop_gimplify_context (gimple body) htab_delete (c->temp_htab); } +/* Push a GIMPLE_BIND tuple onto the stack of bindings. */ + static void gimple_push_bind_expr (gimple gimple_bind) { @@ -240,19 +243,23 @@ gimple_push_bind_expr (gimple gimple_bin VEC_safe_push (gimple, heap, gimplify_ctxp->bind_expr_stack, gimple_bind); } +/* Pop the first element off the stack of bindings. */ + static void gimple_pop_bind_expr (void) { VEC_pop (gimple, gimplify_ctxp->bind_expr_stack); } +/* Return the first element of the stack of bindings. */ + gimple gimple_current_bind_expr (void) { return VEC_last (gimple, gimplify_ctxp->bind_expr_stack); } -/* Return the stack GIMPLE_BINDs created during gimplification. */ +/* Return the stack of bindings created during gimplification. */ VEC(gimple, heap) * gimple_bind_expr_stack (void) @@ -260,7 +267,7 @@ gimple_bind_expr_stack (void) return gimplify_ctxp->bind_expr_stack; } -/* Returns true iff there is a COND_EXPR between us and the innermost +/* Return true iff there is a COND_EXPR between us and the innermost CLEANUP_POINT_EXPR. This info is used by gimple_push_cleanup. */ static bool @@ -392,7 +399,7 @@ remove_suffix (char *name, int len) } } -/* Create a new temporary name with PREFIX. Returns an identifier. */ +/* Create a new temporary name with PREFIX. Return an identifier. */ static GTY(()) unsigned int tmp_var_id_num; @@ -413,9 +420,8 @@ create_tmp_var_name (const char *prefix) return get_identifier (tmp_name); } - /* Create a new temporary variable declaration of type TYPE. - Does NOT push it into the current binding. */ + Do NOT push it into the current binding. */ tree create_tmp_var_raw (tree type, const char *prefix) @@ -446,7 +452,7 @@ create_tmp_var_raw (tree type, const cha return tmp_var; } -/* Create a new temporary variable declaration of type TYPE. DOES push the +/* Create a new temporary variable declaration of type TYPE. DO push the variable into the current binding. Further, assume that this is called only from gimplification or optimization, at which point the creation of certain types are bugs. */ @@ -537,7 +543,6 @@ lookup_tmp_var (tree val, bool is_formal return ret; } - /* Return true if T is a CALL_EXPR or an expression that can be assigned to a temporary. Note that this predicate should only be used during gimplification. See the rationale for this in @@ -605,7 +610,7 @@ internal_get_tmp_var (tree val, gimple_s return t; } -/* Returns a formal temporary variable initialized with VAL. PRE_P is as +/* Return a formal temporary variable initialized with VAL. PRE_P is as in gimplify_expr. Only use this function if: 1) The value of the unfactored expression represented by VAL will not @@ -623,7 +628,7 @@ get_formal_tmp_var (tree val, gimple_seq return internal_get_tmp_var (val, pre_p, NULL, true); } -/* Returns a temporary variable initialized with VAL. PRE_P and POST_P +/* Return a temporary variable initialized with VAL. PRE_P and POST_P are as in gimplify_expr. */ tree @@ -632,8 +637,8 @@ get_initialized_tmp_var (tree val, gimpl return internal_get_tmp_var (val, pre_p, post_p, false); } -/* Declares all the variables in VARS in SCOPE. If DEBUG_IN
Re: [lto/pph] Do not pack more bits than requested (issue4415044)
On Thu, 14 Apr 2011, Diego Novillo wrote: > On Thu, Apr 14, 2011 at 15:28, Jakub Jelinek wrote: > > > If bitpack_word_t has BITS_PER_BITPACK_WORD bits, then for > > nbits = BITS_PER_BITPACK_WORD this will be undefined. > > Use say > > mask = ((bitpack_word_t) 2 << (nbits - 1)) - 1; > > or something similar (assertion ensures that nbits isn't 0). > > Quite right, thanks. In the meantime, I've changed my mind with this. > I think it's safer if we just assert that the value we are about to > pack fit in the number of bits the caller specified. > > The only problematic user is pack_ts_type_value_fields when it tries > to pack a -1 for the type's alias set. I think we should just stream > that as an integer and not go through the bitpacking overhead. > > For now, I'm applying this to the pph branch. Tested on x86_64. No > LTO failures. See below > > Diego. > > * lto-streamer-out.c (pack_ts_type_value_fields): Pack all bits > of -1 value. > * lto-streamer.h (bitpack_create): Assert that the value to > pack does not overflow NBITS. > * lto-streamer-in.c (unpack_ts_type_value_fields): Unpack > BITS_PER_BITPACK_WORD bits for TYPE_ALIAS_SET. > > diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c > index 97b86ce..f04e031 100644 > --- a/gcc/lto-streamer-in.c > +++ b/gcc/lto-streamer-in.c > @@ -1734,7 +1734,7 @@ unpack_ts_type_value_fields (struct bitpack_d > *bp, tree expr) >TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1); >TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1); >TYPE_ALIGN (expr) = (unsigned) bp_unpack_value (bp, HOST_BITS_PER_INT); > - TYPE_ALIAS_SET (expr) = bp_unpack_value (bp, HOST_BITS_PER_INT); > + TYPE_ALIAS_SET (expr) = bp_unpack_value (bp, BITS_PER_BITPACK_WORD); > } > > > diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c > index 3ccad8b..89ad9c5 100644 > --- a/gcc/lto-streamer-out.c > +++ b/gcc/lto-streamer-out.c > @@ -518,7 +518,8 @@ pack_ts_type_value_fields (struct bitpack_d *bp, tree > expr) >bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1); >bp_pack_value (bp, TYPE_READONLY (expr), 1); >bp_pack_value (bp, TYPE_ALIGN (expr), HOST_BITS_PER_INT); > - bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, HOST_BITS_PER_INT); > + bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, > +BITS_PER_BITPACK_WORD); As we only want to stream alias-set zeros just change it to a single bit, like bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0, 1); and on the reader side restore either a zero or -1. > } > > > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h > index 0d49430..73afd46 100644 > --- a/gcc/lto-streamer.h > +++ b/gcc/lto-streamer.h > @@ -1190,18 +1190,15 @@ bitpack_create (struct lto_output_stream *s) > static inline void > bp_pack_value (struct bitpack_d *bp, bitpack_word_t val, unsigned nbits) > { > - bitpack_word_t mask, word; > + bitpack_word_t word = bp->word; >int pos = bp->pos; > > - word = bp->word; > - > + /* We shouldn't try to pack more bits than can fit in a bitpack word. */ >gcc_assert (nbits > 0 && nbits <= BITS_PER_BITPACK_WORD); Asserts will break merging the operations and make them slow again. Please no asserts in these core routines. Look at the .optimized dump of a series of bp_pack_value, they should be basically optimized to a series of ORs. As for the -1 case, it's simply broken use of the interface. Richard. > - /* Make sure that VAL only has the lower NBITS set. Generate a > - mask with the lower NBITS set and use it to filter the upper > - bits from VAL. */ > - mask = ((bitpack_word_t) 1 << nbits) - 1; > - val = val & mask; > + /* The value to pack should not overflow NBITS. */ > + gcc_assert (nbits == BITS_PER_BITPACK_WORD > + || val <= ((bitpack_word_t) 1 << nbits)); > >/* If val does not fit into the current bitpack word switch to the > next one. */
Re: [PATCH] Improve combining of conditionals
> The patch was successfully tested on {i686, arm, mips}-linux, both GCC > testsuites and SPEC2000 runs. For all targets there was no observable code > difference in SPEC2000 benchmarks, so the example does not trigger very > often. Still, it speeds up CoreMark by about 1%. > > OK for trunk? Yes, modulo the following nits: @@ -4938,11 +4938,13 @@ find_split_point (rtx *loc, rtx insn, bool set_src) IN_DEST is nonzero if we are processing the SET_DEST of a SET. + IN_COND is nonzero if we are on top level of the condition. "...we are at the top level of a condition." @@ -5221,10 +5225,12 @@ subst (rtx x, rtx from, rtx to, int in_dest, int unique_copy) expression. OP0_MODE is the original mode of XEXP (x, 0). IN_DEST is nonzero - if we are inside a SET_DEST. */ + if we are inside a SET_DEST. IN_COND is nonzero if we are on the top level + of a condition. */ Likewise. @@ -5717,7 +5723,16 @@ combine_simplify_rtx (rtx x, enum machine_mode op0_mode, int in_dest) ZERO_EXTRACT is indeed appropriate, it will be placed back by the call to make_compound_operation in the SET case. */ - if (STORE_FLAG_VALUE == 1 + if (in_cond) + /* Don't apply below optimizations if the caller would + prefer a comparison rather than a value. + E.g., for the condition in an IF_THEN_ELSE most targets need + an explicit comparison. */ + { + ; + } Remove the superfluous parentheses and move the comment to a new paragraph of the big comment just above. No need to retest, just make sure this still compiles, thanks in advance. -- Eric Botcazou
Re: [PATCH][RFC] Make called function type explicit, make function pointer type conversions useless
On Thu, 14 Apr 2011, Eric Botcazou wrote: > > I have the following, the gimple_call_chain check is to prevent > > Ada bootstrap from failing. > > On x86? Yes, though maybe because I had get_callee_fndecl patched to not strip conversions. > > I suppose your patch is ok, maybe with simply not wrapping the NOP_EXPR > > around builtins. > > I can try that indeed... Thanks for fixing this! Richard.
Re: [Patch,testsuite,avr]: add -finline-limit=0 to pr41885.c options
On Thu, Apr 14, 2011 at 8:14 PM, Denis Chertykov wrote: > 2011/4/14 Georg-Johann Lay : >> This patchlet adds -finline-limit=0 to dg-options in >> >> testsuite/gcc.target/avr/torture/pr41885.c >> >> because otherwise optimizers will fold all tests and actually no test >> function is called when optimization is on. The test case still passes >> all tests. >> >> >> testsuite/ >> 2011-04-14 Georg-Johann Lay >> >> * gcc.target/avr/torture/pr41885.c (dg-options): Add >> -finline-limit=0 Please use -fno-inline instead. Richard. >> Index: testsuite/gcc.target/avr/torture/pr41885.c >> === >> --- testsuite/gcc.target/avr/torture/pr41885.c (Revision 172431) >> +++ testsuite/gcc.target/avr/torture/pr41885.c (Arbeitskopie) >> @@ -1,4 +1,4 @@ >> -/* { dg-options "-w -std=c99" } */ >> +/* { dg-options "-w -std=c99 -finline-limit=0" } */ >> /* { dg-do run } */ >> >> #include >> > > Approved. > > Denis. >
Re: More of ipa-inline housekeeping
> > > > I fixed this on the and added sanity check that the fields are initialized. > > This has shown problem with early inliner iteration fixed thusly and fact > > that > > early inliner is attempting to compute overall growth at a time the inline > > parameters are not computed for functions not visited by early optimizations > > yet. We previously agreed that early inliner should not try to do that (as > > this > > leads to early inliner inlining functions called once that should be > > deferred > > for later consieration). I just hope it won't cause benchmarks to > > regress too much ;) > > Yeah, we agreed to that. And I forgot about it as it wasn't part of the > early inliner reorg (which was supposed to be a 1:1 transform). Today C++ results shows some regressions, but nothing earthshaking. So I think it is good idea to drop this feature of early inliner since it is not really systematic. There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change. Perhaps your aliasing? Honza
Re: More of ipa-inline housekeeping
2011/4/15 Jan Hubicka : >> > >> > I fixed this on the and added sanity check that the fields are initialized. >> > This has shown problem with early inliner iteration fixed thusly and fact >> > that >> > early inliner is attempting to compute overall growth at a time the inline >> > parameters are not computed for functions not visited by early >> > optimizations >> > yet. We previously agreed that early inliner should not try to do that (as >> > this >> > leads to early inliner inlining functions called once that should be >> > deferred >> > for later consieration). I just hope it won't cause benchmarks to >> > regress too much ;) >> >> Yeah, we agreed to that. And I forgot about it as it wasn't part of the >> early inliner reorg (which was supposed to be a 1:1 transform). > > Today C++ results shows some regressions, but nothing earthshaking. So I > think it is good > idea to drop this feature of early inliner since it is not really systematic. > There is also great improvement on LTO SPEC2000, but I tend to hope it is > unrelated change. > Perhaps your aliasing? I doubt SPEC2k uses VLAs or alloca, does it? Might be the DSE improvements, but I'm not sure. Richard. > Honza >
[committed] Fix pr46084.c testcase (PR target/48614)
Hi! This testcase doesn't use avx-check.h and is dg-do run testcase, as it is compiled with -mavx whenever it emits any AVX instructions, it won't be able to run on any older CPUs. Fixed thusly, committed to trunk/4.6. 2011-04-15 Jakub Jelinek PR target/48614 * gcc.target/i386/pr46084.c: Require avx_runtime instead of just avx. --- gcc/testsuite/gcc.target/i386/pr46084.c.jj 2011-04-15 08:11:00.493464611 +0200 +++ gcc/testsuite/gcc.target/i386/pr46084.c 2011-04-15 08:08:50.282146043 +0200 @@ -1,7 +1,7 @@ /* This test needs to use setrlimit to set the stack size, so it can only run on Unix. */ /* { dg-do run { target *-*-linux* *-*-solaris* *-*-darwin* } } */ -/* { dg-require-effective-target avx } */ +/* { dg-require-effective-target avx_runtime } */ /* { dg-require-effective-target split_stack } */ /* { dg-options "-fsplit-stack -O2 -mavx" } */ Jakub
Re: [PATCH] Improve combining of conditionals
On Fri, Apr 15, 2011 at 11:04 AM, Eric Botcazou wrote: > @@ -4938,11 +4938,13 @@ find_split_point (rtx *loc, rtx insn, bool set_src) > > IN_DEST is nonzero if we are processing the SET_DEST of a SET. > > + IN_COND is nonzero if we are on top level of the condition. > > "...we are at the top level of a condition." And make IN_COND a bool instead of an int? Ciao! Steven
Re: [PATCH] Improve combining of conditionals
On Apr 15, 2011, at 2:38 PM, Steven Bosscher wrote: > On Fri, Apr 15, 2011 at 11:04 AM, Eric Botcazou wrote: >> @@ -4938,11 +4938,13 @@ find_split_point (rtx *loc, rtx insn, bool set_src) >> >>IN_DEST is nonzero if we are processing the SET_DEST of a SET. >> >> + IN_COND is nonzero if we are on top level of the condition. >> >> "...we are at the top level of a condition." > > And make IN_COND a bool instead of an int? I think it's better to follow existing format of the function and make IN_COND to be of the same type as IN_DEST, i.e., an 'int'. I agree that 'bool' should be preferred when adding a new function or significantly rewriting an existing one. -- Maxim Kuvyrkov Mentor Graphics / CodeSourcery
Re: [PATCH] Improve combining of conditionals
> And make IN_COND a bool instead of an int? I had the same initial reaction but then came to the same conclusion as Maxim. -- Eric Botcazou
Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
Ping. On 25/03/11 16:19, Andrew Stubbs wrote: On 28/01/11 15:20, Richard Earnshaw wrote: On Fri, 2011-01-28 at 15:13 +, Andrew Stubbs wrote: On 28/01/11 14:12, Richard Earnshaw wrote: So what happens to a variation of your testcase: long long foolong (long long x, short *a, short *b) { return x + (long long)*a * (long long)*b; } With your patch? This should generate identical code to your original test-case. The patch has no effect on that testcase - it's broken in some other way, I think, and the same with and without my patch: ldrsh r3, [r3, #0] ldrsh r2, [r2, #0] push {r4, r5} asrs r4, r3, #31 asrs r5, r2, #31 mul r4, r2, r4 mla r4, r3, r5, r4 umull r2, r3, r2, r3 adds r3, r4, r3 adds r0, r0, r2 adc r1, r1, r3 pop {r4, r5} bx lr Hmmm, that probably doesn't add anything useful to the discussion. :( I'll add that one to the todo list ... Andrew Ouch! I though that used to work :-( I looked at this one again, but on a second inspection I'm not sure there's much wrong with it? When I wrote the above I thought that there was a 64-bit multiply instruction, but now I look more closely I see there isn't, hence the above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does a 64-bit -> 64-bit multiply, and then adds 'x'. Can the umull/add/add/adc be optimized using umlal? It's too late on a Friday to workout what's going on with the carries Also, I don't fully understand why the compiler can't just disregard the casts and use maddhidi4? Isn't that mathematically equivalent in this case? When you say it used to work, what did it use to look like? Thanks Andrew
[ping] 3 unreviewed patches
Fix annoying gcov filename handling: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01380.html (rs6000) Fix thinko in output_profile_hook: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01624.html Introduce -Wstack-usage: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01992.html Thanks in advance. -- Eric Botcazou
Re: More of ipa-inline housekeeping
> 2011/4/15 Jan Hubicka : > >> > > >> > I fixed this on the and added sanity check that the fields are > >> > initialized. > >> > This has shown problem with early inliner iteration fixed thusly and > >> > fact that > >> > early inliner is attempting to compute overall growth at a time the > >> > inline > >> > parameters are not computed for functions not visited by early > >> > optimizations > >> > yet. We previously agreed that early inliner should not try to do that > >> > (as this > >> > leads to early inliner inlining functions called once that should be > >> > deferred > >> > for later consieration). I just hope it won't cause benchmarks to > >> > regress too much ;) > >> > >> Yeah, we agreed to that. And I forgot about it as it wasn't part of the > >> early inliner reorg (which was supposed to be a 1:1 transform). > > > > Today C++ results shows some regressions, but nothing earthshaking. So I > > think it is good > > idea to drop this feature of early inliner since it is not really > > systematic. > > There is also great improvement on LTO SPEC2000, but I tend to hope it is > > unrelated change. > > Perhaps your aliasing? > > I doubt SPEC2k uses VLAs or alloca, does it? Might be the DSE > improvements, but I'm not sure. It seems to happen only with LTO, so it might be inlining & fixed call cost estimates. It does not seem so likely to me however - I know that gzip is touchy about inlining, but vortex seems easy. Honza
Re: [Patch,testsuite,avr]: add -finline-limit=0 to pr41885.c options
Richard Guenther schrieb: > On Thu, Apr 14, 2011 at 8:14 PM, Denis Chertykov wrote: >> 2011/4/14 Georg-Johann Lay : >>> This patchlet adds -finline-limit=0 to dg-options in >>> >>> testsuite/gcc.target/avr/torture/pr41885.c >>> >>> because otherwise optimizers will fold all tests and actually no test >>> function is called when optimization is on. The test case still passes >>> all tests. >>> >>> >>> testsuite/ >>> 2011-04-14 Georg-Johann Lay >>> >>>* gcc.target/avr/torture/pr41885.c (dg-options): Add >>>-finline-limit=0 > > Please use -fno-inline instead. > > Richard. Ok, changed it: http://gcc.gnu.org/viewcvs?view=revision&revision=172487 Johann
Re: [PATCH] Improve combining of conditionals
> The problem this patch fixes is that combine_simplify_rtx() prefers to > return an expression (say, ) even when a comparison is > prefered (say, ). Expressions are not recognized as > valid conditions of if_then_else for most targets, so combiner misses a > potential optimization. This patch makes combine_simplify_rtx() aware of > the context it was invoked in, and, when appropriate, does not discourage > it from returning a conditional. Btw, this is very likely also valid for targets with STORE_FLAG_VALUE == -1 so the same IN_COND short-circuit would need to be added a few lines below in combine_simplify_rtx. But this would need to be tested. Do you happen to have access to such a target, e.g. m68k? -- Eric Botcazou
Re: PATCH: PR middle-end/48608: Alignment adjust of local variables is lost
Hi, On Thu, 14 Apr 2011, H.J. Lu wrote: > >> > + if (align > DECL_ALIGN (decl)) > >> > + DECL_ALIGN (decl) = align; > >> > >> Shouldn't this unconditionally set DECL_ALIGN in case > >> LOCAL_DECL_ALINGMENT returns something smaller? > > > > Decreasing alignment of DECLs points to a problem elsewhere, so perhaps an > > assert that this doesn't happen is better. Decreasing is a problem > > because it's not conservative: there might have been code generated > > already assuming the once larger alignment that then possibly breaks if it > > turns out the alignment is actually smaller. > > ia32 may decrease local variable alignment: > > /* Don't do dynamic stack realignment for long long objects with > -mpreferred-stack-boundary=2. */ > if (!TARGET_64BIT > && align == 64 > && ix86_preferred_stack_boundary < 64 > && (mode == DImode || (type && TYPE_MODE (type) == DImode)) > && (!type || !TYPE_USER_ALIGN (type)) > && (!decl || !DECL_USER_ALIGN (decl))) > align = 32; But it hopefully does so before gimplification? I.e. before real code is generated that could possibly make use of the large alignment? Ciao, Michael.
Re: Implement stack arrays even for unknown sizes
Hi, On Thu, 14 Apr 2011, Dominique Dhumieres wrote: > I have forgotten to mentionned that I have a variant of fatigue > in which I have done the inlining manually along with few other > optimizations and the timing for it is > > [macbook] lin/test% gfc -Ofast fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.793u 0.002s 0:02.79 100.0% 0+0k 0+1io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.680u 0.003s 0:02.68 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -flto fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.671u 0.002s 0:02.67 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -fstack-arrays fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.680u 0.003s 0:02.68 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -flto -fstack-arrays > fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.677u 0.003s 0:02.68 99.6% 0+0k 0+0io 0pf+0w > > So the timing of the original code with > -Ofast -finline-limit=600 -fwhole-program -flto -fstack-arrays > is quite close to this lower bound. > > I have also looked at the failure for gfortran.dg/elemental_dependency_1.f90 > and it seems due to a spurious integer(kind=4) A.37[4]; (and friends) in Yes, this is due to the DECL_EXPR statement which is rendered by the dumper just the same as a normal decl. The testcase looks for exactly one such decl, but with -fstack-arrays there are exactly two for each such array. > integer(kind=4) A.37[4]; So, this is the normal decl. > atmp.36.dim[0].ubound = 3; > integer(kind=4) A.37[4]; And this is the DECL_EXPR statement, which then actually is transformed into the stack_save/alloca/stack_restore sequence. Ciao, Michael.
Re: [Patch, libfortran] Replace sprintf() with snprintf()
On 04/14/2011 11:53 PM, Janne Blomqvist wrote: Hi, as is well known, sprintf() is prone to buffer overflow, hence snprintf(). libgfortran uses snprintf() in some places, but not everywhere. Rather than analyzing every sprintf() call for a potential overflow, the attached patch takes the dogmatic but simple approach of replacing all the remaining sprintf() usage with snprintf(). For targets without snprintf(), io/list_read.c contained a fallback macro that uses sprintf(); this is moved to libgfortran.h so that it's available everywhere. readelf -s libgfortran.so|grep sprintf confirms that there is no remaining usage of sprintf(). Regtested on x86_64-unknown-linux-gnu, Ok for trunk? OK, thanks. Jerry
Re: [PATCH] Improve combining of conditionals
On Apr 15, 2011, at 3:34 PM, Eric Botcazou wrote: >> The problem this patch fixes is that combine_simplify_rtx() prefers to >> return an expression (say, ) even when a comparison is >> prefered (say, ). Expressions are not recognized as >> valid conditions of if_then_else for most targets, so combiner misses a >> potential optimization. This patch makes combine_simplify_rtx() aware of >> the context it was invoked in, and, when appropriate, does not discourage >> it from returning a conditional. > > Btw, this is very likely also valid for targets with STORE_FLAG_VALUE == -1 > so > the same IN_COND short-circuit would need to be added a few lines below in > combine_simplify_rtx. But this would need to be tested. Do you happen to > have access to such a target, e.g. m68k? Hm, I didn't notice that one, thanks! I have access to m68k (ColdFire, tbp) and will test this change there before committing. -- Maxim Kuvyrkov Mentor Graphics / CodeSourcery
Re: [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide
On Tue, Apr 12, 2011 at 4:14 PM, Richard Sandiford wrote: > We have separate vect_extract_even_odd and vect_extract_even_odd_wide > target selectors, and separate vect_strided and vect_strided_wide > selectors. The comment suggests that "wide" is for 32+ bits, > but we often use the non-wide forms for 32-bit tests. We also have > tests that combine 16-bit and 32-bit strided accesses without checking > for both widths. > > I'm about to split vect_strided into vect_stridedN (for each stride > factor N). One option was to preserve the wide distinction and have > vect_stridedN_wide as well. However, given the current usage, > and given that the two selectors are the same, I think it makes sense > to combine them until we know what distinction we need to make. > > Tested on x86_64-linux-gnu and arm-linux-gnueabi. OK to install? Ok. Thanks, Richard. > Richard > > > gcc/testsuite/ > * lib/target-supports.exp > (check_effective_target_vect_extract_even_odd_wide): Delete. > (check_effective_target_vect_strided_wide): Likewise. > * gcc.dg/vect/O3-pr39675-2.c: Use the non-wide versions instead. > * gcc.dg/vect/fast-math-pr35982.c: Likewise. > * gcc.dg/vect/fast-math-vect-complex-3.c: Likewise. > * gcc.dg/vect/pr37539.c: Likewise. > * gcc.dg/vect/slp-11.c: Likewise. > * gcc.dg/vect/slp-12a.c: Likewise. > * gcc.dg/vect/slp-12b.c: Likewise. > * gcc.dg/vect/slp-19.c: Likewise. > * gcc.dg/vect/slp-23.c: Likewise. > * gcc.dg/vect/vect-1.c: Likewise. > * gcc.dg/vect/vect-98.c: Likewise. > * gcc.dg/vect/vect-107.c: Likewise. > * gcc.dg/vect/vect-strided-float.c: Likewise. > > Index: gcc/testsuite/lib/target-supports.exp > === > --- gcc/testsuite/lib/target-supports.exp 2011-04-12 11:53:54.0 > +0100 > +++ gcc/testsuite/lib/target-supports.exp 2011-04-12 11:55:11.0 > +0100 > @@ -3121,29 +3121,6 @@ proc check_effective_target_vect_extract > return $et_vect_extract_even_odd_saved > } > > -# Return 1 if the target supports vector even/odd elements extraction of > -# vectors with SImode elements or larger, 0 otherwise. > - > -proc check_effective_target_vect_extract_even_odd_wide { } { > - global et_vect_extract_even_odd_wide_saved > - > - if [info exists et_vect_extract_even_odd_wide_saved] { > - verbose "check_effective_target_vect_extract_even_odd_wide: using > cached result" 2 > - } else { > - set et_vect_extract_even_odd_wide_saved 0 > - if { [istarget powerpc*-*-*] > - || [istarget i?86-*-*] > - || [istarget x86_64-*-*] > - || [istarget ia64-*-*] > - || [istarget spu-*-*] } { > - set et_vect_extract_even_odd_wide_saved 1 > - } > - } > - > - verbose "check_effective_target_vect_extract_even_wide_odd: returning > $et_vect_extract_even_odd_wide_saved" 2 > - return $et_vect_extract_even_odd_wide_saved > -} > - > # Return 1 if the target supports vector interleaving, 0 otherwise. > > proc check_effective_target_vect_interleave { } { > @@ -3184,25 +3161,6 @@ proc check_effective_target_vect_strided > return $et_vect_strided_saved > } > > -# Return 1 if the target supports vector interleaving and extract even/odd > -# for wide element types, 0 otherwise. > -proc check_effective_target_vect_strided_wide { } { > - global et_vect_strided_wide_saved > - > - if [info exists et_vect_strided_wide_saved] { > - verbose "check_effective_target_vect_strided_wide: using cached > result" 2 > - } else { > - set et_vect_strided_wide_saved 0 > - if { [check_effective_target_vect_interleave] > - && [check_effective_target_vect_extract_even_odd_wide] } { > - set et_vect_strided_wide_saved 1 > - } > - } > - > - verbose "check_effective_target_vect_strided_wide: returning > $et_vect_strided_wide_saved" 2 > - return $et_vect_strided_wide_saved > -} > - > # Return 1 if the target supports section-anchors > > proc check_effective_target_section_anchors { } { > Index: gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c > === > --- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 2011-04-12 11:53:54.0 > +0100 > +++ gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 2011-04-12 11:55:11.0 > +0100 > @@ -26,7 +26,7 @@ foo () > } > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_strided_wide } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target vect_strided_wide } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_strided } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target vect_strided } } } */ > /* { dg-final { cleanup-t
Re: [8/9] Testsuite: split tests for strided accesses
On Tue, Apr 12, 2011 at 4:19 PM, Richard Sandiford wrote: > The next patch introduces separate vect_stridedN target selectors > for each tested stride factor N. At the moment, some tests contain > several independent loops that have different stride factors. > It's easier to make the next change if we put these loops into > separate tests. > > Tested on x86_64-linux-gnu and arm-linux-gnueabi. OK to install? Ok. Thanks, Richard. > Richard > > > gcc/testsuite/ > * gcc.dg/vect/slp-11.c: Split into... > * gcc.dg/vect/slp-11a.c, gcc.dg/vect/slp-11b.c, > gcc.dg/vect/slp-11c.c: ...these tests. > * gcc.dg/vect/slp-12a.c: Split 4-stride loop into... > * gcc.dg/vect/slp-12c.c: ...this new test. > * gcc.dg/vect/slp-19.c: Split into... > * gcc.dg/vect/slp-19a.c, gcc.dg/vect/slp-19b.c, > gcc.dg/vect/slp-19c.c: ...these new tests. > > Index: gcc/testsuite/gcc.dg/vect/slp-11.c > === > --- gcc/testsuite/gcc.dg/vect/slp-11.c 2011-04-12 15:18:24.0 +0100 > +++ /dev/null 2011-03-23 08:42:11.268792848 + > @@ -1,113 +0,0 @@ > -/* { dg-require-effective-target vect_int } */ > - > -#include > -#include "tree-vect.h" > - > -#define N 8 > - > -int > -main1 () > -{ > - int i; > - unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, > b5, b6, b7; > - unsigned int in[N*8] = > {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; > - float out2[N*8]; > - > - /* Different operations - not SLPable. */ > - for (i = 0; i < N; i++) > - { > - a0 = in[i*8] + 5; > - a1 = in[i*8 + 1] * 6; > - a2 = in[i*8 + 2] + 7; > - a3 = in[i*8 + 3] + 8; > - a4 = in[i*8 + 4] + 9; > - a5 = in[i*8 + 5] + 10; > - a6 = in[i*8 + 6] + 11; > - a7 = in[i*8 + 7] + 12; > - > - b0 = a0 * 3; > - b1 = a1 * 2; > - b2 = a2 * 12; > - b3 = a3 * 5; > - b4 = a4 * 8; > - b5 = a5 * 4; > - b6 = a6 * 3; > - b7 = a7 * 2; > - > - out[i*8] = b0 - 2; > - out[i*8 + 1] = b1 - 3; > - out[i*8 + 2] = b2 - 2; > - out[i*8 + 3] = b3 - 1; > - out[i*8 + 4] = b4 - 8; > - out[i*8 + 5] = b5 - 7; > - out[i*8 + 6] = b6 - 3; > - out[i*8 + 7] = b7 - 7; > - } > - > - /* check results: */ > - for (i = 0; i < N; i++) > - { > - if (out[i*8] != (in[i*8] + 5) * 3 - 2 > - || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3 > - || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2 > - || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1 > - || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8 > - || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7 > - || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3 > - || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7) > - abort (); > - } > - > - /* Requires permutation - not SLPable. */ > - for (i = 0; i < N*2; i++) > - { > - out[i*4] = (in[i*4] + 2) * 3; > - out[i*4 + 1] = (in[i*4 + 2] + 2) * 7; > - out[i*4 + 2] = (in[i*4 + 1] + 7) * 3; > - out[i*4 + 3] = (in[i*4 + 3] + 3) * 4; > - } > - > - /* check results: */ > - for (i = 0; i < N*2; i++) > - { > - if (out[i*4] != (in[i*4] + 2) * 3 > - || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7 > - || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3 > - || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4) > - abort (); > - } > - > - /* Different operations - not SLPable. */ > - for (i = 0; i < N*4; i++) > - { > - out2[i*2] = ((float) in[i*2] * 2 + 6) ; > - out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7); > - } > - > - /* check results: */ > - for (i = 0; i < N*4; i++) > - { > - if (out2[i*2] != ((float) in[i*2] * 2 + 6) > - || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7)) > - abort (); > - } > - > - > - return 0; > -} > - > -int main (void) > -{ > - check_vect (); > - > - main1 (); > - > - return 0; > -} > - > -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target > { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target > { { { ! vect_uintfloat_cvt } && vect_strided } && vect_int_mult } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target > { ! { vect_int_mult && vect_strided } } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > } } */ > -/* { dg-final { cleanup-tree-dump "vect" } } */ > - > Index: gcc/testsuite/gcc.dg/vect/slp-11a.c > === > --- /dev/null 2011-03-23 08:42:11.268792848 + > +++ gcc/testsuite/gcc.dg/vect/slp-11a.c 2011-04-12 15:18:25.0 +0100 > @@ -0,0 +1,75 @@ > +/* { dg-require-effective-targ
Re: [10/9] Add tests for stride-3 accesses
On Tue, Apr 12, 2011 at 4:34 PM, Richard Sandiford wrote: > This patch adds a test for stride-3 accesses. I didn't add any > particularly complicated cases because I think the testsuite already > covers the interaction between the strided loads & stores and other > operations pretty well. Let me know if there's something I should > add though. > > Tested on x86_64-linux-gnu and arm-linux-gnueabi. OK to install? Ok. Thanks, Richard. > Richard > > > gcc/testsuite/ > * gcc.dg/vect/vect-strided-u16-i3.c: New test. > > Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c > === > --- /dev/null 2011-03-23 08:42:11.268792848 + > +++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c 2011-04-12 > 11:55:17.0 +0100 > @@ -0,0 +1,112 @@ > +#include > +#include "tree-vect.h" > + > +#define N 128 > + > +typedef struct { > + unsigned short a; > + unsigned short b; > + unsigned short c; > +} s; > + > +#define A(I) (I) > +#define B(I) ((I) * 2) > +#define C(I) ((unsigned short) ~((I) ^ 0x18)) > + > +void __attribute__ ((noinline)) > +check1 (s *res) > +{ > + int i; > + > + for (i = 0; i < N; i++) > + if (res[i].a != C (i) > + || res[i].b != A (i) > + || res[i].c != B (i)) > + abort (); > +} > + > +void __attribute__ ((noinline)) > +check2 (unsigned short *res) > +{ > + int i; > + > + for (i = 0; i < N; i++) > + if (res[i] != (unsigned short) (A (i) + B (i) + C (i))) > + abort (); > +} > + > +void __attribute__ ((noinline)) > +check3 (s *res) > +{ > + int i; > + > + for (i = 0; i < N; i++) > + if (res[i].a != i > + || res[i].b != i > + || res[i].c != i) > + abort (); > +} > + > +void __attribute__ ((noinline)) > +check4 (unsigned short *res) > +{ > + int i; > + > + for (i = 0; i < N; i++) > + if (res[i] != (unsigned short) (A (i) + B (i))) > + abort (); > +} > + > +void __attribute__ ((noinline)) > +main1 (s *arr) > +{ > + int i; > + s *ptr = arr; > + s res1[N]; > + unsigned short res2[N]; > + > + for (i = 0; i < N; i++) > + { > + res1[i].a = arr[i].c; > + res1[i].b = arr[i].a; > + res1[i].c = arr[i].b; > + } > + check1 (res1); > + > + for (i = 0; i < N; i++) > + res2[i] = arr[i].a + arr[i].b + arr[i].c; > + check2 (res2); > + > + for (i = 0; i < N; i++) > + { > + res1[i].a = i; > + res1[i].b = i; > + res1[i].c = i; > + } > + check3 (res1); > + > + for (i = 0; i < N; i++) > + res2[i] = arr[i].a + arr[i].b; > + check4 (res2); > +} > + > +int main (void) > +{ > + int i; > + s arr[N]; > + > + check_vect (); > + > + for (i = 0; i < N; i++) > + { > + arr[i].a = A (i); > + arr[i].b = B (i); > + arr[i].c = C (i); > + } > + main1 (arr); > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target > vect_strided3 } } } */ > +/* { dg-final { cleanup-tree-dump "vect" } } */ >
[Committed] S/390: Fix popcount expanders
Hi, the attached patch fixes a problem with the population count expanders on s390. For operand 2 a scratch register is allocated in the preparation code of the expander. Nevertheless there has been a match_operand for operand 2. It should be just a match_dup instead. Fixed with the attached patch. Bootstrapped and regression tested on s390x. Committed to 4.6 and mainline. Bye, -Andreas- 2011-04-15 Andreas Krebbel * config/s390/s390.md (popcountdi2, popcountsi2, popcounthi2): Replace match_operand with match_dup for the third operand in these expanders. Index: gcc/config/s390/s390.md === *** gcc/config/s390/s390.md.orig --- gcc/config/s390/s390.md *** *** 9330,9336 (parallel [(set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2))) (clobber (reg:CC CC_REGNUM))]) ; sllg op2, op0, 16 !(set (match_operand:DI 2 "register_operand" "") (ashift:DI (match_dup 0) (const_int 16))) ; agr op0, op2 (parallel [(set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2))) --- 9330,9336 (parallel [(set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2))) (clobber (reg:CC CC_REGNUM))]) ; sllg op2, op0, 16 !(set (match_dup 2) (ashift:DI (match_dup 0) (const_int 16))) ; agr op0, op2 (parallel [(set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2))) *** *** 9352,9358 UNSPEC_POPCNT)) (clobber (reg:CC CC_REGNUM))]) ; sllk op2, op0, 16 !(set (match_operand:SI 2 "register_operand" "") (ashift:SI (match_dup 0) (const_int 16))) ; ar op0, op2 (parallel [(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 2))) --- 9352,9358 UNSPEC_POPCNT)) (clobber (reg:CC CC_REGNUM))]) ; sllk op2, op0, 16 !(set (match_dup 2) (ashift:SI (match_dup 0) (const_int 16))) ; ar op0, op2 (parallel [(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 2))) *** *** 9374,9380 UNSPEC_POPCNT)) (clobber (reg:CC CC_REGNUM))]) ; sllk op2, op0, 8 !(set (match_operand:SI 2 "register_operand" "") (ashift:SI (match_dup 0) (const_int 8))) ; ar op0, op2 (parallel [(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 2))) --- 9374,9380 UNSPEC_POPCNT)) (clobber (reg:CC CC_REGNUM))]) ; sllk op2, op0, 8 !(set (match_dup 2) (ashift:SI (match_dup 0) (const_int 8))) ; ar op0, op2 (parallel [(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 2)))
Re: [lto/pph] Do not pack more bits than requested (issue4415044)
On Fri, Apr 15, 2011 at 04:56, Richard Guenther wrote: >> @@ -518,7 +518,8 @@ pack_ts_type_value_fields (struct bitpack_d *bp, tree >> expr) >> bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1); >> bp_pack_value (bp, TYPE_READONLY (expr), 1); >> bp_pack_value (bp, TYPE_ALIGN (expr), HOST_BITS_PER_INT); >> - bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, >> HOST_BITS_PER_INT); >> + bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, >> + BITS_PER_BITPACK_WORD); > > As we only want to stream alias-set zeros just change it to a single bit, > like > > bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0, 1); > > and on the reader side restore either a zero or -1. Ah, yes. Much better. > As for the -1 case, it's simply broken use of the interface. Which would've been caught by the assertion. How about this, we keep the asserts with #ifdef ENABLE_CHECKING. This would've saved me some ugly debugging. Diego.
Re: [lto/pph] Do not pack more bits than requested (issue4415044)
On Fri, 15 Apr 2011, Diego Novillo wrote: > On Fri, Apr 15, 2011 at 04:56, Richard Guenther wrote: > > >> @@ -518,7 +518,8 @@ pack_ts_type_value_fields (struct bitpack_d *bp, tree > >> expr) > >> bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1); > >> bp_pack_value (bp, TYPE_READONLY (expr), 1); > >> bp_pack_value (bp, TYPE_ALIGN (expr), HOST_BITS_PER_INT); > >> - bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, > >> HOST_BITS_PER_INT); > >> + bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, > >> + BITS_PER_BITPACK_WORD); > > > > As we only want to stream alias-set zeros just change it to a single bit, > > like > > > > bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0, 1); > > > > and on the reader side restore either a zero or -1. > > Ah, yes. Much better. > > > As for the -1 case, it's simply broken use of the interface. > > Which would've been caught by the assertion. How about this, we keep > the asserts with #ifdef ENABLE_CHECKING. This would've saved me some > ugly debugging. I think we should rather add the masking. The assert would only trigger for particular values, not for bogus use of the interface (you get sign-extension for signed arguments). Richard.
PATCH: PR target/48612: [4.7 Regression] vminps instruction is generated with -ftree-vectorize
Hi, This patch is approved in PR. I checked it in. H.J. --- Index: ChangeLog === --- ChangeLog (revision 172490) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-04-15 H.J. Lu + + PR target/48612 + * config/i386/sse.md (*ieee_smin3): Switch mnemonics. + (*ieee_smax3): Likewise. + 2011-04-15 Andreas Krebbel * config/s390/s390.md (popcountdi2, popcountsi2, popcounthi2): Index: config/i386/sse.md === --- config/i386/sse.md (revision 172490) +++ config/i386/sse.md (working copy) @@ -962,8 +962,8 @@ UNSPEC_IEEE_MIN))] "" "@ - vmin\t{%2, %1, %0|%0, %1, %2} - min\t{%2, %0|%0, %2}" + min\t{%2, %0|%0, %2} + vmin\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") @@ -977,8 +977,8 @@ UNSPEC_IEEE_MAX))] "" "@ - vmax\t{%2, %1, %0|%0, %1, %2} - max\t{%2, %0|%0, %2}" + max\t{%2, %0|%0, %2} + vmax\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex")
[PATCH 0/4] Devirtualization fix and improvements
Hi, the following set of patches is a bunch of early fixups and improvements to the current devirtualization mechanism which are semi-related but are meant to be applied on top of each other. One nice thing about them is that they improve SPEC 2006 473.astar by over 3% with LTO. More comments above the individual patches. Thanks, Martin
[PATCH 1/4] Remove usesess and wrong code from ipa_analyze_virtual_call_uses
Hi, ipa_analyze_virtual_call_uses contains code that was meant to deal with situation where OBJ_TYPE_REF_OBJECT is a (number of) COMPONENT_REFs on top of a dereferenced default definition SSA_NAME of a parameter. The code is useless because that never happens in the IL, if an ancestor object of a parameter is being used for a virtual call, the object in the expression is always an SSA_NAME which is assigned the proper value in a previous statement. Moreover, if it ever triggered, it might lead to wrong code because in cases like this it is necessary also to store the offset within the parameter into the indirect call graph edge information (like we do in indirect inlining). The above is done in the next patch in the series. I've split this part from it because I would like to commit it also to the 4.6 branch. I have bootstrapped and tested this on x86-64-linux without any problems. OK for trunk and the 4.6 branch? Thanks, Martin 2011-04-08 Martin Jambor * ipa-prop.c (ipa_analyze_virtual_call_uses): Remove handling of ADR_EXPRs. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -1383,18 +1383,6 @@ ipa_analyze_virtual_call_uses (struct cg if (!flag_devirtualize) return; - if (TREE_CODE (obj) == ADDR_EXPR) -{ - do - { - obj = TREE_OPERAND (obj, 0); - } - while (TREE_CODE (obj) == COMPONENT_REF); - if (TREE_CODE (obj) != MEM_REF) - return; - obj = TREE_OPERAND (obj, 0); -} - if (TREE_CODE (obj) != SSA_NAME || !SSA_NAME_IS_DEFAULT_DEF (obj)) return;
[PATCH 3/4] Simple relaxation of dynamic type change detection routine
Hi, in order to speed up astar, I had to persuade the function that decides whether a statement potentially modifies the dynamic type of an object by storing a new value to the VMT pointer to consider the following statement harmless (all types are integers of some sort): MEM[(i32 *)b2arp_3(D) + 8B] = 0; I'd like to experiment with this routine a bit more once I have some other IPA-CP infrastructure in place but at the moment I opted for a simple solution: All scalar non-pointer stores are deemed safe. VMT pointer is a compiler generated field which is a pointer so legal user code is not able to store stuff there through some fancy type casts and compiler generated code should have no reason whatsoever to that either. Therefore I believe this change is safe and useful. I have bootstrapped and tested the patch on x886_64-linux. OK for trunk? Thanks, Martin 2011-04-14 Martin Jambor * ipa-prop.c (stmt_may_be_vtbl_ptr_store): Return false for scalar non-pointer assignments. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -405,13 +405,18 @@ stmt_may_be_vtbl_ptr_store (gimple stmt) { tree lhs = gimple_assign_lhs (stmt); - if (TREE_CODE (lhs) == COMPONENT_REF - && !DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1)) - && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) + if (!AGGREGATE_TYPE_P (TREE_TYPE (lhs))) + { + if (!POINTER_TYPE_P (TREE_TYPE (lhs))) return false; - /* In the future we might want to use get_base_ref_and_offset to find -if there is a field corresponding to the offset and if so, proceed -almost like if it was a component ref. */ + + if (TREE_CODE (lhs) == COMPONENT_REF + && !DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1))) + return false; + /* In the future we might want to use get_base_ref_and_offset to find +if there is a field corresponding to the offset and if so, proceed +almost like if it was a component ref. */ + } } return true; }
[PATCH 4/4] Devirtualization based on global objects
Hi, this is the patch that actually speeds up astar (together with the previous one). It implements devirtualization based on global objects. In the past we have refrained from doing this because in general it is difficult to say whether the object is currently being constructed and so it might have a dynamic type of one of its ancestors. However, if the object's class does not have any ancestors that obviously cannot happen. Devirtualizing in such conditions is enough to change a virtual call to regwayobj::isaddtobound in 473.astar to a direct one which can and should be inlined. That seemed a good justification to implement this and so the patch below does so and brings about 3.1% speedup for that benchmark with LTO. I acknowledge that instead of discarding all classes with ancestors it would be better to check that the called virtual method has the same implementation in all ancestors instead. That is perhaps something for later. It took me surprisingly long to realize that this technique can be used for folding virtual calls based on local automatically allocated objexts too and so can be used to un-XFAIL g++.dg/opt/devirt1.c that regressed in 4.6. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2011-04-15 Martin Jambor * ipa-cp.c (ipcp_process_devirtualization_opportunities): Devirtualize also according to actual contants. * gimple-fold.c (gimple_extract_devirt_binfo_from_cst): New function. (gimple_fold_obj_type_ref_call): New function. (gimple_fold_call): Call gimple_fold_obj_type_ref_call on OBJ_TYPE_REFs. * gimple.h (gimple_extract_devirt_binfo_from_cst): Declare. * testsuite/g++.dg/opt/devirt1.C: Bump to -O2, remove XFAIL. * testsuite/g++.dg/opt/devirt2.C: New test. * testsuite/g++.dg/ipa/devirt-g-1.C: Likewise. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -1245,51 +1245,71 @@ ipcp_process_devirtualization_opportunit for (ie = node->indirect_calls; ie; ie = next_ie) { - int param_index, types_count, j; + int param_index; HOST_WIDE_INT token, anc_offset; tree target, delta, otr_type; + struct ipcp_lattice *lat; next_ie = ie->next_callee; if (!ie->indirect_info->polymorphic) continue; param_index = ie->indirect_info->param_index; - if (param_index == -1 - || ipa_param_cannot_devirtualize_p (info, param_index) - || ipa_param_types_vec_empty (info, param_index)) + if (param_index == -1) continue; + lat = ipcp_get_lattice (info, param_index); token = ie->indirect_info->otr_token; anc_offset = ie->indirect_info->anc_offset; otr_type = ie->indirect_info->otr_type; target = NULL_TREE; - types_count = VEC_length (tree, info->params[param_index].types); - for (j = 0; j < types_count; j++) + if (lat->type == IPA_CONST_VALUE) { - tree binfo = VEC_index (tree, info->params[param_index].types, j); - tree d, t; - + tree binfo = gimple_extract_devirt_binfo_from_cst (lat->constant); + if (!binfo) + continue; binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); if (!binfo) - { - target = NULL_TREE; - break; - } + continue; + target = gimple_get_virt_mehtod_for_binfo (token, binfo, &delta, +false); + } + else + { + int types_count, j; - t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); - if (!t) - { - target = NULL_TREE; - break; - } - else if (!target) - { - target = t; - delta = d; - } - else if (target != t || !tree_int_cst_equal (delta, d)) + if (ipa_param_cannot_devirtualize_p (info, param_index) + || ipa_param_types_vec_empty (info, param_index)) + continue; + + types_count = VEC_length (tree, info->params[param_index].types); + for (j = 0; j < types_count; j++) { - target = NULL_TREE; - break; + tree binfo = VEC_index (tree, info->params[param_index].types, j); + tree d, t; + + binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); + if (!binfo) + { + target = NULL_TREE; + break; + } + + t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); + if (!t) + { + target = NULL_TREE; + break; + } + else if (!target) + { + target = t; + delta = d; + } +
[PATCH 2/4] Handle calls to ancestor objects in IPA-CP devirtualization
Hi, early inlining can create virtual calls based on the part of an object that represents an ancestor. This patch makes ipa-prop analysis able to recognize such calls and store the required offset along with such calls (the field is already there for similar purposes of indirect inlining). The constant propagation is then made aware of the offset field and takes it into account when looking up the proper BINFO. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2011-04-13 Martin Jambor * ipa-cp.c (ipcp_process_devirtualization_opportunities): Take into account anc_offset and otr_type from the indirect edge info. * ipa-prop.c (get_ancestor_addr_info): New function. (compute_complex_ancestor_jump_func): Assignment analysis moved to get_ancestor_addr_info, call it. (ipa_note_param_call): Do not initialize information about polymorphic calls, return the indirect call graph edge. Remove the last parameter, adjust all callers. (ipa_analyze_virtual_call_uses): Process also calls to ancestors of parameters. Initialize polymorphic information in the indirect edge. * testsuite/g++.dg/ipa/devirt-7.C: New test. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -1246,8 +1246,8 @@ ipcp_process_devirtualization_opportunit for (ie = node->indirect_calls; ie; ie = next_ie) { int param_index, types_count, j; - HOST_WIDE_INT token; - tree target, delta; + HOST_WIDE_INT token, anc_offset; + tree target, delta, otr_type; next_ie = ie->next_callee; if (!ie->indirect_info->polymorphic) @@ -1259,14 +1259,23 @@ ipcp_process_devirtualization_opportunit continue; token = ie->indirect_info->otr_token; + anc_offset = ie->indirect_info->anc_offset; + otr_type = ie->indirect_info->otr_type; target = NULL_TREE; types_count = VEC_length (tree, info->params[param_index].types); for (j = 0; j < types_count; j++) { tree binfo = VEC_index (tree, info->params[param_index].types, j); - tree d; - tree t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); + tree d, t; + binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); + if (!binfo) + { + target = NULL_TREE; + break; + } + + t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); if (!t) { target = NULL_TREE; Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -576,6 +576,49 @@ compute_complex_assign_jump_func (struct } } +/* Extract the base, offset and MEM_REF expression from a statement ASSIGN if + it looks like: + + iftmp.1_3 = &obj_2(D)->D.1762; + + The base of the MEM_REF must be a default definition SSA NAME of a + parameter. Return NULL_TREE if it looks otherwise. If case of success, the + whole MEM_REF expression is returned and the offset calculated from any + handled components and the MEM_REF itself is stored into *OFFSET. The whole + RHS stripped off the ADDR_EXPR is stored into *OBJ_P. */ + +static tree +get_ancestor_addr_info (gimple assign, tree *obj_p, HOST_WIDE_INT *offset) +{ + HOST_WIDE_INT size, max_size; + tree expr, parm, obj; + + if (!gimple_assign_single_p (assign)) +return NULL_TREE; + expr = gimple_assign_rhs1 (assign); + + if (TREE_CODE (expr) != ADDR_EXPR) +return NULL_TREE; + expr = TREE_OPERAND (expr, 0); + obj = expr; + expr = get_ref_base_and_extent (expr, offset, &size, &max_size); + + if (TREE_CODE (expr) != MEM_REF + /* If this is a varying address, punt. */ + || max_size == -1 + || max_size != size) +return NULL_TREE; + parm = TREE_OPERAND (expr, 0); + if (TREE_CODE (parm) != SSA_NAME + || !SSA_NAME_IS_DEFAULT_DEF (parm) + || *offset < 0) +return NULL_TREE; + + *offset += mem_ref_offset (expr).low * BITS_PER_UNIT; + *obj_p = obj; + return expr; +} + /* Given that an actual argument is an SSA_NAME that is a result of a phi statement PHI, try to find out whether NAME is in fact a @@ -603,7 +646,7 @@ compute_complex_ancestor_jump_func (stru struct ipa_jump_func *jfunc, gimple call, gimple phi) { - HOST_WIDE_INT offset, size, max_size; + HOST_WIDE_INT offset; gimple assign, cond; basic_block phi_bb, assign_bb, cond_bb; tree tmp, parm, expr, obj; @@ -626,29 +669,12 @@ compute_complex_ancestor_jump_func (stru assign = SSA_NAME_DEF_STMT (tmp); assign_bb = gimple_bb (assign); - if (!single_pred_p (assign_bb) - || !gimple_assign_single_p (assign)) + if (!single_pred_p (assign_bb)) return; - expr = gimple
Re: [lto/pph] Do not pack more bits than requested (issue4415044)
On Fri, Apr 15, 2011 at 08:49, Richard Guenther wrote: > On Fri, 15 Apr 2011, Diego Novillo wrote: > >> On Fri, Apr 15, 2011 at 04:56, Richard Guenther wrote: >> >> >> @@ -518,7 +518,8 @@ pack_ts_type_value_fields (struct bitpack_d *bp, tree >> >> expr) >> >> bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1); >> >> bp_pack_value (bp, TYPE_READONLY (expr), 1); >> >> bp_pack_value (bp, TYPE_ALIGN (expr), HOST_BITS_PER_INT); >> >> - bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, >> >> HOST_BITS_PER_INT); >> >> + bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0 ? 0 : -1, >> >> + BITS_PER_BITPACK_WORD); >> > >> > As we only want to stream alias-set zeros just change it to a single bit, >> > like >> > >> > bp_pack_value (bp, TYPE_ALIAS_SET (expr) == 0, 1); >> > >> > and on the reader side restore either a zero or -1. >> >> Ah, yes. Much better. >> >> > As for the -1 case, it's simply broken use of the interface. >> >> Which would've been caught by the assertion. How about this, we keep >> the asserts with #ifdef ENABLE_CHECKING. This would've saved me some >> ugly debugging. > > I think we should rather add the masking. The assert would only > trigger for particular values, not for bogus use of the interface > (you get sign-extension for signed arguments). OK, that works too. I'll prepare a patch. Diego.
Re: [6/9] NEON vec_load_lanes and vec_store_lanes patterns
On Tue, 2011-04-12 at 15:01 +0100, Richard Sandiford wrote: > This patch adds vec_load_lanes and vec_store_lanes patterns for NEON. > They feed directly into the corresponding intrinsic patterns. > > Tested on x86_64-linux-gnu and arm-linux-gnueabi. OK to install? > > Richard > > > gcc/ > * config/arm/neon.md (vec_load_lanes): New expanders, > (vec_store_lanes): Likewise. > OK. R.
[PATCH] factor asm op chaining out from stmt.c:expand_asm_stmt
There are several cut-and-pasted loops in expand_asm_stmt that could be parameterized by the functions used to access the particular operands. The patch below does that. Tested on x86_64-unknown-linux-gnu. OK to commit? -Nathan * stmt.c (chain_asm_ops): New function. (expand_asm_stmt): Call it. diff --git a/gcc/stmt.c b/gcc/stmt.c index 1a9f9e5..1fc09e9 100644 --- a/gcc/stmt.c +++ b/gcc/stmt.c @@ -1114,13 +1114,34 @@ expand_asm_operands (tree string, tree outputs, tree inputs, free_temp_slots (); } +/* Return the operands of STMT, a GIMPLE_ASM, as described by OP_FN and + N_OPS connected via TREE_CHAIN. */ + +static tree +chain_asm_ops (const_gimple stmt, unsigned (*n_ops) (const_gimple), + tree (*op_fn) (const_gimple, unsigned)) +{ + unsigned i, n; + tree ret = NULL_TREE, t; + + n = (*n_ops) (stmt); + if (n > 0) +{ + t = ret = (*op_fn) (stmt, 0); + for (i = 1; i < n; i++) + t = TREE_CHAIN (t) = (*op_fn) (stmt, i); +} + + return ret; +} + void expand_asm_stmt (gimple stmt) { int noutputs; - tree outputs, tail, t; + tree outputs, tail; tree *o; - size_t i, n; + size_t i; const char *s; tree str, out, in, cl, labels; location_t locus = gimple_location (stmt); @@ -1128,41 +1149,10 @@ expand_asm_stmt (gimple stmt) /* Meh... convert the gimple asm operands into real tree lists. Eventually we should make all routines work on the vectors instead of relying on TREE_CHAIN. */ - out = NULL_TREE; - n = gimple_asm_noutputs (stmt); - if (n > 0) -{ - t = out = gimple_asm_output_op (stmt, 0); - for (i = 1; i < n; i++) - t = TREE_CHAIN (t) = gimple_asm_output_op (stmt, i); -} - - in = NULL_TREE; - n = gimple_asm_ninputs (stmt); - if (n > 0) -{ - t = in = gimple_asm_input_op (stmt, 0); - for (i = 1; i < n; i++) - t = TREE_CHAIN (t) = gimple_asm_input_op (stmt, i); -} - - cl = NULL_TREE; - n = gimple_asm_nclobbers (stmt); - if (n > 0) -{ - t = cl = gimple_asm_clobber_op (stmt, 0); - for (i = 1; i < n; i++) - t = TREE_CHAIN (t) = gimple_asm_clobber_op (stmt, i); -} - - labels = NULL_TREE; - n = gimple_asm_nlabels (stmt); - if (n > 0) -{ - t = labels = gimple_asm_label_op (stmt, 0); - for (i = 1; i < n; i++) - t = TREE_CHAIN (t) = gimple_asm_label_op (stmt, i); -} + out = chain_asm_ops (stmt, gimple_asm_noutputs, gimple_asm_output_op); + in = chain_asm_ops (stmt, gimple_asm_ninputs, gimple_asm_input_op); + cl = chain_asm_ops (stmt, gimple_asm_nclobbers, gimple_asm_clobber_op); + labels = chain_asm_ops (stmt, gimple_asm_nlabels, gimple_asm_label_op); s = gimple_asm_string (stmt); str = build_string (strlen (s), s);
Re: [PATCH v3] Re: avoid useless if-before-free tests
> "Janne" == Janne Blomqvist writes: Jim> Can someone add me to the gcc group? That would help. Jim> I already have ssh access to sourceware.org. Janne> I'm not sure if I'm considered to be well-established Janne> enough, so could someone help Jim out here, please? I added Jim to the gcc group. Tom
[PATCH] refactor gimple asm memory clobber checking
There are a couple places that check GIMPLE_ASMs for clobbering memory; this patch centralizes the logic in gimple.c. Tested on x86_64-unknown-linux-gnu. OK to commit? -Nathan * gimple.h (gimple_asm_clobbers_memory_p): Declare. * gimple.c (gimple_asm_clobbers_memory_p): Define. * ipa-pure-const.c (check_stmt): Call it. * tree-ssa-operands.c (get_asm_expr_operands): Likewise. diff --git a/gcc/gimple.c b/gcc/gimple.c index 090fc94..5dc62ea 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -5142,4 +5142,21 @@ gimple_call_builtin_p (gimple stmt, enum built_in_function code) && DECL_FUNCTION_CODE (fndecl) == code); } +/* Return true if STMT clobbers memory. STMT is required to be a + GIMPLE_ASM. */ + +bool +gimple_asm_clobbers_memory_p (const_gimple stmt) +{ + unsigned i; + + for (i = 0; i < gimple_asm_nclobbers (stmt); i++) +{ + tree op = gimple_asm_clobber_op (stmt, i); + if (strcmp (TREE_STRING_POINTER (TREE_VALUE (op)), "memory") == 0) + return true; +} + + return false; +} #include "gt-gimple.h" diff --git a/gcc/gimple.h b/gcc/gimple.h index 572cabc..840e149 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -973,6 +973,7 @@ extern bool walk_stmt_load_store_ops (gimple, void *, bool (*)(gimple, tree, void *)); extern bool gimple_ior_addresses_taken (bitmap, gimple); extern bool gimple_call_builtin_p (gimple, enum built_in_function); +extern bool gimple_asm_clobbers_memory_p (const_gimple); /* In gimplify.c */ extern tree create_tmp_var_raw (tree, const char *); diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index b7deb57..eb5b0f6 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -639,7 +639,6 @@ static void check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa) { gimple stmt = gsi_stmt (*gsip); - unsigned int i = 0; if (is_gimple_debug (stmt)) return; @@ -693,16 +692,12 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa) } break; case GIMPLE_ASM: - for (i = 0; i < gimple_asm_nclobbers (stmt); i++) + if (gimple_asm_clobbers_memory_p (stmt)) { - tree op = gimple_asm_clobber_op (stmt, i); - if (strcmp (TREE_STRING_POINTER (TREE_VALUE (op)), "memory") == 0) - { - if (dump_file) -fprintf (dump_file, "memory asm clobber is not const/pure"); - /* Abandon all hope, ye who enter here. */ - local->pure_const_state = IPA_NEITHER; - } + if (dump_file) + fprintf (dump_file, "memory asm clobber is not const/pure"); + /* Abandon all hope, ye who enter here. */ + local->pure_const_state = IPA_NEITHER; } if (gimple_asm_volatile_p (stmt)) { diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c index 57f443f..7f76cbf 100644 --- a/gcc/tree-ssa-operands.c +++ b/gcc/tree-ssa-operands.c @@ -832,15 +832,8 @@ get_asm_expr_operands (gimple stmt) } /* Clobber all memory and addressable symbols for asm ("" : : : "memory"); */ - for (i = 0; i < gimple_asm_nclobbers (stmt); i++) -{ - tree link = gimple_asm_clobber_op (stmt, i); - if (strcmp (TREE_STRING_POINTER (TREE_VALUE (link)), "memory") == 0) - { - add_virtual_operand (stmt, opf_def); - break; - } -} + if (gimple_asm_clobbers_memory_p (stmt)) +add_virtual_operand (stmt, opf_def); }
Re: [patch, ARM] Fix PR48325, support POST_INC/PRE_DEC for NEON struct modes
On Thu, 2011-03-31 at 22:57 +0800, Chung-Lin Tang wrote: > This PR doesn't exactly trigger currently on trunk; a REG_DEAD note that > occurs in trunk, but not in the 4.5-based compilers which this bug was > reported for, currently blocks auto-inc-dec from doing its job, and just > happens to avoid this ICE. > > The problem is vldmia/db, which do exist, are currently not enabled for > NEON struct modes (OI, XI, etc.) This only needs a small patch to > neon_struct_mem_operand() to work. Okay for trunk? > > Thanks, > Chung-Lin > > 2011-03-31 Chung-Lin Tang > > * config/arm/arm.c (neon_struct_mem_operand): > Support POST_INC/PRE_DEC memory operands. OK. R.
Re: [PATCH, ARM] PR47855 Compute attr "length" for some thumb2 insns, 2/3
On Thu, 2011-04-14 at 21:19 +0800, Carrot Wei wrote: > On Fri, Apr 8, 2011 at 6:51 PM, Ramana Radhakrishnan > wrote: > > On 08/04/11 10:57, Carrot Wei wrote: > >> > >> Hi > >> > >> This is the second part of the fixing for > >> > >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47855 > >> > >> This patch contains the length computation for insn patterns > >> "*arm_movqi_insn" > >> and "*arm_addsi3". Since the alternatives and encodings are much more > >> complex, > >> the attribute length is computed in separate C functions. > > > Sorry, no. This is potentially a maintenance pain. It hardcodes alternatives > > from a pattern elsewhere in the C file. I don't like doing this unless we > > have to with the sync primitives or with push_multi. In this case I'm not > > convinced we need such functions in the .c file. > > > > Why can't we use the "enabled" attribute here with appropriate constraints > > for everything other than the memory cases (even there we might be able to > > invent some new constraints) ? > > > > Also a note about programming style. There are the helper macros like REG_P, > > CONST_INT_P and MEM_P which remove the necessity for checks like > > > > GET_CODE (x) == y where y E { REG, CONST_INT, MEM} > > Hi Ramana > > As you suggested I created several new constraints, and use the > "enabled" attribute to split the current alternatives in this new > patch. It has been tested on arm qemu without regression. > > thanks > Carrot Sorry, I don't think this approach can work. Certainly not with the way the compiler currently works, and especially for mov and add insns. These instructions are only 2 bytes long if either: 1) They clobber the condition code register or 2) They occur inside an IT block. We can't tell either of these from the pattern, so you're underestimating the length of the instruction in some circumstances by claiming that they are only 2 bytes long. That /will/ lead to broken code someday. We can't add potential clobbers to mov and add patterns because that will break reload which relies on these patterns being simple-set insns with no added baggage. It *might* be possible to add clobbers to other operations, but that will then most-likely upset instruction scheduling (I think the scheduler treats two insns that clobber the same hard reg as being strongly ordered). Putting in the clobber too early will certainly affect cond-exec generation. In short, I'm not aware of a simple way to address this problem so that we get accurate length information, but minimal impact on other passes in the compiler. R.
Re: Implement stack arrays even for unknown sizes
Michael, > Yes, this is due to the DECL_EXPR statement which is rendered by the > dumper just the same as a normal decl. The testcase looks for exactly one > such decl, but with -fstack-arrays there are exactly two for each such > array. The testsuite is run without -fstack-arrays, so I dont' understand why the "DECL_EXPR statement" appears. Dominique
[PATCH] Fix PR48290
This fixes the remaining part of PR48290, copyprop not properly propagating constant copies across PHI nodes. On the way this patch needs to fix some present issues with the code inhibiting various kinds of propagation (but not removing those two that look completely bogus). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-04-15 Richard Guenther PR tree-optimization/48290 * tree-ssa-copy.c (copy_prop_visit_phi_node): Propagate constants. Properly decide inhibiting propagation based on the valueized operand. Do loop-closed SSA form preserving here ... (init_copy_prop): ... not here. Index: gcc/tree-ssa-copy.c === *** gcc/tree-ssa-copy.c (revision 172485) --- gcc/tree-ssa-copy.c (working copy) *** copy_prop_visit_phi_node (gimple phi) *** 567,572 --- 567,573 for (i = 0; i < gimple_phi_num_args (phi); i++) { prop_value_t *arg_val; + tree arg_value; tree arg = gimple_phi_arg_def (phi, i); edge e = gimple_phi_arg_edge (phi, i); *** copy_prop_visit_phi_node (gimple phi) *** 575,598 if (!(e->flags & EDGE_EXECUTABLE)) continue; ! /* Constants in the argument list never generate a useful copy. !Similarly, names that flow through abnormal edges cannot be !used to derive copies. */ ! if (TREE_CODE (arg) != SSA_NAME || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg)) ! { ! phi_val.value = lhs; ! break; ! } ! ! /* Avoid copy propagation from an inner into an outer loop. !Otherwise, this may move loop variant variables outside of !their loops and prevent coalescing opportunities. If the !value was loop invariant, it will be hoisted by LICM and !exposed for copy propagation. Not a problem for virtual !operands though. !??? The value will be always loop invariant. */ ! if (is_gimple_reg (lhs) ! && loop_depth_of_name (arg) > loop_depth_of_name (lhs)) { phi_val.value = lhs; break; --- 576,584 if (!(e->flags & EDGE_EXECUTABLE)) continue; ! /* Names that flow through abnormal edges cannot be used to !derive copies. */ ! if (TREE_CODE (arg) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg)) { phi_val.value = lhs; break; *** copy_prop_visit_phi_node (gimple phi) *** 605,630 fprintf (dump_file, "\n"); } ! arg_val = get_copy_of_val (arg); ! /* If we didn't visit the definition of arg yet treat it as ! UNDEFINED. This also handles PHI arguments that are the !same as lhs. We'll come here again. */ ! if (!arg_val->value) ! continue; /* If the LHS didn't have a value yet, make it a copy of the first argument we find. */ if (phi_val.value == NULL_TREE) { ! phi_val.value = arg_val->value; continue; } /* If PHI_VAL and ARG don't have a common copy-of chain, then this PHI node cannot be a copy operation. */ ! if (phi_val.value != arg_val->value ! && !operand_equal_p (phi_val.value, arg_val->value, 0)) { phi_val.value = lhs; break; --- 591,641 fprintf (dump_file, "\n"); } ! if (TREE_CODE (arg) == SSA_NAME) ! { ! arg_val = get_copy_of_val (arg); ! ! /* If we didn't visit the definition of arg yet treat it as !UNDEFINED. This also handles PHI arguments that are the !same as lhs. We'll come here again. */ ! if (!arg_val->value) ! continue; ! arg_value = arg_val->value; ! } ! else ! arg_value = valueize_val (arg); ! ! /* Avoid copy propagation from an inner into an outer loop. !Otherwise, this may move loop variant variables outside of !their loops and prevent coalescing opportunities. If the !value was loop invariant, it will be hoisted by LICM and !exposed for copy propagation. !??? The value will be always loop invariant. !In loop-closed SSA form do not copy-propagate through !PHI nodes in blocks with a loop exit edge predecessor. */ ! if (current_loops ! && TREE_CODE (arg_value) == SSA_NAME ! && (loop_depth_of_name (arg_value) > loop_depth_of_name (lhs) ! || (loops_state_satisfies_p (LOOP_CLOSED_SSA) ! && loop_exit_edge_p (e->src->loop_father, e ! { ! phi_val.value = lhs; ! break; ! } /* If the LHS didn't have a value yet, make it a copy of the first argument we find. */ if (phi_val.value == NULL_TREE) { ! phi_val.value = arg_value; co
[PATCH] Fix PR48286
This makes us avoid 323. Tested on x86_64-unknown-linux-gnu/-m32, committed. Richard. 2011-04-15 Richard Guenther PR testsuite/48286 * gfortran.dg/cray_pointers_8.f90: Use -ffloat-store. gcc/testsuite/gfortran.dg/cray_pointers_8.f90 Index: gcc/testsuite/gfortran.dg/cray_pointers_8.f90 === --- gcc/testsuite/gfortran.dg/cray_pointers_8.f90 (revision 172493) +++ gcc/testsuite/gfortran.dg/cray_pointers_8.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-do run } -! { dg-options "-fcray-pointer" } +! { dg-options "-fcray-pointer -ffloat-store" } ! ! Test the fix for PR36528 in which the Cray pointer was not passed ! correctly to 'euler' so that an undefined reference to fcn was
Re: [PATCH] factor asm op chaining out from stmt.c:expand_asm_stmt
On Fri, Apr 15, 2011 at 3:19 PM, Nathan Froyd wrote: > There are several cut-and-pasted loops in expand_asm_stmt that could be > parameterized by the functions used to access the particular operands. > The patch below does that. > > Tested on x86_64-unknown-linux-gnu. OK to commit? Hmm, it doesn't make it easier to follow though. Just my 2cents, Richard. > -Nathan > > * stmt.c (chain_asm_ops): New function. > (expand_asm_stmt): Call it. > > diff --git a/gcc/stmt.c b/gcc/stmt.c > index 1a9f9e5..1fc09e9 100644 > --- a/gcc/stmt.c > +++ b/gcc/stmt.c > @@ -1114,13 +1114,34 @@ expand_asm_operands (tree string, tree outputs, tree > inputs, > free_temp_slots (); > } > > +/* Return the operands of STMT, a GIMPLE_ASM, as described by OP_FN and > + N_OPS connected via TREE_CHAIN. */ > + > +static tree > +chain_asm_ops (const_gimple stmt, unsigned (*n_ops) (const_gimple), > + tree (*op_fn) (const_gimple, unsigned)) > +{ > + unsigned i, n; > + tree ret = NULL_TREE, t; > + > + n = (*n_ops) (stmt); > + if (n > 0) > + { > + t = ret = (*op_fn) (stmt, 0); > + for (i = 1; i < n; i++) > + t = TREE_CHAIN (t) = (*op_fn) (stmt, i); > + } > + > + return ret; > +} > + > void > expand_asm_stmt (gimple stmt) > { > int noutputs; > - tree outputs, tail, t; > + tree outputs, tail; > tree *o; > - size_t i, n; > + size_t i; > const char *s; > tree str, out, in, cl, labels; > location_t locus = gimple_location (stmt); > @@ -1128,41 +1149,10 @@ expand_asm_stmt (gimple stmt) > /* Meh... convert the gimple asm operands into real tree lists. > Eventually we should make all routines work on the vectors instead > of relying on TREE_CHAIN. */ > - out = NULL_TREE; > - n = gimple_asm_noutputs (stmt); > - if (n > 0) > - { > - t = out = gimple_asm_output_op (stmt, 0); > - for (i = 1; i < n; i++) > - t = TREE_CHAIN (t) = gimple_asm_output_op (stmt, i); > - } > - > - in = NULL_TREE; > - n = gimple_asm_ninputs (stmt); > - if (n > 0) > - { > - t = in = gimple_asm_input_op (stmt, 0); > - for (i = 1; i < n; i++) > - t = TREE_CHAIN (t) = gimple_asm_input_op (stmt, i); > - } > - > - cl = NULL_TREE; > - n = gimple_asm_nclobbers (stmt); > - if (n > 0) > - { > - t = cl = gimple_asm_clobber_op (stmt, 0); > - for (i = 1; i < n; i++) > - t = TREE_CHAIN (t) = gimple_asm_clobber_op (stmt, i); > - } > - > - labels = NULL_TREE; > - n = gimple_asm_nlabels (stmt); > - if (n > 0) > - { > - t = labels = gimple_asm_label_op (stmt, 0); > - for (i = 1; i < n; i++) > - t = TREE_CHAIN (t) = gimple_asm_label_op (stmt, i); > - } > + out = chain_asm_ops (stmt, gimple_asm_noutputs, gimple_asm_output_op); > + in = chain_asm_ops (stmt, gimple_asm_ninputs, gimple_asm_input_op); > + cl = chain_asm_ops (stmt, gimple_asm_nclobbers, gimple_asm_clobber_op); > + labels = chain_asm_ops (stmt, gimple_asm_nlabels, gimple_asm_label_op); > > s = gimple_asm_string (stmt); > str = build_string (strlen (s), s); >
Re: [PATCH] refactor gimple asm memory clobber checking
On Fri, Apr 15, 2011 at 3:20 PM, Nathan Froyd wrote: > There are a couple places that check GIMPLE_ASMs for clobbering memory; > this patch centralizes the logic in gimple.c. > > Tested on x86_64-unknown-linux-gnu. OK to commit? Ok. Thanks, Richard. > -Nathan > > * gimple.h (gimple_asm_clobbers_memory_p): Declare. > * gimple.c (gimple_asm_clobbers_memory_p): Define. > * ipa-pure-const.c (check_stmt): Call it. > * tree-ssa-operands.c (get_asm_expr_operands): Likewise. > > diff --git a/gcc/gimple.c b/gcc/gimple.c > index 090fc94..5dc62ea 100644 > --- a/gcc/gimple.c > +++ b/gcc/gimple.c > @@ -5142,4 +5142,21 @@ gimple_call_builtin_p (gimple stmt, enum > built_in_function code) > && DECL_FUNCTION_CODE (fndecl) == code); > } > > +/* Return true if STMT clobbers memory. STMT is required to be a > + GIMPLE_ASM. */ > + > +bool > +gimple_asm_clobbers_memory_p (const_gimple stmt) > +{ > + unsigned i; > + > + for (i = 0; i < gimple_asm_nclobbers (stmt); i++) > + { > + tree op = gimple_asm_clobber_op (stmt, i); > + if (strcmp (TREE_STRING_POINTER (TREE_VALUE (op)), "memory") == 0) > + return true; > + } > + > + return false; > +} > #include "gt-gimple.h" > diff --git a/gcc/gimple.h b/gcc/gimple.h > index 572cabc..840e149 100644 > --- a/gcc/gimple.h > +++ b/gcc/gimple.h > @@ -973,6 +973,7 @@ extern bool walk_stmt_load_store_ops (gimple, void *, > bool (*)(gimple, tree, void *)); > extern bool gimple_ior_addresses_taken (bitmap, gimple); > extern bool gimple_call_builtin_p (gimple, enum built_in_function); > +extern bool gimple_asm_clobbers_memory_p (const_gimple); > > /* In gimplify.c */ > extern tree create_tmp_var_raw (tree, const char *); > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c > index b7deb57..eb5b0f6 100644 > --- a/gcc/ipa-pure-const.c > +++ b/gcc/ipa-pure-const.c > @@ -639,7 +639,6 @@ static void > check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa) > { > gimple stmt = gsi_stmt (*gsip); > - unsigned int i = 0; > > if (is_gimple_debug (stmt)) > return; > @@ -693,16 +692,12 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state > local, bool ipa) > } > break; > case GIMPLE_ASM: > - for (i = 0; i < gimple_asm_nclobbers (stmt); i++) > + if (gimple_asm_clobbers_memory_p (stmt)) > { > - tree op = gimple_asm_clobber_op (stmt, i); > - if (strcmp (TREE_STRING_POINTER (TREE_VALUE (op)), "memory") == 0) > - { > - if (dump_file) > - fprintf (dump_file, " memory asm clobber is not > const/pure"); > - /* Abandon all hope, ye who enter here. */ > - local->pure_const_state = IPA_NEITHER; > - } > + if (dump_file) > + fprintf (dump_file, " memory asm clobber is not const/pure"); > + /* Abandon all hope, ye who enter here. */ > + local->pure_const_state = IPA_NEITHER; > } > if (gimple_asm_volatile_p (stmt)) > { > diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c > index 57f443f..7f76cbf 100644 > --- a/gcc/tree-ssa-operands.c > +++ b/gcc/tree-ssa-operands.c > @@ -832,15 +832,8 @@ get_asm_expr_operands (gimple stmt) > } > > /* Clobber all memory and addressable symbols for asm ("" : : : "memory"); > */ > - for (i = 0; i < gimple_asm_nclobbers (stmt); i++) > - { > - tree link = gimple_asm_clobber_op (stmt, i); > - if (strcmp (TREE_STRING_POINTER (TREE_VALUE (link)), "memory") == 0) > - { > - add_virtual_operand (stmt, opf_def); > - break; > - } > - } > + if (gimple_asm_clobbers_memory_p (stmt)) > + add_virtual_operand (stmt, opf_def); > } > > >
patch pings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01060.html http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02247.html -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNqFOfAAoJEBRtltQi2kC7ijYH/ibLT2HNaFXjdE9SKT5Ih1JV dTEWPnY7QBP5Xe6FVwZ09ibPCOxJsK7yGdQBYqy5gRQor8ebifI4kenwcBcdET/m NogdM8DPWbuhgGda7cETNkru7FifSe9mRsKQGhNzVQl48oEKWmcGkm/l3a7gndfD cX08lFzqIH1wcPWUqQPf6gcUMRE4W/0j92E4PEoIbigMoSIRFcduVouBlld8NLBV aicTihAC+MFMVKSkpXGjMLCbn/HkNOyV9s9T+Or1/XuCIZy9zlB1JSfR/EJqVmPm mNmg0bUrtzXN2uImL6ohIV32i732KKcNhNm/FupZTHfbI50nDc1h8SnCgEHc2t0= =i7zA -END PGP SIGNATURE-
Re: Implement stack arrays even for unknown sizes
Hi, On Fri, 15 Apr 2011, Dominique Dhumieres wrote: > Michael, > > > Yes, this is due to the DECL_EXPR statement which is rendered by the > > dumper just the same as a normal decl. The testcase looks for exactly one > > such decl, but with -fstack-arrays there are exactly two for each such > > array. > > The testsuite is run without -fstack-arrays, so I dont' understand why > the "DECL_EXPR statement" appears. Bummer, you're right. I unconditionally emit a DECL_EXPR for arrays even when they don't have a variable length. It's harmless, but makes the testcase fail (I wasn't seeing the fail because I've changed the testcase already to make it not fail with -fstack-arrays). I'll make the DECL_EXPR conditional on the size being variable. As Tobias already okayed the patch I'm planning to check in the slightly modified variant as below, after a new round of testing. Ciao, Michael. * trans-array.c (toplevel): Include gimple.h. (gfc_trans_allocate_array_storage): Check flag_stack_arrays, properly expand variable length arrays. (gfc_trans_auto_array_allocation): If flag_stack_arrays create variable length decls and associate them with their scope. * gfortran.h (gfc_option_t): Add flag_stack_arrays member. * options.c (gfc_init_options): Handle -fstack_arrays option. * lang.opt (fstack-arrays): Add option. * invoke.texi (Code Gen Options): Document it. * Make-lang.in (trans-array.o): Depend on GIMPLE_H. Index: fortran/trans-array.c === *** fortran/trans-array.c (revision 172431) --- fortran/trans-array.c (working copy) *** along with GCC; see the file COPYING3. *** 81,86 --- 81,87 #include "system.h" #include "coretypes.h" #include "tree.h" + #include "gimple.h" #include "diagnostic-core.h" /* For internal_error/fatal_error. */ #include "flags.h" #include "gfortran.h" *** gfc_trans_allocate_array_storage (stmtbl *** 630,647 { /* Allocate the temporary. */ onstack = !dynamic && initial == NULL_TREE !&& gfc_can_put_var_on_stack (size); if (onstack) { /* Make a temporary variable to hold the data. */ tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (nelem), nelem, gfc_index_one_node); tmp = build_range_type (gfc_array_index_type, gfc_index_zero_node, tmp); tmp = build_array_type (gfc_get_element_type (TREE_TYPE (desc)), tmp); tmp = gfc_create_var (tmp, "A"); tmp = gfc_build_addr_expr (NULL_TREE, tmp); gfc_conv_descriptor_data_set (pre, desc, tmp); } --- 631,657 { /* Allocate the temporary. */ onstack = !dynamic && initial == NULL_TREE !&& (gfc_option.flag_stack_arrays !|| gfc_can_put_var_on_stack (size)); if (onstack) { /* Make a temporary variable to hold the data. */ tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (nelem), nelem, gfc_index_one_node); + tmp = gfc_evaluate_now (tmp, pre); tmp = build_range_type (gfc_array_index_type, gfc_index_zero_node, tmp); tmp = build_array_type (gfc_get_element_type (TREE_TYPE (desc)), tmp); tmp = gfc_create_var (tmp, "A"); + /* If we're here only because of -fstack-arrays we have to +emit a DECL_EXPR to make the gimplifier emit alloca calls. */ + if (!gfc_can_put_var_on_stack (size)) + gfc_add_expr_to_block (pre, + fold_build1_loc (input_location, + DECL_EXPR, TREE_TYPE (tmp), + tmp)); tmp = gfc_build_addr_expr (NULL_TREE, tmp); gfc_conv_descriptor_data_set (pre, desc, tmp); } *** gfc_trans_auto_array_allocation (tree de *** 4759,4767 { stmtblock_t init; tree type; ! tree tmp; tree size; tree offset; bool onstack; gcc_assert (!(sym->attr.pointer || sym->attr.allocatable)); --- 4769,4779 { stmtblock_t init; tree type; ! tree tmp = NULL_TREE; tree size; tree offset; + tree space; + tree inittree; bool onstack; gcc_assert (!(sym->attr.pointer || sym->attr.allocatable)); *** gfc_trans_auto_array_allocation (tree de *** 4818,4832 return; } ! /* The size is the number of elements in the array, so multiply by the ! size of an element to get the total size. */ ! tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type)); !
Re: [PATCH 1/4] Remove usesess and wrong code from ipa_analyze_virtual_call_uses
On Fri, 15 Apr 2011, Martin Jambor wrote: > Hi, > > ipa_analyze_virtual_call_uses contains code that was meant to deal > with situation where OBJ_TYPE_REF_OBJECT is a (number of) > COMPONENT_REFs on top of a dereferenced default definition SSA_NAME of > a parameter. > > The code is useless because that never happens in the IL, if an > ancestor object of a parameter is being used for a virtual call, the > object in the expression is always an SSA_NAME which is assigned the > proper value in a previous statement. > > Moreover, if it ever triggered, it might lead to wrong code because in > cases like this it is necessary also to store the offset within the > parameter into the indirect call graph edge information (like we do in > indirect inlining). > > The above is done in the next patch in the series. I've split this > part from it because I would like to commit it also to the 4.6 branch. > I have bootstrapped and tested this on x86-64-linux without any > problems. OK for trunk and the 4.6 branch? Ok for both. Thanks, Richard. > Thanks, > > Martin > > > 2011-04-08 Martin Jambor > > * ipa-prop.c (ipa_analyze_virtual_call_uses): Remove handling > of ADR_EXPRs. > > > Index: src/gcc/ipa-prop.c > === > --- src.orig/gcc/ipa-prop.c > +++ src/gcc/ipa-prop.c > @@ -1383,18 +1383,6 @@ ipa_analyze_virtual_call_uses (struct cg >if (!flag_devirtualize) > return; > > - if (TREE_CODE (obj) == ADDR_EXPR) > -{ > - do > - { > - obj = TREE_OPERAND (obj, 0); > - } > - while (TREE_CODE (obj) == COMPONENT_REF); > - if (TREE_CODE (obj) != MEM_REF) > - return; > - obj = TREE_OPERAND (obj, 0); > -} > - >if (TREE_CODE (obj) != SSA_NAME >|| !SSA_NAME_IS_DEFAULT_DEF (obj)) > return; > > -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
Re: More of ipa-inline housekeeping
AFAICT revision 172430 fixed the original problem in pr45810: gfc -Ofast -fwhole-program fatigue.f90 : 6.301u 0.003s 0:06.30 gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26 However if I play with --param max-inline-insns-auto=*, I get gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays fatigue.f90 : 4.870u 0.002s 0:04.87 gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays fatigue.f90 : 2.872u 0.002s 0:02.87 and gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97 gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73 while I get the same threshold=125 with/without -flto at revision 172429. Note that I get the same thresholds without -fstack-arrays, the run times are only larger. Dominique
Re: Implement stack arrays even for unknown sizes
On 04/15/2011 07:28 AM, Michael Matz wrote: --- snip --- I'll make the DECL_EXPR conditional on the size being variable. As Tobias already okayed the patch I'm planning to check in the slightly modified variant as below, after a new round of testing. Thats A-OK Thanks, Jerry
Re: More of ipa-inline housekeeping
I have forgotten to say that the 125 and 516 thresholds are for x86_64-apple-darwin10. On powerpc-apple-darwin9 they are repsectively 172 and 638. Dominique
[PATCH, SMS] Avoid unfreed memory when SMS fails
Hello, This patch fixes the scenario where SMS fails to schedule a loop and continue to the next one without freeing data structures allocated while scheduling the first loop. Bootstrap and regtested on ppc64-redhat-linux. OK for mainline? Thanks, Revital Changelog: * modulo-sched.c (sms_schedule): Avoid unfreed memory when SMS fails. Index: modulo-sched.c === --- modulo-sched.c (revision 170464) +++ modulo-sched.c (working copy) @@ -1177,7 +1177,6 @@ sms_schedule (void) fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count); fprintf (dump_file, ")\n"); } - continue; } else {
[PATCH, SMS] Free sccs field
Hello, The attached patch adds missing free operation for storage allocated while calculating SCCs. Bootstrap and regtested on ppc64-redhat-linux. OK for mainline? Thanks, Revital Changelog: * ddg.c (free_ddg_all_sccs): Free sccs field in struct ddg_all_sccs. Index: ddg.c === --- ddg.c (revision 171573) +++ ddg.c (working copy) @@ -1011,6 +1082,8 @@ free_ddg_all_sccs (ddg_all_sccs_ptr all_ for (i = 0; i < all_sccs->num_sccs; i++) free_scc (all_sccs->sccs[i]); + if (all_sccs->sccs) +free (all_sccs->sccs); free (all_sccs); }
Re: [PATCH 2/4] Handle calls to ancestor objects in IPA-CP devirtualization
On Fri, 15 Apr 2011, Martin Jambor wrote: > Hi, > > early inlining can create virtual calls based on the part of an object > that represents an ancestor. This patch makes ipa-prop analysis able > to recognize such calls and store the required offset along with such > calls (the field is already there for similar purposes of indirect > inlining). The constant propagation is then made aware of the offset > field and takes it into account when looking up the proper BINFO. > > Bootstrapped and tested on x86_64-linux. OK for trunk? > > Thanks, > > Martin > > > > 2011-04-13 Martin Jambor > > * ipa-cp.c (ipcp_process_devirtualization_opportunities): Take into > account anc_offset and otr_type from the indirect edge info. > * ipa-prop.c (get_ancestor_addr_info): New function. > (compute_complex_ancestor_jump_func): Assignment analysis moved to > get_ancestor_addr_info, call it. > (ipa_note_param_call): Do not initialize information about polymorphic > calls, return the indirect call graph edge. Remove the last > parameter, adjust all callers. > (ipa_analyze_virtual_call_uses): Process also calls to ancestors of > parameters. Initialize polymorphic information in the indirect edge. > > * testsuite/g++.dg/ipa/devirt-7.C: New test. > > > Index: src/gcc/ipa-cp.c > === > --- src.orig/gcc/ipa-cp.c > +++ src/gcc/ipa-cp.c > @@ -1246,8 +1246,8 @@ ipcp_process_devirtualization_opportunit >for (ie = node->indirect_calls; ie; ie = next_ie) > { >int param_index, types_count, j; > - HOST_WIDE_INT token; > - tree target, delta; > + HOST_WIDE_INT token, anc_offset; > + tree target, delta, otr_type; > >next_ie = ie->next_callee; >if (!ie->indirect_info->polymorphic) > @@ -1259,14 +1259,23 @@ ipcp_process_devirtualization_opportunit > continue; > >token = ie->indirect_info->otr_token; > + anc_offset = ie->indirect_info->anc_offset; > + otr_type = ie->indirect_info->otr_type; >target = NULL_TREE; >types_count = VEC_length (tree, info->params[param_index].types); >for (j = 0; j < types_count; j++) > { > tree binfo = VEC_index (tree, info->params[param_index].types, j); > - tree d; > - tree t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); > + tree d, t; > > + binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); > + if (!binfo) > + { > + target = NULL_TREE; > + break; > + } > + > + t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); > if (!t) > { > target = NULL_TREE; > Index: src/gcc/ipa-prop.c > === > --- src.orig/gcc/ipa-prop.c > +++ src/gcc/ipa-prop.c > @@ -576,6 +576,49 @@ compute_complex_assign_jump_func (struct > } > } > > +/* Extract the base, offset and MEM_REF expression from a statement ASSIGN if > + it looks like: > + > + iftmp.1_3 = &obj_2(D)->D.1762; > + > + The base of the MEM_REF must be a default definition SSA NAME of a > + parameter. Return NULL_TREE if it looks otherwise. If case of success, > the > + whole MEM_REF expression is returned and the offset calculated from any > + handled components and the MEM_REF itself is stored into *OFFSET. The > whole > + RHS stripped off the ADDR_EXPR is stored into *OBJ_P. */ > + > +static tree > +get_ancestor_addr_info (gimple assign, tree *obj_p, HOST_WIDE_INT *offset) > +{ > + HOST_WIDE_INT size, max_size; > + tree expr, parm, obj; > + > + if (!gimple_assign_single_p (assign)) > +return NULL_TREE; > + expr = gimple_assign_rhs1 (assign); > + > + if (TREE_CODE (expr) != ADDR_EXPR) > +return NULL_TREE; > + expr = TREE_OPERAND (expr, 0); > + obj = expr; > + expr = get_ref_base_and_extent (expr, offset, &size, &max_size); > + > + if (TREE_CODE (expr) != MEM_REF > + /* If this is a varying address, punt. */ > + || max_size == -1 > + || max_size != size) > +return NULL_TREE; > + parm = TREE_OPERAND (expr, 0); > + if (TREE_CODE (parm) != SSA_NAME > + || !SSA_NAME_IS_DEFAULT_DEF (parm) Might be an uninitialized variable, so also check TREE_CODE (SSA_NAME_VAR (parm)) == PARM_DECL? > + || *offset < 0) Check this above where you check max_size/size. > +return NULL_TREE; > + > + *offset += mem_ref_offset (expr).low * BITS_PER_UNIT; At some point it might be worth switching to get_addr_base_and_unit_offsets and not use bit but unit offsets throughout the code. > + *obj_p = obj; > + return expr; > +} > + > > /* Given that an actual argument is an SSA_NAME that is a result of a phi > statement PHI, try to find out whether NAME is in fact a > @@ -603,7 +646,7 @@ compute_complex_ancestor_jump_func (stru > st
[PATCH, SMS] New flag to apply SMS when SC equals 1
Hello, The attached patch introduces a new flag to allow applying SMS when stage count (SC) also equals 1. Currently, SMS is applied only when SC greater than 1 as stage count of 1 means that there is no interleaving between iterations and the scheduling passes do the job in this case. The new flag is introduced for debugging purposes to apply SMS on more loops. Bootstrap and regtest on ppc64-redhat-linux. OK for mainline? Thanks, Revital Changelog: * common.opt (fmodulo-sched-allow-sc-one): New flag. * modulo-sched.c (sms_schedule): Allow SMS when stage count equals one and -fmodulo-sched-allow-sc-one flag is set. === modified file 'gcc/common.opt' --- gcc/common.opt 2011-03-06 00:38:13 + +++ gcc/common.opt 2011-04-10 11:46:08 + @@ -1395,6 +1395,10 @@ fmodulo-sched-allow-regmoves Common Report Var(flag_modulo_sched_allow_regmoves) Perform SMS based modulo scheduling with register moves allowed +fmodulo-sched-allow-sc-one +Common Report Var(flag_modulo_sched_allow_sc_one) +Perform SMS based modulo scheduling also when stage count equals one + fmove-loop-invariants Common Report Var(flag_move_loop_invariants) Init(1) Optimization Move loop invariant computations out of loops === modified file 'gcc/modulo-sched.c' --- gcc/modulo-sched.c 2011-03-27 07:11:08 + +++ gcc/modulo-sched.c 2011-04-10 11:45:17 + @@ -1223,8 +1223,10 @@ sms_schedule (void) } /* Stage count of 1 means that there is no interleaving between - iterations, let the scheduling passes do the job. */ - if (stage_count <= 1 + iterations, let the scheduling passes do the job unless +-fmodulo-sched-allow-sc-one flag is set. */ + if ((!flag_modulo_sched_allow_sc_one && (stage_count == 1)) + || (stage_count < 1) || (count_init && (loop_count <= stage_count)) || (flag_branch_probabilities && (trip_count <= stage_count))) {
Re: [PATCH 3/4] Simple relaxation of dynamic type change detection routine
On Fri, 15 Apr 2011, Martin Jambor wrote: > Hi, > > in order to speed up astar, I had to persuade the function that > decides whether a statement potentially modifies the dynamic type of > an object by storing a new value to the VMT pointer to consider the > following statement harmless (all types are integers of some sort): > > MEM[(i32 *)b2arp_3(D) + 8B] = 0; > > I'd like to experiment with this routine a bit more once I have some > other IPA-CP infrastructure in place but at the moment I opted for a > simple solution: All scalar non-pointer stores are deemed safe. > > VMT pointer is a compiler generated field which is a pointer so legal > user code is not able to store stuff there through some fancy type > casts and compiler generated code should have no reason whatsoever to > that either. Therefore I believe this change is safe and useful. > > I have bootstrapped and tested the patch on x886_64-linux. OK for > trunk? I think this should be only done for -fstrict-aliasing. Ok with that change. Richard. > Thanks, > > Martin > > > > 2011-04-14 Martin Jambor > > * ipa-prop.c (stmt_may_be_vtbl_ptr_store): Return false for scalar > non-pointer assignments. > > Index: src/gcc/ipa-prop.c > === > --- src.orig/gcc/ipa-prop.c > +++ src/gcc/ipa-prop.c > @@ -405,13 +405,18 @@ stmt_may_be_vtbl_ptr_store (gimple stmt) > { >tree lhs = gimple_assign_lhs (stmt); > > - if (TREE_CODE (lhs) == COMPONENT_REF > - && !DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1)) > - && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) > + if (!AGGREGATE_TYPE_P (TREE_TYPE (lhs))) > + { > + if (!POINTER_TYPE_P (TREE_TYPE (lhs))) > return false; > - /* In the future we might want to use get_base_ref_and_offset to find > - if there is a field corresponding to the offset and if so, proceed > - almost like if it was a component ref. */ > + > + if (TREE_CODE (lhs) == COMPONENT_REF > + && !DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1))) > + return false; > + /* In the future we might want to use get_base_ref_and_offset to find > + if there is a field corresponding to the offset and if so, proceed > + almost like if it was a component ref. */ > + } > } >return true; > } > > -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
Re: [PATCH 4/4] Devirtualization based on global objects
On Fri, 15 Apr 2011, Martin Jambor wrote: > Hi, > > this is the patch that actually speeds up astar (together with the > previous one). It implements devirtualization based on global > objects. In the past we have refrained from doing this because in > general it is difficult to say whether the object is currently being > constructed and so it might have a dynamic type of one of its > ancestors. However, if the object's class does not have any ancestors > that obviously cannot happen. > > Devirtualizing in such conditions is enough to change a virtual call > to regwayobj::isaddtobound in 473.astar to a direct one which can and > should be inlined. That seemed a good justification to implement this > and so the patch below does so and brings about 3.1% speedup for that > benchmark with LTO. > > I acknowledge that instead of discarding all classes with ancestors it > would be better to check that the called virtual method has the same > implementation in all ancestors instead. That is perhaps something > for later. > > It took me surprisingly long to realize that this technique can be > used for folding virtual calls based on local automatically allocated > objexts too and so can be used to un-XFAIL g++.dg/opt/devirt1.c that > regressed in 4.6. > > Bootstrapped and tested on x86_64-linux. OK for trunk? > > Thanks, > > Martin > > > 2011-04-15 Martin Jambor > > * ipa-cp.c (ipcp_process_devirtualization_opportunities): Devirtualize > also according to actual contants. > * gimple-fold.c (gimple_extract_devirt_binfo_from_cst): New function. > (gimple_fold_obj_type_ref_call): New function. > (gimple_fold_call): Call gimple_fold_obj_type_ref_call on > OBJ_TYPE_REFs. > * gimple.h (gimple_extract_devirt_binfo_from_cst): Declare. > > * testsuite/g++.dg/opt/devirt1.C: Bump to -O2, remove XFAIL. > * testsuite/g++.dg/opt/devirt2.C: New test. > * testsuite/g++.dg/ipa/devirt-g-1.C: Likewise. > > > Index: src/gcc/ipa-cp.c > === > --- src.orig/gcc/ipa-cp.c > +++ src/gcc/ipa-cp.c > @@ -1245,51 +1245,71 @@ ipcp_process_devirtualization_opportunit > >for (ie = node->indirect_calls; ie; ie = next_ie) > { > - int param_index, types_count, j; > + int param_index; >HOST_WIDE_INT token, anc_offset; >tree target, delta, otr_type; > + struct ipcp_lattice *lat; > >next_ie = ie->next_callee; >if (!ie->indirect_info->polymorphic) > continue; >param_index = ie->indirect_info->param_index; > - if (param_index == -1 > - || ipa_param_cannot_devirtualize_p (info, param_index) > - || ipa_param_types_vec_empty (info, param_index)) > + if (param_index == -1) > continue; > > + lat = ipcp_get_lattice (info, param_index); >token = ie->indirect_info->otr_token; >anc_offset = ie->indirect_info->anc_offset; >otr_type = ie->indirect_info->otr_type; >target = NULL_TREE; > - types_count = VEC_length (tree, info->params[param_index].types); > - for (j = 0; j < types_count; j++) > + if (lat->type == IPA_CONST_VALUE) > { > - tree binfo = VEC_index (tree, info->params[param_index].types, j); > - tree d, t; > - > + tree binfo = gimple_extract_devirt_binfo_from_cst (lat->constant); > + if (!binfo) > + continue; > binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); > if (!binfo) > - { > - target = NULL_TREE; > - break; > - } > + continue; > + target = gimple_get_virt_mehtod_for_binfo (token, binfo, &delta, > + false); > + } > + else > + { > + int types_count, j; > > - t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); > - if (!t) > - { > - target = NULL_TREE; > - break; > - } > - else if (!target) > - { > - target = t; > - delta = d; > - } > - else if (target != t || !tree_int_cst_equal (delta, d)) > + if (ipa_param_cannot_devirtualize_p (info, param_index) > + || ipa_param_types_vec_empty (info, param_index)) > + continue; > + > + types_count = VEC_length (tree, info->params[param_index].types); > + for (j = 0; j < types_count; j++) > { > - target = NULL_TREE; > - break; > + tree binfo = VEC_index (tree, info->params[param_index].types, j); > + tree d, t; > + > + binfo = get_binfo_at_offset (binfo, anc_offset, otr_type); > + if (!binfo) > + { > + target = NULL_TREE; > + break; > + } > + > + t = gimple_get_virt_mehtod_for_binfo (token, binfo, &d, true); > + if (!t) > + { > + targ
Re: [PATCH, SMS] Avoid unfreed memory when SMS fails
On Fri, Apr 15, 2011 at 5:26 PM, Revital Eres wrote: > Hello, > > This patch fixes the scenario where SMS fails to > schedule a loop and continue to the next one without > freeing data structures allocated while scheduling > the first loop. > > Bootstrap and regtested on ppc64-redhat-linux. > > OK for mainline? Ok. Thanks, Richard. > Thanks, > Revital > > Changelog: > > * modulo-sched.c (sms_schedule): Avoid unfreed memory when SMS fails. > > Index: modulo-sched.c > === > --- modulo-sched.c (revision 170464) > +++ modulo-sched.c (working copy) > @@ -1177,7 +1177,6 @@ sms_schedule (void) > fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count); > fprintf (dump_file, ")\n"); > } > - continue; > } > else > { >
Re: [PATCH, SMS] Free sccs field
On Fri, Apr 15, 2011 at 5:27 PM, Revital Eres wrote: > Hello, > > The attached patch adds missing free operation for storage > allocated while calculating SCCs. > > Bootstrap and regtested on ppc64-redhat-linux. > > OK for mainline? Ok. Thanks, Richard. > Thanks, > Revital > > Changelog: > > * ddg.c (free_ddg_all_sccs): Free sccs field in struct ddg_all_sccs. > > Index: ddg.c > === > --- ddg.c (revision 171573) > +++ ddg.c (working copy) > @@ -1011,6 +1082,8 @@ free_ddg_all_sccs (ddg_all_sccs_ptr all_ > for (i = 0; i < all_sccs->num_sccs; i++) > free_scc (all_sccs->sccs[i]); > > + if (all_sccs->sccs) > + free (all_sccs->sccs); > free (all_sccs); > } >
Re: [PATCH, SMS] New flag to apply SMS when SC equals 1
On Fri, Apr 15, 2011 at 5:28 PM, Revital Eres wrote: > Hello, > > The attached patch introduces a new flag to allow applying SMS when > stage count (SC) also equals 1. > Currently, SMS is applied only when SC greater than 1 as stage count > of 1 means that there is no interleaving between iterations and the > scheduling passes do the job in this case. > The new flag is introduced for debugging purposes to apply SMS > on more loops. > > Bootstrap and regtest on ppc64-redhat-linux. > > OK for mainline? If it's for debugging, can you use a --parm instead (like modulo-sched-min-sc or similar)? Thanks, Richard. > Thanks, > Revital > > Changelog: > > * common.opt (fmodulo-sched-allow-sc-one): New flag. > * modulo-sched.c (sms_schedule): Allow SMS when stage count > equals one and -fmodulo-sched-allow-sc-one flag is set. >
Re: [PATCH, SMS] Free sccs field
On Fri, Apr 15, 2011 at 06:27:05PM +0300, Revital Eres wrote: > + if (all_sccs->sccs) > +free (all_sccs->sccs); No need to check for non-NULL prior to free'ing. -Nathan
Re: [Patch, Fortran] PR 18918: Build + install libcaf_single.a
On Wed, Apr 13, 2011 at 17:08, Tobias Burnus wrote: > Hello all, hi Ralf and Jorge, > > The attached patch causes libgfortran/ also to build > libgfortran/caf/single.c, which will be installed as > $PREFIX/.../target-triplet/gcc-version/libcaf_single.a. > > Build and tested on x86-64-linux. > OK for the trunk? Ok. -- Janne Blomqvist
Re: [PATCH 3/6] Allow jumps in epilogues
On 04/13/2011 02:38 PM, Bernd Schmidt wrote: > This still requires the i386 output_set_got which I think I can cope > with [...] Patch below, to be applied on top of all the others. Only lightly tested so far beyond standard (fairly useless) regression tests, by comparing generated assembly before/after, for -fpic -march=pentium and core2, for i686-linux and i686-apple-darwin10 (for TARGET_MACHO). I've not found or managed to create a testcase for making MI thunks generate a call to output_set_got. I think it should work, but it's not tested. Bernd * config/i386/i386.c (output_set_got): Don't call dwarf2out_flush_queued_reg_saves. (ix86_reorg): Split set_got patterns. (i386_dwarf_handle_frame_unspec, i386_dwarf_flush_queued_register_saves): New static functions. (TARGET_DWARF_HANDLE_FRAME_UNSPEC, TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES): New macros. * config/i386/i386.md (set_got_call, set_got_pop, set_got_add): New patterns. * dwarf2out.c (scan_until_barrier): Use the target's dwarf_flush_queued_register_saves hook. * target.def (dwarf_flush_queued_register_saves): New hook. * doc/tm.texi: Regenerate. Index: gcc/config/i386/i386.c === --- gcc.orig/config/i386/i386.c +++ gcc/config/i386/i386.c @@ -8953,6 +8953,8 @@ output_set_got (rtx dest, rtx label ATTR output_asm_insn ("mov%z0\t{%2, %0|%0, %2}", xops); else { + /* For normal functions, this pattern is split, but we can still +get here for thunks. */ output_asm_insn ("call\t%a2", xops); #ifdef DWARF2_UNWIND_INFO /* The call to next label acts as a push. */ @@ -9010,12 +9012,6 @@ output_set_got (rtx dest, rtx label ATTR get_pc_thunk_name (name, REGNO (dest)); pic_labels_used |= 1 << REGNO (dest); -#ifdef DWARF2_UNWIND_INFO - /* Ensure all queued register saves are flushed before the -call. */ - if (dwarf2out_do_frame ()) - dwarf2out_flush_queued_reg_saves (); -#endif xops[2] = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (name)); xops[2] = gen_rtx_MEM (QImode, xops[2]); output_asm_insn ("call\t%X2", xops); @@ -30458,6 +30454,56 @@ ix86_reorg (void) with old MDEP_REORGS that are not CFG based. Recompute it now. */ compute_bb_for_insn (); + /* Split any set_got patterns so that we interact correctly with + dwarf2out. */ + if (!TARGET_64BIT && !TARGET_VXWORKS_RTP && !TARGET_DEEP_BRANCH_PREDICTION + && flag_pic) +{ + rtx insn, next; + for (insn = get_insns (); insn; insn = next) + { + rtx pat, label, dest, cst, gotsym, new_insn; + int icode; + + next = NEXT_INSN (insn); + if (!NONDEBUG_INSN_P (insn)) + continue; + + icode = recog_memoized (insn); + if (icode != CODE_FOR_set_got && icode != CODE_FOR_set_got_labelled) + continue; + + extract_insn (insn); + if (icode == CODE_FOR_set_got) + { + label = gen_label_rtx (); + cst = const0_rtx; + } + else + { + label = recog_data.operand[1]; + cst = const1_rtx; + } + + dest = recog_data.operand[0]; + pat = gen_set_got_call (label, cst); + new_insn = emit_insn_before (pat, insn); + RTX_FRAME_RELATED_P (new_insn) = 1; + RTX_FRAME_RELATED_P (XVECEXP (PATTERN (new_insn), 0, 1)) = 1; + gotsym = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME); + pat = gen_set_got_pop (dest, label); + new_insn = emit_insn_before (pat, insn); + RTX_FRAME_RELATED_P (new_insn) = 1; + RTX_FRAME_RELATED_P (XVECEXP (PATTERN (new_insn), 0, 1)) = 1; + if (!TARGET_MACHO) + { + pat = gen_set_got_add (dest, gotsym, label); + new_insn = emit_insn_before (pat, insn); + } + delete_insn (insn); + } +} + if (optimize && optimize_function_for_speed_p (cfun)) { if (TARGET_PAD_SHORT_FUNCTION) @@ -30475,6 +30521,30 @@ ix86_reorg (void) move_or_delete_vzeroupper (); } +/* Handle the TARGET_DWARF_HANDLE_FRAME_UNSPEC hook. + This is called from dwarf2out.c to emit call frame instructions + for frame-related insns containing UNSPECs and UNSPEC_VOLATILEs. */ +static void +i386_dwarf_handle_frame_unspec (rtx pattern ATTRIBUTE_UNUSED, + int index ATTRIBUTE_UNUSED) +{ + gcc_assert (index == UNSPEC_SET_GOT); +} + +/* Handle the TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES hook. + This is called from dwarf2out.c to decide whether all queued + register saves should be emitted before INSN. */ +static bool +i386_dwarf_flush_queued_register_saves (rtx insn) +{ + if (!TARGET_VXWORKS_RTP || !flag_pic) +{ + int icode = recog_memoized (insn); + return (
[patch, libgfortran] PR48589 Invalid G0/G0.d editing for NaN/infinity
Hi, I plan to commit the following simple and obvious patch. Regression tested on x86-64-linux-gnu. I will apply the test case in the PR to the testsuite. Regards, Jerry 2011-04-15 Jerry DeLisle PR libgfortran/48589 * io/write_float.def (write_infnan): Set width properly for G0. Index: write_float.def === --- write_float.def (revision 172467) +++ write_float.def (working copy) @@ -654,11 +654,11 @@ write_infnan (st_parameter_dt *dtp, const fnode *f mark = (sign == S_PLUS || sign == S_MINUS) ? 8 : 7; nb = f->u.real.w; - + /* If the field width is zero, the processor must select a width not zero. 4 is chosen to allow output of '-Inf' or '+Inf' */ - if (nb == 0) + if ((nb == 0) || dtp->u.p.g0_no_blanks) { if (isnan_flag) nb = 3;
Re: patch pings
On 04/15/2011 04:18 PM, Jeff Law wrote: > http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02247.html I don't know. I sympathize with the goal, but I'm not too happy about the structure of this patch. Doesn't this do the scan once for every reload in an insn? It seems to me like the loop (or rather, a function call to it) should at least be placed at the top of choose_reload_regs rather than in allocate_reload_reg. In my tests (i686-linux, compiling my standard set of testcases) the patch appears to have, in general, very little effect on code quality. It might on average be slightly better, but I've also seen several cases where we do worse. If we go to so much effort to do scanning of subsequent insns it ought to be possible to do better. AFAICT the patch ignores whether the pseudo that's being reloaded will be reloaded again in the current ebb - if not, it should get a "bad" spill register, where "bad" in this case only includes hard regs that don't currently hold a spilled pseudo. Likewise, once we've reloaded the last occurrence of a pseudo in an ebb, we should forget the reg_reloaded_contents to make sure we consider the spill reg good again. However, I'm not sure whether changes such as these alone will reduce the number of cases where the patch regresses. Bernd
Re: FDO usability patch -- cfg and lineno checksum
Resent in plain text mode .. David On Fri, Apr 15, 2011 at 9:28 AM, Xinliang David Li wrote: > > Honza, do you have a chance to take a look at it? > Thanks, > > David > > On Wed, Apr 13, 2011 at 3:25 PM, Xinliang David Li wrote: >> >> Hi, in current FDO implementation, the source file version used in >> profile-generate needs to strictly match the version used in >> profile-use -- simple formating changes will invalidate the profile >> data (use of it will lead to compiler error or ICE). There are two >> main problems that lead to the weakness: >> >> 1) the function checksum is computed based on line number and number >> of basic blocks -- this means any line number change will invalidate >> the profile data. In fact, line number should play very minimal role >> in profile matching decision. Another problem is that number of basic >> blocks is also not a good indicator whether CFG matches or not. >> 2) cgraph's pid is used as the key to find the matching profile data >> for a function. If there is any new function added, or if the order of >> the functions changes, the profile data is invalidated. >> >> The attached is a patch from google that improves this. >> 1) function checksum is split into two parts: lineno checksum and cfg >> checksum >> 2) function assembler name is used in profile hashing. >> >> Bootstrapped and regression tested on x86_64/linux. >> >> Ok for trunk? >> >> Thanks, >> >> David >> >> 2011-04-13 Xinliang David Li >> >> * tree.c (crc32_byte): New function. >> * tree.h (crc32_byte): New function. >> * gcov.c (function_info): Add new fields. >> (read_graph_file): Handle new fields. >> (read_count_file): Handle name. >> * gcov-io.c (gcov_string_length): New function. >> (gcov_write_words): Bug fix. >> (gcov_read_words): Bug fix. >> * gcov-io.h: New interfaces. >> * profile.c (get_exec_counts): New parameter. >> (compute_branch_probablilities): New parameter. >> (compute_value_histogram): New parameter. >> (branch_prob): Pass cfg_checksum. >> * coverage.c (get_const_string_type): New function. >> (hash_counts_entry_hash): Use string hash. >> (hash_counts_entry_eq): Use string compare. >> (htab_counts_entry_del): Delete name. >> (read_count_file): Add handling of cfg checksum. >> (get_coverage_counts): New parameter. >> (xstrdup_mask_random): New function. >> (coverage_compute_lineno_checksum): New function. >> (coverage_compute_cfg_checksum): New function. >> (coverage_begin_output): New parameter. >> (coverage_end_function): New parameter. >> (build_fn_info_type): Build new fields. >> (build_fn_info_value): Build new field values. >> * coverage.h: Interface changes. >> * libgcov.c (gcov_exit): Dump new fields. >> * gcov_dump.c (tag_function): Print new fields. >
Re: [PATCH] Improve combining of conditionals
On Apr 15, 2011, at 3:44 AM, Eric Botcazou wrote: >> And make IN_COND a bool instead of an int? > > I had the same initial reaction but then came to the same conclusion as Maxim. If it can be bool, it should be bool.
Re: [PATCH, SMS] Free sccs field
Hello, On 15 April 2011 18:53, Nathan Froyd wrote: > On Fri, Apr 15, 2011 at 06:27:05PM +0300, Revital Eres wrote: >> + if (all_sccs->sccs) >> + free (all_sccs->sccs); > > No need to check for non-NULL prior to free'ing. OK, I'll commit the patch without the check then. (after re-testing) Thanks, Revital > > -Nathan > >
Re: [build] Allow building libobjc_gc on Tru64 UNIX, Darwin
Nicola, >> Ok for mainline if both pass? > > Yes. > > [and by the way, I think you're fixing PR libobjc/32037 in the process. :-)] indeed, thanks for checking. Testing revealed that I'd been lazy with quoting. I'm including the patch I've installed, which encloses OBJC_BOEHM_GC in configure.ac in single quotes. >> Btw., it would be considerably easier if --enable-libobjc-gc could be >> enabled automatically if boehm-gc is configured. > > Yes. I've filed libobjc/48626 --enable-objc-gc should be automatic >> Besides, it seems that libobjc_gc isn't tested anywhere. > > Yes. and libobjc/48627 libobjc_gc should be tested just in case. Rainer 2011-04-13 Rainer Orth PR libobjc/32037 * Makefile.in (OBJC_GCFLAGS): Move ... * configure.ac (enable_objc_gc): ... here. Add $(libsuffix) to OBJC_BOEHM_GC. * configure: Regenerate. diff --git a/libobjc/Makefile.in b/libobjc/Makefile.in --- a/libobjc/Makefile.in +++ b/libobjc/Makefile.in @@ -93,7 +93,7 @@ LIBTOOL_INSTALL = $(LIBTOOL) --mode=inst LIBTOOL_CLEAN = $(LIBTOOL) --mode=clean #LIBTOOL_UNINSTALL = $(LIBTOOL) --mode=uninstall -OBJC_GCFLAGS=-DOBJC_WITH_GC=1 +OBJC_GCFLAGS=@OBJC_GCFLAGS@ OBJC_BOEHM_GC=@OBJC_BOEHM_GC@ OBJC_BOEHM_GC_INCLUDES=@OBJC_BOEHM_GC_INCLUDES@ OBJC_BOEHM_GC_LIBS=../boehm-gc/libgcjgc_convenience.la $(thread_libs_and_flags) diff --git a/libobjc/configure.ac b/libobjc/configure.ac --- a/libobjc/configure.ac +++ b/libobjc/configure.ac @@ -1,6 +1,6 @@ # Process this file with autoconf to produce a configure script. # Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004 -# 2005, 2006, 2009 Free Software Foundation, Inc. +# 2005, 2006, 2009, 2011 Free Software Foundation, Inc. # Originally contributed by Dave Love (d.l...@dl.ac.uk). # #This file is part of GCC. @@ -63,15 +63,25 @@ AC_ARG_ENABLE(objc-gc, the GNU Objective-C runtime.], [case $enable_objc_gc in no) +OBJC_GCFLAGS='' OBJC_BOEHM_GC='' OBJC_BOEHM_GC_INCLUDES='' ;; *) -OBJC_BOEHM_GC=libobjc_gc.la +OBJC_GCFLAGS='-DOBJC_WITH_GC=1' +OBJC_BOEHM_GC='libobjc_gc$(libsuffix).la' OBJC_BOEHM_GC_INCLUDES='-I$(top_srcdir)/../boehm-gc/include -I../boehm-gc/include' +case "${host}" in + alpha*-dec-osf*) +# boehm-gc headers include , which needs to be compiled + # with -pthread on Tru64 UNIX. +OBJC_GCFLAGS="${OBJC_GCFLAGS} -pthread" + ;; +esac ;; esac], -[OBJC_BOEHM_GC=''; OBJC_BOEHM_GC_INCLUDES='']) +[OBJC_GCFLAGS=''; OBJC_BOEHM_GC=''; OBJC_BOEHM_GC_INCLUDES='']) +AC_SUBST(OBJC_GCFLAGS) AC_SUBST(OBJC_BOEHM_GC) AC_SUBST(OBJC_BOEHM_GC_INCLUDES) -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[v3] Handle NOTY in extract_symvers.pl
While testing the close-to-final version of my COMDAT-group-with-Sun as patch [build, c++, lto] Support COMDAT group with Sun as (PR target/40483) http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01365.html http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00600.html which only awaits the release of a linker with the required fixes, I noticed that extract_symvers.pl doesn't handly NOTY/NOTYPE entries, but chokes instead, whereas the readelf based version handles them just fine. Although the final patch won't create them anymore, I'm installing the following patch that makes the script more robust. It was tested while putting the finishing touches on the as-comdat patch. Installed on mainline. Rainer 2011-04-04 Rainer Orth * scripts/extract_symvers.pl: Handle NOTY. diff --git a/libstdc++-v3/scripts/extract_symvers.pl b/libstdc++-v3/scripts/extract_symvers.pl --- a/libstdc++-v3/scripts/extract_symvers.pl +++ b/libstdc++-v3/scripts/extract_symvers.pl @@ -1,6 +1,6 @@ #!/usr/bin/perl -w -# Copyright (C) 2010 Free Software Foundation, Inc. +# Copyright (C) 2010, 2011 Free Software Foundation, Inc. # # This file is part of the GNU ISO C++ Library. This library is free # software; you can redistribute it and/or modify it under the @@ -108,6 +108,7 @@ while () { die "unhandled symbol:\n$_" if ($bind !~ /^(GLOB|WEAK)/ or $oth !~ /[DP]/); # Adapt to readelf type naming convention. +$type = "NOTYPE" if ($type eq "NOTY"); $type = "OBJECT" if ($type eq "OBJT"); # Use correct symbol type. @@ -116,7 +117,7 @@ while () { close ELFDUMP or die "elfdump error"; foreach $symbol (keys %type) { -if ($type{$symbol} eq "FUNC") { +if ($type{$symbol} eq "FUNC" || $type{$symbol} eq "NOTYPE") { push @lines, "$type{$symbol}:$symbol\@\@$version{$symbol}\n"; } elsif ($type{$symbol} eq "OBJECT" and $size{$symbol} == 0) { push @lines, "$type{$symbol}:$size{$symbol}:$version{$symbol}\n"; -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] doubled words
On Apr 15, 2011, at 1:18 AM, Jim Meyering wrote: > While most of these are in comments, the corrections > in gcc/tree-cfg.c and gcc/config/sh/constraints.md are in strings. > The former at least is marked for translation, and hence appears > in every .po file. I think these are obvious. >
[libjava, testsuite] Link jni tests with -liconv on Tru64 UNIX
While checking Tru64 UNIX libjava testsuite results, I noticed that the libjava.jni/invocation/PR16923.c execution test was failing like this: 277710:./PR16923: /sbin/loader: Error: libgcj.so.12: symbol "iconv" unresolved 277710:./PR16923: /sbin/loader: Fatal Error: Load of "./PR16923" failed: Unresolved symbol name FAIL: PR16923 run UNTESTED: PR16923 output This happens because the test isn't linked with gcj (which handles this in libgcj.spec), but with gcc, which does not. Perhaps this could (and should) be done, or rather libgcj.so always linked with -liconv on targets that require that, but I chose the easy way out for now, following the Darwin lead. Tested with the appropriate runtest invocation (which unfortunately means running all of jni.exp, not just the single testcase, installed on mainline, 4.5 and 4.6 branches. Rainer 2011-04-14 Rainer Orth * testsuite/libjava.jni/jni.exp (gcj_jni_get_cxxflags_invocation): Add $libiconv to cxxflags for alpha*-dec-osf*. diff --git a/libjava/testsuite/libjava.jni/jni.exp b/libjava/testsuite/libjava.jni/jni.exp --- a/libjava/testsuite/libjava.jni/jni.exp +++ b/libjava/testsuite/libjava.jni/jni.exp @@ -280,6 +280,11 @@ proc gcj_jni_get_cxxflags_invocation {} lappend cxxflags "-shared-libgcc" } + # Tru64 UNIX needs -liconv linked explicitly since gcc does the linking. + if { [istarget "alpha*-dec-osf*"] } { +lappend cxxflags $libiconv + } + return $cxxflags } -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix PR46399 - missing mode promotion for libcall args - updated
On 03/03/2011 05:32 PM, Andreas Krebbel wrote: > [PATCH] Fix PR46399 - missing mode promotion for libcall args > http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01595.html > > after removing the FUNCTION_VALUE target macro with: > > [Committed] S/390: Remove deprecated target macro FUNCTION_VALUE > http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00127.html > > And I've moved the documentation of the new target hook to target.def > as requested by Nathan. > > Bootstrapped on x86_64, s390 and s390x. No regressions. > > Two testcases fixed on s390x: > > < FAIL: gcc.dg/dfp/pr41049.c execution test > < FAIL: decimal/comparison.cc execution test The problem is quite real, it's just bitten us with some ARM changes. The fact that we don't have types for libcall function args is very unfortunate. Maybe that's something we need to change? It would be a much better fix, and it might not even be _that_ much work to add an optional type field to each optab entry, which could then be passed to emit_library_call - if it's nonnull, we'd use the normal promote_function_mode hook otherwise ignore the problem as before. This would allow us to solve the s390 problem reasonably quickly and lets us add more libcall types later. As to the existing patch, isn't fntype NULL throughout emit_library_call_value_1? It might make sense to lose that argument from the hook. I also doubt that the unsignedp bit is very useful without an argument type, but I'm at a loss to think of something sensible to do about it. Bernd
ObjC: rewritten checks for duplicate instance variables (for improved speed and improved error messages)
This patch rewrites the check for duplicate instance variables done by the Objective-C (and Objective-C++) compiler. The new code has many advantages: * it avoids copying all the instance variables into a temporary tree chain in the way that the all code was doing (according to some notes from benchmarks and profiling I did a few months ago, this should save about 5k allocations per compilation when compiling a typical ObjC GNUstep file, giving an approx 0.25% order of magnitude speedup with -fsyntax-only); * it provides better error messages by providing the informative message "previous declaration of ..." when a duplicate declaration is found; * it avoids producing duplicate errors if a class with duplicate instance variables is subclassed. The existing code always built a copy of the full list of instance variables for the class (including superclass ones) and then would look for duplicates. This means if you have a class with a duplicate, and subclass it 10 times, you'll get the same error message 11 times! The new code avoids this mistake, which also reduces the number of checks required (for the standard case without a hashtable, I'll explain in a minute). * it removes the special, ad-hoc field duplicate check implementation for ObjC++ and has it use the same code as the ObjC codebase. The new code does the checks for duplicates directly, by just iterating and comparing, unless the number of checks to do is very large, in which case it falls back to a hashtable. This is similar to what the C code does for structs, but here the check is more specific as we are only checking the instance variables in the current class against the other instance variables in the class and the superclass, so it's all more specific and optimized for the situation. Updated existing testcases (to check for the new "previous declaration of ..." messages), and added more testcases. Ok to commit ? Thanks Index: c-family/c-objc.h === --- c-family/c-objc.h (revision 172444) +++ c-family/c-objc.h (working copy) @@ -62,7 +62,7 @@ extern tree objc_build_string_object (tree); extern tree objc_get_protocol_qualified_type (tree, tree); extern tree objc_get_class_reference (tree); extern tree objc_get_class_ivars (tree); -extern tree objc_get_interface_ivars (tree); +extern bool objc_detect_field_duplicates (bool); extern void objc_start_class_interface (tree, tree, tree, tree); extern void objc_start_category_interface (tree, tree, tree, tree); extern void objc_start_protocol (tree, tree, tree); Index: c-family/ChangeLog === --- c-family/ChangeLog (revision 172444) +++ c-family/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-04-15 Nicola Pero + + * c-objc.h (objc_get_interface_ivars): Removed. + (objc_detect_field_duplicates): New. + * stub-objc.c: Likewise. + 2011-04-14 Nicola Pero * stub-objc.c (objc_declare_protocols): Renamed to Index: c-family/stub-objc.c === --- c-family/stub-objc.c(revision 172444) +++ c-family/stub-objc.c(working copy) @@ -275,10 +275,10 @@ objc_get_class_reference (tree ARG_UNUSED (name)) return 0; } -tree -objc_get_interface_ivars (tree ARG_UNUSED (fieldlist)) +bool +objc_detect_field_duplicates (bool ARG_UNUSED (check_superclasses_only)) { - return 0; + return false; } tree Index: objc/ChangeLog === --- objc/ChangeLog (revision 172444) +++ objc/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2011-04-15 Nicola Pero + + * objc-act.c (objc_get_interface_ivars): Removed. + (objc_detect_field_duplicates): New. + (hash_instance_variable): New. + (eq_instance_variable): New. + 2011-04-14 Nicola Pero * objc-act.c (objc_declare_protocols): Renamed to Index: objc/objc-act.c === --- objc/objc-act.c (revision 172444) +++ objc/objc-act.c (working copy) @@ -3813,6 +3813,8 @@ lookup_interface (tree ident) } } + + /* Implement @defs () within struct bodies. */ tree @@ -3829,19 +3831,242 @@ objc_get_class_ivars (tree class_name) return error_mark_node; } + +/* Functions used by the hashtable for field duplicates in + objc_detect_field_duplicates(). Ideally, we'd use a standard + key-value dictionary hashtable , and store as keys the field names, + and as values the actual declarations (used to print nice error + messages with the locations). But, the hashtable we are using only + allows us to store keys in the hashtable, without values (it looks + more like a set). So, we store the DECLs, but define equality as + DECLs having the same name, and hash as the hash of the n
[PATCH, PR 48601] cgraph_get_create_node in emultls.c
Hi, lower_emutls_function_body in emultls.c should use cgraph_get_create_node to create call graph nodes if need be because it is introducing TLS builtin decls into IL. I have bootstrapped and tested this on x86_64-linux (where the bug does not happen) and in bugzilla it has been confirmed it fixes the issue. It is rather straight-forward and was pre-approved on IRC by Honza so I am about to commit it now. Testcases are already in the testsuite. Thanks, Martin 2011-04-15 Martin Jambor PR middle-end/48601 * tree-emutls.c (lower_emutls_function_body): Call cgraph_get_create_node instead of cgraph_get_node. Do not assert the result is non-NULL. Index: src/gcc/tree-emutls.c === --- src.orig/gcc/tree-emutls.c +++ src/gcc/tree-emutls.c @@ -619,8 +619,9 @@ lower_emutls_function_body (struct cgrap d.cfun_node = node; d.builtin_decl = built_in_decls[BUILT_IN_EMUTLS_GET_ADDRESS]; - d.builtin_node = cgraph_get_node (d.builtin_decl); - gcc_checking_assert (d.builtin_node); + /* This is where we introduce the declaration to the IL and so we have to + create a node for it. */ + d.builtin_node = cgraph_get_create_node (d.builtin_decl); FOR_EACH_BB (d.bb) {
Re: [PATCH] Improve combining of conditionals
> If it can be bool, it should be bool. Rather basic principle IMO. Consistency is of much greater value. -- Eric Botcazou
Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.
Denis Chertykov schrieb: > 2011/4/14 Georg-Johann Lay : >> Denis Chertykov schrieb: >>> 2011/4/14 Georg-Johann Lay : The "rotl3" expanders (mode \in {HI,SI,DI}) violates synopsis by using 4 operands instead of 3. This runs in ICE in top of optabs.c:maybe_gen_insn. The right way to do this is to use match_dup, not match_operand. So the fix is obvious. Regenerated avr-gcc and run against avr testsuite: gcc.target/avr/torture/pr41885.c generates these patterns Johann 2011-04-14 Georg-Johann Lay * config/avr/avr.md ("rotlhi3", "rotlsi3", "rotldi3"): Fix expanders operand 3 from match_operand to match_dup. >>> May be better to use (clobber (match_scratch ...)) here and in >>> `*rotw' and `*rotb'. >> These are splitters that might split before reload, producing strange, >> non-matching insns like >> (set (scratch:QI) ... >> or crashing already in avr_rotate_bytes becase the split condition reads >> "&& (reload_completed || mode == DImode)" > > Generally I'm agree with you, change match_operand to match_dup is easy. I already submitted the easy patch to avoid the ICE. > But IMHO the right way is: > - the splitters and avr_rotate_bytes are wrong and must be fixed too. > - operands[3] is a scratch register and right way is to use match_scratch. > > I can approve the patch. > > Denis. Ok, here is the right-way patch. The expander generates a SCRATCH instead of a pseudo, and in case of a pre-reload split the SCRATCH is replaced with a pseudo because it is not known if or if not the scratch will actually be needed. avr_rotate_bytes now skips operations on non-REG 'scratch'. Furthermore, I added an assertion that ensures that 'scratch' is actually a REG when it is needed. Besides that, I fixed indentation. I know it is not optimal to fix indentation alongside with functionality... did it anyway :-) Finally, I exposed alternative #3 of the insns to the register allocator, because it is not possible to distinguish between overlapping or non-overlapping regs, and #3 does not need a scratch. Ran C-testsuite with no regressions. Johann 2011-04-15 Georg-Johann Lay * config/avr/avr.md ("rotl3"): Use SCRATCH instead of REG in operand 3. Fix indentation. Unquote C snippet. ("*rotw","*rotbw"): Ditto. Replace scratch with pseudo for pre-reload splits. Use const_int_operand for operand 2. Expose alternative 3 to register allocator. * config/avr/avr.c (avr_rotate_bytes): Handle SCRATCH in operands[3]. Use GNU indentation style. Index: config/avr/avr.md === --- config/avr/avr.md (Revision 172493) +++ config/avr/avr.md (Arbeitskopie) @@ -1519,22 +1519,22 @@ (define_mode_attr rotsmode [(DI "QI") (S (define_expand "rotl3" [(parallel [(set (match_operand:HIDI 0 "register_operand" "") - (rotate:HIDI (match_operand:HIDI 1 "register_operand" "") -(match_operand:VOID 2 "const_int_operand" ""))) - (clobber (match_dup 3))])] + (rotate:HIDI (match_operand:HIDI 1 "register_operand" "") +(match_operand:VOID 2 "const_int_operand" ""))) + (clobber (match_dup 3))])] "" - " -{ - if (CONST_INT_P (operands[2]) && 0 == (INTVAL (operands[2]) % 8)) { - if (AVR_HAVE_MOVW && 0 == INTVAL (operands[2]) % 16) -operands[3] = gen_reg_rtx (mode); - else -operands[3] = gen_reg_rtx (QImode); - } - else -FAIL; -}") +if (CONST_INT_P (operands[2]) +&& 0 == INTVAL (operands[2]) % 8) + { +if (AVR_HAVE_MOVW && 0 == INTVAL (operands[2]) % 16) + operands[3] = gen_rtx_SCRATCH (mode); +else + operands[3] = gen_rtx_SCRATCH (QImode); + } +else + FAIL; + }) ;; Overlapping non-HImode registers often (but not always) need a scratch. @@ -1545,35 +1545,49 @@ (define_expand "rotl3" ; Split word aligned rotates using scratch that is mode dependent. (define_insn_and_split "*rotw" - [(set (match_operand:HIDI 0 "register_operand" "=r,r,#&r") - (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") - (match_operand 2 "immediate_operand" "n,n,n"))) - (clobber (match_operand: 3 "register_operand" "=" ))] - "(CONST_INT_P (operands[2]) && - (0 == (INTVAL (operands[2]) % 16) && AVR_HAVE_MOVW))" + [(set (match_operand:HIDI 0 "register_operand" "=r,r,&r") +(rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") + (match_operand 2 "const_int_operand" "n,n,n"))) + (clobber (match_scratch: 3 "="))] + "AVR_HAVE_MOVW + && 0 == INTVAL (operands[2]) % 16" "#" "&& (reload_completed || mode == DImode)" [(const_int 0)] - "avr_rotate_bytes (operands); - DONE;" -) + { +/* Split happens in .split1: Cook up a pseudo */ +if (SCRATCH == GET_CODE (operands[3]) +&& !re
[wwwdocs] Shuffle entries in the footer and remove margin at the bottom
This concludes this mini project. :-) Committed and the pages on gcc.gnu.org regenerated; www.gnu.org will follow over night. Gerald Index: style.mhtml === RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v retrieving revision 1.115 diff -u -r1.115 style.mhtml --- style.mhtml 12 Apr 2011 21:11:23 - 1.115 +++ style.mhtml 15 Apr 2011 17:22:30 - @@ -232,14 +232,10 @@ -These pages are -http://gcc.gnu.org/about.html";>maintained by the GCC team. -Last modified http://validator.w3.org/check/referer";>. - -For questions related to the use of GCC, please consult these web -pages and the http://gcc.gnu.org/onlinedocs/";>GCC manuals. If -that fails, the mailto:gcc-h...@gcc.gnu.org";>gcc-h...@gcc.gnu.org +For questions related to the use of GCC, +please consult these web pages and the +http://gcc.gnu.org/onlinedocs/";>GCC manuals. If that fails, +the mailto:gcc-h...@gcc.gnu.org";>gcc-h...@gcc.gnu.org mailing list might help. Comments on these web pages and the development of GCC are welcome on our developer list at mailto:g...@gcc.gnu.org";>g...@gcc.gnu.org. @@ -252,6 +248,11 @@ Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. +These pages are +http://gcc.gnu.org/about.html";>maintained by the GCC team. +Last modified http://validator.w3.org/check/referer";>. +
Re: [PATCH v3] Re: avoid useless if-before-free tests
> I added Jim to the gcc group. Thanks, Tom.
Re: [PATCH] Improve combining of conditionals
On Apr 15, 2011, at 10:22 AM, Eric Botcazou wrote: >> If it can be bool, it should be bool. > > Rather basic principle IMO. Consistency is of much greater value. Yes, and consistency is had by upgrading existing int uses that should be bool to bool as they are spotted :-) I prefer doing this, even if done one at a time. If one discourages it from being done, they run the risk of it being done later. I'd rather have it sooner in this case.
Re: ObjC: rewritten checks for duplicate instance variables (for improved speed and improved error messages)
On Apr 15, 2011, at 10:19 AM, Nicola Pero wrote: > This patch rewrites the check for duplicate instance variables > done by the Objective-C (and Objective-C++) compiler. > Ok to commit ? Ok.
Use ASM_COMMENT_START
The following is a trivial patch to remove hard coded '#' usage. Ok for trunk? Thanks, David 2011-04-15 Xinliang David Li * final.c (dump_basic_block_info): Use ASM_COMMENT_START. Index: final.c === --- final.c (revision 172212) +++ final.c (working copy) @@ -1691,14 +1691,14 @@ dump_basic_block_info (FILE *file, rtx i edge e; edge_iterator ei; - fprintf (file, "# BLOCK %d", bb->index); + fprintf (file, "%s BLOCK %d", ASM_COMMENT_START, bb->index); if (bb->frequency) fprintf (file, " freq:%d", bb->frequency); if (bb->count) fprintf (file, " count:" HOST_WIDEST_INT_PRINT_DEC, bb->count); fprintf (file, " seq:%d", (*bb_seqn)++); - fprintf (file, "\n# PRED:"); + fprintf (file, "\n%s PRED:", ASM_COMMENT_START); FOR_EACH_EDGE (e, ei, bb->preds) { dump_edge_info (file, e, 0); @@ -1711,7 +1711,7 @@ dump_basic_block_info (FILE *file, rtx i edge e; edge_iterator ei; - fprintf (asm_out_file, "# SUCC:"); + fprintf (asm_out_file, "%s SUCC:", ASM_COMMENT_START); FOR_EACH_EDGE (e, ei, bb->succs) { dump_edge_info (asm_out_file, e, 1);
Re: Use ASM_COMMENT_START
On Fri, Apr 15, 2011 at 14:11, Xinliang David Li wrote: > 2011-04-15 Xinliang David Li > > * final.c (dump_basic_block_info): Use ASM_COMMENT_START. OK. Diego.
[pph] Clean positive tests. (issue4423044)
This patch cleans up positive tests. First, stop on first failure so as to avoid littering the logfile. Second, change the extensions of the generated assembly files to .s-pph and .s+pph to be clear on the provenence of each file. Index: gcc/testsuite/ChangeLog.pph 2011-04-15 Lawrence Crowl * lib/dg-pph.exp (dg-pph-pos): Stop on first failure. Change names of assembly files to be clear on which is with and without PPH. Index: gcc/testsuite/lib/dg-pph.exp === --- gcc/testsuite/lib/dg-pph.exp(revision 172457) +++ gcc/testsuite/lib/dg-pph.exp(working copy) @@ -74,13 +74,13 @@ proc dg-pph-pos { subdir test options ma if { $have_errs } { verbose -log "regular compilation failed" fail "$nshort $options, regular compilation failed" -return + return } if { ! [ file_on_host exists "$bname.s" ] } { - verbose -log "assembly file '$bname.s' missing" - fail "$nshort $options, assembly comparison" -return + verbose -log "regular assembly file '$bname.s' missing" + fail "$nshort $options, regular assembly missing" + return } # Rename the .s file into .s-pph to compare it after the second build. @@ -89,19 +89,33 @@ proc dg-pph-pos { subdir test options ma # Compile a second time using the pph files. dg-test -keep-output $test "$options $mapflag -I." "" -remote_upload host "$bname.s" + +if { $have_errs } { + verbose -log "PPH compilation failed" + fail "$nshort $options, PPH compilation failed" + return +} + +if { ! [ file_on_host exists "$bname.s" ] } { + verbose -log "PPH assembly file '$bname.s' missing" + fail "$nshort $options, PPH assembly missing" + return +} + +# Rename the .s file into .s+pph to compare it. +remote_upload host "$bname.s" "$bname.s+pph" +remote_download host "$bname.s+pph" # Compare the two assembly files. They should be identical. -set tmp [ diff "$bname.s" "$bname.s-pph" ] +set tmp [ diff "$bname.s-pph" "$bname.s+pph" ] if { $tmp == 0 } { -verbose -log "assembly file '$bname.s', '$bname.s-pph' comparison error" -fail "$nshort $options assembly comparison" + verbose -log "assembly file '$bname.s-pph', '$bname.s+pph' comparison error" + fail "$nshort $options assembly comparison" } elseif { $tmp == 1 } { -pass "$nshort $options assembly comparison" + pass "$nshort $options assembly comparison" + file_on_host delete "$bname.s-pph" + file_on_host delete "$bname.s+pph" } else { -fail "$nshort $options assembly comparison" + fail "$nshort $options assembly comparison" } -file_on_host delete "$bname.s" -file_on_host delete "$bname.s-pph" - } -- This patch is available for review at http://codereview.appspot.com/4423044
Re: [PATCH] sel-sched: Fix erroneous re-use of pointer to last insn in a BB (PR 48235)
Vladimir and Dmitry, The new test you added, gcc.dg/pr48235.c, is failing on IA64 because it gets this excess message: cc1: note: -freorder-blocks-and-partition does not work on this architecture Could you modify the test to either filter out this message or not use the -freorder-blocks-and-partition option in IA64 or not run the test at all on IA64. I notice that gcc.dg/tree-prof/pr45354.c looks like this test but also has: /* { dg-require-effective-target freorder } */ That test does not fail, or run, on IA64 due to this dg option. Steve Ellcey s...@cup.hp.com
ObjC: get rid of another unnecessary, temporary copy of instance variables
This patch for the Objective-C compiler gets rid of another case where we'd create a temporary tree chain with a copy of all the instance variables of a class just to do a simple check on them. The check is the one to check the scope of the instance variable that is being accessed; without this patch, the check requires copying all the instance variables for the class on a temporary tree chain. With this patch, the check is done (as you'd expect) on the instance variables themselves, without copying them. This is obviously faster; on the other hand, the typical GNUstep-based ObjC file has classes with few instance variables (maybe 10 or 20) and only accesses instance variables up to a few hundred times per file, so the speedup is only of the order of 0.1% to 0.2% even with -fsyntax-only. Of course, in unusual cases (ie, classes with many instance variables, and a big file accessing instance variables lots of times) the speedup may be bigger. Ok to commit ? Thanks Index: ChangeLog === --- ChangeLog (revision 172511) +++ ChangeLog (working copy) @@ -1,5 +1,10 @@ 2011-04-15 Nicola Pero + * objc-act.c (ivar_of_class): New. + (objc_is_public): Use ivar_of_class. + +2011-04-15 Nicola Pero + * objc-act.c (objc_get_interface_ivars): Removed. (objc_detect_field_duplicates): New. (hash_instance_variable): New. Index: objc-act.c === --- objc-act.c (revision 172511) +++ objc-act.c (working copy) @@ -6367,6 +6367,35 @@ is_private (tree decl) DECL_NAME (decl))); } +/* Searches all the instance variables of 'klass' and of its + superclasses for an instance variable whose name (identifier) is + 'ivar_name_ident'. Return the declaration (DECL) of the instance + variable, if found, or NULL_TREE, if not found. */ +static inline tree +ivar_of_class (tree klass, tree ivar_name_ident) +{ + /* First, look up the ivar in CLASS_RAW_IVARS. */ + tree decl_chain = CLASS_RAW_IVARS (klass); + + for ( ; decl_chain; decl_chain = DECL_CHAIN (decl_chain)) +if (DECL_NAME (decl_chain) == ivar_name_ident) + return decl_chain; + + /* If not found, search up the class hierarchy. */ + while (CLASS_SUPER_NAME (klass)) +{ + klass = lookup_interface (CLASS_SUPER_NAME (klass)); + + decl_chain = CLASS_RAW_IVARS (klass); + + for ( ; decl_chain; decl_chain = DECL_CHAIN (decl_chain)) + if (DECL_NAME (decl_chain) == ivar_name_ident) + return decl_chain; +} + + return NULL_TREE; +} + /* We have an instance variable reference;, check to see if it is public. */ int @@ -6397,7 +6426,7 @@ objc_is_public (tree expr, tree identifier) return 0; } - if ((decl = is_ivar (get_class_ivars (klass, true), identifier))) + if ((decl = ivar_of_class (klass, identifier))) { if (TREE_PUBLIC (decl)) return 1;
FDO usability trivial patch to fix div by zero error with insane profile
This is a trivial patch to deal with bad profile data (from multi-threaded programs) that leads to divide by zero error. Ok for trunk? Thanks, David 2011-04-15 Xinliang David Li * ipa-inline.c (cgraph_decide_recursive_inlining): Fix div by zero error with insane profile. Index: ipa-inline.c === --- ipa-inline.c (revision 172510) +++ ipa-inline.c (working copy) @@ -779,7 +779,7 @@ cgraph_decide_recursive_inlining (struct fprintf (dump_file, " Not inlining cold call\n"); continue; } - if (curr->count * 100 / node->count < probability) + if (node->count == 0 || curr->count * 100 / node->count < probability) { if (dump_file) fprintf (dump_file,
Re: FDO usability trivial patch to fix div by zero error with insane profile
On Fri, Apr 15, 2011 at 15:20, Xinliang David Li wrote: > This is a trivial patch to deal with bad profile data (from > multi-threaded programs) that leads to divide by zero error. > > Ok for trunk? > > Thanks, > > David > > > > 2011-04-15 Xinliang David Li > > * ipa-inline.c (cgraph_decide_recursive_inlining): Fix > div by zero error with insane profile. Looks fine to me, but you really want Honza to review this. Any chance we can get a test case for this? Diego.
Re: Remember/restore ALLOCA_FOR_VAR_P over tuples
Hi, On Thu, 14 Apr 2011, Michael Matz wrote: > > Btw, I don't remember why I chose ALLOCA_FOR_VAR_P over > > CALL_ALLOCA_FOR_VAR_P but, given the name of the GIMPLE flag and > > predicate, it's probably time to change it. > > Good idea, I'll rename it before checking in. r172516, for reference also below. I later saw that my patch causes cxg2001 to fail. I've analyzed it enough to be sure that it's only exposed by this patch (due to inlining now happening), in fact it's a problem in IRA that is reproducible with a C++ testcase even without the patch. I've filed PR48633 for this. Ciao, Michael. * tree.h (ALLOCA_FOR_VAR_P): Rename to CALL_ALLOCA_FOR_VAR_P. * builtins.c (expand_builtin): Use CALL_ALLOCA_FOR_VAR_P. * function.c (gimplify_parameters): Ditto. * gimplify.c (gimplify_vla_decl): Ditto. * gimple.h (enum gf_mask): Add GF_CALL_ALLOCA_FOR_VAR. (gimple_call_set_alloca_for_var): New inline function. (gimple_call_alloca_for_var_p): Ditto. * gimple.c (gimple_build_call_from_tree): Remember CALL_ALLOCA_FOR_VAR_P state. * cfgexpand.c (expand_call_stmt): Restore CALL_ALLOCA_FOR_VAR_P state. * tree-inline.c (inline_forbidden_p_stmt): Don't reject alloca calls if they were for VLA objects. Index: tree.h === *** tree.h (revision 172431) --- tree.h (working copy) *** struct GTY(()) tree_common { *** 574,580 all decls CALL_FROM_THUNK_P and !ALLOCA_FOR_VAR_P in CALL_EXPR side_effects_flag: --- 574,580 all decls CALL_FROM_THUNK_P and !CALL_ALLOCA_FOR_VAR_P in CALL_EXPR side_effects_flag: *** extern void omp_clause_range_check_faile *** 1388,1394 /* In a CALL_EXPR, if the function being called is BUILT_IN_ALLOCA, means that it has been built for the declaration of a variable-sized object. */ ! #define ALLOCA_FOR_VAR_P(NODE) (CALL_EXPR_CHECK (NODE)->base.protected_flag) /* In a type, nonzero means that all objects of the type are guaranteed by the language or front-end to be properly aligned, so we can indicate that a MEM --- 1388,1395 /* In a CALL_EXPR, if the function being called is BUILT_IN_ALLOCA, means that it has been built for the declaration of a variable-sized object. */ ! #define CALL_ALLOCA_FOR_VAR_P(NODE) \ ! (CALL_EXPR_CHECK (NODE)->base.protected_flag) /* In a type, nonzero means that all objects of the type are guaranteed by the language or front-end to be properly aligned, so we can indicate that a MEM Index: builtins.c === *** builtins.c (revision 172431) --- builtins.c (working copy) *** expand_builtin (tree exp, rtx target, rt *** 6025,6031 case BUILT_IN_ALLOCA: /* If the allocation stems from the declaration of a variable-sized object, it cannot accumulate. */ ! target = expand_builtin_alloca (exp, ALLOCA_FOR_VAR_P (exp)); if (target) return target; break; --- 6025,6031 case BUILT_IN_ALLOCA: /* If the allocation stems from the declaration of a variable-sized object, it cannot accumulate. */ ! target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); if (target) return target; break; Index: function.c === *** function.c (revision 172431) --- function.c (working copy) *** gimplify_parameters (void) *** 3652,3658 t = built_in_decls[BUILT_IN_ALLOCA]; t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm)); /* The call has been built for a variable-sized object. */ ! ALLOCA_FOR_VAR_P (t) = 1; t = fold_convert (ptr_type, t); t = build2 (MODIFY_EXPR, TREE_TYPE (addr), addr, t); gimplify_and_add (t, &stmts); --- 3652,3658 t = built_in_decls[BUILT_IN_ALLOCA]; t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm)); /* The call has been built for a variable-sized object. */ ! CALL_ALLOCA_FOR_VAR_P (t) = 1; t = fold_convert (ptr_type, t); t = build2 (MODIFY_EXPR, TREE_TYPE (addr), addr, t); gimplify_and_add (t, &stmts); Index: gimplify.c === *** gimplify.c (revision 172431) --- gimplify.c (working copy) *** gimplify_vla_decl (tree decl, gimple_seq *** 1329,1335 t = built_in_decls[BUILT_IN_ALLOCA]; t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl)); /* The call has been built for a variable-sized object. */ ! ALLOCA
[PATCH] Prefer simplify_gen_* over gen_* in expand_debug_expr
Hi! On IRC richi mentioned yesterday that when gcc.dg/guality/asm-1.c testcase is compiled with g++ instead of gcc, it fails. Apparently that is because of slightly different ARRAY_REF in the code, which for g++ leads into (sign_extend:DI (zero_extend:SI (something:HI))) in DEBUG_INSN (as opposed to (zero_extend:DI (something:HI)) with C FE) and var-tracking doesn't happen to match it in some places. The above is something simplify-rtx.c can fix up though. This patch fixes it by using simplify_gen_* instead of gen_* in many more places than it did until now in expand_debug_expr. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-04-15 Jakub Jelinek * cfgexpand.c (expand_debug_expr): Use simplify_gen_{unary,binary,ternary} instead of gen_rtx_*. --- gcc/cfgexpand.c.jj 2011-04-09 19:29:19.081421040 +0200 +++ gcc/cfgexpand.c 2011-04-14 18:52:51.354402038 +0200 @@ -2364,6 +2364,7 @@ expand_debug_expr (tree exp) { rtx op0 = NULL_RTX, op1 = NULL_RTX, op2 = NULL_RTX; enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + enum machine_mode inner_mode = VOIDmode; int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp)); addr_space_t as; @@ -2410,6 +2411,7 @@ expand_debug_expr (tree exp) unary: case tcc_unary: + inner_mode = TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 0))); op0 = expand_debug_expr (TREE_OPERAND (exp, 0)); if (!op0) return NULL_RTX; @@ -2513,7 +2515,7 @@ expand_debug_expr (tree exp) case NOP_EXPR: case CONVERT_EXPR: { - enum machine_mode inner_mode = GET_MODE (op0); + inner_mode = GET_MODE (op0); if (mode == inner_mode) return op0; @@ -2560,9 +2562,9 @@ expand_debug_expr (tree exp) else if (TREE_CODE_CLASS (TREE_CODE (exp)) == tcc_unary ? TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, 0))) : unsignedp) - op0 = gen_rtx_ZERO_EXTEND (mode, op0); + op0 = simplify_gen_unary (ZERO_EXTEND, mode, op0, inner_mode); else - op0 = gen_rtx_SIGN_EXTEND (mode, op0); + op0 = simplify_gen_unary (SIGN_EXTEND, mode, op0, inner_mode); return op0; } @@ -2697,7 +2699,8 @@ expand_debug_expr (tree exp) /* Don't use offset_address here, we don't need a recognizable address, and we don't want to generate code. */ - op0 = gen_rtx_MEM (mode, gen_rtx_PLUS (addrmode, op0, op1)); + op0 = gen_rtx_MEM (mode, simplify_gen_binary (PLUS, addrmode, + op0, op1)); } if (MEM_P (op0)) @@ -2770,25 +2773,23 @@ expand_debug_expr (tree exp) } case ABS_EXPR: - return gen_rtx_ABS (mode, op0); + return simplify_gen_unary (ABS, mode, op0, mode); case NEGATE_EXPR: - return gen_rtx_NEG (mode, op0); + return simplify_gen_unary (NEG, mode, op0, mode); case BIT_NOT_EXPR: - return gen_rtx_NOT (mode, op0); + return simplify_gen_unary (NOT, mode, op0, mode); case FLOAT_EXPR: - if (unsignedp) - return gen_rtx_UNSIGNED_FLOAT (mode, op0); - else - return gen_rtx_FLOAT (mode, op0); + return simplify_gen_unary (TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, +0))) +? UNSIGNED_FLOAT : FLOAT, mode, op0, +inner_mode); case FIX_TRUNC_EXPR: - if (unsignedp) - return gen_rtx_UNSIGNED_FIX (mode, op0); - else - return gen_rtx_FIX (mode, op0); + return simplify_gen_unary (unsignedp ? UNSIGNED_FIX : FIX, mode, op0, +inner_mode); case POINTER_PLUS_EXPR: /* For the rare target where pointers are not the same size as @@ -2799,161 +2800,164 @@ expand_debug_expr (tree exp) && GET_MODE (op0) != GET_MODE (op1)) { if (GET_MODE_BITSIZE (GET_MODE (op0)) < GET_MODE_BITSIZE (GET_MODE (op1))) - op1 = gen_rtx_TRUNCATE (GET_MODE (op0), op1); + op1 = simplify_gen_unary (TRUNCATE, GET_MODE (op0), op1, + GET_MODE (op1)); else /* We always sign-extend, regardless of the signedness of the operand, because the operand is always unsigned here even if the original C expression is signed. */ - op1 = gen_rtx_SIGN_EXTEND (GET_MODE (op0), op1); + op1 = simplify_gen_unary (SIGN_EXTEND, GET_MODE (op0), op1, + GET_MODE (op1)); } /* Fall through. */ case PLUS_EXPR: - return gen_rtx_PLUS (mode, op0, op1); + return simplify_gen_binary (PLUS, mode, op0, op1); case MINUS_EXPR: - return gen_rtx_MINUS (mode, op0, op1); + return simplify_gen_binary (MINUS, mode, op0, op1); case MULT_EXPR: -
Re: FDO usability trivial patch to fix div by zero error with insane profile
There is no way to get a test case that can produce the problem in a deterministic way. There is really not much for Honza to review though :) David On Fri, Apr 15, 2011 at 12:52 PM, Diego Novillo wrote: > On Fri, Apr 15, 2011 at 15:20, Xinliang David Li wrote: >> This is a trivial patch to deal with bad profile data (from >> multi-threaded programs) that leads to divide by zero error. >> >> Ok for trunk? >> >> Thanks, >> >> David >> >> >> >> 2011-04-15 Xinliang David Li >> >> * ipa-inline.c (cgraph_decide_recursive_inlining): Fix >> div by zero error with insane profile. > > Looks fine to me, but you really want Honza to review this. Any > chance we can get a test case for this? > > > Diego. >
Re: ObjC: get rid of another unnecessary, temporary copy of instance variables
On Apr 15, 2011, at 11:52 AM, Nicola Pero wrote: > This patch for the Objective-C compiler gets rid of another case where > we'd create a temporary tree chain with a copy of all the instance variables > of a class just to do a simple check on them. > Ok to commit ? Ok.
Re: [PATCH] Prefer simplify_gen_* over gen_* in expand_debug_expr
On Fri, Apr 15, 2011 at 10:21 PM, Jakub Jelinek wrote: > Hi! > > On IRC richi mentioned yesterday that when gcc.dg/guality/asm-1.c > testcase is compiled with g++ instead of gcc, it fails. > Apparently that is because of slightly different ARRAY_REF > in the code, which for g++ leads into > (sign_extend:DI (zero_extend:SI (something:HI))) > in DEBUG_INSN (as opposed to > (zero_extend:DI (something:HI)) > with C FE) and var-tracking doesn't happen to match it in some > places. The above is something simplify-rtx.c can fix up though. > > This patch fixes it by using simplify_gen_* instead of gen_* in many > more places than it did until now in expand_debug_expr. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2011-04-15 Jakub Jelinek > > * cfgexpand.c (expand_debug_expr): Use > simplify_gen_{unary,binary,ternary} instead of gen_rtx_*. > > --- gcc/cfgexpand.c.jj 2011-04-09 19:29:19.081421040 +0200 > +++ gcc/cfgexpand.c 2011-04-14 18:52:51.354402038 +0200 > @@ -2364,6 +2364,7 @@ expand_debug_expr (tree exp) > { > rtx op0 = NULL_RTX, op1 = NULL_RTX, op2 = NULL_RTX; > enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); > + enum machine_mode inner_mode = VOIDmode; > int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp)); > addr_space_t as; > > @@ -2410,6 +2411,7 @@ expand_debug_expr (tree exp) > > unary: > case tcc_unary: > + inner_mode = TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 0))); > op0 = expand_debug_expr (TREE_OPERAND (exp, 0)); > if (!op0) > return NULL_RTX; > @@ -2513,7 +2515,7 @@ expand_debug_expr (tree exp) > case NOP_EXPR: > case CONVERT_EXPR: > { > - enum machine_mode inner_mode = GET_MODE (op0); > + inner_mode = GET_MODE (op0); > > if (mode == inner_mode) > return op0; > @@ -2560,9 +2562,9 @@ expand_debug_expr (tree exp) > else if (TREE_CODE_CLASS (TREE_CODE (exp)) == tcc_unary > ? TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, 0))) > : unsignedp) > - op0 = gen_rtx_ZERO_EXTEND (mode, op0); > + op0 = simplify_gen_unary (ZERO_EXTEND, mode, op0, inner_mode); > else > - op0 = gen_rtx_SIGN_EXTEND (mode, op0); > + op0 = simplify_gen_unary (SIGN_EXTEND, mode, op0, inner_mode); > > return op0; > } > @@ -2697,7 +2699,8 @@ expand_debug_expr (tree exp) > /* Don't use offset_address here, we don't need a > recognizable address, and we don't want to generate > code. */ > - op0 = gen_rtx_MEM (mode, gen_rtx_PLUS (addrmode, op0, op1)); > + op0 = gen_rtx_MEM (mode, simplify_gen_binary (PLUS, addrmode, > + op0, op1)); > } > > if (MEM_P (op0)) > @@ -2770,25 +2773,23 @@ expand_debug_expr (tree exp) > } > > case ABS_EXPR: > - return gen_rtx_ABS (mode, op0); > + return simplify_gen_unary (ABS, mode, op0, mode); > > case NEGATE_EXPR: > - return gen_rtx_NEG (mode, op0); > + return simplify_gen_unary (NEG, mode, op0, mode); > > case BIT_NOT_EXPR: > - return gen_rtx_NOT (mode, op0); > + return simplify_gen_unary (NOT, mode, op0, mode); > > case FLOAT_EXPR: > - if (unsignedp) > - return gen_rtx_UNSIGNED_FLOAT (mode, op0); > - else > - return gen_rtx_FLOAT (mode, op0); > + return simplify_gen_unary (TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, > + 0))) > + ? UNSIGNED_FLOAT : FLOAT, mode, op0, > + inner_mode); > > case FIX_TRUNC_EXPR: > - if (unsignedp) > - return gen_rtx_UNSIGNED_FIX (mode, op0); > - else > - return gen_rtx_FIX (mode, op0); > + return simplify_gen_unary (unsignedp ? UNSIGNED_FIX : FIX, mode, op0, > + inner_mode); > > case POINTER_PLUS_EXPR: > /* For the rare target where pointers are not the same size as > @@ -2799,161 +2800,164 @@ expand_debug_expr (tree exp) > && GET_MODE (op0) != GET_MODE (op1)) > { > if (GET_MODE_BITSIZE (GET_MODE (op0)) < GET_MODE_BITSIZE (GET_MODE > (op1))) > - op1 = gen_rtx_TRUNCATE (GET_MODE (op0), op1); > + op1 = simplify_gen_unary (TRUNCATE, GET_MODE (op0), op1, > + GET_MODE (op1)); > else > /* We always sign-extend, regardless of the signedness of > the operand, because the operand is always unsigned > here even if the original C expression is signed. */ > - op1 = gen_rtx_SIGN_EXTEND (GET_MODE (op0), op1); > + op1 = simplify_gen_unary (SIGN_EXTEND, GET_MODE (op0), op1, > + GET_MODE (op1)); > } > /* Fall through. */ > case PLUS_EXPR: > -
Re: FDO usability trivial patch to fix div by zero error with insane profile
On Fri, Apr 15, 2011 at 16:21, Xinliang David Li wrote: > There is no way to get a test case that can produce the problem in a > deterministic way. There is really not much for Honza to review > though :) The check may be needed somewhere else, you may be papering over the real issue. I don't think you are but I don't know the code well enough to say for sure. Diego.