Re: [PATCH] Fix Ada bootstrap failure
On Fri, Sep 2, 2011 at 6:55 PM, Eric Botcazou wrote: >> And this is the real fix. Richard, do you want me to apply (part of it)? A varaint of the patch is ok, > In fact I'd even propose to revert the fold_builtin_alloca_for_var part of > your patch and apply mine entirely, as eliminating alloca (0) early looks a > interesting simplification. What do you think? I'm not sure we want to create a the replacement decl with DECL_SIZE zero though, so I suppose instead of making sure align is BITS_PER_UNIT shouldn't we make sure that size is at least 1? I fear we might run into other odd problems with zero-size autos. Thus, a variant of the patch that re-enables handling of size zero allocas but adjusts the size to be at least 1 is ok. Other opinions? Thanks, Richard. > -- > Eric Botcazou >
Re: Propagate BB predicates in ipa-inline-analysis
On Fri, Sep 2, 2011 at 7:42 PM, Ulrich Weigand wrote: > Jan Hubicka wrote: > >> (edge_execution_predicate): Rewrite as... >> (set_cond_stmt_execution_predicate): ... this function; handle >> __builtin_constant_p. > > This causes ICEs when building recent Linux kernels with the > CONFIG_TRACE_BRANCH_PROFLING option. This reduced test case: > > static inline __attribute__((always_inline)) int f (unsigned int n, unsigned > int size) > { > return (__builtin_constant_p (size != 0 && n > ~0 / size) > ? !!(size != 0 && n > ~0 / size) > : ({ static unsigned int count[2] = { 0, 0 }; > int r = !!(size != 0 && n > ~0 / size); > count[r]++; > r; })); > } > > int g (unsigned int size) > { > return f (size / 4096, 4); > } > > built with -O2 (on i386) on current mainline results in: > > /home/uweigand/test.i:15:1: internal compiler error: tree check: expected > ssa_name, have integer_cst in set_cond_stmt_execution_predicate, at > ipa-inline-analysis.c:1190 Can you open a bugreport? Thanks, Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com >
Re: Fix PR50260
On Fri, Sep 2, 2011 at 8:40 PM, Michael Matz wrote: > Hi, > > On Fri, 2 Sep 2011, Michael Matz wrote: > >> Hi, >> >> On Fri, 2 Sep 2011, Richard Guenther wrote: >> >> > > Currently regstrapping on x86_64-linux (without Ada). Okay for trunk? >> > >> > Ok. Time to make get_var_ann private? >> >> Yes. Should the regstrap succeed on x86_64-linux I'll commit this >> variant of the patch (hunks in tree-flow.h, tree-flow-inline.h and >> tree-dfa.c are new, otherwise the same patch). > > As I feared the call to get_var_ann in set_is_used right now really is > still needed, privatizing it hence isn't that straight forward. For now > I've committed the PR50260 fix without that cleanup as r178489 to fix > bootstrap for some platforms. That's odd though - looking at the single caller it should only ever be called for referenced vars - which means we miss a referenced var somewhere. Richard. > > Ciao, > Michael.
Re: rfa: remove get_var_ann (was: Fix PR50260)
On Sat, Sep 3, 2011 at 3:51 AM, Michael Matz wrote: > Hi, > > On Fri, 2 Sep 2011, Michael Matz wrote: > >> > > Ok. Time to make get_var_ann private? >> >> As I feared the call to get_var_ann in set_is_used right now really is >> still needed, privatizing it hence isn't that straight forward. > > After pondering I concluded that it's not necessary to call set_is_used > for variables without var annotation. Those won't be in referenced_vars > (or local_decls) and therefore also won't be removed from those lists no > matter how hard we try. > > Regstrapped on x86_64-linux (without Ada). Okay for trunk? No. We call mark_all_vars_used on trees contained in GIMPLE non-debug statements. All vars we can reach that way _have_ to be in the list of referenced vars. That they are not is the bug. So - can you investigate which var doesn't have an annotation an why it isn't in referenced vars? Richard. > > Ciao, > Michael. > -- > * tree-flow.h (get_var_ann): Don't declare. > * tree-flow-inline.h (get_var_ann): Remove. > (set_is_used): Use var_ann, not get_var_ann. > * tree-dfa.c (add_referenced_var): Inline body of get_var_ann. > * tree-ssa-live.c (mark_all_vars_used_1): Check var_ann before > calling set_is_used. > > Index: tree-flow-inline.h > === > --- tree-flow-inline.h (Revision 178488) > +++ tree-flow-inline.h (Arbeitskopie) > @@ -145,16 +145,6 @@ var_ann (const_tree t) > return p ? *p : NULL; > } > > -/* Return the variable annotation for T, which must be a _DECL node. > - Create the variable annotation if it doesn't exist. */ > -static inline var_ann_t > -get_var_ann (tree var) > -{ > - var_ann_t *p = DECL_VAR_ANN_PTR (var); > - gcc_checking_assert (p); > - return *p ? *p : create_var_ann (var); > -} > - > /* Get the number of the next statement uid to be allocated. */ > static inline unsigned int > gimple_stmt_max_uid (struct function *fn) > @@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us > static inline void > set_is_used (tree var) > { > - var_ann_t ann = get_var_ann (var); > + var_ann_t ann = var_ann (var); > ann->used = true; > } > > Index: tree-flow.h > === > --- tree-flow.h (Revision 178488) > +++ tree-flow.h (Arbeitskopie) > @@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d > typedef struct var_ann_d *var_ann_t; > > static inline var_ann_t var_ann (const_tree); > -static inline var_ann_t get_var_ann (tree); > static inline void update_stmt (gimple); > static inline int get_lineno (const_gimple); > > Index: tree-dfa.c > === > --- tree-dfa.c (Revision 178488) > +++ tree-dfa.c (Arbeitskopie) > @@ -580,8 +580,9 @@ set_default_def (tree var, tree def) > bool > add_referenced_var (tree var) > { > - get_var_ann (var); > gcc_assert (DECL_P (var)); > + if (!*DECL_VAR_ANN_PTR (var)) > + create_var_ann (var); > > /* Insert VAR into the referenced_vars hash table if it isn't present. */ > if (referenced_var_check_and_insert (var)) > Index: tree-ssa-live.c > === > --- tree-ssa-live.c (Revision 178408) > +++ tree-ssa-live.c (Arbeitskopie) > @@ -376,7 +376,10 @@ mark_all_vars_used_1 (tree *tp, int *wal > { > if (data != NULL && bitmap_clear_bit ((bitmap) data, DECL_UID (t))) > mark_all_vars_used (&DECL_INITIAL (t), data); > - set_is_used (t); > + /* If something has no annotation it's neither in referenced_vars, > + nor in local_decls. No use in marking it as used. */ > + if (var_ann (t)) > + set_is_used (t); > } > /* remove_unused_scope_block_p requires information about labels > which are not DECL_IGNORED_P to tell if they might be used in the IL. */ >
Re: [PATCH] Fix Ada bootstrap failure
> I'm not sure we want to create a the replacement decl with DECL_SIZE zero > though, so I suppose instead of making sure align is BITS_PER_UNIT > shouldn't we make sure that size is at least 1? I fear we might run into > other odd problems with zero-size autos. The cfgexpand.c code already does so, see the top of the hunk. > Thus, a variant of the patch that re-enables handling of size zero allocas > but adjusts the size to be at least 1 is ok. I think that the zero size is fine, we very likely generate this kind of things directly in Ada. -- Eric Botcazou
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
> Well, even when sign-extended there is a constant you can't negate > without overflow. I would start digging for a testcase with > such case - but as said, testcases involving TYPE_IS_SIZETYPE are > very hard to generate for me. We run thousands of Ada tests every night on many platforms and never detected a problem here, so finding a testcase with the unpatched compiler is probably very hard - if there is really something to be found, which I doubt. > Well, but I'd expect you can have a set of Ada types, a function that > returns just its size and some scan-tree-dumps that check those sizes > are folded to a constant. Or to just N statements. We probably should, but we don't have them (yet). Like for the vast majority of the transformations made by the folder I guess, so... > If you insist I can revert the patch and apply it together with the > sign-extension change. Yes, IMHO it should be part of the final patch. -- Eric Botcazou
Re: [PATCH] Fix Ada bootstrap failure
On Sat, Sep 3, 2011 at 11:09 AM, Eric Botcazou wrote: >> I'm not sure we want to create a the replacement decl with DECL_SIZE zero >> though, so I suppose instead of making sure align is BITS_PER_UNIT >> shouldn't we make sure that size is at least 1? I fear we might run into >> other odd problems with zero-size autos. > > The cfgexpand.c code already does so, see the top of the hunk. > >> Thus, a variant of the patch that re-enables handling of size zero allocas >> but adjusts the size to be at least 1 is ok. > > I think that the zero size is fine, we very likely generate this kind of > things directly in Ada. Ok. Then your original patch is ok with removing the size == 0 check I added. Thanks, Richard. > -- > Eric Botcazou >
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
On Sat, Sep 3, 2011 at 11:24 AM, Eric Botcazou wrote: >> Well, even when sign-extended there is a constant you can't negate >> without overflow. I would start digging for a testcase with >> such case - but as said, testcases involving TYPE_IS_SIZETYPE are >> very hard to generate for me. > > We run thousands of Ada tests every night on many platforms and never detected > a problem here, so finding a testcase with the unpatched compiler is probably > very hard - if there is really something to be found, which I doubt. Well, for real-world code I believe that. But see all the recent testcases for corner-cases of our signed-overflow stuff, they all require hand-crafted testcases involving INT_MIN, no inlining and even -ftrapv. What I meant to say is, given Ada can construct arbitrary layouted types it should be possible to have testcases for all the corner-cases - after all you cannot have both, undefined overflow and wrapping overflow, at the same time. That extract_muldiv_1 code is wrong for TYPE_OVERFLOW_UNDEFINED types as well, btw. >> Well, but I'd expect you can have a set of Ada types, a function that >> returns just its size and some scan-tree-dumps that check those sizes >> are folded to a constant. Or to just N statements. > > We probably should, but we don't have them (yet). Like for the vast majority > of the transformations made by the folder I guess, so... > >> If you insist I can revert the patch and apply it together with the >> sign-extension change. > > Yes, IMHO it should be part of the final patch. Ok, I'll revert it on monday. Richard. > -- > Eric Botcazou >
Re: [RFC] Split -mrecip
On Wed, Aug 31, 2011 at 6:16 PM, Michael Matz wrote: > I'd like to have tighter control over the individual situations that > -mrecip handles, and I think the user might appreciate this too. Hence > I've introduced four new target options -mrecip-div, -mrecip-sqrt, > -mrecip-vec-div and -mrecip-vec-sqrt. I've redefined -mrecip to be > equivalent to using those four options together. In addition one can > selectively disable some part via -mrecip -mno-recip-vec for instance. > > I was split mind about the approach, I could also have done like rs6000 > (-mrecip=) with the disadvantage of having to write an own > parser as our opt framework can't deal with comma separated lists of > masks. With the approach I chose our opt framework gets most of the work > done. > > I've decided to not use four new bits from target_flags, and instead > created a new mask (recip_mask). Four bits would have fit in target bits > right now, but in the future we might want to add more specialization, > like modes for which the reciprocals are active. > > What do you think? These new flags looks like a nice addition, but I wonder, why we need separate options to handle vector recip. A vector rsqrt or rdiv is generated automatically in the same way as scalar rsqrt or rdiv is generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough. For the future - could rs6000 and x86 use the same compile options to handle reciprocals? Uros.
[Patch, Fortran] PR44646 - Add parser support for DO CONCURRENT
Recently, I came across PR 44646, where I wrote a bit more than a year ago: "I have an embryonic patch for it". Well, I found it again, did some polishing - and here it is. This patch implements the parsing/diagnostic for "DO[,] CONCURRENT for-all-header", e.g. do concurrent (i = 1:5) A(i) = B(i) end do DO CONCURRENT allows the compiler to go through the loop in any order and also in parallel; some constraints help to avoid dependencies, but it is the duty of the programmer to ensure that the order does not matter. That's one difference to the FORALL construct; another is that one is allow to do much more in the DO CONCURRENT block than FORALL allows: Other loops, conditional jumps, ... TODO as follow up: - Replace "Sorry" by a real implementation in trans-stmt.c - Add documentation to gfc-internal.texi, update gfortran.texi (F2003 status) and the wiki - Try to make use of the constraints for the middle end (optimization, autoparallelization), if possible. - Optionally (-fdo-concurrent=...?), parallelize the loop (e.g. by adding OpenMP pragmas). Build and regtested on x86-64-linux. OK for the trunk? Tobias 2011-09-03 Tobias Burnus PR fortran/44646 * decl.c (gfc_match_entry, gfc_match_end): Handle COMP_DO_CONCURRENT. * dump-parse-tree.c (show_code_node): Handle EXEC_DO_CONCURRENT. * gfortran.h (gfc_exec_op): Add EXEC_DO_CONCURRENT. * match.c (gfc_match_critical, match_exit_cycle, gfc_match_stopcode, lock_unlock_statement, sync_statement, gfc_match_allocate, gfc_match_deallocate, gfc_match_return): Add DO CONCURRENT diagnostic. (gfc_match_do): Match DO CONCURRENT. (match_derived_type_spec, match_type_spec, gfc_free_forall_iterator, match_forall_iterator, match_forall_header, match_simple_forall, gfc_match_forall): Move up in the file. * parse.c (check_do_closure, parse_do_block): Handle do concurrent. * parse.h (gfc_compile_state): Add COMP_DO_CONCURRENT. * resolve.c (do_concurrent_flag): New global variable. (resolve_function, pure_subroutine, resolve_branch, gfc_resolve_blocks, resolve_code, resolve_types): Add do concurrent diagnostic. * st.c (gfc_free_statement): Handle EXEC_DO_CONCURRENT. * trans-stmt.c (gfc_trans_do_concurrent): New function. * trans-stmt.h (gfc_trans_do_concurrent): Ditto. * trans.c (trans_code): Call it. 2011-09-03 Tobias Burnus PR fortran/44646 * gfortran.dg/do_concurrent_1.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 18e2651..0ee2575 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5248,6 +5248,7 @@ gfc_match_entry (void) "an IF-THEN block"); break; case COMP_DO: + case COMP_DO_CONCURRENT: gfc_error ("ENTRY statement at %C cannot appear within " "a DO block"); break; @@ -5853,6 +5854,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_DO: +case COMP_DO_CONCURRENT: *st = ST_ENDDO; target = " do"; eos_ok = 0; diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c index ad8b554..af2cd85 100644 --- a/gcc/fortran/dump-parse-tree.c +++ b/gcc/fortran/dump-parse-tree.c @@ -1611,6 +1611,28 @@ show_code_node (int level, gfc_code *c) fputs ("END DO", dumpfile); break; +case EXEC_DO_CONCURRENT: + fputs ("DO CONCURRENT ", dumpfile); + for (fa = c->ext.forall_iterator; fa; fa = fa->next) +{ + show_expr (fa->var); + fputc (' ', dumpfile); + show_expr (fa->start); + fputc (':', dumpfile); + show_expr (fa->end); + fputc (':', dumpfile); + show_expr (fa->stride); + + if (fa->next != NULL) +fputc (',', dumpfile); +} + show_expr (c->expr1); + + show_code (level + 1, c->block->next); + code_indent (level, c->label1); + fputs ("END DO", dumpfile); + break; + case EXEC_DO_WHILE: fputs ("DO WHILE ", dumpfile); show_expr (c->expr1); diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index ac36d24..54e0b20 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -2052,10 +2052,10 @@ typedef enum EXEC_LABEL_ASSIGN, EXEC_POINTER_ASSIGN, EXEC_CRITICAL, EXEC_ERROR_STOP, EXEC_GOTO, EXEC_CALL, EXEC_COMPCALL, EXEC_ASSIGN_CALL, EXEC_RETURN, EXEC_ENTRY, EXEC_PAUSE, EXEC_STOP, EXEC_CONTINUE, EXEC_INIT_ASSIGN, - EXEC_IF, EXEC_ARITHMETIC_IF, EXEC_DO, EXEC_DO_WHILE, EXEC_SELECT, EXEC_BLOCK, - EXEC_FORALL, EXEC_WHERE, EXEC_CYCLE, EXEC_EXIT, EXEC_CALL_PPC, - EXEC_ALLOCATE, EXEC_DEALLOCATE, EXEC_END_PROCEDURE, EXEC_SELECT_TYPE, - EXEC_SYNC_ALL, EXEC_SYNC_MEMORY, EXEC_SYNC_IMAGES, + EXEC_IF, EXEC_ARITHMETIC_IF, EXEC_DO, EXEC_DO_CONCURRENT, EXEC_DO_WHILE, + EXEC_SELECT, EXEC_BLOCK, EXEC_FORALL, EXEC_WHERE, EXEC_CYCLE, EXEC_EXIT, + EXEC_CALL_PPC, EXEC_ALLOCATE, EXEC_DEALLOCATE, EXEC_END_PROCEDURE, + EXEC_SELECT_TYPE, EXEC_SYNC_ALL, EXEC_SYNC_MEMORY, EXEC_SYNC_IMAGES, EXEC_OPEN, EXEC_CLOSE, EXEC_WAIT, EXEC_READ,
Re:
Hello! > Here is a patch which adds few more splits for AGU stalls avoidance on > Atom. It also fixes cost model and detects AGU stalls more > efficiently. > > Bootstrapped and checked on x86_64-linux. > > 2011-09-02 Enkovich Ilya > > * config/i386/i386-protos.h (ix86_lea_outperforms): New. > (ix86_avoid_lea_for_add): Likewise. > (ix86_avoid_lea_for_addr): Likewise. > (ix86_split_lea_for_addr): Likewise. > > * config/i386/i386.c (LEA_MAX_STALL): New. > (increase_distance): Likewise. > (insn_defines_reg): Likewise. > (insn_uses_reg_mem): Likewise. > (distance_non_agu_define_in_bb): Likewise. > (distance_agu_use_in_bb): Likewise. > (ix86_lea_outperforms): Likewise. > (ix86_ok_to_clobber_flags): Likewise. > (ix86_avoid_lea_for_add): Likewise. > (ix86_avoid_lea_for_addr): Likewise. > (ix86_split_lea_for_addr): Likewise. > (distance_non_agu_define): Search in pred BBs added. > (distance_agu_use): Search in succ BBs added. > (IX86_LEA_PRIORITY): Value changed from 2 to 0. > (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. > (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. > > * config/i386/i386.md: Splits added to transform lea into a > sequence of instructions. Did you also test on x32 ? H.J.'s x32 page [1] currently says that Atom LEA optimization is disabled on x32 for some reason. The patch looks OK to me, with a few nits below. [1] https://sites.google.com/site/x32abi/ --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h +bool +ix86_avoid_lea_for_addr (rtx insn, rtx operands[]) +{ + unsigned int regno0 = true_regnum (operands[0]) ; + unsigned int regno1 = -1; + unsigned int regno2 = -1; Use INVALID_REGNUM here. +extern void +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode) +{ Missing comment. --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -5777,6 +5777,41 @@ (const_string "none"))) (set_attr "mode" "QI")]) +;; Split non destructive adds if we cannot use lea. +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (plus:SWI48 (match_operand:SWI48 1 "register_operand" "") + (match_operand:SWI48 2 "nonmemory_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "reload_completed && ix86_avoid_lea_for_add (insn, operands)" + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 0) (plus: (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + ] +) Put all closing braces on one line: (clobber (reg:CC FLAGS_REG))])]) +;; Split lea into one or more ALU instructions if profitable. +(define_split + [(set (match_operand:SI 0 "register_operand" "") + (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))] + "reload_completed && ix86_avoid_lea_for_addr (insn, operands)" + [(const_int 0)] +{ + ix86_split_lea_for_addr (operands, SImode); + DONE; +}) This is valid only for TARGET_64BIT. Please note that x32 adds quite some different LEA patterns (see i386.md, line 5466+). I suggest you merge your splitters with these define_insn patterns into define_insn_and_split, adding "&& reload_completed && ix86_avoid_lea_for_addr (insn, operands)" as a split condition. Uros.
Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
... Sent again, with correct Cc and subject line ... Hello! > Here is a patch which adds few more splits for AGU stalls avoidance on > Atom. It also fixes cost model and detects AGU stalls more > efficiently. > > Bootstrapped and checked on x86_64-linux. > > 2011-09-02 Enkovich Ilya > > * config/i386/i386-protos.h (ix86_lea_outperforms): New. > (ix86_avoid_lea_for_add): Likewise. > (ix86_avoid_lea_for_addr): Likewise. > (ix86_split_lea_for_addr): Likewise. > > * config/i386/i386.c (LEA_MAX_STALL): New. > (increase_distance): Likewise. > (insn_defines_reg): Likewise. > (insn_uses_reg_mem): Likewise. > (distance_non_agu_define_in_bb): Likewise. > (distance_agu_use_in_bb): Likewise. > (ix86_lea_outperforms): Likewise. > (ix86_ok_to_clobber_flags): Likewise. > (ix86_avoid_lea_for_add): Likewise. > (ix86_avoid_lea_for_addr): Likewise. > (ix86_split_lea_for_addr): Likewise. > (distance_non_agu_define): Search in pred BBs added. > (distance_agu_use): Search in succ BBs added. > (IX86_LEA_PRIORITY): Value changed from 2 to 0. > (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. > (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. > > * config/i386/i386.md: Splits added to transform lea into a > sequence of instructions. Did you also test on x32 ? H.J.'s x32 page [1] currently says that Atom LEA optimization is disabled on x32 for some reason. The patch looks OK to me, with a few nits below. [1] https://sites.google.com/site/x32abi/ --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h +bool +ix86_avoid_lea_for_addr (rtx insn, rtx operands[]) +{ + unsigned int regno0 = true_regnum (operands[0]) ; + unsigned int regno1 = -1; + unsigned int regno2 = -1; Use INVALID_REGNUM here. +extern void +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode) +{ Missing comment. --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -5777,6 +5777,41 @@ (const_string "none"))) (set_attr "mode" "QI")]) +;; Split non destructive adds if we cannot use lea. +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (plus:SWI48 (match_operand:SWI48 1 "register_operand" "") + (match_operand:SWI48 2 "nonmemory_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "reload_completed && ix86_avoid_lea_for_add (insn, operands)" + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 0) (plus: (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + ] +) Put all closing braces on one line: (clobber (reg:CC FLAGS_REG))])]) +;; Split lea into one or more ALU instructions if profitable. +(define_split + [(set (match_operand:SI 0 "register_operand" "") + (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))] + "reload_completed && ix86_avoid_lea_for_addr (insn, operands)" + [(const_int 0)] +{ + ix86_split_lea_for_addr (operands, SImode); + DONE; +}) This is valid only for TARGET_64BIT. Please note that x32 adds quite some different LEA patterns (see i386.md, line 5466+). I suggest you merge your splitters with these define_insn patterns into define_insn_and_split, adding "&& reload_completed && ix86_avoid_lea_for_addr (insn, operands)" as a split condition. Uros.
Re: [PATCH] Fix Ada bootstrap failure
> Then your original patch is ok with removing the size == 0 check I added. Bootstrapped/regtested on x86_64-suse-linux and applied, thanks. -- Eric Botcazou
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
> Well, for real-world code I believe that. But see all the recent testcases > for corner-cases of our signed-overflow stuff, they all require > hand-crafted testcases involving INT_MIN, no inlining and even -ftrapv. > What I meant to say is, given Ada can construct arbitrary layouted types it > should be possible to have testcases for all the corner-cases - after all > you cannot have both, undefined overflow and wrapping overflow, at the same > time. Don't forget that we pretend that sizetypes don't overflow; in other words, we don't support arbitrarily-sized types, so no INT_MAX or something like that. > Ok, I'll revert it on monday. Thanks. I'll give the complete patch a try on our internal testsuite. -- Eric Botcazou
Re: rfa: remove get_var_ann (was: Fix PR50260)
Hi, On Sat, 3 Sep 2011, Richard Guenther wrote: > >> As I feared the call to get_var_ann in set_is_used right now really > >> is still needed, privatizing it hence isn't that straight forward. > > > > After pondering I concluded that it's not necessary to call > > set_is_used for variables without var annotation. Those won't be in > > referenced_vars (or local_decls) and therefore also won't be removed > > from those lists no matter how hard we try. > > > > Regstrapped on x86_64-linux (without Ada). Okay for trunk? > > No. We call mark_all_vars_used on trees contained in GIMPLE non-debug > statements. All vars we can reach that way _have_ to be in the list of > referenced vars. Sometimes global variables (non-auto vars with context != cfun) can be missing from referenced_vars. In the case where we hit bugs with this it's the non-local variables generated for profile counters. I can add the respective add_referenced_var calls for that, but I'm not sure I see the point. That of course comes back to our old discussion for what purposes referenced_vars actually is used. I looked at all users and I think that the global counters missing from referenced_vars is harmless. OTOH it's a nice invariant that can actually be checked for (that all reachable vars whatsoever have to be in referenced_vars), so I'm going to do that. > That they are not is the bug. So - can you investigate > which var doesn't have an annotation an why it isn't in referenced vars? Ciao, Michael.
Re: [RFC] Split -mrecip
Hi, On Sat, 3 Sep 2011, Uros Bizjak wrote: > > I've decided to not use four new bits from target_flags, and instead > > created a new mask (recip_mask). Four bits would have fit in target > > bits right now, but in the future we might want to add more > > specialization, like modes for which the reciprocals are active. > > > > What do you think? > > These new flags looks like a nice addition, but I wonder, why we need > separate options to handle vector recip. A vector rsqrt or rdiv is > generated automatically in the same way as scalar rsqrt or rdiv is > generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough. No, the difference does matter. Using reciprocal estimates for scalar divs often results in errors in benchmarks because those sometimes are used to feed integer conversions for either index calculations or printouts. The small rounding errors with the reciprocals lead to incorrect outputs then. Context where the div can be vectorized often don't have this problem (they're then used purely for calculations over arrays of float data). For instance spec2006 and polyhedron break with -mrecip purely because of the scalar reciprocals, but work with only vectorized ones. I.e. users really want to differ between both. Also, when this patch goes in I plan to submit another one that activates vectorized rcp/rsqrt under -ffast-math already (that's what ICC happens to do too). > For the future - could rs6000 and x86 use the same compile options to > handle reciprocals? I'd guess so. rs6000 uses a hand-written comma-splitter, which we could reuse. Ciao, Michael.
[committed] Define "return" pattern on pa
This change defines a "return" insn pattern on pa to improve return optimization when no epilogue is needed. Tested on hppa2.0w-hp-hpux11.11 and hppa-unknown-linux-gnu. Committed to trunk. Defining a "return" pattern fixes PR middle-end/50232 on PA. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2011-09-03 John David Anglin PR middle-end/50232 * config/pa/pa.md (return): Define "return" insn pattern. (epilogue): Use it when no epilogue is needed. * config/pa/pa.c (pa_can_use_return_insn): New function. * config/pa/pa-protos.h (pa_can_use_return_insn): Declare. Index: config/pa/pa.md === --- config/pa/pa.md (revision 178315) +++ config/pa/pa.md (working copy) @@ -6671,6 +6671,20 @@ ;; Unconditional and other jump instructions. +;; Trivial return used when no epilogue is needed. +(define_insn "return" + [(return) + (use (reg:SI 2))] + "pa_can_use_return_insn ()" + "* +{ + if (TARGET_PA_20) +return \"bve%* (%%r2)\"; + return \"bv%* %%r0(%%r2)\"; +}" + [(set_attr "type" "branch") + (set_attr "length" "4")]) + ;; This is used for most returns. (define_insn "return_internal" [(return) @@ -6719,11 +6733,8 @@ rtx x; /* Try to use the trivial return first. Else use the full epilogue. */ - if (reload_completed - && !frame_pointer_needed - && !df_regs_ever_live_p (2) - && (compute_frame_size (get_frame_size (), 0) ? 0 : 1)) -x = gen_return_internal (); + if (pa_can_use_return_insn ()) +x = gen_return (); else { hppa_expand_epilogue (); Index: config/pa/pa-protos.h === --- config/pa/pa-protos.h (revision 178315) +++ config/pa/pa-protos.h (working copy) @@ -93,6 +93,7 @@ extern int cint_ok_for_move (HOST_WIDE_INT); extern void hppa_expand_prologue (void); extern void hppa_expand_epilogue (void); +extern bool pa_can_use_return_insn (void); extern int ior_mask_p (unsigned HOST_WIDE_INT); extern void compute_zdepdi_operands (unsigned HOST_WIDE_INT, unsigned *); Index: config/pa/pa.c === --- config/pa/pa.c (revision 178315) +++ config/pa/pa.c (working copy) @@ -4329,6 +4329,24 @@ } } +bool +pa_can_use_return_insn (void) +{ + if (!reload_completed) +return false; + + if (frame_pointer_needed) +return false; + + if (df_regs_ever_live_p (2)) +return false; + + if (crtl->profile) +return false; + + return compute_frame_size (get_frame_size (), 0) == 0; +} + rtx hppa_pic_save_rtx (void) {
Re: Propagate BB predicates in ipa-inline-analysis
> Jan Hubicka wrote: > > > (edge_execution_predicate): Rewrite as... > > (set_cond_stmt_execution_predicate): ... this function; handle > > __builtin_constant_p. > > This causes ICEs when building recent Linux kernels with the > CONFIG_TRACE_BRANCH_PROFLING option. This reduced test case: > > static inline __attribute__((always_inline)) int f (unsigned int n, unsigned > int size) > { > return (__builtin_constant_p (size != 0 && n > ~0 / size) > ? !!(size != 0 && n > ~0 / size) > : ({ static unsigned int count[2] = { 0, 0 }; > int r = !!(size != 0 && n > ~0 / size); > count[r]++; > r; })); > } > > int g (unsigned int size) > { > return f (size / 4096, 4); > } > > built with -O2 (on i386) on current mainline results in: > > /home/uweigand/test.i:15:1: internal compiler error: tree check: expected > ssa_name, have integer_cst in set_cond_stmt_execution_predicate, at > ipa-inline-analysis.c:1190 the code dies on: D.2739_10 = __builtin_constant_p (0); it assumes that argument of __builtin_constant_p is always SSA_NAME. I am testing the obvious fix for that. However it is missed optimization to leave such unfolded statement. It happens at fwprop converting: : iftmp.1_9 = 0; : # iftmp.1_2 = PHI <1(4), 0(5)> D.2739_10 = __builtin_constant_p (iftmp.1_2); Honza > > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com
PATCH: PR bootstrap/50237: [4.7 regression] bootstrap comparison failure for libcpp/lex.o
Hi, .init_array is enabled only if linker can take input .ctors sections with input .init_array sections to generate the single output .init_array section in the right order so that GCC is compatible with existing .o files. Not all linkers support it even if they support .init_array section. Since I couldn't find an uniform link-time test for this feature which works for all ELF targets, run-time test is used. However, when --with-ld= is used to configure GCC, the wrong linker is used for testing. This patch checks DEFAULT_LINKER and works for me. OK for trunk? Thanks. H.J. --- 2011-09-03 H.J. Lu PR bootstrap/50237 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check DEFAULT_LINKER. * configure: Regenerated. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index d8defea..9cf0e9a 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -376,6 +376,15 @@ AC_DEFUN([gcc_AC_INITFINI_ARRAY], AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array support, gcc_cv_initfini_array, [dnl if test "x${build}" = "x${target}" && test "x${build}" = "x${host}"; then +saved_CC="$CC" +if test x"${DEFAULT_LINKER+set}" = x"set"; then + linker_dir="`dirname ${DEFAULT_LINKER}`" + if test x"${linker_dir+set}" = x"set"; then + if test -d ${linker_dir}; then + CC="$CC -B${linker_dir}/" + fi + fi +fi AC_RUN_IFELSE([AC_LANG_SOURCE([ #ifndef __ELF__ #error Not an ELF OS @@ -485,10 +494,11 @@ main () ])], [gcc_cv_initfini_array=yes], [gcc_cv_initfini_array=no], [gcc_cv_initfini_array=no]) - else - AC_MSG_CHECKING(cross compile... guessing) - gcc_cv_initfini_array=no - fi]) +CC="$saved_CC" + else +AC_MSG_CHECKING(cross compile... guessing) +gcc_cv_initfini_array=no + fi]) enable_initfini_array=$gcc_cv_initfini_array ]) if test $enable_initfini_array = yes; then diff --git a/gcc/configure b/gcc/configure index b1dd57b..d57a7f7 100755 --- a/gcc/configure +++ b/gcc/configure @@ -10782,6 +10782,15 @@ if test "${gcc_cv_initfini_array+set}" = set; then : $as_echo_n "(cached) " >&6 else if test "x${build}" = "x${target}" && test "x${build}" = "x${host}"; then +saved_CC="$CC" +if test x"${DEFAULT_LINKER+set}" = x"set"; then + linker_dir="`dirname ${DEFAULT_LINKER}`" + if test x"${linker_dir+set}" = x"set"; then + if test -d ${linker_dir}; then + CC="$CC -B${linker_dir}/" + fi + fi +fi if test "$cross_compiling" = yes; then : gcc_cv_initfini_array=no else @@ -10904,11 +10913,12 @@ rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ conftest.$ac_objext conftest.beam conftest.$ac_ext fi - else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking cross compile... guessing" >&5 +CC="$saved_CC" + else +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking cross compile... guessing" >&5 $as_echo_n "checking cross compile... guessing... " >&6; } - gcc_cv_initfini_array=no - fi +gcc_cv_initfini_array=no + fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_initfini_array" >&5 $as_echo "$gcc_cv_initfini_array" >&6; } @@ -17915,7 +17925,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 17918 "configure" +#line 17928 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18021,7 +18031,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18024 "configure" +#line 18034 "configure" #include "confdefs.h" #if HAVE_DLFCN_H
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
On Sat, Sep 3, 2011 at 5:08 PM, Eric Botcazou wrote: >> Well, for real-world code I believe that. But see all the recent testcases >> for corner-cases of our signed-overflow stuff, they all require >> hand-crafted testcases involving INT_MIN, no inlining and even -ftrapv. >> What I meant to say is, given Ada can construct arbitrary layouted types it >> should be possible to have testcases for all the corner-cases - after all >> you cannot have both, undefined overflow and wrapping overflow, at the same >> time. > > Don't forget that we pretend that sizetypes don't overflow; in other words, we > don't support arbitrarily-sized types, so no INT_MAX or something like that. I know what we "pretend", but "pretending" is far from a rigorous specification of behavior. What's the range of valid sizes we support? Are all sizetype (sub-)expressions always of value in that range? What do we do about the fact that sizetype is unsigned, so -x always overflows for x != 0? Thus, do we need to disable all a - b -> a + -b kind of foldings for sizetypes? (we don't) What I see we pretend is that "sizetype" is supposed to be of infinite precision (well, infinite "enugh" to handle all (sub-)expressions of sizetype that may occur). An unsigned type isn't well-suited for that, of course. A type that is of the same precision as pointers possibly neither, considering sub-expressions. Given the restriction we impose in the C fronted (objects can be max convering half of the address-space) making all sizetypes signed would probably make sense (but that isn't easy, I've tried that already - keeping them unsigned but no longer sign-extending was way easier ;)) >> Ok, I'll revert it on monday. > > Thanks. I'll give the complete patch a try on our internal testsuite. Thanks. I'll expect some fallout. Richard. > -- > Eric Botcazou >
Re: rfa: remove get_var_ann (was: Fix PR50260)
On Sat, Sep 3, 2011 at 5:31 PM, Michael Matz wrote: > Hi, > > On Sat, 3 Sep 2011, Richard Guenther wrote: > >> >> As I feared the call to get_var_ann in set_is_used right now really >> >> is still needed, privatizing it hence isn't that straight forward. >> > >> > After pondering I concluded that it's not necessary to call >> > set_is_used for variables without var annotation. Those won't be in >> > referenced_vars (or local_decls) and therefore also won't be removed >> > from those lists no matter how hard we try. >> > >> > Regstrapped on x86_64-linux (without Ada). Okay for trunk? >> >> No. We call mark_all_vars_used on trees contained in GIMPLE non-debug >> statements. All vars we can reach that way _have_ to be in the list of >> referenced vars. > > Sometimes global variables (non-auto vars with context != cfun) can be > missing from referenced_vars. In the case where we hit bugs with this > it's the non-local variables generated for profile counters. I can add > the respective add_referenced_var calls for that, but I'm not sure I see > the point. That of course comes back to our old discussion for what > purposes referenced_vars actually is used. I looked at all users and I > think that the global counters missing from referenced_vars is harmless. > > OTOH it's a nice invariant that can actually be checked for (that all > reachable vars whatsoever have to be in referenced_vars), so I'm going to > do that. Yes, until we get rid of referenced_vars (which we still should do at some point...) that's the best. IIRC we have some verification code even, and wonder why it doesn't trigger. Richard. >> That they are not is the bug. So - can you investigate >> which var doesn't have an annotation an why it isn't in referenced vars? > > > Ciao, > Michael.
Re: Propagate BB predicates in ipa-inline-analysis
On Sat, Sep 3, 2011 at 6:41 PM, Jan Hubicka wrote: >> Jan Hubicka wrote: >> >> > (edge_execution_predicate): Rewrite as... >> > (set_cond_stmt_execution_predicate): ... this function; handle >> > __builtin_constant_p. >> >> This causes ICEs when building recent Linux kernels with the >> CONFIG_TRACE_BRANCH_PROFLING option. This reduced test case: >> >> static inline __attribute__((always_inline)) int f (unsigned int n, unsigned >> int size) >> { >> return (__builtin_constant_p (size != 0 && n > ~0 / size) >> ? !!(size != 0 && n > ~0 / size) >> : ({ static unsigned int count[2] = { 0, 0 }; >> int r = !!(size != 0 && n > ~0 / size); >> count[r]++; >> r; })); >> } >> >> int g (unsigned int size) >> { >> return f (size / 4096, 4); >> } >> >> built with -O2 (on i386) on current mainline results in: >> >> /home/uweigand/test.i:15:1: internal compiler error: tree check: expected >> ssa_name, have integer_cst in set_cond_stmt_execution_predicate, at >> ipa-inline-analysis.c:1190 > > the code dies on: > > D.2739_10 = __builtin_constant_p (0); > > it assumes that argument of __builtin_constant_p is always SSA_NAME. I am > testing the obvious fix for that. > However it is missed optimization to leave such unfolded statement. It > happens at fwprop converting: tree-ssa-forwprop.c I presume? That one misses calling fold_stmt when it propagates things. > : > iftmp.1_9 = 0; > > : > # iftmp.1_2 = PHI <1(4), 0(5)> > D.2739_10 = __builtin_constant_p (iftmp.1_2); But I wonder how it manages to do something to the above ... (are you sure it's forwprop?) Richard. > Honza >> >> Bye, >> Ulrich >> >> -- >> Dr. Ulrich Weigand >> GNU Toolchain for Linux on System z and Cell BE >> ulrich.weig...@de.ibm.com >
Re: Propagate BB predicates in ipa-inline-analysis
> tree-ssa-forwprop.c I presume? That one misses calling fold_stmt when > it propagates things. Yep. > > > : > > iftmp.1_9 = 0; > > > > : > > # iftmp.1_2 = PHI <1(4), 0(5)> > > D.2739_10 = __builtin_constant_p (iftmp.1_2); > > But I wonder how it manages to do something to the above ... (are you > sure it's forwprop?) It is fwprop, but I cut&pasted wrong function (it is the offline copy instead of inline). we transfrom: g (unsigned int size) { int iftmp.1; int D.2765; unsigned int D.2764; unsigned int D.2763; static unsigned int count[2] = {0, 0}; int r; static unsigned int count[2] = {0, 0}; unsigned int D.2729; : D.2729_2 = size_1(D) / 4096; if (D.2729_2 > 1073741823) goto ; else goto ; : : # iftmp.1_7 = PHI <1(2), 0(3)> D.2765_8 = __builtin_constant_p (iftmp.1_7); into: g (unsigned int size) { int D.2765; unsigned int D.2764; unsigned int D.2763; static unsigned int count[2] = {0, 0}; int r; static unsigned int count[2] = {0, 0}; : D.2765_8 = __builtin_constant_p (0); I guess forwprop should fold then... The extra call and control flow survives all the way down to ccp2 (that is after IPA) that eventually folds it away. Honza
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
Let me jump in on this a little bit, since much of the code in this area was originally written by me. > Are all sizetype (sub-)expressions always of value in that range? > What do we do about the fact that sizetype is unsigned, so -x always > overflows for x != 0? Thus, do we need to disable all a - b -> a + > -b kind of foldings for sizetypes? (we don't) The basic idea is that an overflow of sizetype is either: (1) Detected at a higher level (e.g., testing for maximum sizes of objects) or; (2) Is an undetected error and hence the result of such overflow is undefined. What this means from a practical point of view (but indeed a bit hard to define from a formal point of view) is that the normal addition and subtraction operations (on a 2's complement machines, which all are now) will "do the right thing" in all cases so we can perform all those sorts of folding operations.
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
On Sat, Sep 3, 2011 at 9:47 PM, Richard Kenner wrote: > Let me jump in on this a little bit, since much of the code in this area > was originally written by me. > >> Are all sizetype (sub-)expressions always of value in that range? >> What do we do about the fact that sizetype is unsigned, so -x always >> overflows for x != 0? Thus, do we need to disable all a - b -> a + >> -b kind of foldings for sizetypes? (we don't) > > The basic idea is that an overflow of sizetype is either: > > (1) Detected at a higher level (e.g., testing for maximum sizes of objects) > or; > (2) Is an undetected error and hence the result of such overflow is undefined. > > What this means from a practical point of view (but indeed a bit hard > to define from a formal point of view) is that the normal addition and > subtraction operations (on a 2's complement machines, which all are > now) will "do the right thing" in all cases so we can perform all > those sorts of folding operations. So what's your opinion on the "bug" that triggered the patch in question?\ Namely extract_muldiv_1 folding (((10240 - (sizetype) first) + 1) * 8) /[cl] 8 to ((sizetype) first * 0x0fff8 + 81928) /[cl] 8 to ((sizetype) first * 2305843009213693951 + 10241) thus, folding A - B to -B + A, which is valid for unsigned types only if overflow wraps. But the 2nd folding is only valid if overflow is undefined. Both foldings happen in extract_muldiv_1, the latter is especially enabled for TYPE_IS_SIZETYPE. The reasoning why both transforms are valid is bogus IMHO. Can you give a formal definition of sizetype behavior that can be used to prove that both transforms are valid? Richard.
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
On Sat, Sep 3, 2011 at 10:37 PM, Richard Guenther wrote: > On Sat, Sep 3, 2011 at 9:47 PM, Richard Kenner > wrote: >> Let me jump in on this a little bit, since much of the code in this area >> was originally written by me. >> >>> Are all sizetype (sub-)expressions always of value in that range? >>> What do we do about the fact that sizetype is unsigned, so -x always >>> overflows for x != 0? Thus, do we need to disable all a - b -> a + >>> -b kind of foldings for sizetypes? (we don't) >> >> The basic idea is that an overflow of sizetype is either: >> >> (1) Detected at a higher level (e.g., testing for maximum sizes of objects) >> or; >> (2) Is an undetected error and hence the result of such overflow is >> undefined. >> >> What this means from a practical point of view (but indeed a bit hard >> to define from a formal point of view) is that the normal addition and >> subtraction operations (on a 2's complement machines, which all are >> now) will "do the right thing" in all cases so we can perform all >> those sorts of folding operations. > > So what's your opinion on the "bug" that triggered the patch in question?\ > Namely extract_muldiv_1 folding > > (((10240 - (sizetype) first) + 1) * 8) /[cl] 8 > > to > > ((sizetype) first * 0x0fff8 + 81928) /[cl] 8 > > to > > ((sizetype) first * 2305843009213693951 + 10241) > > thus, folding A - B to -B + A, which is valid for unsigned types only > if overflow wraps. But the 2nd folding is only valid if overflow is > undefined. > Both foldings happen in extract_muldiv_1, the latter is especially > enabled for TYPE_IS_SIZETYPE. > > The reasoning why both transforms are valid is bogus IMHO. > Can you give a formal definition of sizetype behavior that can be > used to prove that both transforms are valid? And as you are here now, can you shed some light on the weird decision to make sizetype TYPE_UNSIGNED but sign-extended at the same time? ;) Richard. > Richard. >
demangle C++ extern "C" functions
Hello, this patch is obviously related to PR c++/2316 ("g++ fails to overload on language linkage") but seems fairly independent. Currently, the demangler recognizes 'Y' for extern "C" functions and ignores it. The patch makes it print extern "C" after the function type: _Z1aIFYviEEvPT_ void a(void (*)(int) extern "C") Writing it before the return type seems more natural, but it is ambiguous. I guess it could also be printed in the middle (next to the star that indicates a function pointer), but placing it like the cv-qualifiers of member functions seemed good (plus, that's where Oracle puts it in its implementation of c++filt). Since g++ doesn't generate such mangling, the effect should be hard to notice ;-) (Even if the patch was ok, I am not a committer) 2011-09-03 Marc Glisse * include/demangle.h (demangle_component_type) [DEMANGLE_COMPONENT_EXTERN_C]: Handle extern "C". * libiberty/cp-demangle.c (d_dump): Likewise. (d_make_comp): Likewise. (d_function_type): Likewise. (d_print_comp): Likewise. (d_print_mod_list): Likewise. (d_print_mod): Likewise. (d_print_function_type): Likewise. * libiberty/testsuite/demangle-expected: Test it. -- Marc GlisseIndex: include/demangle.h === --- include/demangle.h (revision 178498) +++ include/demangle.h (working copy) @@ -288,6 +288,9 @@ /* The const qualifier. The one subtree is the type which is being qualified. */ DEMANGLE_COMPONENT_CONST, + /* extern "C" linkage. The one subtree is the function type which + is being qualified. */ + DEMANGLE_COMPONENT_EXTERN_C, /* The restrict qualifier modifying a member function. The one subtree is the type which is being qualified. */ DEMANGLE_COMPONENT_RESTRICT_THIS, Index: libiberty/testsuite/demangle-expected === --- libiberty/testsuite/demangle-expected (revision 178498) +++ libiberty/testsuite/demangle-expected (working copy) @@ -4151,3 +4151,8 @@ --format=auto _ZN3Psi7VariantIIcPKcEE5visitIIRZN11VariantTest9TestVisit11test_methodEvEUlS2_E0_RZNS6_11test_methodEvEUlcE1_RZNS6_11test_methodEvEUlNS_4NoneEE_EEENS_13VariantDetail19SelectVisitorResultIIDpT_EE4typeEDpOSG_ Psi::VariantDetail::SelectVisitorResult::type Psi::Variant::visit((VariantTest::TestVisit::test_method()::{lambda(Psi::None)#1}&)...) +# +# extern "C" linkage for function types. +--format=gnu-v3 +_Z1aIFYviEEvPT_ +void a(void (*)(int) extern "C") Index: libiberty/cp-demangle.c === --- libiberty/cp-demangle.c (revision 178498) +++ libiberty/cp-demangle.c (working copy) @@ -591,6 +591,9 @@ case DEMANGLE_COMPONENT_CONST: printf ("const\n"); break; +case DEMANGLE_COMPONENT_EXTERN_C: + printf ("extern \"C\"\n"); + break; case DEMANGLE_COMPONENT_RESTRICT_THIS: printf ("restrict this\n"); break; @@ -807,6 +810,7 @@ break; /* These types only require one parameter. */ +case DEMANGLE_COMPONENT_EXTERN_C: case DEMANGLE_COMPONENT_VTABLE: case DEMANGLE_COMPONENT_VTT: case DEMANGLE_COMPONENT_TYPEINFO: @@ -2324,18 +2328,22 @@ d_function_type (struct d_info *di) { struct demangle_component *ret; + int is_extern_c = 0; if (! d_check_char (di, 'F')) return NULL; if (d_peek_char (di) == 'Y') { - /* Function has C linkage. We don't print this information. -FIXME: We should print it in verbose mode. */ + /* Function has C linkage. */ + is_extern_c = 1; d_advance (di, 1); + di->expansion += sizeof "extern \"C\""; } ret = d_bare_function_type (di, 1); if (! d_check_char (di, 'E')) return NULL; + if (is_extern_c) +ret = d_make_comp (di, DEMANGLE_COMPONENT_EXTERN_C, ret, NULL); return ret; } @@ -3925,6 +3933,7 @@ case DEMANGLE_COMPONENT_RESTRICT_THIS: case DEMANGLE_COMPONENT_VOLATILE_THIS: case DEMANGLE_COMPONENT_CONST_THIS: +case DEMANGLE_COMPONENT_EXTERN_C: case DEMANGLE_COMPONENT_VENDOR_TYPE_QUAL: case DEMANGLE_COMPONENT_POINTER: case DEMANGLE_COMPONENT_COMPLEX: @@ -4537,7 +4546,8 @@ || (! suffix && (mods->mod->type == DEMANGLE_COMPONENT_RESTRICT_THIS || mods->mod->type == DEMANGLE_COMPONENT_VOLATILE_THIS - || mods->mod->type == DEMANGLE_COMPONENT_CONST_THIS))) + || mods->mod->type == DEMANGLE_COMPONENT_CONST_THIS + || mods->mod->type == DEMANGLE_COMPONENT_EXTERN_C))) { d_print_mod_list (dpi, options, mods->next, suffix); return; @@ -4628,6 +4638,9 @@ case DEMANGLE_COMPONENT_CONST_THIS: d_append_string (dpi, " const"); return; +case DEMANGLE_COMPONENT_EXTERN_C: + d_append_string (dpi, " extern \"C\""); + return
Re: [RFC] Split -mrecip
On Sat, Sep 3, 2011 at 5:42 PM, Michael Matz wrote: >> > I've decided to not use four new bits from target_flags, and instead >> > created a new mask (recip_mask). Four bits would have fit in target >> > bits right now, but in the future we might want to add more >> > specialization, like modes for which the reciprocals are active. >> > >> > What do you think? >> >> These new flags looks like a nice addition, but I wonder, why we need >> separate options to handle vector recip. A vector rsqrt or rdiv is >> generated automatically in the same way as scalar rsqrt or rdiv is >> generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough. > > No, the difference does matter. Using reciprocal estimates for scalar > divs often results in errors in benchmarks because those sometimes are > used to feed integer conversions for either index calculations or > printouts. The small rounding errors with the reciprocals lead to > incorrect outputs then. Context where the div can be vectorized often > don't have this problem (they're then used purely for calculations over > arrays of float data). For instance spec2006 and polyhedron break with > -mrecip purely because of the scalar reciprocals, but work with only > vectorized ones. I.e. users really want to differ between both. I agree with your analysis. > Also, when this patch goes in I plan to submit another one that activates > vectorized rcp/rsqrt under -ffast-math already (that's what ICC happens to > do too). Great! In the past, we tried to use -mrecip with -ffast-math. IIRC, polyhedron broke on scalar rdiv and spec2006 broke on rsqrt. Taking into account your analysis above, using separate options and activating vectorized ones for -ffast-math makes much sense. >> For the future - could rs6000 and x86 use the same compile options to >> handle reciprocals? > > I'd guess so. rs6000 uses a hand-written comma-splitter, which we could > reuse. Perhaps rs6000 could adopt our approach in addition to its comma-splitter? OTOH, whatever is more convenient, I don't care that much. I have CC'd rs6000 maintainer for his opinion. Uros.
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
> So what's your opinion on the "bug" that triggered the patch in question? > Namely extract_muldiv_1 folding > > (((10240 - (sizetype) first) + 1) * 8) /[cl] 8 > > to > > ((sizetype) first * 0x0fff8 + 81928) /[cl] 8 > > to > > ((sizetype) first * 2305843009213693951 + 10241) > > thus, folding A - B to -B + A, which is valid for unsigned types only > if overflow wraps. I think the tricky part is the optimization of (-X) * 8 to (-8 * X), especially if you then try to compute the -8 as unsigned. I don't think that's valid no matter what overflow does, but I still have a hard time reasoning about these things. > Can you give a formal definition of sizetype behavior that can be > used to prove that both transforms are valid? No, that's what I said before: I don't now have and never did have a formal definition. Which was not good.
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
> And as you are here now, can you shed some light on the weird > decision to make sizetype TYPE_UNSIGNED but sign-extended > at the same time? ;) Basically, to make division work properly given that the type is unsigned.
Re: [PATCH] Remove bogus TYPE_IS_SIZETYPE special-casing in extract_muldiv_1
> So what's your opinion on the "bug" that triggered the patch in question?\ > Namely extract_muldiv_1 folding > > (((10240 - (sizetype) first) + 1) * 8) /[cl] 8 > > to > > ((sizetype) first * 0x0fff8 + 81928) /[cl] 8 > > to > > ((sizetype) first * 2305843009213693951 + 10241) I think this optimization "works" if you sign-extend the constant when you do division by 8.
Re: [RFC] Split -mrecip
On Sat, Sep 3, 2011 at 11:11 PM, Uros Bizjak wrote: > On Sat, Sep 3, 2011 at 5:42 PM, Michael Matz wrote: > >>> > I've decided to not use four new bits from target_flags, and instead >>> > created a new mask (recip_mask). Four bits would have fit in target >>> > bits right now, but in the future we might want to add more >>> > specialization, like modes for which the reciprocals are active. >>> > >>> > What do you think? >>> >>> These new flags looks like a nice addition, but I wonder, why we need >>> separate options to handle vector recip. A vector rsqrt or rdiv is >>> generated automatically in the same way as scalar rsqrt or rdiv is >>> generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough. >> >> No, the difference does matter. Using reciprocal estimates for scalar >> divs often results in errors in benchmarks because those sometimes are >> used to feed integer conversions for either index calculations or >> printouts. The small rounding errors with the reciprocals lead to >> incorrect outputs then. Context where the div can be vectorized often >> don't have this problem (they're then used purely for calculations over >> arrays of float data). For instance spec2006 and polyhedron break with >> -mrecip purely because of the scalar reciprocals, but work with only >> vectorized ones. I.e. users really want to differ between both. > > I agree with your analysis. > >> Also, when this patch goes in I plan to submit another one that activates >> vectorized rcp/rsqrt under -ffast-math already (that's what ICC happens to >> do too). > > Great! In the past, we tried to use -mrecip with -ffast-math. IIRC, > polyhedron broke on scalar rdiv and spec2006 broke on rsqrt. Taking > into account your analysis above, using separate options and > activating vectorized ones for -ffast-math makes much sense. > >>> For the future - could rs6000 and x86 use the same compile options to >>> handle reciprocals? >> >> I'd guess so. rs6000 uses a hand-written comma-splitter, which we could >> reuse. > > Perhaps rs6000 could adopt our approach in addition to its > comma-splitter? OTOH, whatever is more convenient, I don't care that > much. I have CC'd rs6000 maintainer for his opinion. Some relevant PRs: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47989 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32352 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702 Uros.
[PATCH] Better comparison of BINFOs in IPA-CP
Hi, the patch below improves the comparisons of BINFOs in IPA-CP. The problem is that we can read different BINFOs for the same type (or a base type component) from the LTO summaries because BINFOs coming from different compilation units are not unified. Because we now have the BINFO_VTABLE available, we can compare those instead since the VMT variables are unified. Bootstrapped and tested on x86_64-linux, also tested by LTO-building Firefox and 483.xalancbmk on the same platform (I lost the actual numbers but the new test returned true hundreds of times in both these cases). OK for trunk? Thanks, Martin 2011-09-02 Martin Jambor * ipa-cp.c (values_equal_for_ipcp_p): When comparing BINFOs, compare their BINFO_VTABLE, Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -800,6 +800,33 @@ values_equal_for_ipcp_p (tree x, tree y) if (x == y) return true; + if (TREE_CODE (x) == TREE_BINFO && TREE_CODE (y) == TREE_BINFO) +{ + unsigned HOST_WIDE_INT ox, oy; + tree vx = BINFO_VTABLE (x); + tree vy = BINFO_VTABLE (y); + + if (!vx || !vy + || TREE_CODE (vx) != POINTER_PLUS_EXPR + || TREE_CODE (vy) != POINTER_PLUS_EXPR) + return false; + + ox = tree_low_cst (TREE_OPERAND (vx, 1), 1); + oy = tree_low_cst (TREE_OPERAND (vx, 1), 1); + + if (ox != oy) + return false; + + vx = TREE_OPERAND (vx, 0); + vy = TREE_OPERAND (vy, 0); + if (TREE_CODE (vx) != ADDR_EXPR + || TREE_CODE (vy) != ADDR_EXPR) + return false; + + if (TREE_OPERAND (vx, 0) == TREE_OPERAND (vy, 0)) + return true; +} + if (TREE_CODE (x) == TREE_BINFO || TREE_CODE (y) == TREE_BINFO) return false;
[PATCH] Move versionable flag from inline summary to cgraph_node.local
Hi, we have agreed on this list recently that the versionable flag should be moved from the inline summary to the local part of struct cgraph_node because it has nothing to do with inlining and is computed by ipa-prop and used only by ipa-cp. This patch does exactly that. Bootstrapped and tested on x86_64-linux. I have LTO-built Firefox with it too. OK for trunk? Thanks, Martin 2011-09-03 Martin Jambor * ipa-inline.h (struct inline_summary): Move versionable flag... * cgraph.h (struct cgraph_local_info): ...here * ipa-cp.c (determine_versionability): Use the new versionable flag. (determine_versionability): Likewise. (ipcp_versionable_function_p): Likewise. (ipcp_generate_summary): Likewise. * ipa-inline-analysis.c (dump_inline_summary): Do not dump the versionable flag. (compute_inline_parameters): Do not clear the versionable flag. (inline_read_section): Do not stream the versionable flag. (inline_write_summary): Likewise. * lto-cgraph.c (lto_output_node): Stream the versionable flag. (input_overwrite_node): Likewise. Index: src/gcc/cgraph.h === --- src.orig/gcc/cgraph.h +++ src/gcc/cgraph.h @@ -84,10 +84,13 @@ struct GTY(()) cgraph_local_info { /* Set when function is visible by other units. */ unsigned externally_visible : 1; - + /* Set once it has been finalized so we consider it to be output. */ unsigned finalized : 1; + /* False when there is something makes versioning impossible. */ + unsigned versionable : 1; + /* False when function calling convention and signature can not be changed. This is the case when __builtin_apply_args is used. */ unsigned can_change_signature : 1; Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -367,7 +367,7 @@ determine_versionability (struct cgraph_ present. */ if (node->alias || node->thunk.thunk_p) reason = "alias or thunk"; - else if (!inline_summary (node)->versionable) + else if (!node->local.versionable) reason = "not a tree_versionable_function"; else if (cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE) reason = "insufficient body availability"; @@ -376,7 +376,7 @@ determine_versionability (struct cgraph_ fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n", cgraph_node_name (node), node->uid, reason); - inline_summary (node)->versionable = (reason == NULL); + node->local.versionable = (reason == NULL); } /* Return true if it is at all technically possible to create clones of a @@ -385,7 +385,7 @@ determine_versionability (struct cgraph_ static bool ipcp_versionable_function_p (struct cgraph_node *node) { - return inline_summary (node)->versionable; + return node->local.versionable; } /* Structure holding accumulated information about callers of a node. */ @@ -2476,14 +2476,11 @@ ipcp_generate_summary (void) fprintf (dump_file, "\nIPA constant propagation start:\n"); ipa_register_cgraph_hooks (); - /* FIXME: We could propagate through thunks happily and we could be - even able to clone them, if needed. Do that later. */ FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node) { /* Unreachable nodes should have been eliminated before ipcp. */ gcc_assert (node->needed || node->reachable); - - inline_summary (node)->versionable = tree_versionable_function_p (node->decl); + node->local.versionable = tree_versionable_function_p (node->decl); ipa_analyze_node (node); } } Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -986,8 +986,6 @@ dump_inline_summary (FILE * f, struct cg fprintf (f, " always_inline"); if (s->inlinable) fprintf (f, " inlinable"); - if (s->versionable) - fprintf (f, " versionable"); fprintf (f, "\n self time: %i\n", s->self_time); fprintf (f, " global time: %i\n", s->time); @@ -1642,7 +1640,7 @@ compute_inline_parameters (struct cgraph struct inline_edge_summary *es = inline_edge_summary (node->callees); struct predicate t = true_predicate (); - info->inlinable = info->versionable = 0; + info->inlinable = 0; node->callees->call_stmt_cannot_inline_p = true; node->local.can_change_signature = false; es->call_stmt_time = 1; @@ -2408,7 +2406,6 @@ inline_read_section (struct lto_file_dec bp = streamer_read_bitpack (&ib); info->inlinable = bp_unpack_value (&bp, 1); - info->versionable = bp_unpack_value (&bp, 1); count2 = streamer_read_uhwi (&ib); gcc_assert (!info->conds); @@ -2539,7 +2536,6 @@ inline_write_