Re: [ARM] fix for PR49423
On Fri, Oct 12, 2012 at 11:13 PM, Ramana Radhakrishnan wrote: > On Tue, Sep 25, 2012 at 5:32 PM, Dinar Temirbulatov > wrote: >> Hi Ramana, >> Here is obvious fix for PR49423, I just added pool range for > > Sorry for the late response - I've been on vacation. > > No it's not ok. These were removed deliberately and subsequent > efforts to put these back on have been rejected as being incomplete. > Please investigate why reload wants to reload the load of a constant 0 > which is a valid constant for HImode values into the constant pool . > In addition it might be that movhi_insn needs to grow support for movw > at the right architecture level. > > With your patch and for example 920928-2.c you'll start seeing 0 going > into the constant pool and an ldrh out of that ! In my case (scal-to-vec2.c at -O1 -msoft-float), the register pressure is huge and ira (not reload as far as I can tell) was trying to reduce it slightly by generating the constant (0x3002) which was pulled out of the loop. The register allocator (IRA or reload) got so confused it changed: (insn 357 355 358 7 (set (reg:SI 372 [ D.4448 ]) (zero_extend:SI (subreg:HI (reg:SI 745 [ D.4141+4 ]) 0))) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161 {*arm_zero_extendhisi2_v6} (nil)) Into: (insn 357 1072 1075 7 (set (reg:SI 8 r8) (zero_extend:SI (mem/u/c:HI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S2 A16]))) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161 {*arm_zero_extendhisi2_v6} (nil)) (insn 1075 357 1077 7 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp) (const_int -60 [0xffc4])) [0 %sfp+-24 S4 A32]) (reg:SI 8 r8)) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 181 {*arm_movsi_insn} (nil)) Thanks, Andrew Pinski > > regards > Ramana
Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
I'm having to correct my own comments again, sorry. Richard Sandiford writes: >> + /* If this is post-increment, first copy the location to the reload reg. >> */ >> + if (post && real_in != result) >> +emit_insn (gen_move_insn (result, real_in)); > > Nit, but real_in != result can never be true AIUI, and I was confused how > the code could be correct in that case. Maybe just remove it, or make > it an assert? Probably obvious, but I meant "real_in == result" can never be true. "real_in != result" could be removed or turned into an assert. >> +if (GET_CODE (op) == PLUS) >> + { >> +plus = op; >> +op = XEXP (op, 1); >> + } > > Sorry, I'm complaining about old reload code again, but: does this > actually happen in LRA? In reload, a register operand could become a > PLUS because of elimination, but I thought LRA did things differently. > Besides, this is only needed for: > >> +if (CONST_POOL_OK_P (mode, op) >> +&& ((targetm.preferred_reload_class >> + (op, (enum reg_class) goal_alt[i]) == NO_REGS) >> +|| no_input_reloads_p) >> +&& mode != VOIDmode) >> + { >> +rtx tem = force_const_mem (mode, op); >> + >> +change_p = true; >> +/* If we stripped a SUBREG or a PLUS above add it back. */ >> +if (plus != NULL_RTX) >> + tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem); > > and we shouldn't have (plus (constant ...) ...) after elimination > (or at all outside of a CONST). I don't understand why the code is > needed even in reload. Scratch the thing about needing it for reload. It's obviously the second second operand we're reloading, not the first, and it could well be that an elimination displacement needs to be reloaded via the constant pool. The question for LRA still stands though. Richard
[i386] scalar ops that preserve the high part of a vector
Hello, this patch provides an alternate pattern to let combine recognize scalar operations that preserve the high part of a vector. If the strategy is all right, I could do the same for more operations (mul, div, ...). Something similar is also possible for V4SF (different pattern though), but probably not as useful. bootstrap+testsuite ok. 2012-10-13 Marc Glisse PR target/54855 gcc/ * config/i386/sse.md (*sse2_vmv2df3): New define_insn. gcc/testsuite/ * gcc.target/i386/pr54855.c: New testcase. -- Marc GlisseIndex: config/i386/sse.md === --- config/i386/sse.md (revision 192420) +++ config/i386/sse.md (working copy) @@ -812,20 +812,38 @@ (const_int 1)))] "TARGET_SSE" "@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) +(define_insn "*sse2_vmv2df3" + [(set (match_operand:V2DF 0 "register_operand" "=x,x") + (vec_concat:V2DF + (plusminus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(const_int 0)])) + (match_operand:DF 2 "nonimmediate_operand" "xm,xm")) + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]] + "TARGET_SSE2" + "@ + sd\t{%2, %0|%0, %2} + vsd\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "DF")]) + (define_expand "mul3" [(set (match_operand:VF 0 "register_operand") (mult:VF (match_operand:VF 1 "nonimmediate_operand") (match_operand:VF 2 "nonimmediate_operand")))] "TARGET_SSE" "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") (define_insn "*mul3" [(set (match_operand:VF 0 "register_operand" "=x,x") Index: testsuite/gcc.target/i386/pr54855.c === --- testsuite/gcc.target/i386/pr54855.c (revision 0) +++ testsuite/gcc.target/i386/pr54855.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -msse2" } */ + +typedef double vec __attribute__((vector_size(16))); + +vec f (vec x) +{ + x[0] += 2; + return x; +} + +vec g (vec x) +{ + x[0] -= 1; + return x; +} + +/* { dg-final { scan-assembler-not "mov" } } */ Property changes on: testsuite/gcc.target/i386/pr54855.c ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native
Re: PR54915 (ssa-forwprop, vec_perm_expr)
On Fri, 12 Oct 2012, Marc Glisse wrote: Hello, apparently, in the optimization that recognizes that {v[1],v[0]} is a VEC_PERM_EXPR, I forgot to check that v is a 2-element vector... (not that there aren't things that could be done if v has a different size, just not directly a VEC_PERM_EXPR, and not right now, priority is to fix the bug) Checking that v has the same type as the result seemed like the easiest way, but there are many variations that could be slightly better or worse. bootstrap+testsuite ok. 2012-10-02 Marc Glisse PR tree-optimization/54915 gcc/ * tree-ssa-forwprop.c (simplify_vector_constructor): Check argument's type. gcc/testsuite/ * gcc.dg/tree-ssa/pr54915.c: New testcase. This new version, with a slightly relaxed test, seems preferable and also passes testing. -- Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/pr54915.c === --- testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef double v2df __attribute__ ((__vector_size__ (16))); +typedef double v4df __attribute__ ((__vector_size__ (32))); + +void f (v2df *ret, v4df* xp) +{ + v4df x = *xp; + v2df xx = { x[2], x[3] }; + *ret = xx; +} Property changes on: testsuite/gcc.dg/tree-ssa/pr54915.c ___ Added: svn:eol-style + native Added: svn:keywords + Author Date Id Revision URL Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 192420) +++ tree-ssa-forwprop.c (working copy) @@ -2833,20 +2833,22 @@ simplify_vector_constructor (gimple_stmt ref = TREE_OPERAND (op1, 0); if (orig) { if (ref != orig) return false; } else { if (TREE_CODE (ref) != SSA_NAME) return false; + if (!useless_type_conversion_p (type, TREE_TYPE (ref))) + return false; orig = ref; } if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size) return false; sel[i] = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size; if (sel[i] != i) maybe_ident = false; } if (i < nelts) return false;
Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760
On 2012/10/12 06:55 AM, Oleg Endo wrote: > This broke the recently added thread pointer built-ins on SH, but I was > prepared for that, so no problem here. The attached patch is a straight > forward fix. > > However, with the patch applied I get an ICE on one of the SH thread > pointer tests: gcc/testsuite/gcc.target/sh/pr54760-3.c, function > test04: > > internal compiler error: in expand_insn, at optabs.c:8208 > __builtin_set_thread_pointer (xx[i]); Looks like I was supposed to use create_input_operand() there instead. I've committed the attached patch as obvious. This should be fixed now. Thanks, Chung-Lin * builtins.c (expand_builtin_set_thread_pointer): Use create_input_operand() instead of create_fixed_operand(). Index: builtins.c === --- builtins.c (revision 192421) +++ builtins.c (revision 192422) @@ -5776,7 +5776,7 @@ struct expand_operand op; rtx val = expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, Pmode, EXPAND_NORMAL); - create_fixed_operand (&op, val); + create_input_operand (&op, val, Pmode); expand_insn (icode, 1, &op); return; }
Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760
On Sat, 2012-10-13 at 17:33 +0800, Chung-Lin Tang wrote: > On 2012/10/12 06:55 AM, Oleg Endo wrote: > > This broke the recently added thread pointer built-ins on SH, but I was > > prepared for that, so no problem here. The attached patch is a straight > > forward fix. > > > > However, with the patch applied I get an ICE on one of the SH thread > > pointer tests: gcc/testsuite/gcc.target/sh/pr54760-3.c, function > > test04: > > > > internal compiler error: in expand_insn, at optabs.c:8208 > > __builtin_set_thread_pointer (xx[i]); > > Looks like I was supposed to use create_input_operand() there instead. > I've committed the attached patch as obvious. This should be fixed now. Yep, confirmed. Thanks! Cheers, Oleg
[patch] PR54885
Hello, I hadn't realized that dead registers can still reach EQ-notes. Fixed with the attached patch. Bootstrapped&tested with normal and release checking on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven df_rd_lr_eq.diff Description: Binary data
PR fortran/51727: make module files reproducible, question on C++ in gcc
Hi, first a question also to non-gfortraners: if I want to use std::map, where do I "#include "? In system.h? Now to the patch-specific part: in this PR, module files are produced with random changes because the order in which symbols are written can depend on the memory layout. This patch fixes this by recording which symbols need to be written and then processing them in order. The patch doesn't make the more involved effort of putting all symbols into the module in an easily predicted order, instead it only makes sure that the order remains fixed across the compiler invocations. The reason why the former is difficult is that during the process of writing a symbol, it can turn out that other symbols will have to be written as well (say, because they appear in array specifications). Since the module-writing code determines which symbols to output while actually writing the file, recording all the symbols that need to be written before writing to the file would mean a lot of surgery. I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and 2) it can't be backported to old release branches which are still compiled as C. Joost expressed interested in a backport. 3) I don't know where to #include (see above) Therefore I also propose a patch where I added the necessary ~50 lines of boilerplate code and added the necessary traversal function to use gfortran's GFC_BBT to maintain the ordered tree of symbols. Both patches pass the testsuite and Joost confirms that they fix the problem with CP2K. I also verified with a few examples that they both produce identical .mod files as they should. Is the C++ patch, modified to do the #include correctly, ok for the trunk? If not, the C-only patch? Can I put the C-only patch on the release branches? And which? Cheers, - Tobi 2012-10-13 Tobias Schlüter PR fortran/51727 * module.c (sorted_pointer_info): New. (gfc_get_sorted_pointer_info): New. (free_sorted_pointer_info_tree): New. (compare_sorted_pointer_info): New. (find_symbols_to_write): New. (write_symbol1_recursion): New. (write_symbol1): Collect symbols that need writing, output in order. (write_generic): Traverse tree in order. diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index 5cfc335..4cfcae4 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -5150,32 +5150,122 @@ write_symbol0 (gfc_symtree *st) } -/* Recursive traversal function to write the secondary set of symbols - to the module file. These are symbols that were not public yet are - needed by the public symbols or another dependent symbol. The act - of writing a symbol can modify the pointer_info tree, so we cease - traversal if we find a symbol to write. We return nonzero if a - symbol was written and pass that information upwards. */ +/* Type for the temporary tree used when writing secondary symbols. */ + +struct sorted_pointer_info +{ + BBT_HEADER (sorted_pointer_info); + + pointer_info *p; +}; + +#define gfc_get_sorted_pointer_info() XCNEW (sorted_pointer_info) + +/* Recursively traverse the temporary tree, free its contents. */ + +static void +free_sorted_pointer_info_tree (sorted_pointer_info *p) +{ + if (!p) +return; + + free_sorted_pointer_info_tree (p->left); + free_sorted_pointer_info_tree (p->right); + + free (p); +} + +/* Comparison function for the temporary tree. */ static int -write_symbol1 (pointer_info *p) +compare_sorted_pointer_info (void *_spi1, void *_spi2) { - int result; + sorted_pointer_info *spi1, *spi2; + spi1 = (sorted_pointer_info *)_spi1; + spi2 = (sorted_pointer_info *)_spi2; + if (spi1->p->integer < spi2->p->integer) +return -1; + if (spi1->p->integer > spi2->p->integer) +return 1; + return 0; +} + + +/* Finds the symbols that need to be written and collects them in the + sorted_pi tree so that they can be traversed in an order + independent of memory addresses. */ + +static void +find_symbols_to_write(sorted_pointer_info **tree, pointer_info *p) +{ + if (!p) +return; + + if (p->type == P_SYMBOL && p->u.wsym.state == NEEDS_WRITE) +{ + sorted_pointer_info *sp = gfc_get_sorted_pointer_info(); + sp->p = p; + + gfc_insert_bbt (tree, sp, compare_sorted_pointer_info); + } + + find_symbols_to_write (tree, p->left); + find_symbols_to_write (tree, p->right); +} + + +/* Recursive function that traverses the tree of symbols that need to be + written and writes them in order. */ + +static void +write_symbol1_recursion (sorted_pointer_info *sp) +{ + if (!sp) +return; + + write_symbol1_recursion (sp->left); + + pointer_info *p1 = sp->p; + gcc_assert (p1->type == P_SYMBOL && p1->
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
ps I forgot to mention that I also changed write_generic to traverse the tree in defined order, this is the same in the C++ and the C-only patch. Cheers, - Tobi On 2012-10-13 15:26, Tobias Schlüter wrote: Hi, first a question also to non-gfortraners: if I want to use std::map, where do I "#include "? In system.h? Now to the patch-specific part: in this PR, module files are produced with random changes because the order in which symbols are written can depend on the memory layout. This patch fixes this by recording which symbols need to be written and then processing them in order. The patch doesn't make the more involved effort of putting all symbols into the module in an easily predicted order, instead it only makes sure that the order remains fixed across the compiler invocations. The reason why the former is difficult is that during the process of writing a symbol, it can turn out that other symbols will have to be written as well (say, because they appear in array specifications). Since the module-writing code determines which symbols to output while actually writing the file, recording all the symbols that need to be written before writing to the file would mean a lot of surgery. I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and 2) it can't be backported to old release branches which are still compiled as C. Joost expressed interested in a backport. 3) I don't know where to #include (see above) Therefore I also propose a patch where I added the necessary ~50 lines of boilerplate code and added the necessary traversal function to use gfortran's GFC_BBT to maintain the ordered tree of symbols. Both patches pass the testsuite and Joost confirms that they fix the problem with CP2K. I also verified with a few examples that they both produce identical .mod files as they should. Is the C++ patch, modified to do the #include correctly, ok for the trunk? If not, the C-only patch? Can I put the C-only patch on the release branches? And which? Cheers, - Tobi
[PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize > 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize>1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? Ciao! Steven PR rtl-optimization/38711 * df.h (df_get_live_out, df_get_live_in): Make static inline functions. * df-problems.c (df_get_live_out, df_get_live_in): Moved to df.h. * ira-lives.c (process_bb_node_lives): Use df_get_live_out instead of DF_LR_OUT. * ira-build.c (create_bb_allocnos): Likewise. (create_loop_allocnos): Likewise, and use df_get_live_in instead of DF_LR_IN. * ira-emit.c (generate_edge_moves): Likewise. (add_ranges_and_copies): Likewise. * ira.c (mark_elimination): Update DF_LR and DF_LIVE. (build_insn_chain): Use df_get_live_out instead of DF_LR_OUT. (do_reload): Remove the DF_LIVE problem for -O1. * ira-color.c (ira_loop_edge_freq): Use df_get_live_out instead of DF_LR_OUT, and df_get_live_in instead of DF_LR_IN. ira-speedup-3.diff Description: Binary data
Re: [RFC PATCH] Add support for sparc compare-and-branch.
> The trouble area, and where I need help from either Rainer or Eric, > is the Solaris2 bits. > > I think we need to move the Solaris assembler stuff over to a model > where it passes: > > -m{32,64} -xarch=sparcFOO > > instead of using the v8plusX stuff to indicate 32bit. And that's > the direction I tried to move in here. The only versions of the Solaris assembler I have access to only support v8plusX according to the man page. Has that changed recently? > Could one of you help me get the solaris side correct? I made sure > that binutils accepts the same options for this stuff, that's why > I can unconditionally use '-xarch=sparc4' in the configure test. sparc4 sounds very 1990-ish... > * config/sparc/sparc.opt (mvis4): New option. > * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4, > define __VIS__ to 0x400. What's the relationship between VIS4 and SPARC-T4 exactly? The above manual only speaks of VIS3 as far as I can see. And the CBcond instructions are not marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange. Why not make them depend on -mcpu=niagara4 instead? -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On 12-10-13 9:40 AM, Steven Bosscher wrote: Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize > 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize>1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? LIVE is not used on purpose. I added LIVE usage to the old RA about 10 years ago (it was before DF-infrastructure). Everything was ok until, somebody reported compiler crash on semantically wrong program (something with usage of uninitialized variable). After that I removed this code. Probably, that is also a reason why reload uses LR not LIVE. So I'd like to wait a few weeks. When I have more time, I am going to reproduce this test suite and look how IRA behaves on it. Thanks, Steven.
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote: > On 12-10-13 9:40 AM, Steven Bosscher wrote: >> >> Hello, >> >> This fixes the long-standing enhancement request to use DF_LIVE in >> IRA. To quote the first comment in the PR: >> >> IRA should be using the DF-LIVE sets, which are smaller than the DF-LR >> sets when they are available (typically at O2 and above). The proper >> sets can be conveniently accessed using the df_get_live_[in,out] >> functions which use DF-LIVE if it is available and fall back to DF-LR >> if it is not. >> >> I thought that DF_LIVE wasn't available at -O1 in IRA, but >> interestingly ira.c:ira() adds it to the DF-problems list in that case >> already: >> >> $ grep -C3 -n df_live_add ira.c >> 4160- >> 4161- if (optimize == 1) >> 4162-{ >> 4163: df_live_add_problem (); >> 4164- df_live_set_all_dirty (); >> 4165-} >> 4166-#ifdef ENABLE_CHECKING >> >> So DF_LIVE is already being computed for IRA. It's just not being used >> because the DF_LR_{IN,OUT} macros are used instead of the >> df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are >> used, which results in marginally faster compile times for my set of >> cc1-i files on power64 and on x86_64, as well as smaller code. There's >> also a good speedup for the PR54146 test case. >> >> Only reload.c still uses the DF_LR_OUT. The comment in >> find_dummy_reload() suggests this is intentional. >> >> I also fond a bug in IRA: >> >> 4392- touch the artificial uses and defs. */ >> 4393- df_finish_pass (true); >> 4394- if (optimize > 1) >> 4395:df_live_add_problem (); >> 4396- df_scan_alloc (NULL); >> 4397- df_scan_blocks (); >> 4398- >> >> At optimize>1 the DF_LIVE problem is already always computed. I think >> the idea here was to *remove* the DF_LIVE problem at -O1, which is >> also part of the attached patch. >> (Perhaps we should even simply not add the DF_LIVE problem at -O1, but >> that can be done in a follow-up patch if necessary.) >> >> Bootstrapped and tested (with default checking and with release >> checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. >> OK for trunk? >> >> > LIVE is not used on purpose. I added LIVE usage to the old RA about 10 > years ago (it was before DF-infrastructure). Everything was ok until, > somebody reported compiler crash on semantically wrong program (something > with usage of uninitialized variable). After that I removed this code. GCC was a completely different compiler 10 years ago. Handling of uninitialized variables is radically different since tree-ssa, and also the resource allocation side of the compiler has essentially been rewritten (by you :-). Whatever broke in the compiler 10 years ago may not be relevant anymore today. I can't find your work, or the reported problem, in the gcc mailing list archives. Did you add a test case for the problem, and if so, where can I find it? > Probably, that is also a reason why reload uses LR not LIVE. The reason why reload uses DF_LR is PR20973, as documented in the code. (And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the transformations of pass_initialize_regs to it, that pass exists to avoid uninitialized regs hazards by making them initialized.) > So I'd like to > wait a few weeks. When I have more time, I am going to reproduce this test > suite and look how IRA behaves on it. I don't think that's giving this patch a fair chance. In a few week's time, stage1 will be over. You may not have time, but that doesn't mean other developers should wait. If you have a test case that fails with my patch, I'll gladly spend my own time on it to see if it can be fixed. My patch was properly tested on two primary targets and passes all tests in the test suite, it improves the compiler now, and it addresses a PR that you've basically ignored for 4 years. I kindly ask you to reconsider your position :-) Ciao! Steven
Re: [lra] patch to fix GCC crash on a SPEC2006 test
On Thu, 2012-10-11 at 23:53 -0400, Vladimir Makarov wrote: > My biggest problem is ESL. I should use simpler phrases. Heh, I'm not sure compiler speak qualifies as English. :) > Is the following comment better? > > Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does not affect value > of insn_bitmap of the corresponding lra_reg_info. That is because we > don't need to reload pseudos in CALL_INSN_FUNCTION_USAGEs. So if we > process only insns in the insn_bitmap of given pseudo here, we can miss > the pseudo in some CALL_INSN_FUNCTION_USAGEs. Sure, that's better. Thanks. Peter
[PATCH] Fix gcov handling directories with periods
PR gcov-profile/44728 * gcov.c (create_file_names): When stripping extension only look at base name. --- gcc/gcov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/gcov.c b/gcc/gcov.c index cf26ce1..09831c2 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -842,7 +842,7 @@ create_file_names (const char *file_name) } /* Remove the extension. */ - cptr = strrchr (name, '.'); + cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.'); if (cptr) *cptr = 0; -- 1.7.12.3 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [patch] PR54885
> Hello, > > I hadn't realized that dead registers can still reach EQ-notes. Fixed > with the attached patch. Hi, thanks for working on this! I already comitted the testcase as gcc.dg/webizer.c so there is no need to commit it again. Honza > > Bootstrapped&tested with normal and release checking on > x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. OK for > trunk? > > Ciao! > Steven
Two obvious loop-iv fixes
Hi, while proofreading loop-iv for possible cause of the miscompilation that turned out to be webizer bug I noticed two minor problems. 1) determine_max_iter has path dealing with AND that is bogus, because constant argument is always canonized to be second. Enabling the path however somethimes leads to worse results, so I improved it a bit by combining both estimates. 2) bounds are recorded as signed values intead of unsigned. This means that values with upper bit set gets promoted to almost infinity for double int. Bootstrapped®tested x86_64-linux, comitted as obvious. Honza * loop-iv.c (determine_max_iter): Fix handling of AND. (iv_number_of_iterations): Record upper bounds as unsigned values. Index: loop-iv.c === --- loop-iv.c (revision 192409) +++ loop-iv.c (working copy) @@ -2224,13 +2224,18 @@ determine_max_iter (struct loop *loop, s rtx niter = desc->niter_expr; rtx mmin, mmax, cmp; unsigned HOST_WIDEST_INT nmax, inc; + unsigned HOST_WIDEST_INT andmax = 0; + + /* We used to look for constant operand 0 of AND, + but canonicalization should always make this impossible. */ + gcc_checking_assert (GET_CODE (niter) != AND + || !CONST_INT_P (XEXP (niter, 0))); if (GET_CODE (niter) == AND - && CONST_INT_P (XEXP (niter, 0))) + && CONST_INT_P (XEXP (niter, 1))) { - nmax = INTVAL (XEXP (niter, 0)); - if (!(nmax & (nmax + 1))) - return nmax; + andmax = UINTVAL (XEXP (niter, 1)); + niter = XEXP (niter, 0); } get_mode_bounds (desc->mode, desc->signed_p, desc->mode, &mmin, &mmax); @@ -2258,7 +2263,13 @@ determine_max_iter (struct loop *loop, s if (dump_file) fprintf (dump_file, ";; improved upper bound by one.\n"); } - return nmax / inc; + nmax /= inc; + if (andmax) +nmax = MIN (nmax, andmax); + if (dump_file) +fprintf (dump_file, ";; Determined upper bound "HOST_WIDEST_INT_PRINT_DEC".\n", +nmax); + return nmax; } /* Computes number of iterations of the CONDITION in INSN in LOOP and stores @@ -2563,7 +2574,7 @@ iv_number_of_iterations (struct loop *lo ? iv0.base : mode_mmin); max = (up - down) / inc + 1; - record_niter_bound (loop, double_int::from_shwi (max), + record_niter_bound (loop, double_int::from_uhwi (max), false, true); if (iv0.step == const0_rtx) @@ -2776,14 +2787,14 @@ iv_number_of_iterations (struct loop *lo desc->const_iter = true; desc->niter = val & GET_MODE_MASK (desc->mode); - record_niter_bound (loop, double_int::from_shwi (desc->niter), + record_niter_bound (loop, double_int::from_uhwi (desc->niter), false, true); } else { max = determine_max_iter (loop, desc, old_niter); gcc_assert (max); - record_niter_bound (loop, double_int::from_shwi (max), + record_niter_bound (loop, double_int::from_uhwi (max), false, true); /* simplify_using_initial_values does a copy propagation on the registers
Re: Two obvious loop-iv fixes
On Sat, Oct 13, 2012 at 6:54 PM, Jan Hubicka wrote: > 2) bounds are recorded as signed values intead of unsigned. This means >that values with upper bit set gets promoted to almost infinity for >double int. Could you check and see if this change fixes PR54919? Ciao! Steven
Re: Use conditional casting with symtab_node
On 10/12/12, Richard Biener wrote: > On Oct 11, 2012 Xinliang David Li wrote: >> On Oct 11, 2012 Lawrence Crowl wrote: >>> On 10/10/12, Xinliang David Li wrote: In a different thread, I proposed the following alternative to 'try_xxx': template T* symbol::cast_to(symbol* p) { if (p->is()) return static_cast(p); return 0; } cast: template T& symbol:as(symbol* p) { assert(p->is()) return static_cast(*p); } >>> >>> My concern here was never the technical feasibility, nor the >>> desirability of having the facility for generic code, but the amount >>> of the amount of typing in non-generic contexts. That is >>> >>> if (cgraph_node *q = p->try_function ()) >>> >>> versus >>> >>> if (cgraph_node *q = p->cast_to ()) >>> >>> I thought the latter would generate objections. Does anyone object >>> to the extra typing? >>> >>> If not, I can make that change. >> >> Think about a more complex class hierarchy and interface consistency. >> I believe you don't want this: >> >> tree::try_decl () { .. } >> tree::try_ssa_name () { ..} >> tree::try_type() {...} >> tree::try_int_const () {..} >> tree::try_exp () { ...} >> .. >> >> Rather one (or two with the const_cast version). >> >> template T *tree::cast_to () >> { >>if (kind_ == kind_traits::value) >> return static_cast (this); >> >> return 0; >> } > > I also think that instead of > >>> if (cgraph_node *q = p->cast_to ()) > > we want > > if ((q = cast_to (p)) > > I see absolutely no good reason to make cast_to a member, given > that the language has static_cast, const_cast and stuff. cast_to > would simply be our equivalent to dynamic_cast within our OO model. Sorry, that was a think-o. Agreed there is no point in it being a member. > Then I'd call it *_cast instead of cast_*, so, why not gcc_cast < >? > Or dyn_cast <> (). That way > > if ((q = dyn_cast (p)) > > would be how to use it. Which function? The point with my original proposal is that the kind of function was derivable from context, and thus can be shorter. > Small enough, compared to > > if ((q = p->try_function ())) > > and a lot more familiar to folks knowing C++ (and probably doesn't > make a difference to others). > > Thus, please re-use or follow existing concepts. Both are existing concepts. What I proposed is a common technique for avoiding the cost of dynamic_cast when down casting in a class hierarchy. Both concepts will work. I proposed what I thought would be most convenient to programmers. However, I will change to the other form unless someone objects. > As for the example with tree we'd then have > > if ((q = dyn_cast (p))) > > if we can overload on the template parameter kind (can we? > typename vs. enum?) There are two template parameters, one for the argument type and one for the return type. We can overload on the argument type but not on the return type. We can, however, (indirectly) partially specialize on the return type. We need to do that anyway to represent the fact that not all type conversions are legal. However, I recommend against specializing on the enum, as it would become somewhat obscure when the hierarchy is discriminated on more than one enum. > or use tree_cast <> (no I don't want dyn_cast > all around the code). Once we have proper class hierarchies, derived to base conversions are implicit. In the meantime, we need something else. -- Lawrence Crowl
Re: [PATCH] Fix gcov handling directories with periods
On Sat, Oct 13, 2012 at 8:51 AM, Andreas Schwab wrote: > PR gcov-profile/44728 > * gcov.c (create_file_names): When stripping extension only look > at base name. > diff --git a/gcc/gcov.c b/gcc/gcov.c > index cf26ce1..09831c2 100644 > --- a/gcc/gcov.c > +++ b/gcc/gcov.c > @@ -842,7 +842,7 @@ create_file_names (const char *file_name) > } > >/* Remove the extension. */ > - cptr = strrchr (name, '.'); > + cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.'); >if (cptr) > *cptr = 0; Why do you need the CONST_CAST? strrchr is a standard function and it takes const char * as the first argument. There is other code in gcc that calls strrchr with a const char * argument. This patch is OK without the CONST_CAST. Thanks. Ian
Re: Use conditional casting with symtab_node
On Sat, Oct 13, 2012 at 12:44 PM, Lawrence Crowl wrote: >> Thus, please re-use or follow existing concepts. > > Both are existing concepts. What I proposed is a common technique > for avoiding the cost of dynamic_cast when down casting in a class > hierarchy. Both concepts will work. I proposed what I thought > would be most convenient to programmers. However, I will change > to the other form unless someone objects. Let me elaborate on your point. The concept you are trying to implement is that of: (a) check whether a tree is of a certain kind; (b) if so return a pointer to the (sub-)object of that kind; otherwise a null pointer. This is a standard idiom -- at least when using C++. It can be implemented in various ways. Your earlier attempt try_xxx is one example. Another common example, built into the language is dynamic_cast. The latter requires that classes involved in this test be polymorphic (because the builtin implementation needs to consult metadata that are already present with virtual functions.) Yet, another implementation is what is currently in GCC more-or-less. I think we should name the operation based on the abstract concept we are implementing as opposed the builtin language implementation. That is why I recommend is<> over dyn_cast<> or variations of it. We should be able to understand its uses without having to know the implementation details. > However, I recommend against specializing on the enum, as it would > become somewhat obscure when the hierarchy is discriminated on more > than one enum. 100% agreed. -- Gaby
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 09:26 , Tobias Schlüter wrote: Hi, first a question also to non-gfortraners: if I want to use std::map, where do I "#include "? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Diego.
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 20:00, Diego Novillo wrote: On 2012-10-13 09:26 , Tobias Schlüter wrote: first a question also to non-gfortraners: if I want to use std::map, where do I "#include "? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Ok, I was just wondering if there's a rule because a quick grep revealed no non-header source file that includes system headers. Thanks, - Tobi
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 14:04 , Tobias Schlüter wrote: On 2012-10-13 20:00, Diego Novillo wrote: On 2012-10-13 09:26 , Tobias Schlüter wrote: first a question also to non-gfortraners: if I want to use std::map, where do I "#include "? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Ok, I was just wondering if there's a rule because a quick grep revealed no non-header source file that includes system headers. Joseph or others may have reason to create a system.hxx file or some such, but in principle I don't see why we shouldn't include std library headers as we need them. Joseph? Diego.
Re: [PATCH] Fix gcov handling directories with periods
Ian Lance Taylor writes: > Why do you need the CONST_CAST? strrchr is a standard function and it > takes const char * as the first argument. There is other code in gcc > that calls strrchr with a const char * argument. strrchr is overloaded as const and non-const in C++. We need the non-const version. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue
On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand wrote: > Hello, > > I was running a couple of tests on various platforms in preparation > of getting the find_reload_subreg_address patch needed by aarch64 upstream: > http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html > > This unfortunately uncovered a regression in vect-98-big-array.c on i386. > It seems to me that this is due to a pre-existing problem in the back-end. [...] > But in principle the problem is latent even without the patch. The back-end > should not permit reload to make changes to the pattern that fundamentally > change the semantics of the resulting instruction ... > > I was wondering if the i386 port maintainers could have a look at this > pattern. Shouldn't we really have two patterns, one to *load* an unaligned > value and one to *store* and unaligned value, and not permit that memory > access to get reloaded? Please find attached a fairly mechanical patch that splits move_unaligned pattern into load_unaligned and store_unaligned patterns. We've had some problems with this pattern, and finally we found the reason to make unaligned moves more robust. I will wait for the confirmation that attached patch avoids the failure you are seeing with your reload patch. > [I guess a more fundamental change might also be to not have an unspec > in the first place, but only plain moves, and check MEM_ALIGN in the > move insn emitter to see which variant of the instruction is required ...] We already do this for AVX moves, but SSE aligned moves should ICE on unaligned access. This is handy to find errors in user programs (or ... well ... in the compiler itself). Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 192420) +++ config/i386/i386.c (working copy) @@ -16059,7 +16059,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r { rtx m; rtx (*extract) (rtx, rtx, rtx); - rtx (*move_unaligned) (rtx, rtx); + rtx (*load_unaligned) (rtx, rtx); + rtx (*store_unaligned) (rtx, rtx); enum machine_mode mode; switch (GET_MODE (op0)) @@ -16068,39 +16069,52 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r gcc_unreachable (); case V32QImode: extract = gen_avx_vextractf128v32qi; - move_unaligned = gen_avx_movdqu256; + load_unaligned = gen_avx_loaddqu256; + store_unaligned = gen_avx_storedqu256; mode = V16QImode; break; case V8SFmode: extract = gen_avx_vextractf128v8sf; - move_unaligned = gen_avx_movups256; + load_unaligned = gen_avx_loadups256; + store_unaligned = gen_avx_storeups256; mode = V4SFmode; break; case V4DFmode: extract = gen_avx_vextractf128v4df; - move_unaligned = gen_avx_movupd256; + load_unaligned = gen_avx_loadupd256; + store_unaligned = gen_avx_storeupd256; mode = V2DFmode; break; } - if (MEM_P (op1) && TARGET_AVX256_SPLIT_UNALIGNED_LOAD) + if (MEM_P (op1)) { - rtx r = gen_reg_rtx (mode); - m = adjust_address (op1, mode, 0); - emit_move_insn (r, m); - m = adjust_address (op1, mode, 16); - r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); - emit_move_insn (op0, r); + if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD) + { + rtx r = gen_reg_rtx (mode); + m = adjust_address (op1, mode, 0); + emit_move_insn (r, m); + m = adjust_address (op1, mode, 16); + r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); + emit_move_insn (op0, r); + } + else + emit_insn (load_unaligned (op0, op1)); } - else if (MEM_P (op0) && TARGET_AVX256_SPLIT_UNALIGNED_STORE) + else if (MEM_P (op0)) { - m = adjust_address (op0, mode, 0); - emit_insn (extract (m, op1, const0_rtx)); - m = adjust_address (op0, mode, 16); - emit_insn (extract (m, op1, const1_rtx)); + if (TARGET_AVX256_SPLIT_UNALIGNED_STORE) + { + m = adjust_address (op0, mode, 0); + emit_insn (extract (m, op1, const0_rtx)); + m = adjust_address (op0, mode, 16); + emit_insn (extract (m, op1, const1_rtx)); + } + else + emit_insn (store_unaligned (op0, op1)); } else -emit_insn (move_unaligned (op0, op1)); +gcc_unreachable (); } /* Implement the movmisalign patterns for SSE. Non-SSE modes go @@ -16195,7 +16209,7 @@ ix86_expand_vector_move_misalign (enum machine_mod op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_movdqu (op0, op1)); + emit_insn (gen_sse2_loaddqu (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -16207,7 +16221,7 @@ ix86_expand_vector_move_misalign (enum machine_mod || optimize_function_for_size_p (cfun)) { /* We will event
Re: encoding all aliases options in .opt files
On 12 October 2012 17:18, Joseph S. Myers wrote: > On Fri, 12 Oct 2012, Manuel López-Ibáñez wrote: > >> I am trying to encode the relationship between Wstrict-aliasing and >> Wstrict-aliasing= in the .opt files, and the same for Wstrict-overflow >> and Wstrict-overflow=. However, the parameters of Alias() are taken as >> strings, so we get "3" and "WARN_STRICT_OVERFLOW_CONDITIONAL". Do you >> have a suggestion how to handle this? > > I don't see what the problem is supposed to be. Yes, the arguments are > strings - they describe how one option that the user might pass on the > command line is exactly equivalent to another that they might pass (up to > and including the latter form of the option being the one that is used > when matching specs and multilibs, because the driver does the > translation, including working out the canonical textual form of the > option, before processing specs against the command line). > > If -Wstrict-overflow on the command line is equivalent to > -Wstrict-overflow=WARN_STRICT_OVERFLOW_CONDITIONAL on the command line, > then specify WARN_STRICT_OVERFLOW_CONDITIONAL in the parameters. If it's > equivalent to something else, specify something else. > >> My next step would be to handle aliases also when internally >> generating options, so I can add EnabledBy(Wall) to Wstrict-aliasing. >> Does this sound ok to you? > > That seems reasonable. OK. The attached patch implements this. Does the approach look ok? I will write changelog and more comments if it seems reasonable. One thing I don't like is that now we print [-Werror=strict-aliasing=] in diagnostics. Two questions: 1) Are references allowed now that C++ is the default? 2) While fixing this, I was thinking: Why the difference between Joined and Separate? Why not make every function that takes an argument work with and without '=' and with separate argument. It seems we could remove a lot of options this way. What I am proposing is, instead of: aux-info Common Separate Var(aux_info_file_name) -aux-info Emit declaration information into aux-info= Common Joined Alias(aux-info) simply aux-info Common RequiredArgument Var(aux_info_file_name) -aux-info Emit declaration information into and all of these would be accepted: aux-infoFILE /* we could accept this to be compatible with some options like -B */ aux-info=FILE aux-info FILE We have dozens of options like these. Moreover, we could replace: Wstrict-aliasing Common Alias(Wstrict-aliasing=, 3, 0) Warning Warn about code which might break strict aliasing rules Wstrict-aliasing= Common Joined RejectNegative UInteger Var(warn_strict_aliasing) Warning Warn about code which might break strict aliasing rules with: Wstrict-aliasing Common MissingArgument(3) UInteger Var(warn_strict_aliasing) Warning Warn about code which might break strict aliasing rules and all of these will be accepted: Wstrict-aliasing Wno-strict-aliasing Wstrict-aliasing2 Wstrict-aliasing=2 Wstrict-aliasing 2 Werror=strict-aliasing Wno-error=strict-aliasing Cheers, Manuel. lang-enabled-by-with-args.diff Description: Binary data
Re: Add usage documentation for hash_table
On 10/12/12, Diego Novillo wrote: > Add usage documentation for hash_table. > > Andrew, does this help? > > Lawrence, I think I've gotten the details right, but please confirm. The patch merges the descriptor class with the element class, which we do not currently do and which I don't think we should do. If that class is used in a tree, it would be illegal. I'll work with Diego to address the issue. > > Tested by re-building stage 1. > > > * hash-table.h: Add usage documentation. > Tidy formatting. > > diff --git a/gcc/hash-table.h b/gcc/hash-table.h > index 3aa66a7..0c51be6 100644 > --- a/gcc/hash-table.h > +++ b/gcc/hash-table.h > @@ -21,8 +21,121 @@ along with GCC; see the file COPYING3. If not see > > > /* This file implements a typed hash table. > - The implementation borrows from libiberty's hashtab. */ > + The implementation borrows from libiberty's htab_t. > > + Hash tables are instantiated with two type arguments: > + > + 1- Element: A type describing the elements in the table. > + This type must provide 3 declarations: > + > + - A typedef to create the type Element::T. This is the type > + of the elements stored in the table. So, if you want the > + hash table of MyType elements, declare 'typedef MyType T;'. > + > + - A function named 'hash' that takes a pointer to Element::T > + and returns a hashval_t value. This is the hashing > + function. > + > + - A function named 'equal' that takes two pointers to > + Element::T and returns an int. This is the comparison > + function. It should return nonzero if the elements are > + equal, 0 otherwise. > + > + - A static function named 'remove' that takes an Element::T > + pointer and frees the memory allocated by it. This is used > + when individual elements of the table need to be disposed of > + (e.g., when deleting a hash table, removing elements from > + the table, etc). > + > + 2- Allocator: A template type implementing allocation and deallocation > for > + the table itself. > + > + This type takes as argument a type passed on by the hash table > + allocation and deallocation functions. It must provide four > + static functions: > + > + - Allocator::control_alloc: This allocates the control data > + blocks for the table. > + > +- Allocator::control_free: This frees the control data blocks > + for the table. > + > +- Allocator::data_alloc: This allocates the data elements in > + the table. > + > + - Allocator::data_free: This deallocates the data elements in > + the table. > + > + In general, you will not need to provide your own Allocator type. > + By default, hash tables will use the class xcallocator, which uses > + malloc/free for allocation. > + > + Additionally, this file provides two common types for implementing > + element removal: > + > + - typed_free_remove: it implements the method 'remove' to call > + free(). > + > + - typed_noop_remove: it implements the method 'remove' to do > + nothing. > + > + To use either of these removal strategies, simply make your type a > + derived class of one of these two. > + > + Example usage: > + > + 1- A hash table for MyType: > + > + // Derive from typed_noop_remove so element removal does nothing. > + class MyType : typed_noop_remove > + { > + int f1; > + OtherType f2; > + > + // Hash table support. Need a typedef and 2 static functions. > + > + // 'T' is the type used in all the hash table functions. > + typedef MyType T; > + > + // The hashing function. 'T' and 'MyType' are equivalent here. > + static inline hashval_t hash (const MyType *); > + > + // The equality function. 'T' and 'MyType' are equivalent here. > + static inline int equal (const MyType *, const MyType *); > + }; > + > + inline hashval_t > + MyType::hash (const MyType *e) > + { ... compute and return a hash value for E ... } > + > + inline int > + MyType::equal (const MyType *p1, const MyType *p2) > + { ... compare P1 vs P2. Return 1 if they are the same ... } > + > + > + Note that since MyType derives from typed_noop_remove, it does > not > + need to provide a 'remove' function. It inherits it from > + typed_noop_remove. > + > + To instantiate a hash table for MyType: > + > + hash_table mytype_hash; > + > + You can then used any of the functions in hash_table's public interface. > + See hash_table for details. The interface is very similar to > libiberty's > + htab_t. > + > + > + 2- A hash table of pointers. > + > + This file provides the template type 'pointer_hash' which can be > + used to create hash tables of pointers to any type. This class uses > + the same hashing function used by libiberty's hash_pointer. > + > + To create a hash table of pointers to MyType: > + > + hash_table > ptr_htab; > +*/ > >
Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure
On Fri, Oct 12, 2012 at 10:59 AM, Paolo Carlini wrote: > On 10/12/2012 04:20 PM, Pavel Chupin wrote: >> >> Please see attached patch (applicable after revert). >> I've moved libgcc libstdc++ common configure thread header chunk into >> separate gthr.m4. >> Could you please try it on AIX? >> >> Is it OK for trunk? > > Looks Ok. If David can test is successfully on AIX I can approve it. I was able to bootstrap successfully with the patch. Thanks, David
Re: [PATCH] Fix gcov handling directories with periods
On Sat, Oct 13, 2012 at 11:23 AM, Andreas Schwab wrote: > Ian Lance Taylor writes: > >> Why do you need the CONST_CAST? strrchr is a standard function and it >> takes const char * as the first argument. There is other code in gcc >> that calls strrchr with a const char * argument. > > strrchr is overloaded as const and non-const in C++. We need the > non-const version. Oh yeah. Really we should overload lbasename the same way. Suppose you drop this into include/libiberty.h: #ifdef __cplusplus inline char *lbasename(char *s) { return const_cast(lbasename (s)); } #endif I'll preapprove that if it works. Ian
Re: encoding all aliases options in .opt files
On Sat, Oct 13, 2012 at 11:33 AM, Manuel López-Ibáñez wrote: > > 1) Are references allowed now that C++ is the default? I'm not sure we addressed those in the coding conventions. I would like to say: 1) const references are always fine; 2) non-const references as local variables are always fine; 3) non-const references as function parameters should never be used without a really excellent reason. The reason is simply that using non-const references as function parameters means that arguments passed to a function can change without any indication, and that is potentially quite confusing. Adding an & before an output argument is easy enough, and we're all used to it. Ian
Re: Add usage documentation for hash_table
On 2012-10-13 14:40 , Lawrence Crowl wrote: On 10/12/12, Diego Novillo wrote: Add usage documentation for hash_table. Andrew, does this help? Lawrence, I think I've gotten the details right, but please confirm. The patch merges the descriptor class with the element class, which we do not currently do and which I don't think we should do. If that class is used in a tree, it would be illegal. You mean I should s/Element/Descriptor/? Done. I included the formatting changes in the doc patch, but I can easily split the patch if you prefer. OK with those changes? Diego.
Re: [PATCH] Fix gcov handling directories with periods
Ian Lance Taylor writes: > Suppose you drop this into include/libiberty.h: > > #ifdef __cplusplus > inline char *lbasename(char *s) { return const_cast(lbasename (s)); } > #endif That doesn't work: ../../gcc/libcpp/../include/libiberty.h: In function ‘char* lbasename(char*)’: ../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C function ‘char* lbasename(char*)’ conflicts with ../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration ‘const char* lbasename(const char*)’ here Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Fix PR rtl-optimization/54871
This is the execution failure of gfortran.dg/vector_subscript_1.f90 on SPARC/Solaris at -O3 -funroll-loops. The loop2_unroll dump of the reduced testcase reads: Loop 3 is simple: simple exit 10 -> 12 number of iterations: (const_int -1 [0x]) upper bound: 1073741823 realistic bound: -1 The number of iterations is of course bogus. The problem is in simplify_using_initial_values: it uses the condition in: (insn 105 104 106 6 (set (reg:CC 100 %icc) (compare:CC (reg:SI 153 [ D.1086 ]) (reg:SI 229 [ D.1086 ]))) vector_subscript_1.f90:21 1 {*cmpsi_insn} (expr_list:REG_DEAD (reg:SI 229 [ D.1086 ]) (nil))) to record an equivalence between (reg:SI 153) and (reg:SI 229) and then: (insn 179 101 103 6 (set (reg:SI 229 [ D.1086 ]) (reg:SI 245 [ D.1086 ])) vector_subscript_1.f90:21 -1 (nil)) plus the equivalence to simplify the niter expression. But there is: (insn 104 103 105 6 (set (reg:SI 229 [ D.1086 ]) (if_then_else:SI (ge:CC (reg:CC 100 %icc) (const_int 0 [0])) (reg:SI 132 [ D.1086 ]) (reg:SI 245 [ D.1086 ]))) vector_subscript_1.f90:21 105 {*movsi_cc_v9} (expr_list:REG_EQUAL (if_then_else:SI (ge:CC (reg:CC 100 %icc) (const_int 0 [0])) (reg:SI 132 [ D.1086 ]) (const_int 0 [0])) (expr_list:REG_DEAD (reg:SI 132 [ D.1086 ]) (expr_list:REG_DEAD (reg:CC 100 %icc) (nil) between the two above instructions, which invalidates the equivalence. So the equivalence should have been pruned when insn 104 was processed. The patch also does a bit of cosmetic cleanup in loop-unroll.c, for example in the dump file. We have in the loop2_unroll dump of the original testcase: ;; *** Considering loop 3 *** ;; Considering unrolling loop with constant number of iterations ;; max_unroll 5 (6 copies, initial 5). ;; Decided to unroll the constant times rolling loop, 4 times. Question for the reader: how many times is the loop unrolled? 4, 5 or 6? :-) In fact, the "max_unroll" is always equal to the "times" + 1 and "initial" is an internal initial value for "max_unroll". Morever, this "max_unroll" is _not_ equal to the max_unroll in unroll_loop_constant_iterations: unsigned max_unroll = loop->lpt_decision.times; which is equal to the "times" instead. And we also have comments like: /* Unroll LOOP with constant number of iterations LOOP->LPT_DECISION.TIMES + 1 times. so LOOP->LPT_DECISION.TIMES (the "times" above) isn't really what one might think it is. In the end, this is mightily confusing. Tested on x86_64-suse-linux and SPARC/Solaris, applied on the mainline. 2012-10-13 Eric Botcazou PR rtl-optimization/54871 * loop-iv.c (simplify_using_initial_values): When scanning previous basic blocks, prune the recorded conditions if the current insn was not used to make a replacement. * loop-unroll.c (decide_unroll_constant_iterations): Clean up message. (unroll_loop_constant_iterations): Clarify head comment. (decide_unroll_runtime_iterations): Clean up message. (unroll_loop_runtime_iterations): Clarify head comment. (decide_peel_simple): Clean up message. (peel_loop_simple): Clarify head comment. (decide_unroll_stupid): Clean up message. (unroll_loop_stupid): Clarify head comment. -- Eric BotcazouIndex: loop-iv.c === --- loop-iv.c (revision 192353) +++ loop-iv.c (working copy) @@ -2004,11 +2004,30 @@ simplify_using_initial_values (struct lo } } else - /* If we did not use this insn to make a replacement, any overlap - between stores in this insn and our expression will cause the - expression to become invalid. */ - if (for_each_rtx (expr, altered_reg_used, this_altered)) - goto out; + { + rtx *pnote, *pnote_next; + + /* If we did not use this insn to make a replacement, any overlap + between stores in this insn and our expression will cause the + expression to become invalid. */ + if (for_each_rtx (expr, altered_reg_used, this_altered)) + goto out; + + /* Likewise for the conditions. */ + for (pnote = &cond_list; *pnote; pnote = pnote_next) + { + rtx note = *pnote; + rtx old_cond = XEXP (note, 0); + + pnote_next = &XEXP (note, 1); + if (for_each_rtx (&old_cond, altered_reg_used, this_altered)) + { + *pnote = *pnote_next; + pnote_next = pnote; + free_EXPR_LIST_node (note); + } + } + } if (CONSTANT_P (*expr)) goto out; Index: loop-unroll.c === --- loop-unroll.c (revision 192353) +++ loop-unroll.c (working copy) @@ -602,26 +602,21 @@ decide_unroll_constant_iterations (struc } } - if (dump_file) -fprintf (dump_file, ";; max_unroll %d (%d c
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
> IRA should be using the DF-LIVE sets, which are smaller than the DF-LR > sets when they are available (typically at O2 and above). The proper > sets can be conveniently accessed using the df_get_live_[in,out] > functions which use DF-LIVE if it is available and fall back to DF-LR > if it is not. > [...] > At optimize>1 the DF_LIVE problem is already always computed. I think > the idea here was to *remove* the DF_LIVE problem at -O1, which is > also part of the attached patch. > (Perhaps we should even simply not add the DF_LIVE problem at -O1, but > that can be done in a follow-up patch if necessary.) So, in the end, does the patch enable DF_LIVE at -O1 or not? There seems to be a contradiction between the subject and the body of the message. If yes, perhaps an acceptable compromise would be to keep things unchanged at -O1. -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 10:38 PM, Eric Botcazou wrote: > So, in the end, does the patch enable DF_LIVE at -O1 or not? There seems to > be a contradiction between the subject and the body of the message. If yes, > perhaps an acceptable compromise would be to keep things unchanged at -O1. Eh, right. When I wrote the e-mail, I was still under the impression that IRA only has DF_LIVE available at -O2 and higher, and I only discovered that DF_LIVE is added if optimize==1 just before submitting the patch. So my patch as submitted should have said "... (for -O1 and higher)". I think it would be a good idea to keep things unchanged at -O1. For that, the patch needs a few minor modifications (remove calls to df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} conditional). I can prepare an updated patch for that, if you think that's best. Ciao! Steven
Re: Make try_unroll_loop_completely to use loop bounds recorded
> On Fri, 12 Oct 2012, Jan Hubicka wrote: > > > > > * f95-lang.c (gfc_init_builtin_functions): Build > > > > __builtin_unreachable. > > > > > > I wonder if other languages need similar adjustment? > > > > I also wondered ;) Only Fortran triggered, I will take a look. > > > > > > + /* Now destroy the loop. First try to do so by cancelling the > > > + patch from exit conditional if we identified good candidate. > > > + > > > > > > you mean 'path' here, right? Similar in other places. > > > > Yeah. > > > > > > + /* Unloop destroys the latch edge. */ > > > + unloop (loop, &irred_invalidated); > > > + if (irred_invalidated) > > > + mark_irreducible_loops (); > > > > > > this makes the whole thing quadratic in the number of loops. > > > Please pass down a flag and handle this in the place we > > > update SSA form (thus, once per function / unroll iteration). > > > > This was in my first version of the patch. Then I noticed that remove_path > > also relies on irreducible loops to be computed and does the exactly same > > logic as I do. > > So we would need to extend the &irred_invalidated to remove_path as well. > > Well, then do that. Or at least constrain the number of loops > that are updated in mark_irreducible_loops (though that would > still be quadratic in the loop depth then?). We solved scalability > issues of cunroll for 4.8, so adding new ones isn't going to fly. Well, limiting number of mark_irreducible_loops updates is probably quite easy to do. unloop is returning true just in quite corner cases and actually we do have this quadratic complexity issue in the RTL unroller forever. > > > > > > > Or provide a more optimial implementation that only considers > > > parent loops of 'loop' (even that is excessive though, quadratic > > > in the loop depth). > > > > > > - FOR_EACH_LOOP (li, loop, 0) > > > + for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;) > > > > > > FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) > > > > > > but I'm sure that IV canonicalization should happen for all loops. > > > So, why do you need this change? Esp. we should get rid of > > > single-iteration outer loops here, too. > > > > FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand. > > I > > was running into case where we cancelled inner loop but did not update SSA > > and > > tried to estimate number of iterations in the outer loop (now also > > innermost) > > that went into infinite loop. > > Well, that's not true (it uses fel_* stuff), FOR_EACH_LOOP constructs > a vector of loops in the desired order it walks over. As for estimation I go for next loop before possibly cancelling it, while FOR_EACH_LOOP will go from possibly cancelled loop to next. > you probably need to invalidate the scev cache? The current logic I do not think the _by_eval function uses much of scev cache. Here it was simply walking PHI from infinite loop that was artefact of earlier update. > explicitely walks all innermost loops and then updates, you cannot > walk an outer loop of an inner loop that changed without updating SSA > form for example. I think the fel terators will do it when you are cancelling the loops on the go. I will dig out into this more. Honza > > Richard. > > > Honza > > > > > > - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) > > > + for (fel_init (&li, &next, 0); next;) > > > > > > see above - FOR_EACH_LOOP (li, loop, 0) > > > > > > Otherwise looks ok. Please update and re-post. > > > > > > Thanks, > > > Richard. > > > > > > > > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c > > > > === > > > > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > > > > @@ -0,0 +1,12 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > > > > +int a[2]; > > > > +test(int c) > > > > +{ > > > > + int i; > > > > + for (i=0;i > > > +a[i]=5; > > > > +} > > > > +/* Array bounds says the loop will not roll much. */ > > > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely > > > > (duplicated 1 times). Last iteration exit edge was proved true." > > > > "cunroll"} } */ > > > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c > > > > === > > > > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > > > > @@ -0,0 +1,16 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > > > > +int a[2]; > > > > +test(int c) > > > > +{ > > > > + int i; > > > > + for (i=0;i > > > +{ > > > > + a[i]=5; > > > > + if (test2()) > > > > + return; > > > > +} > > > > +} > > > > +/* We are not ab
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
> I think it would be a good idea to keep things unchanged at -O1. For > that, the patch needs a few minor modifications (remove calls to > df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} > conditional). I can prepare an updated patch for that, if you think > that's best. That was just a proposal to be put on the table. :-) Others, especially Vlad of course, have the final say. And I somewhat sympathize with the cautious approach here, because we were forced to downgrade from DF_LIVE to DF_LR for resource.c at some point (but of course it's resource.c so...). -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On 12-10-13 11:37 AM, Steven Bosscher wrote: On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote: On 12-10-13 9:40 AM, Steven Bosscher wrote: Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize > 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize>1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? LIVE is not used on purpose. I added LIVE usage to the old RA about 10 years ago (it was before DF-infrastructure). Everything was ok until, somebody reported compiler crash on semantically wrong program (something with usage of uninitialized variable). After that I removed this code. GCC was a completely different compiler 10 years ago. Handling of uninitialized variables is radically different since tree-ssa, and also the resource allocation side of the compiler has essentially been rewritten (by you :-). Whatever broke in the compiler 10 years ago may not be relevant anymore today. I can't find your work, or the reported problem, in the gcc mailing list archives. Did you add a test case for the problem, and if so, where can I find it? Probably, that is also a reason why reload uses LR not LIVE. The reason why reload uses DF_LR is PR20973, as documented in the code. (And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the transformations of pass_initialize_regs to it, that pass exists to avoid uninitialized regs hazards by making them initialized.) So I'd like to wait a few weeks. When I have more time, I am going to reproduce this test suite and look how IRA behaves on it. I don't think that's giving this patch a fair chance. In a few week's time, stage1 will be over. You may not have time, but that doesn't mean other developers should wait. If you have a test case that fails with my patch, I'll gladly spend my own time on it to see if it can be fixed. My patch was properly tested on two primary targets and passes all tests in the test suite, it improves the compiler now, and it addresses a PR that you've basically ignored for 4 years. I kindly ask you to reconsider your position :-) I guess you are right. A lot of things are changed. DF is also different from what I used. Other passes also uses LIVE sets. Ok for the idea. If we have a problem later, we could fix it. I'll look at the next version of the patch when you send it to give your the final approval. Thanks for your attention to IRA. By the way, I found the reported problem: http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00533.html But it occurred on s390.
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 11:05 PM, Eric Botcazou wrote: >> I think it would be a good idea to keep things unchanged at -O1. For >> that, the patch needs a few minor modifications (remove calls to >> df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} >> conditional). I can prepare an updated patch for that, if you think >> that's best. > > That was just a proposal to be put on the table. :-) Others, especially Vlad > of course, have the final say. And I somewhat sympathize with the cautious > approach here, because we were forced to downgrade from DF_LIVE to DF_LR for > resource.c at some point (but of course it's resource.c so...). Right, that was for PR40710 (http://gcc.gnu.org/ml/gcc/2009-07/msg00241.html) because resource.c computes its own idea of liveness and it uses a similar definition as the DF_LR problem. Using incompatible views of liveness is not a good idea... For IRA it doesn't matter whether you use DF_LR or DF_LIVE, as long as you use one or the other consistently. Ciao! Steven
[PATCH, alpha]: Trivial alpha.md macroizations, part 5
Hello! 2012-10-13 Uros Bizjak * config/alpha/alpha.md (I24MODE): New mode iterator. (any_divmod): New code iterator. (si3): Macroize expander from {div,mod,udiv,umod}si3 using any_divmod code iterator. (si3): Macroize expander from {div,mod,udiv,umod}di3 using any_divmod code iterator. (extendqi2): Macroize insn from extendqi{hi,si}2 using I24MODE mode iterator. (unaligned_store): Macroize expander from unaligned_store{qi,hi} using I12MODE mode iterator. (mov): Macroize expander from mov{qi,hi} using I12MODE mode iterator. Tested on alphaev68-linux-gnu, committed to mainline SVN. Uros. Index: alpha.md === --- alpha.md(revision 192421) +++ alpha.md(working copy) @@ -93,6 +93,7 @@ (define_mode_iterator IMODE [QI HI SI DI]) (define_mode_iterator I12MODE [QI HI]) (define_mode_iterator I124MODE [QI HI SI]) +(define_mode_iterator I24MODE [HI SI]) (define_mode_iterator I248MODE [HI SI DI]) (define_mode_iterator I48MODE [SI DI]) @@ -734,31 +735,16 @@ ;; problem. Is it worth the complication here to eliminate the sign ;; extension? -(define_expand "divsi3" - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" ""))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" ""))) - (parallel [(set (match_dup 5) - (sign_extend:DI (div:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 "nonimmediate_operand" "") - (subreg:SI (match_dup 5) 0))] - "TARGET_ABI_OSF" -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) +(define_code_iterator any_divmod [div mod udiv umod]) -(define_expand "udivsi3" +(define_expand "si3" [(set (match_dup 3) (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" ""))) (set (match_dup 4) (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" ""))) (parallel [(set (match_dup 5) - (sign_extend:DI (udiv:SI (match_dup 3) (match_dup 4 + (sign_extend:DI + (any_divmod:SI (match_dup 3) (match_dup 4 (clobber (reg:DI 23)) (clobber (reg:DI 28))]) (set (match_operand:SI 0 "nonimmediate_operand" "") @@ -770,78 +756,16 @@ operands[5] = gen_reg_rtx (DImode); }) -(define_expand "modsi3" - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" ""))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" ""))) - (parallel [(set (match_dup 5) - (sign_extend:DI (mod:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 "nonimmediate_operand" "") - (subreg:SI (match_dup 5) 0))] - "TARGET_ABI_OSF" -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) - -(define_expand "umodsi3" - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" ""))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" ""))) - (parallel [(set (match_dup 5) - (sign_extend:DI (umod:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 "nonimmediate_operand" "") - (subreg:SI (match_dup 5) 0))] - "TARGET_ABI_OSF" -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) - -(define_expand "divdi3" +(define_expand "di3" [(parallel [(set (match_operand:DI 0 "register_operand" "") - (div:DI (match_operand:DI 1 "register_operand" "") - (match_operand:DI 2 "register_operand" ""))) + (any_divmod:DI + (match_operand:DI 1 "register_operand" "") + (match_operand:DI 2 "register_operand" ""))) (clobber (reg:DI 23)) (clobber (reg:DI 28))])] "TARGET_ABI_OSF" "") -(define_expand "udivdi3" - [(parallel [(set (match_operand:DI 0 "register_operand" "") - (udiv:DI (match_operand:DI 1 "register_operand" "") - (match_operand:DI 2 "register_operand" ""))) - (clobber (reg:DI 23)) - (clobber (reg:DI 28))])] - "TARGET_ABI_OSF" - "") - -(define_expand "moddi3" - [(parallel [(set (match_operand:DI 0 "register_operand" "") - (mod:DI (match_operand:DI 1 "register_operand" "") - (match_operand:DI 2 "register_operand" ""))) - (clobber (reg:DI 23)) - (clobber (reg:DI
Re: [RFC PATCH] Add support for sparc compare-and-branch.
From: Eric Botcazou Date: Sat, 13 Oct 2012 15:47:11 +0200 >> The trouble area, and where I need help from either Rainer or Eric, >> is the Solaris2 bits. >> >> I think we need to move the Solaris assembler stuff over to a model >> where it passes: >> >> -m{32,64} -xarch=sparcFOO >> >> instead of using the v8plusX stuff to indicate 32bit. And that's >> the direction I tried to move in here. > > The only versions of the Solaris assembler I have access to only support > v8plusX according to the man page. Has that changed recently? For the older stuff I mean doing something like "-m32 -xarch=v9X" > >> * config/sparc/sparc.opt (mvis4): New option. >> * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4, >> define __VIS__ to 0x400. > > What's the relationship between VIS4 and SPARC-T4 exactly? The above manual > only speaks of VIS3 as far as I can see. And the CBcond instructions are not > marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange. > Why not make them depend on -mcpu=niagara4 instead? The current assembler in Solaris Studio (called 'fbe') calls this stuff "sparc4" which I guess means "SPARC-T4 and later". I'm just calling it VIS4 in GCC so that we can export intrinsics of, for example, the cryptographic instructions at some point using the __VIS__ version CPP tests.
[patch] PR54919 - fix variable expansion in RTL loop unrolling
Hello, Today appears to be RTL loop optimizer patch day, because here's another patch... The problem here is that variable expansion does not update REG_EQUAL notes when it performs replacement of the renamed register. I fixed this by using validate_replace_rtx_group(). There is already code in analyze_insn_to_expand_var() to make sure that the to-be-replaced register is only used to accumulate into, so I think that using validate_replace_rtx_group is safe. Could use a 2nd pair of eyes to make sure, though. Tested with a bootstrapped compiler. Test coverage isn't great, because variable expansion is not enabled by default. OK for trunk? Ciao! Steven pr54919.diff Description: Binary data
Re: [RFC PATCH] Add support for sparc compare-and-branch.
From: David Miller Date: Sat, 13 Oct 2012 17:52:42 -0400 (EDT) > The current assembler in Solaris Studio (called 'fbe') calls this > stuff "sparc4" which I guess means "SPARC-T4 and later". BTW, for reference, see: http://docs.oracle.com/cd/E24457_01/html/E22003/fbe.1.html
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On Sat, 13 Oct 2012, Diego Novillo wrote: > On 2012-10-13 14:04 , Tobias Schlüter wrote: > > On 2012-10-13 20:00, Diego Novillo wrote: > > > On 2012-10-13 09:26 , Tobias Schlüter wrote: > > > > first a question also to non-gfortraners: if I want to use std::map, > > > > where do I "#include "? In system.h? > > > > > > No. Ada includes system.h in pure C code. Why not include it exactly > > > where you need it? > > > > Ok, I was just wondering if there's a rule because a quick grep revealed > > no non-header source file that includes system headers. > > Joseph or others may have reason to create a system.hxx file or some such, but > in principle I don't see why we shouldn't include std library headers as we > need them. > > Joseph? The poisoning of various standard library functions that should not be used directly in GCC can be problematic if system headers, using those functions, are included after that poisoning. Thus in general it's better for system header includes to go in system.h, before the poisoning. In addition, where there is some autoconf configuration about whether to include a header, it's generally preferred to keep down the number of places with such a conditional on host features (encapsulating whatever features the header provides in some way so that other code can use them unconditionally). -- Joseph S. Myers jos...@codesourcery.com
Re: encoding all aliases options in .opt files
On Sat, 13 Oct 2012, Manuel L?pez-Ib??ez wrote: > OK. The attached patch implements this. Does the approach look ok? I > will write changelog and more comments if it seems reasonable. One Without the comments explaining the semantics of the new functions and their parameters, I'm not going to attempt to reverse-engineer what the approach in question is intended to be and so whether it's reasonable or not. > 2) While fixing this, I was thinking: Why the difference between > Joined and Separate? Why not make every function that takes an > argument work with and without '=' and with separate argument. It > seems we could remove a lot of options this way. What I am proposing > is, instead of: I see no significant advantage to users in having all these different ways to pass options, and multiple disadvantages: * If you add yet more ways of passing options, then any future overhaul of option handling needs to preserve those in addition to all the others. The fewer ways there are, the fewer constraints on the implementation. * Options like this > aux-infoFILE /* we could accept this to be compatible with some > options like -B */ are liable to be a pain for anyone, seeing such an option, to work out what is the option name that they might search for to get more information, and what is the argument. * In general, if all users are using an option in the same form it's easier for them to tell that another person really is using the same option rather than something that might be different from the option they are used to. * This also may not work so well with JoinedOrMissing options. > Wstrict-aliasing2 Shades of the old -gdwarf2 that once meant "DWARF 1, debug level 2", not to be confused with -gdwarf-2. I don't think such cryptic forms should be encouraged. > Wstrict-aliasing 2 Ambiguous with the existing -Wstrict-aliasing option followed by a linker input file called 2. -- Joseph S. Myers jos...@codesourcery.com
[patch][wwwdocs] gcc 4.8 changes - mention scalability improvements
Hello, This patch adds a short notice about some speed-ups in GCC 4.8 for extremely large functions (coming from the work done on PR54146 by several people). OK for the wwwdocs? Ciao! Steven Index: htdocs/gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.44 diff -u -r1.44 changes.html --- htdocs/gcc-4.8/changes.html 9 Oct 2012 18:44:55 - 1.44 +++ htdocs/gcc-4.8/changes.html 13 Oct 2012 22:45:59 - @@ -65,10 +65,17 @@ level, and it makes PRE more aggressive. The struct reorg and matrix reorg optimizations (command-line -options -fipa-struct-reorg and --fipa-matrix-reorg) have been removed. They did not -work correctly nor with link-time optimization (LTO), hence were only -applicable to programs consisting of a single translation unit. + options -fipa-struct-reorg and + -fipa-matrix-reorg) have been removed. They did not + work correctly nor with link-time optimization (LTO), hence were only + applicable to programs consisting of a single translation unit. + +Several scalability bottle-necks have been removed from GCC's + optimization passes. Compilation of extremely large functions, + e.g. due to the use of the flatten attribute in the + "Eigen" C++ linear algebra templates library, is significantly + faster than previous releases of GCC. +
Re: Propagate profile counts during switch expansion
Ping. On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman wrote: > I have attached a revised patch. The updated ChangeLog is given below > and I have responded to your comments inline: > > 2012-10-08 Easwaran Raman > * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to > specificy the probability of taking the jump. > (emit_cmp_and_jump_insns): Likewise. > (expand_compare_and_swap_loop): Make the jump predicted not taken. > * dojump.c (do_compare_rtx_and_jump): Remove the code attaching > REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns. > * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing > probabilities for branches with more than two successors. > * expr.c (emit_block_move_via_loop): Predict the loop backedge loop > to be highly taken. > (try_casesi): Pass the probability of jumping to the default label. > (try_tablejump): Likewise. > (do_tablejump): Likewise. > * expr.h (try_tablejump): Add a new parameter. > (try_casesi): Likewise. > (emit_cmp_and_jump_insns): Add probability as default parameter with a > default value of -1. > * except.c (sjlj_emit_function_enter): Pass probability to > emit_cmp_and_jump_insns. > * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (add_case_node): Pass estimated probability of jumping to the case > label. > (emit_case_decision_tree): Pass default_prob to emit_case_nodes. > (get_outgoing_edge_probs): New function. > (conditional_probability): Likewise. > (reset_out_edges_aux): Likewise. > (compute_cases_per_edge): Likewise. > (emit_case_dispatch_table): Update probabilities of edges coming out > of the switch statement. > (expand_case): Compute and propagate default edge probability to > emit_case_dispatch_table. > (expand_sjlj_dispatch_table): Update calls to add_case_node and > emit_case_dispatch_table. > (balance_case_nodes): Update subtree_prob values. > (emit_case_nodes): Compute edge probabilities and add pass them to > emit_cmp_and_jump_insns. > > gcc/testsuite/ChangeLog: > 2012-10-02 Easwaran Raman > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > > > > On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka wrote: >>> Hi, >>> This patch propagates the profile counts during RTL expansion. In >>> many cases, there is no way to determine the exact count of an edge >>> generated during the expansion. So this patch uses some simple >>> heuristics to estimate the edge counts but ensures that the counts of >>> the basic blocks corresponding to the cases are (nearly the) same as >>> at the gimple level. >>> >>> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for >>> trunk? >>> Index: gcc/expr.c >>> === >>> --- gcc/expr.c (revision 191879) >>> +++ gcc/expr.c (working copy) >>> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin >>> #ifdef PUSH_ROUNDING >>> static void emit_single_push_insn (enum machine_mode, rtx, tree); >>> #endif >>> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); >>> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); >>> static rtx const_vector_from_tree (tree); >>> static void write_complex_part (rtx, rtx, bool); >>> >>> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree >>> >>> static void >>> do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx >>> table_label, >>> - rtx default_label) >>> + rtx default_label, int default_probability) >> >> Please document default_probability. > Done. > >>> { >>>rtx temp, vector; >>> >>> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r >>> the maximum value of the range. */ >>> >>>if (default_label) >>> -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> - default_label); >>> +{ >>> + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> + default_label); >>> + if (default_probability != -1) >>> +{ >>> + rtx jump_insn = get_last_insn(); >>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT >>> (default_probability)); >>> +} >>> +} >> >> dojump already does this kind of logic, but it is bit more cureful: >> emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, >>if_true_label); >> if (prob != -1 && profile_status != PROFILE_ABSENT) >> { >> for (last = NEXT_INSN (last); >>last && NEXT_INSN (last); >>last = NEXT_INSN (last)) >> if (JUMP_P (last)) >> break; >> if (last >> && JUMP_P (last) >> && ! NEXT_INSN (last) >> && any_condjump_p (last)) >> { >> gcc_assert (!find_reg_note (last, REG_BR_PROB, 0)); >> a
[C++ Patch] PR 17805
Hi, back in 2005 Alexandre worked on this issue up to the point that Mark approved a patch conditional to a couple of straightforward changes (see audit trail). Then nothing happened ;) Today I resurrected the patch, implemented the requests and in the process noticed that it wasn't really handling the single argument case, thus I also extended a bit the testcase. Tested x86_64-linux. (Finally) Ok? Thanks, Paolo. /// /cp 2012-10-14 Alexandre Oliva Paolo Carlini PR c++/17805 * call.c (build_new_op): Filter out operator functions that don't satisfy enum-conversion match requirements. /testsuite 2012-10-14 Alexandre Oliva Paolo Carlini PR c++/17805 * g++.dg/overload/operator6.C: New. Index: cp/call.c === --- cp/call.c (revision 192425) +++ cp/call.c (working copy) @@ -5043,6 +5043,11 @@ build_new_op_1 (location_t loc, enum tree_code cod NULL_TREE, arglist, NULL_TREE, NULL_TREE, false, NULL_TREE, NULL_TREE, flags, &candidates, complain); + + args[0] = arg1; + args[1] = arg2; + args[2] = NULL_TREE; + /* Add class-member operators to the candidate set. */ if (CLASS_TYPE_P (TREE_TYPE (arg1))) { @@ -5062,11 +5067,50 @@ build_new_op_1 (location_t loc, enum tree_code cod BASELINK_ACCESS_BINFO (fns), flags, &candidates, complain); } + /* Per 13.3.1.2/3, 2nd bullet, if no operand has a class type, then + only non-member functions that have type T1 or reference to + cv-qualified-opt T1 for the first argument, if the first argument + has an enumeration type, or T2 or reference to cv-qualified-opt + T2 for the second argument, if the the second argument has an + enumeration type. Filter out those that don't match. */ + else if (! arg2 || ! CLASS_TYPE_P (TREE_TYPE (arg2))) +{ + struct z_candidate **candp, **next; - args[0] = arg1; - args[1] = arg2; - args[2] = NULL_TREE; + for (candp = &candidates; *candp; candp = next) + { + tree parmlist, parmtype; + int i, nargs = (arg2 ? 2 : 1); + cand = *candp; + next = &cand->next; + + parmlist = TYPE_ARG_TYPES (TREE_TYPE (cand->fn)); + + for (i = 0; i < nargs; ++i) + { + parmtype = TREE_VALUE (parmlist); + + if (TREE_CODE (parmtype) == REFERENCE_TYPE) + parmtype = TREE_TYPE (parmtype); + if (TREE_CODE (TREE_TYPE (args[i])) == ENUMERAL_TYPE + && (same_type_ignoring_top_level_qualifiers_p + (TREE_TYPE (args[i]), parmtype))) + break; + + parmlist = TREE_CHAIN (parmlist); + } + + /* No argument has an appropriate type, so remove this +candidate function from the list. */ + if (i == nargs) + { + *candp = cand->next; + next = candp; + } + } +} + add_builtin_candidates (&candidates, code, code2, fnname, args, flags, complain); Index: testsuite/g++.dg/overload/operator6.C === --- testsuite/g++.dg/overload/operator6.C (revision 0) +++ testsuite/g++.dg/overload/operator6.C (working copy) @@ -0,0 +1,27 @@ +// PR c++/17805 + +// Per 13.3.1.2/3 bullet 2, an operator function is not a candidate +// for overload resolution if neither argument is of class type and +// neither enumerator-typed argument gets an exact match, with or +// without reference binding, for the corresponding parameter. + +struct A +{ + A(int); + A(const char*); +}; + +bool operator==(const A&, const A&); +const A& operator*(const A&); + +enum E { e }; + +bool b1 = (e == ""); // { dg-error "no match" } + +bool b2 = (A(1) == ""); + +bool b3 = (e == A(1)); + +const A& a1 = *e;// { dg-error "no match" } + +const A& a2 = *A(1);