Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 18:22, Richard Guenther wrote: 2012/4/12 Andrey Belevantsev: On 12.04.2012 17:54, Richard Guenther wrote: 2012/4/12 Andrey Belevantsev: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation "atom-imul-32" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0") ;;; imul instruction excludes other non-FP instructions. (exclusion_set "atom-eu-0, atom-eu-1" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4") The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? I mean that the scheduler does things like "sort the ready list according to a number of heuristics and to the insn priority, then choose the insn that would allow the maximum of ready insns to be issued on the current cycle". The heuristics are rather general. The scheduler does not do things like "if some random insn is ready, then choose some other random insn from the ready list and schedule it" (which is what the patch does). This is what scheduler hooks are for, to account for some target-specific heuristic. The problem is that this particular implementation looks somewhat like an overkill and also motivated by a single benchmark. Testing on a wider set of benchmarks and checking compile-time hit would make the motivation more convincing. Yeah, and especially looking _why_ the generic heuristics are not working and if they could be improved. After all the issue seems to be properly modeled in the DFA. It is, but the DFA only models the actual stalls that you get when an imul is scheduled. What you need here is to increase priority for ready insns that have imuls in their critical paths, and those imuls should be close enough to fill in the gap (actual consumers in the case of the patch). The lookahead done in max_issue can help with this "dynamic" priority change in general, but it considers just the ready insns, not their immediate consumers. You need to make the deeper lookahead if you want to do this in a target independent way -- no-no from compile time POV. A number of already existing hooks can help: - sched_reorder used in this patch just sorts the ready list in any way the target wishes; - adjust_priority/adjust_cost helps to boost or to lower priority in such subtle cases; - is_costly_dependence (now rs6000 only) can guide the early queue insn removal and its addition to the ready list; - dfa_lookahead_guard (now ia64 only) can ban some insns from being issued on the current try. Using sched_reorder is existing practice, like in s390, spu, mips, etc. Usually though an implementation again only considers the actual ready insns and not the successors. I'd say that trying more tests would help, or other hooks can be tried, too, but after all this is Atom specific, so it's the decision of i386 maintainers. Andrey Richard. Andrey Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidat
Re: [PATCH] Fix PR 52203 and 52715
Hello, On 07.03.2012 15:46, Alexander Monakov wrote: On Wed, 7 Mar 2012, Andrey Belevantsev wrote: Hello, This PR is again about insns that are recog'ed as>=0 but do not change the processor state. As explained in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried experimenting with an attribute marking those "empty" insns in MD files and asserting that all other insns do have reservations. As this doesn't seem to be interesting, I give up with the idea, and the below patch makes sel-sched do exactly what the Haifa scheduler does, i.e. do not count such insns against issue_rate when modelling clock cycles. But of course I wrongly microoptimized the decision of whether an insn is "empty" as shown in PR 52715, so the right fix is to check the emptiness right before issuing the insn. Thus, the following patch is really needed (tested on ia64 and x86 again), OK? The new hunk is the last one in the patch. Andrey 2012-04-13 Andrey Belevantsev PR rtl-optimization/52203 PR rtl-optimization/52715 Revert the 2012-03-07 fix for PR 52203. * sel-sched.c (reset_sched_cycles_in_current_ebb): Check that the insn does not modify DFA right before issuing, adjust issue_rate accordingly. Tested on ia64 and x86-64, OK for trunk? No testcase again because of the amount of flags needed. Andrey 2012-03-07 Andrey Belevantsev PR rtl-optimization/52203 * sel-sched.c (estimate_insn_cost): New parameter pempty. Adjust all callers to pass NULL except ... (reset_sched_cycles_in_current_ebb): ... here, save the value in new variable 'empty'. Increase issue_rate only for non-empty insns. This is OK. Thanks. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 4e13230..ce38fa0 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -4265,10 +4265,9 @@ invoke_aftermath_hooks (fence_t fence, rtx best_insn, int issue_more) return issue_more; } -/* Estimate the cost of issuing INSN on DFA state STATE. Write to PEMPTY - true when INSN does not change the processor state. */ +/* Estimate the cost of issuing INSN on DFA state STATE. */ static int -estimate_insn_cost (rtx insn, state_t state, bool *pempty) +estimate_insn_cost (rtx insn, state_t state) { static state_t temp = NULL; int cost; @@ -4278,8 +4277,6 @@ estimate_insn_cost (rtx insn, state_t state, bool *pempty) memcpy (temp, state, dfa_state_size); cost = state_transition (temp, insn); - if (pempty) -*pempty = (memcmp (temp, state, dfa_state_size) == 0); if (cost < 0) return 0; @@ -4310,7 +4307,7 @@ get_expr_cost (expr_t expr, fence_t fence) return 0; } else -return estimate_insn_cost (insn, FENCE_STATE (fence), NULL); +return estimate_insn_cost (insn, FENCE_STATE (fence)); } /* Find the best insn for scheduling, either via max_issue or just take @@ -7023,7 +7020,7 @@ reset_sched_cycles_in_current_ebb (void) { int cost, haifa_cost; int sort_p; - bool asm_p, real_insn, after_stall, all_issued, empty; + bool asm_p, real_insn, after_stall, all_issued; int clock; if (!INSN_P (insn)) @@ -7050,7 +7047,7 @@ reset_sched_cycles_in_current_ebb (void) haifa_cost = 0; } else -haifa_cost = estimate_insn_cost (insn, curr_state, &empty); +haifa_cost = estimate_insn_cost (insn, curr_state); /* Stall for whatever cycles we've stalled before. */ after_stall = 0; @@ -7084,7 +7081,7 @@ reset_sched_cycles_in_current_ebb (void) if (!after_stall && real_insn && haifa_cost > 0 - && estimate_insn_cost (insn, curr_state, NULL) == 0) + && estimate_insn_cost (insn, curr_state) == 0) break; /* When the data dependency stall is longer than the DFA stall, @@ -7096,7 +7093,7 @@ reset_sched_cycles_in_current_ebb (void) if ((after_stall || all_issued) && real_insn && haifa_cost == 0) -haifa_cost = estimate_insn_cost (insn, curr_state, NULL); +haifa_cost = estimate_insn_cost (insn, curr_state); } haifa_clock += i; @@ -7127,8 +7124,14 @@ reset_sched_cycles_in_current_ebb (void) if (real_insn) { + static state_t temp = NULL; + + if (!temp) + temp = xmalloc (dfa_state_size); + memcpy (temp, curr_state, dfa_state_size); + cost = state_transition (curr_state, insn); - if (!empty) + if (memcmp (temp, curr_state, dfa_state_size)) issued_insns++; if (sched_verbose >= 2)
use longcall to abort from !pic trampoline setup on ppc-vxworks
Hello, For some versions and execution modes, VxWorks features facilities to let users download object modules and link them with the kernel at run-time. Relocation troubles (24bit reloc overflows) might show up when module instructions contain short references to kernel symbols and the module happens to be loaded very far away from the kernel in memory. As of today, the powerpc "__trampoline_setup" function embeds such a potentially problematic reference, in the short call the "abort" at the end. The attached patch is a suggestion to prevent the potential troubles by simply issuing a longcall sequence in the relevant case (vxworks, !pic). We have been using it in our internal trees for years now. Tested on mainline by checking that a linux hosted build for powerpc-wrs-vxworks succeeds and that the expected sequence is found in the tramp.o of interest. OK to commit ? Thanks in advance, With Kind Regards, Olivier 2012-04-12 Olivier Hainque libgcc/ * config/rs6000/vxworks/tramp.S (trampoline_setup): Use a longcall sequence in the non pic case on VxWorks. vxtramp.dif Description: video/dv
[PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
Hello, Here is a patch which creates new gnu-user-common.h file and moves all common gnu-user.h and gnu-user64.h definitions to this new file. New file is required to avoid duplication of Android specific changes in gnu-user.h and gnu-user64.h. This patch is actually a non Android specific part of previously submitted patch to support Android in x86 target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html). Bootstrapped and tested on linux-x86_64. Is it OK for mainline and 4.7? Thanks Ilya, --- 2012-04-13 Enkovich Ilya * config/i386/gnu-user.h: Include gnu-user-common.h. (CPP_SPEC): Removed. (CC1_SPEC): Removed. (ENDFILE_SPEC): Removed. (DEFAULT_PCC_STRUCT_RETURN): Removed. (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed. (TARGET_OS_CPP_BUILTINS): Removed. (LIBGCC2_HAS_TF_MODE): Removed. (LIBGCC2_TF_CEXT): Removed. (TF_SIZE): Removed. (TARGET_ASM_FILE_END): Removed. (STACK_CHECK_MOVING_SP): Removed. (STACK_CHECK_STATIC_BUILTIN): Removed. (TARGET_THREAD_SSP_OFFSET): Removed. (TARGET_CAN_SPLIT_STACK): Removed. (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed. * config/i386/gnu-user64.h: Likewise. * config/i386/gnu-user-common.h: New. mandroid.patch Description: Binary data
[PATCH] Fix for PR52734 (-ftree-tail-merge)
Richard, this patch fixes PR52743. The problem is as follows: blocks 3 and 5, with successor 6 are considered equal and merged. ... # BLOCK 3 freq:6102 # PRED: 2 [61.0%] (true,exec) # VUSE <.MEMD.1734_10> dddD.1710_3 = bbbD.1703; goto ; # SUCC: 6 [100.0%] (fallthru,exec) # BLOCK 5 freq:2378 # PRED: 4 [61.0%] (false,exec) # SUCC: 6 [100.0%] (fallthru,exec) # BLOCK 6 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 7 [100.0%] (fallthru) 5 [100.0%] (fallthru,exec) # dddD.1710_1 = PHI # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)> # VUSE <.MEMD.1734_8> return dddD.1710_1; # SUCC: EXIT [100.0%] ... Tail merge considers 2 blocks equal if the effect at the tail is equal, meaning: - the sequence of side effects produced by each block is equal - the value phis are equal There are no side effects in block 3 and 5, and the phi alternatives of dddD.1710_1 for 3 (dddD.1710_3) and 5 (dddD.1710_4) are proven equal by gvn. The problem is that changing the (4->5) edge into a (4->3) edge changes the value of dddD.1710_3, because block 4 contains a store that affects the load in block 3. ... # BLOCK 4 freq:3898 # PRED: 2 [39.0%] (false,exec) # VUSE <.MEMD.1734_10> dddD.1710_4 = bbbD.1703; # .MEMD.1734_11 = VDEF <.MEMD.1734_10> # USE = nonlocal null # CLB = nonlocal null D.1724_5 = aaaD.1705 (); if (D.1724_5 != 0) goto ; else goto ; # SUCC: 7 [39.0%] (true,exec) 5 [61.0%] (false,exec) ... Or, put differently, the incoming vuse of block 3 affects a value phi alternative for that block (dddD.1710_3), so the 2 blocks are equal only under the condition that the incoming vuses are equal. We could build an analysis that addresses that precisely, but for now I implemented a more coarse-grained fix: if the incoming vuses are not equal, and at least one of the vuses influenced a non-virtual result, we don't consider the blocks equal. Bootstrapped and reg-tested on x86_64. ok for trunk, 4.7.1? Thanks, - Tom 2012-04-13 Tom de Vries * tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add parameters vuse and vuse_escaped. (find_duplicate): Init vuse1, vuse2 and vuse_escaped. Pass to gsi_advance_bw_nondebug_nonlocal. Return if vuse_escaped and vuse1 != vuse2. * gcc.dg/pr52734.c: New test. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 185028) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -1123,18 +1123,31 @@ gimple_equal_p (same_succ same_succ, gim } } -/* Let GSI skip backwards over local defs. */ +/* Let GSI skip backwards over local defs. Return the earliest vuse in VUSE. + Return true in VUSE_ESCAPED if the vuse influenced a SSA_OP_DEF of one of the + processed statements. */ static void -gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi) +gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse, + bool *vuse_escaped) { gimple stmt; + tree lvuse; while (true) { if (gsi_end_p (*gsi)) return; stmt = gsi_stmt (*gsi); + + lvuse = gimple_vuse (stmt); + if (lvuse != NULL_TREE) + { + *vuse = lvuse; + if (!ZERO_SSA_OPERANDS (stmt, SSA_OP_DEF)) + *vuse_escaped = true; + } + if (!(is_gimple_assign (stmt) && local_def (gimple_get_lhs (stmt)) && !gimple_has_side_effects (stmt))) return; @@ -1150,9 +1163,11 @@ find_duplicate (same_succ same_succ, bas { gimple_stmt_iterator gsi1 = gsi_last_nondebug_bb (bb1); gimple_stmt_iterator gsi2 = gsi_last_nondebug_bb (bb2); + tree vuse1 = NULL_TREE, vuse2 = NULL_TREE; + bool vuse_escaped = false; - gsi_advance_bw_nondebug_nonlocal (&gsi1); - gsi_advance_bw_nondebug_nonlocal (&gsi2); + gsi_advance_bw_nondebug_nonlocal (&gsi1, &vuse1, &vuse_escaped); + gsi_advance_bw_nondebug_nonlocal (&gsi2, &vuse2, &vuse_escaped); while (!gsi_end_p (gsi1) && !gsi_end_p (gsi2)) { @@ -1161,13 +1176,20 @@ find_duplicate (same_succ same_succ, bas gsi_prev_nondebug (&gsi1); gsi_prev_nondebug (&gsi2); - gsi_advance_bw_nondebug_nonlocal (&gsi1); - gsi_advance_bw_nondebug_nonlocal (&gsi2); + gsi_advance_bw_nondebug_nonlocal (&gsi1, &vuse1, &vuse_escaped); + gsi_advance_bw_nondebug_nonlocal (&gsi2, &vuse2, &vuse_escaped); } if (!(gsi_end_p (gsi1) && gsi_end_p (gsi2))) return; + /* If the incoming vuses are not the same, and the vuse escaped into an + SSA_OP_DEF, then merging the 2 blocks will change the value of the def, + which potentially means the semantics of one of the blocks will be changed. + TODO: make this check more precise. */ + if (vuse_escaped && vuse1 != vuse2) +return; + if (dump_file) fprintf (dump_file, "find_duplicates: duplicate of \n", bb1->index, bb2->index); Index: gcc/testsuite/gcc.dg/pr52734.c ==
Re: [PR tree-optimization/52558]: RFC: questions on store data race
On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez wrote: > Here we have a testcase that affects both the C++ memory model and > transactional memory. > > [Hans, this is caused by the same problem that is causing the speculative > register promotion issue you and Torvald pointed me at]. > > In the following testcase (adapted from the PR), the loop invariant motion > pass caches a pre-existing value for g_2, and then performs a store to g_2 > on every path, causing a store data race: > > int g_1 = 1; > int g_2 = 0; > > int func_1(void) > { > int l; > for (l = 0; l < 1234; l++) > { > if (g_1) > return l; > else > g_2 = 0; <-- Store to g_2 should only happen if !g_1 > } > return 999; > } > > This gets transformed into something like: > > g_2_lsm = g_2; > if (g_1) { > g_2 = g_2_lsm; // boo! hiss! > return 0; > } else { > g_2_lsm = 0; > g_2 = g_2_lsm; > } > > The spurious write to g_2 could cause a data race. > > Andrew has suggested verifying that the store to g_2 was actually different > than on entry, and letting PHI copy propagation optimize the redundant > comparisons. Like this: > > g_2_lsm = g_2; > if (g_1) { > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; // hopefully optimized away > return 0; > } else { > g_2_lsm = 0; > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; > } > > ...which PHI copy propagation and/or friends should be able to optimize. > > For that matter, regardless of the memory model, the proposed solution would > improve the existing code by removing the extra store (in this contrived > test case anyhow). > > Anyone see a problem with this approach (see attached patch)? Can we _remove_ a store we percieve as redundant (with a single-threaded view) with the memory model? If so, then this sounds plausible (minus the optimization that if the variable is written to unconditionally we avoid this comparison -- see the places where we check whether we can apply store motion to possibly trapping locations). A similar effect could be obtained by keeping a flag whether we entered the path that stored the value before the transform. Thus, do lsm = g2; // technically not needed, if you also handle loads flag = false; for (...) { if (g1) { if (flag) g2 = lsm; } else { lsm = 0; flag = true; } } if (flag) g2 = lsm; that would allow to extend this to cover the cases where the access may eventually trap (if you omit the load before the loop). Not sure which is more efficient (you can eventually combine flag variables for different store locations, combining the if-changed compare is not so easy I guess). > Assuming the unlikely scenario that everyone likes this :), I have two > problems that I would like answered. But feel free to ignore if the > approach is a no go. > > 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2. > In the PR, Richard mentions that both FRE/PRE can figure this out, but they > are not run after store motion. DOM should also be able to, but apparently > does not :(. Tips? Well. Schedule FRE after loop opts ... DOM is not prepared to handle loads/stores with differing VOPs - it was never updated really after the alias-improvements merge. > 2. The GIMPLE_CONDs I create in this patch are currently causing problems > with complex floats/doubles. do_jump is complaining that it can't compare > them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex > values since complex lowering has already happened (??). Is there a more > canonical way of creating a GIMPLE_COND that may contain complex floats at > this stage? As you are handling redundant stores you want a bitwise comparison anyway, but yes, complex compares need to be lowered. But as said, you want a bitwise comparison, not a value-comparison. You can try using if (VIEW_CONVERT_EXPR (lsm_tmp_reg) != VIEW_CONVERT_EXPR < > (orig_value)) ... that can of course result in really awful codegen (think of fp - gp register moves, or even stack spills). Which maybe hints at that the flag approach would be better in some cases? (you need to cache the original value in a register anyway, the only thing you save is updating it when you would update the flag value) Maybe you can combine both, eventually dependent on a target hook can_compare_bitwise (mode) that would tell you if the target can efficiently compare two registers in mode for bit equality. Few comments on the patch itself. + new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops && new_bb->loop_father) + add_bb_to_loop (then_bb, new_bb->lo
Re: [PATCH] Fix PR 52203 and 52715
On Fri, 13 Apr 2012, Andrey Belevantsev wrote: > But of course I wrongly microoptimized the decision of whether an insn is > "empty" as shown in PR 52715, so the right fix is to check the emptiness right > before issuing the insn. Thus, the following patch is really needed (tested > on ia64 and x86 again), OK? The new hunk is the last one in the patch. > > Andrey > > 2012-04-13 Andrey Belevantsev > > PR rtl-optimization/52203 > PR rtl-optimization/52715 > > Revert the 2012-03-07 fix for PR 52203. > * sel-sched.c (reset_sched_cycles_in_current_ebb): Check that > the insn does not modify DFA right before issuing, adjust > issue_rate accordingly. OK. Thanks. Alexander
Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)
On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries wrote: > Richard, > > this patch fixes PR52743. > > The problem is as follows: blocks 3 and 5, with successor 6 are considered > equal > and merged. > ... > # BLOCK 3 freq:6102 > # PRED: 2 [61.0%] (true,exec) > # VUSE <.MEMD.1734_10> > dddD.1710_3 = bbbD.1703; > goto ; > # SUCC: 6 [100.0%] (fallthru,exec) > > # BLOCK 5 freq:2378 > # PRED: 4 [61.0%] (false,exec) > # SUCC: 6 [100.0%] (fallthru,exec) > > # BLOCK 6 freq:1 > # PRED: 3 [100.0%] (fallthru,exec) 7 [100.0%] (fallthru) 5 [100.0%] > (fallthru,exec) > # dddD.1710_1 = PHI > # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)> > # VUSE <.MEMD.1734_8> > return dddD.1710_1; > # SUCC: EXIT [100.0%] > ... > > Tail merge considers 2 blocks equal if the effect at the tail is equal, > meaning: > - the sequence of side effects produced by each block is equal > - the value phis are equal > > There are no side effects in block 3 and 5, and the phi alternatives of > dddD.1710_1 for 3 (dddD.1710_3) and 5 (dddD.1710_4) are proven equal by gvn. > > The problem is that changing the (4->5) edge into a (4->3) edge changes the > value of dddD.1710_3, because block 4 contains a store that affects the load > in > block 3. > ... > # BLOCK 4 freq:3898 > # PRED: 2 [39.0%] (false,exec) > # VUSE <.MEMD.1734_10> > dddD.1710_4 = bbbD.1703; > # .MEMD.1734_11 = VDEF <.MEMD.1734_10> > # USE = nonlocal null > # CLB = nonlocal null > D.1724_5 = aaaD.1705 (); > if (D.1724_5 != 0) > goto ; > else > goto ; > # SUCC: 7 [39.0%] (true,exec) 5 [61.0%] (false,exec) > ... > > Or, put differently, the incoming vuse of block 3 affects a value phi > alternative for that block (dddD.1710_3), so the 2 blocks are equal only under > the condition that the incoming vuses are equal. > > We could build an analysis that addresses that precisely, but for now I > implemented a more coarse-grained fix: if the incoming vuses are not equal, > and > at least one of the vuses influenced a non-virtual result, we don't consider > the > blocks equal. > > Bootstrapped and reg-tested on x86_64. > > ok for trunk, 4.7.1? Hmm, I wonder if the proper observation would not be that while GVN considers the PHI arguments in > # dddD.1710_1 = PHI to be equal, their definitions are not in the blocks we try to merge, so their value can be dependent on the status as it was on their predecessors. GVN, after all, considers flow-sensitivity. Richard. > Thanks, > - Tom > > 2012-04-13 Tom de Vries > > * tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add > parameters vuse and vuse_escaped. > (find_duplicate): Init vuse1, vuse2 and vuse_escaped. Pass to > gsi_advance_bw_nondebug_nonlocal. Return if vuse_escaped and > vuse1 != vuse2. > > * gcc.dg/pr52734.c: New test. >
Re: [PATCH][C] Fix PR52549
On Thu, 12 Apr 2012, Richard Guenther wrote: > > This fixes PR52549 - we are running into an overzealous assert > that wants to make sure we don't have PLUS_EXPR on pointers. > But that code does not really check this and falls foul of > the conversion removal code right before it that transforms > (void *)(a + b) to a + b. > > Fixed as follows. > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Done. > Ok? Actually I changed my mind and consider it obvious. Richard. > 2012-04-12 Richard Guenther > > PR c/52549 > * c-typeck.c (pointer_diff): Remove bogus assert. > > * gcc.dg/pr52549.c: New testcase. > > Index: gcc/c-typeck.c > === > --- gcc/c-typeck.c(revision 186373) > +++ gcc/c-typeck.c(working copy) > @@ -3446,8 +3446,6 @@ pointer_diff (location_t loc, tree op0, >else > con1 = op1; > > - gcc_assert (TREE_CODE (con0) != PLUS_EXPR > - && TREE_CODE (con1) != PLUS_EXPR); >if (TREE_CODE (con0) == POINTER_PLUS_EXPR) > { >lit0 = TREE_OPERAND (con0, 1); > Index: gcc/testsuite/gcc.dg/pr52549.c > === > --- gcc/testsuite/gcc.dg/pr52549.c(revision 0) > +++ gcc/testsuite/gcc.dg/pr52549.c(revision 0) > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > + > +_mark (long obj, int i, char *a) > +{ > + (char *)&(((long *)(obj)) [i]) - a; > +} >
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
Hello! > Here is a patch which creates new gnu-user-common.h file and moves all > common gnu-user.h and gnu-user64.h definitions to this new file. New > file is required to avoid duplication of Android specific changes in > gnu-user.h and gnu-user64.h. This patch is actually a non Android > specific part of previously submitted patch to support Android in x86 > target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html). > Bootstrapped and tested on linux-x86_64. Is it OK for mainline and > 4.7? > 2012-04-13 Enkovich Ilya > > * config/i386/gnu-user.h: Include gnu-user-common.h. > (CPP_SPEC): Removed. > (CC1_SPEC): Removed. > (ENDFILE_SPEC): Removed. > (DEFAULT_PCC_STRUCT_RETURN): Removed. > (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed. > (TARGET_OS_CPP_BUILTINS): Removed. > (LIBGCC2_HAS_TF_MODE): Removed. > (LIBGCC2_TF_CEXT): Removed. > (TF_SIZE): Removed. > (TARGET_ASM_FILE_END): Removed. > (STACK_CHECK_MOVING_SP): Removed. > (STACK_CHECK_STATIC_BUILTIN): Removed. > (TARGET_THREAD_SSP_OFFSET): Removed. > (TARGET_CAN_SPLIT_STACK): Removed. > (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed. > > * config/i386/gnu-user64.h: Likewise. > > * config/i386/gnu-user-common.h: New. This ChangeLog is wrong, you didn't remove these files but move them to new file. +#ifdef TARGET_LIBC_PROVIDES_SSP +/* i386 glibc provides __stack_chk_guard in %gs:0x14, + x32 glibc provides it in %fs:0x18. + x86_64 glibc provides it in %fs:0x28. */ +#define TARGET_THREAD_SSP_OFFSET \ + (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14) + +/* We steal the last transactional memory word. */ +#define TARGET_CAN_SPLIT_STACK +#define TARGET_THREAD_SPLIT_STACK_OFFSET \ + (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30) +#endif Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT in gnu-user.h? I don't see the reason to put these conditinal defines in shared file. However, this may be due to biarch handling that I'm not familiar with in all details. +#include "config/i386/gnu-user-common.h" You shouldn't include new file here, list it in config.gcc before i386/gnu-user.h or i386/gnu-user64.c Uros.
Re: [PATCH] Option to build bare-metal ARM cross-compiler for arm-none-eabi target without libunwind
Hello, I think it is good to disable the exceptions for the division by zero. Maybe this should be made target specific and not based on a configure option. For example in "libgcc/config.host": arm*-*-linux*) [...] tmake_file="${tmake_file} arm/t-div-by-zero-exc" [...] With "libgcc/config/arm/t-div-by-zero-exc": LIB2_DIVMOD_CFLAGS := -fexceptions -fnon-call-exception -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev wrote: > On 12.04.2012 16:38, Richard Guenther wrote: >> >> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin >> wrote: >>> >>> On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther >>> wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov wrote: > > >> Can atom execute two IMUL in parallel? Or what exactly is the >> pipeline >> behavior? > > > As I understand from Intel's optimization reference manual, the > behavior is as > follows: if the instruction immediately following IMUL has shorter > latency, > execution is stalled for 4 cycles (which is IMUL's latency); otherwise, > a > 4-or-more cycles latency instruction can be issued after IMUL without a > stall. > In other words, IMUL is pipelined with respect to other long-latency > instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation "atom-imul-32" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0") ;;; imul instruction excludes other non-FP instructions. (exclusion_set "atom-eu-0, atom-eu-1" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4") >>> >>> The main idea is quite simple: >>> >>> If we are going to schedule IMUL instruction (it is on the top of >>> ready list) we try to find out producer of other (independent) IMUL >>> instruction that is in ready list too. The goal is try to schedule >>> such a producer to get another IMUL in ready list and get scheduling >>> of 2 successive IMUL instructions. >> >> >> Why does that not happen without your patch? Does it never happen without >> your patch or does it merely not happen for one EEMBC benchmark (can >> you provide a testcase?)? > > > It does not happen because the scheduler by itself does not do such specific > reordering. That said, it is easy to imagine the cases where this patch > will make things worse rather than better. > > Igor, why not try different subtler mechanisms like adjust_priority, which > is get called when an insn is added to the ready list? E.g. increase the > producer's priority. > > The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when > you have more than one imul in the ready list? Don't you want to bump the > priority of the other imul found? Could you provide some examples when this patch would harm the performance? Sched_reorder was chosen since it is used in other ports and looks most suitable for such case, e.g. it provides access to the whole ready list. BTW, just increasing producer's priority seems to be more risky in performance sense - we can incorrectly start delaying some instructions. Thought ready list doesn't contain DEBUG_INSN... Is it so? If it contains them - this could be added easily The case with more than one imul in the ready list wasn't considered just because the goal was to catch the particular case when there is a risk to get the following picture: "imul-producer-imul" which is less effective than "producer-imul-imul" for Atom > > Andrey > > >> >>> And MD allows us only prefer scheduling of successive IMUL instructions, >>> i.e. >>> If IMUL was just scheduled and ready list contains another IMUL >>> instruction then it will be chosen as the best candidate for >>> scheduling. >>> >>> at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. > > From reading the patch, I could not understand the link between > pipeline > behavior and what the patch appears to do. > > Hope that helps. > > Alexander > > Thanks, Igor
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
> Hello! > >> Here is a patch which creates new gnu-user-common.h file and moves all >> common gnu-user.h and gnu-user64.h definitions to this new file. New >> file is required to avoid duplication of Android specific changes in >> gnu-user.h and gnu-user64.h. This patch is actually a non Android >> specific part of previously submitted patch to support Android in x86 >> target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html). >> Bootstrapped and tested on linux-x86_64. Is it OK for mainline and >> 4.7? > >> 2012-04-13 Enkovich Ilya >> >> * config/i386/gnu-user.h: Include gnu-user-common.h. >> (CPP_SPEC): Removed. >> (CC1_SPEC): Removed. >> (ENDFILE_SPEC): Removed. >> (DEFAULT_PCC_STRUCT_RETURN): Removed. >> (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed. >> (TARGET_OS_CPP_BUILTINS): Removed. >> (LIBGCC2_HAS_TF_MODE): Removed. >> (LIBGCC2_TF_CEXT): Removed. >> (TF_SIZE): Removed. >> (TARGET_ASM_FILE_END): Removed. >> (STACK_CHECK_MOVING_SP): Removed. >> (STACK_CHECK_STATIC_BUILTIN): Removed. >> (TARGET_THREAD_SSP_OFFSET): Removed. >> (TARGET_CAN_SPLIT_STACK): Removed. >> (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed. >> >> * config/i386/gnu-user64.h: Likewise. >> >> * config/i386/gnu-user-common.h: New. > > This ChangeLog is wrong, you didn't remove these files but move them > to new file. > > +#ifdef TARGET_LIBC_PROVIDES_SSP > +/* i386 glibc provides __stack_chk_guard in %gs:0x14, > + x32 glibc provides it in %fs:0x18. > + x86_64 glibc provides it in %fs:0x28. */ > +#define TARGET_THREAD_SSP_OFFSET \ > + (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14) > + > +/* We steal the last transactional memory word. */ > +#define TARGET_CAN_SPLIT_STACK > +#define TARGET_THREAD_SPLIT_STACK_OFFSET \ > + (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30) > +#endif > > Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT > in gnu-user.h? I don't see the reason to put these conditinal defines > in shared file. However, this may be due to biarch handling that I'm > not familiar with in all details. > > +#include "config/i386/gnu-user-common.h" > > You shouldn't include new file here, list it in config.gcc before > i386/gnu-user.h or i386/gnu-user64.c > > Uros. Hello, Thanks for review! Here is a new version with all issues fixed. Bootstrapped on linux-x86_64 and check is in progress. Will it be OK for trunk and 4.7 after successfull check? Thanks, Ilya --- 2012-04-13 Enkovich Ilya * config/i386/gnu-user-common.h: New. * config/i386/gnu-user.h (CPP_SPEC): Moved to gnu-user-common.h. (CC1_SPEC): Likewise. (ENDFILE_SPEC): Likewise. (DEFAULT_PCC_STRUCT_RETURN): Likewise. (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. (TARGET_OS_CPP_BUILTINS): Likewise. (LIBGCC2_HAS_TF_MODE): Likewise. (LIBGCC2_TF_CEXT): Likewise. (TF_SIZE): Likewise. (TARGET_ASM_FILE_END): Likewise. (STACK_CHECK_MOVING_SP): Likewise. (STACK_CHECK_STATIC_BUILTIN): Likewise. * config/i386/gnu-user64.h: Likewise. mandroid.patch Description: Binary data
Re: Phone call (was Re: Armhf dynamic linker path)
On 12/04/12 19:29, Dennis Gilmore wrote: > > off topic but i find aarch64 weird and too generic is it arm alpha amd > atom. > That's only 'cos it's new. It's no different from names like ia64. R.
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich wrote: >> Hello! >> >>> Here is a patch which creates new gnu-user-common.h file and moves all >>> common gnu-user.h and gnu-user64.h definitions to this new file. New >>> file is required to avoid duplication of Android specific changes in >>> gnu-user.h and gnu-user64.h. This patch is actually a non Android >>> specific part of previously submitted patch to support Android in x86 >>> target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html). >>> Bootstrapped and tested on linux-x86_64. Is it OK for mainline and >>> 4.7? >> >>> 2012-04-13 Enkovich Ilya >>> >>> * config/i386/gnu-user.h: Include gnu-user-common.h. >>> (CPP_SPEC): Removed. >>> (CC1_SPEC): Removed. >>> (ENDFILE_SPEC): Removed. >>> (DEFAULT_PCC_STRUCT_RETURN): Removed. >>> (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed. >>> (TARGET_OS_CPP_BUILTINS): Removed. >>> (LIBGCC2_HAS_TF_MODE): Removed. >>> (LIBGCC2_TF_CEXT): Removed. >>> (TF_SIZE): Removed. >>> (TARGET_ASM_FILE_END): Removed. >>> (STACK_CHECK_MOVING_SP): Removed. >>> (STACK_CHECK_STATIC_BUILTIN): Removed. >>> (TARGET_THREAD_SSP_OFFSET): Removed. >>> (TARGET_CAN_SPLIT_STACK): Removed. >>> (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed. >>> >>> * config/i386/gnu-user64.h: Likewise. >>> >>> * config/i386/gnu-user-common.h: New. >> >> This ChangeLog is wrong, you didn't remove these files but move them >> to new file. >> >> +#ifdef TARGET_LIBC_PROVIDES_SSP >> +/* i386 glibc provides __stack_chk_guard in %gs:0x14, >> + x32 glibc provides it in %fs:0x18. >> + x86_64 glibc provides it in %fs:0x28. */ >> +#define TARGET_THREAD_SSP_OFFSET \ >> + (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14) >> + >> +/* We steal the last transactional memory word. */ >> +#define TARGET_CAN_SPLIT_STACK >> +#define TARGET_THREAD_SPLIT_STACK_OFFSET \ >> + (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30) >> +#endif >> >> Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT >> in gnu-user.h? I don't see the reason to put these conditinal defines >> in shared file. However, this may be due to biarch handling that I'm >> not familiar with in all details. >> >> +#include "config/i386/gnu-user-common.h" >> >> You shouldn't include new file here, list it in config.gcc before >> i386/gnu-user.h or i386/gnu-user64.c >> >> Uros. > > Hello, > > Thanks for review! Here is a new version with all issues fixed. > > Bootstrapped on linux-x86_64 and check is in progress. Will it be OK > for trunk and 4.7 after successfull check? For sure such changes are not appropriate for a release branch. Richard. > Thanks, > Ilya > --- > 2012-04-13 Enkovich Ilya > > * config/i386/gnu-user-common.h: New. > > * config/i386/gnu-user.h (CPP_SPEC): Moved to > gnu-user-common.h. > (CC1_SPEC): Likewise. > (ENDFILE_SPEC): Likewise. > (DEFAULT_PCC_STRUCT_RETURN): Likewise. > (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. > (TARGET_OS_CPP_BUILTINS): Likewise. > (LIBGCC2_HAS_TF_MODE): Likewise. > (LIBGCC2_TF_CEXT): Likewise. > (TF_SIZE): Likewise. > (TARGET_ASM_FILE_END): Likewise. > (STACK_CHECK_MOVING_SP): Likewise. > (STACK_CHECK_STATIC_BUILTIN): Likewise. > > * config/i386/gnu-user64.h: Likewise.
Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)
On 13/04/12 11:13, Richard Guenther wrote: > On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries wrote: >> Richard, >> >> this patch fixes PR52743. >> >> The problem is as follows: blocks 3 and 5, with successor 6 are considered >> equal >> and merged. >> ... >> # BLOCK 3 freq:6102 >> # PRED: 2 [61.0%] (true,exec) >> # VUSE <.MEMD.1734_10> >> dddD.1710_3 = bbbD.1703; >> goto ; >> # SUCC: 6 [100.0%] (fallthru,exec) >> >> # BLOCK 5 freq:2378 >> # PRED: 4 [61.0%] (false,exec) >> # SUCC: 6 [100.0%] (fallthru,exec) >> >> # BLOCK 6 freq:1 >> # PRED: 3 [100.0%] (fallthru,exec) 7 [100.0%] (fallthru) 5 [100.0%] >> (fallthru,exec) >> # dddD.1710_1 = PHI >> # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)> >> # VUSE <.MEMD.1734_8> >> return dddD.1710_1; >> # SUCC: EXIT [100.0%] >> ... >> >> Tail merge considers 2 blocks equal if the effect at the tail is equal, >> meaning: >> - the sequence of side effects produced by each block is equal >> - the value phis are equal >> >> There are no side effects in block 3 and 5, and the phi alternatives of >> dddD.1710_1 for 3 (dddD.1710_3) and 5 (dddD.1710_4) are proven equal by >> gvn. >> >> The problem is that changing the (4->5) edge into a (4->3) edge changes the >> value of dddD.1710_3, because block 4 contains a store that affects the load >> in >> block 3. >> ... >> # BLOCK 4 freq:3898 >> # PRED: 2 [39.0%] (false,exec) >> # VUSE <.MEMD.1734_10> >> dddD.1710_4 = bbbD.1703; >> # .MEMD.1734_11 = VDEF <.MEMD.1734_10> >> # USE = nonlocal null >> # CLB = nonlocal null >> D.1724_5 = aaaD.1705 (); >> if (D.1724_5 != 0) >>goto ; >> else >>goto ; >> # SUCC: 7 [39.0%] (true,exec) 5 [61.0%] (false,exec) >> ... >> >> Or, put differently, the incoming vuse of block 3 affects a value phi >> alternative for that block (dddD.1710_3), so the 2 blocks are equal only >> under >> the condition that the incoming vuses are equal. >> >> We could build an analysis that addresses that precisely, but for now I >> implemented a more coarse-grained fix: if the incoming vuses are not equal, >> and >> at least one of the vuses influenced a non-virtual result, we don't consider >> the >> blocks equal. >> >> Bootstrapped and reg-tested on x86_64. >> >> ok for trunk, 4.7.1? > > Hmm, I wonder if the proper observation would not be that while GVN considers > the PHI arguments in > >> # dddD.1710_1 = PHI > > to be equal, their definitions are not in the blocks we try to merge, so their > value can be dependent on the status as it was on their predecessors. > GVN, after all, considers flow-sensitivity. > we are replacing block 5 by block 3. The phi alternative for block 3 is dddD.1710_3(3), which is defined in block 3. The problem is related to the vuse dependency of the def of dddD.1710_3(3). I'll try to reformulate. In tail_merge_optimize, we analyze: 1. if the blocks are equal (for which gvn might be used) 2. if the blocks can be merged. The blocks cannot be merged if the dependencies of the replacement block are not satisfied in the predecessors of the other blocks. We handle the dependencies for virtual and normal values differently. For normal values, we calculate BB_DEP_BB (The bb that either contains or is dominated by the dependencies of the bb). If BB_DEP_BB of the replacement block dominates the blocks that are replaced, the blocks can be merged. For virtual values, we don't check for dependencies. If we would check for virtual dependencies like normal dependencies, the original example pr43864.c would not work anymore: there are 2 blocks with identical result-less function calls, but with different incoming vuse. It's safe to merge the blocks, so we don't treat the vuses as dependencies. This test-case shows a case that the vuse is a dependency of the replacement block, which is not satisfied by the predecessor of the replaced block. We need an analysis of when a vuse needs to be considered a dependency. This patch implement such an analysis. Conservative, but simple. OK for trunk, 4.7.1? Thanks, - Tom > Richard. > > >> Thanks, >> - Tom >> >> 2012-04-13 Tom de Vries >> >>* tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add >>parameters vuse and vuse_escaped. >>(find_duplicate): Init vuse1, vuse2 and vuse_escaped. Pass to >>gsi_advance_bw_nondebug_nonlocal. Return if vuse_escaped and >>vuse1 != vuse2. >> >>* gcc.dg/pr52734.c: New test. >>
Commit: RL78: Remove use of TODO_dump_func
Hi DJ, The optimization pass flag "TODO_dump_flag" has been removed (see patch committed 2012-04-11) which was causing the RL78 backend to fail to build. I am applying the following patch as an obvious fix. Cheers Nick gcc/ChangeLog 2012-04-13 Nick Clifton * config/rl78/rl78.c (rl78_devirt_pass): Remove use of TODO_dump_func flag. Index: gcc/config/rl78/rl78.c === --- gcc/config/rl78/rl78.c (revision 186405) +++ gcc/config/rl78/rl78.c (working copy) @@ -140,7 +140,7 @@ TV_MACH_DEP, 0, 0, 0, 0, - TODO_dump_func + 0 }; static struct register_pass_info rl78_devirt_info =
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Fri, Apr 13, 2012 at 2:18 PM, Igor Zamyatin wrote: > On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev wrote: >> On 12.04.2012 16:38, Richard Guenther wrote: >>> >>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin >>> wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther wrote: > > On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov > wrote: >> >> >>> Can atom execute two IMUL in parallel? Or what exactly is the >>> pipeline >>> behavior? >> >> >> As I understand from Intel's optimization reference manual, the >> behavior is as >> follows: if the instruction immediately following IMUL has shorter >> latency, >> execution is stalled for 4 cycles (which is IMUL's latency); otherwise, >> a >> 4-or-more cycles latency instruction can be issued after IMUL without a >> stall. >> In other words, IMUL is pipelined with respect to other long-latency >> instructions, but not to short-latency instructions. > > > It seems to be modeled in the pipeline description though: > > ;;; imul insn has 5 cycles latency > (define_reservation "atom-imul-32" > "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, > atom-port-0") > > ;;; imul instruction excludes other non-FP instructions. > (exclusion_set "atom-eu-0, atom-eu-1" > "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4") > The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. >>> >>> >>> Why does that not happen without your patch? Does it never happen without >>> your patch or does it merely not happen for one EEMBC benchmark (can >>> you provide a testcase?)? >> >> >> It does not happen because the scheduler by itself does not do such specific >> reordering. That said, it is easy to imagine the cases where this patch >> will make things worse rather than better. >> >> Igor, why not try different subtler mechanisms like adjust_priority, which >> is get called when an insn is added to the ready list? E.g. increase the >> producer's priority. >> >> The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when >> you have more than one imul in the ready list? Don't you want to bump the >> priority of the other imul found? > > Could you provide some examples when this patch would harm the performance? > > Sched_reorder was chosen since it is used in other ports and looks > most suitable for such case, e.g. it provides access to the whole > ready list. > BTW, just increasing producer's priority seems to be more risky in > performance sense - we can incorrectly start delaying some > instructions. > > Thought ready list doesn't contain DEBUG_INSN... Is it so? If it > contains them - this could be added easily > > The case with more than one imul in the ready list wasn't considered > just because the goal was to catch the particular case when there is a > risk to get the following picture: "imul-producer-imul" which is less > effective than "producer-imul-imul" for Atom Actually, this constraint of one imul could be omitted. I'm going to try this change > >> >> Andrey >> >> >>> And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. > at least from my very limited guessing of what the above does. So, did > you > analyze why the scheduler does not properly schedule things for you? > > Richard. > >> >> From reading the patch, I could not understand the link between >> pipeline >> behavior and what the patch appears to do. >> >> Hope that helps. >> >> Alexander >> >> > > Thanks, > Igor
Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)
On Fri, Apr 13, 2012 at 12:56 PM, Tom de Vries wrote: > On 13/04/12 11:13, Richard Guenther wrote: >> On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries >> wrote: >>> Richard, >>> >>> this patch fixes PR52743. >>> >>> The problem is as follows: blocks 3 and 5, with successor 6 are considered >>> equal >>> and merged. >>> ... >>> # BLOCK 3 freq:6102 >>> # PRED: 2 [61.0%] (true,exec) >>> # VUSE <.MEMD.1734_10> >>> dddD.1710_3 = bbbD.1703; >>> goto ; >>> # SUCC: 6 [100.0%] (fallthru,exec) >>> >>> # BLOCK 5 freq:2378 >>> # PRED: 4 [61.0%] (false,exec) >>> # SUCC: 6 [100.0%] (fallthru,exec) >>> >>> # BLOCK 6 freq:1 >>> # PRED: 3 [100.0%] (fallthru,exec) 7 [100.0%] (fallthru) 5 [100.0%] >>> (fallthru,exec) >>> # dddD.1710_1 = PHI >>> # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)> >>> # VUSE <.MEMD.1734_8> >>> return dddD.1710_1; >>> # SUCC: EXIT [100.0%] >>> ... >>> >>> Tail merge considers 2 blocks equal if the effect at the tail is equal, >>> meaning: >>> - the sequence of side effects produced by each block is equal >>> - the value phis are equal >>> >>> There are no side effects in block 3 and 5, and the phi alternatives of >>> dddD.1710_1 for 3 (dddD.1710_3) and 5 (dddD.1710_4) are proven equal by >>> gvn. >>> >>> The problem is that changing the (4->5) edge into a (4->3) edge changes the >>> value of dddD.1710_3, because block 4 contains a store that affects the >>> load in >>> block 3. >>> ... >>> # BLOCK 4 freq:3898 >>> # PRED: 2 [39.0%] (false,exec) >>> # VUSE <.MEMD.1734_10> >>> dddD.1710_4 = bbbD.1703; >>> # .MEMD.1734_11 = VDEF <.MEMD.1734_10> >>> # USE = nonlocal null >>> # CLB = nonlocal null >>> D.1724_5 = aaaD.1705 (); >>> if (D.1724_5 != 0) >>> goto ; >>> else >>> goto ; >>> # SUCC: 7 [39.0%] (true,exec) 5 [61.0%] (false,exec) >>> ... >>> >>> Or, put differently, the incoming vuse of block 3 affects a value phi >>> alternative for that block (dddD.1710_3), so the 2 blocks are equal only >>> under >>> the condition that the incoming vuses are equal. >>> >>> We could build an analysis that addresses that precisely, but for now I >>> implemented a more coarse-grained fix: if the incoming vuses are not equal, >>> and >>> at least one of the vuses influenced a non-virtual result, we don't >>> consider the >>> blocks equal. >>> >>> Bootstrapped and reg-tested on x86_64. >>> >>> ok for trunk, 4.7.1? >> >> Hmm, I wonder if the proper observation would not be that while GVN considers >> the PHI arguments in >> >>> # dddD.1710_1 = PHI >> >> to be equal, their definitions are not in the blocks we try to merge, so >> their >> value can be dependent on the status as it was on their predecessors. >> GVN, after all, considers flow-sensitivity. >> > > we are replacing block 5 by block 3. The phi alternative for block 3 is > dddD.1710_3(3), which is defined in block 3. The problem is related to the > vuse > dependency of the def of dddD.1710_3(3). > > I'll try to reformulate. In tail_merge_optimize, we analyze: > 1. if the blocks are equal (for which gvn might be used) > 2. if the blocks can be merged. The blocks cannot be merged if the > dependencies > of the replacement block are not satisfied in the predecessors of the other > blocks. > > We handle the dependencies for virtual and normal values differently. > > For normal values, we calculate BB_DEP_BB (The bb that either contains or is > dominated by the dependencies of the bb). If BB_DEP_BB of the replacement > block > dominates the blocks that are replaced, the blocks can be merged. > > For virtual values, we don't check for dependencies. > > If we would check for virtual dependencies like normal dependencies, the > original example pr43864.c would not work anymore: there are 2 blocks with > identical result-less function calls, but with different incoming vuse. > It's safe to merge the blocks, so we don't treat the vuses as dependencies. > > This test-case shows a case that the vuse is a dependency of the replacement > block, which is not satisfied by the predecessor of the replaced block. > > We need an analysis of when a vuse needs to be considered a dependency. Ah, ok. That makes sense then. > This patch implement such an analysis. Conservative, but simple. OK for trunk, > 4.7.1? Yes. Thanks, Richard. > Thanks, > - Tom > >> Richard. >> >> >>> Thanks, >>> - Tom >>> >>> 2012-04-13 Tom de Vries >>> >>> * tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add >>> parameters vuse and vuse_escaped. >>> (find_duplicate): Init vuse1, vuse2 and vuse_escaped. Pass to >>> gsi_advance_bw_nondebug_nonlocal. Return if vuse_escaped and >>> vuse1 != vuse2. >>> >>> * gcc.dg/pr52734.c: New test. >>> >
Re: [i386, patch, RFC] HLE support in GCC
> No, just the bits; programmers would need to do > __atomic_...(..., __ATOMIC_RELEASE | HLE_RELEASE); > I believe this is what you had in one of your versions of the patch. My > suggestions was not about doing something new but instead a > suggestions/poll for a resolution of the discussion. Oh, okay, got it. So, seems it all covered by my recent patch (hle-rfc-5.gcc.patch). Any other inputs? Thanks, K
Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?
On Thu, 12 Apr 2012, Martin Jambor wrote: > Hi, > > On Wed, Apr 04, 2012 at 04:42:05PM +0200, Richard Guenther wrote: > > On Wed, 4 Apr 2012, Martin Jambor wrote: > > > > > Hi everyone, especially Richi and Eric, > > > > > > I'd like to know what is your attitude to changing SRA's > > > build_ref_for_model to what it once looked like, so that it produces > > > COMPONENT_REFs only for bit-fields. The non-bit field handling was > > > added in order to work-around problems when expanding non-aligned > > > MEM_REFs on strict-alignment targets but that should not be a problem > > > now and my experiments confirm that. Last week I successfully > > > bootstrapped and tested this patch on sparc64-linux (with the > > > temporary MEM_EXPR patch, not including Java), ia64-linux (without > > > Ada), x86_64-linux, i686-linux and tested it on hppa-linux (only C and > > > C++). > > ... > > > > > > > 2012-03-20 Martin Jambor > > > > > > * tree-sra.c (build_ref_for_model): Create COMPONENT_REFs only for > > > bit-fields. > > > > > > Index: src/gcc/tree-sra.c > > > === > > > *** src.orig/gcc/tree-sra.c > > > --- src/gcc/tree-sra.c > > > *** build_ref_for_offset (location_t loc, tr > > > *** 1489,1558 > > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > > } > > > > > > - DEF_VEC_ALLOC_P_STACK (tree); > > > - #define VEC_tree_stack_alloc(alloc) VEC_stack_alloc (tree, alloc) > > > - > > > /* Construct a memory reference to a part of an aggregate BASE at the > > > given > > > !OFFSET and of the type of MODEL. In case this is a chain of > > > references > > > !to component, the function will replicate the chain of > > > COMPONENT_REFs of > > > !the expression of MODEL to access it. GSI and INSERT_AFTER have the > > > same > > > !meaning as in build_ref_for_offset. */ > > > > > > static tree > > > build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, > > >struct access *model, gimple_stmt_iterator *gsi, > > >bool insert_after) > > > { > > > ! tree type = model->type, t; > > > ! VEC(tree,stack) *cr_stack = NULL; > > > ! > > > ! if (TREE_CODE (model->expr) == COMPONENT_REF) > > > { > > > ! tree expr = model->expr; > > > ! > > > ! /* Create a stack of the COMPONENT_REFs so later we can walk them > > > in > > > ! order from inner to outer. */ > > > ! cr_stack = VEC_alloc (tree, stack, 6); > > > ! > > > ! do { > > > ! tree field = TREE_OPERAND (expr, 1); > > > ! tree cr_offset = component_ref_field_offset (expr); > > > ! HOST_WIDE_INT bit_pos > > > ! = tree_low_cst (cr_offset, 1) * BITS_PER_UNIT > > > ! + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field)); > > > > > > ! /* We can be called with a model different from the one > > > associated > > > !with BASE so we need to avoid going up the chain too far. */ > > > ! if (offset - bit_pos < 0) > > > ! break; > > > ! > > > ! offset -= bit_pos; > > > ! VEC_safe_push (tree, stack, cr_stack, expr); > > > ! > > > ! expr = TREE_OPERAND (expr, 0); > > > ! type = TREE_TYPE (expr); > > > ! } while (TREE_CODE (expr) == COMPONENT_REF); > > > } > > > ! > > > ! t = build_ref_for_offset (loc, base, offset, type, gsi, insert_after); > > > ! > > > ! if (TREE_CODE (model->expr) == COMPONENT_REF) > > > ! { > > > ! unsigned i; > > > ! tree expr; > > > ! > > > ! /* Now replicate the chain of COMPONENT_REFs from inner to outer. > > > */ > > > ! FOR_EACH_VEC_ELT_REVERSE (tree, cr_stack, i, expr) > > > ! { > > > ! tree field = TREE_OPERAND (expr, 1); > > > ! t = fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (field), > > > t, field, > > > !TREE_OPERAND (expr, 2)); > > > ! } > > > ! > > > ! VEC_free (tree, stack, cr_stack); > > > ! } > > > ! > > > ! return t; > > > } > > > > > > /* Construct a memory reference consisting of component_refs and > > > array_refs to > > > --- 1489,1520 > > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > > } > > > > > > /* Construct a memory reference to a part of an aggregate BASE at the > > > given > > > !OFFSET and of the same type as MODEL. In case this is a reference > > > to a > > > !bit-field, the function will replicate the last component_ref of > > > model's > > > !expr to access it. GSI and INSERT_AFTER have the same meaning as in > > > !build_ref_for_offset. */ > > > > > > static tree > > > build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, > > >struct access *model, gimple_stmt_iterator *gsi, > > >bool insert_after) >
Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?
Richard Guenther writes: >> Anyway, the patch I posted previously would risk re-introducing PR >> 50386 and PR 50326, even though they are very unlikely with just >> bit-fields. So my current working version is the following, but it >> causes failure of libmudflap.c++/pass55-frag.cxx execution test so I'm >> not actually proposing it yet (sigh). > > I would not worry about mudflap tests. The patch looks good to my > eyes. Are you sure the failure is new? At least for 64-bit at -O, libmudflap.c++/pass55-frag.cxx already fails right now (cf. PR libmudflap/49843). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Fix PR52969
The following patch fixes the missed handling of TRUTH_NOT_EXPR predicates in predicate_mem_writes and general combined predicates which need gimplification. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2012-04-13 Richard Guenther PR tree-optimization/52969 * tree-if-conv.c (predicate_mem_writes): Properly gimplify the condition for the COND_EXPR and handle predicate negation by swapping the COND_EXPR arms. * gcc.dg/torture/pr52969.c: New testcase. Index: gcc/tree-if-conv.c === --- gcc/tree-if-conv.c (revision 186406) +++ gcc/tree-if-conv.c (working copy) @@ -1543,11 +1522,19 @@ predicate_mem_writes (loop_p loop) gimple_stmt_iterator gsi; basic_block bb = ifc_bbs[i]; tree cond = bb_predicate (bb); + bool swap; gimple stmt; if (is_true_predicate (cond)) continue; + swap = false; + if (TREE_CODE (cond) == TRUTH_NOT_EXPR) + { + swap = true; + cond = TREE_OPERAND (cond, 0); + } + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) if ((stmt = gsi_stmt (gsi)) && gimple_assign_single_p (stmt) @@ -1559,6 +1546,15 @@ predicate_mem_writes (loop_p loop) lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi); rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi); + if (swap) + { + tree tem = lhs; + lhs = rhs; + rhs = tem; + } + cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), + is_gimple_condexpr, NULL_TREE, + true, GSI_SAME_STMT); rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs); gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); update_stmt (stmt); Index: gcc/testsuite/gcc.dg/torture/pr52969.c === --- gcc/testsuite/gcc.dg/torture/pr52969.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr52969.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-ftree-loop-if-convert-stores" } */ + +int a, b; +float xsum[100]; +void foo (float *cluster) +{ + int j; + for (; a ; ++j) { + xsum[j] = cluster[j]; + if (xsum[j] > 0) + xsum[j] = 0; + } + if (xsum[0]) +b = 0; +}
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 13.04.2012 14:18, Igor Zamyatin wrote: On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev wrote: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation "atom-imul-32" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0") ;;; imul instruction excludes other non-FP instructions. (exclusion_set "atom-eu-0, atom-eu-1" "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4") The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Could you provide some examples when this patch would harm the performance? I thought of the cases when the other ready insns can fill up the hole and that would be more beneficial because e.g. they would be on more critical paths than the producer of your second imul. I don't know enough of Atom to give an example -- maybe some long divisions? Sched_reorder was chosen since it is used in other ports and looks most suitable for such case, e.g. it provides access to the whole ready list. BTW, just increasing producer's priority seems to be more risky in performance sense - we can incorrectly start delaying some instructions. Yes, but exactly because of the above example you can start incorrectly delaying other insns, too, as you force the insn to be the first in the list. While bumping priority still leaves the scheduler sorting heuristics in place and actually lowers that risk. Thought ready list doesn't contain DEBUG_INSN... Is it so? If it contains them - this could be added easily It does, but I'm not sure the sched_reorder hook gets them or they are immediately removed -- I saw similar checks in one of the targets' hooks. Anyways, my main thought was that it is better to test on more benchmarks to alleviate the above concerns, so as long as the i386 maintainers are happy, I don't see major problems here. A good idea could be to generalize the patch to handle other long latency insns as second consumers, not only imuls, if this is relevant for Atom. Andrey The case with more than one imul in the ready list wasn't considered just because the goal was to catch the particular case when there is a risk to get the following picture: "imul-producer-imul" which is less effective than "producer-imul-imul" for Atom Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander Thanks, Igor
default to strict dwarf-2 on powerpc-vxworks
Hello, On typical VxWorks environments, WindRiver integrated tools are used as much if not more than gdb for debugging purposes. These evolve at an industrial pace, traditionally not as fast as GCC regarding the support of latest dwarf standards. As of today, in our experience, the best compromise to let WRS debugging tools work fine and have negligible effect on gdb's experience for this target is to enforce strict dwarf-2 debugging info by default. We have been doing this in our production compilers for a while and find it really useful, so we're happy to contribute here, in case this is deemed more generally appropriate. The attached patch achieves this in two mini steps: - Allow to distinguish between a dwarf version picked by default by the compiler and one provided on the command line that happens to match the default, which is useful to make sure we don't override a request for an explicit level on the command line. This is achieved by initializing dwarf_version to -1 instead of 2 from common.opt, then pick 2 as a default from toplev.c:process_options, similar to the scheme in place for dwarf_strict already. - Default to dwarf_strict = 1 && dwarf_version = 2 for VxWorks via vxworks_override_options. Sanity checked on mainline for powerpc-wrs-vxworks and bootstrapped on x86_64-pc-linux-gnu. Thanks in advance for your feedback, With Kind Regards, Olivier 2012-04-13 Olivier Hainque * common.opt (gdwarf-): Initialize dwarf_version to -1 instead of 2. * toplev.c (process_options): Default to dwarf_version 2. * config/vxworks.c (vxworks_override_options): Default to strict-dwarf and dwarf_version 2. vxdwarf.dif Description: video/dv
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Wed, Apr 11, 2012 at 4:46 AM, Richard Guenther wrote: > On Tue, Apr 10, 2012 at 9:08 PM, NightStrike wrote: >> On Tue, Apr 10, 2012 at 10:38 AM, Richard Guenther >> wrote: >>> On Tue, Apr 10, 2012 at 3:06 PM, JonY wrote: On 4/10/2012 20:37, Richard Guenther wrote: > On Tue, Apr 10, 2012 at 2:15 PM, JonY wrote: >> Hi, >> >> Patch OK? > > What kind of warning? > Oops, I forgot to mention gcc was complaining about missing braces. >>> >>> Not for me. And I fail to see why it should. >> >> This is the warning: >> >> ../../../../build/gcc/src/gcc/tree-parloops.c:724: warning: suggest >> explicit braces to avoid ambiguous `else' > >> ./xgcc -B. -c -g -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual >> -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute >> -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings >> -Wold-style-definition -Wc++-compat -fno-common -DHAVE_CONFIG_H -I. -I. >> -I/space/rguenther/src/svn/trunk/gcc -I/space/rguenther/src/svn/trunk/gcc/. >> -I/space/rguenther/src/svn/trunk/gcc/../include >> -I/space/rguenther/src/svn/trunk/gcc/../libcpp/include >> -I/space/rguenther/src/svn/trunk/gcc/../libdecnumber >> -I/space/rguenther/src/svn/trunk/gcc/../libdecnumber/bid -I../libdecnumber >> /space/rguenther/src/svn/trunk/gcc/tree-parloops.c -o tree-parloops.o > > no warning from trunk. Which GCC version emits this warning? Note that > we do not try to keep stage1 warning-free but only later stages (which is > why we use -Werror there). > > Richard. Looks like cygwin gcc 3.4.4
recognize .persistent.bss sections as bss
Hello, For several years now, Ada has support for a "Persistent_BSS" pragma that let users place data in a ".persistent.bss" section. This section: - needs to be treated as a bss section by the compiler (in particular, to set flags that will prevent use of space in executable files) - can be treated specially by run-time loaders, not resetting to all zeroes on application warm-restarts. This patch simply adds ".persistent.bss" to the list of section names that GCC identifies a bss sections. Bootstrapped and regtested on x86_84-pc-linux-gnu. OK to commit ? Thanks in advance, Olivier 2012-04-13 Olivier Hainque * varasm.c (default_section_type_flags): Flag .persistent.bss sections as SECTION_BSS. pbss.dif Description: video/dv
Re: [patch, fortran] Trim spaces on list-directed reads
Am 10.04.2012 14:32, schrieb Thomas Koenig: Hello world, this patch effectively trims the spaces from the string on list-directed reads. This avoids the large overhead on processing these spaces when reading from long lines. Ping ** 0.4285714?
[PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization
Hi, currently we ICE when attempting to devirtualize a call to a virtual method introduced in a descendant but with a base which is an ancestor which does not have it. This is because fold_ctor_reference returns constant zero when it cannot find the particular value in the provided constructor which is something gimple_get_virt_method_for_binfo cannot cope with. The patch below avoids it by simply testing for that value and bailing out. Bootstrapped and tested on x86_64-linux, OK for trunk? Thanks, Martin 2012-04-12 Martin Jambor PR middle-end/52939 * gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if fold_ctor_reference returns a zero constant. * testsuite/g++.dg/ipa/pr52939.C: New test. Index: src/gcc/gimple-fold.c === --- src.orig/gcc/gimple-fold.c +++ src/gcc/gimple-fold.c @@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W offset += token * size; fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v), offset, size); - if (!fn) + if (!fn || integer_zerop (fn)) return NULL_TREE; gcc_assert (TREE_CODE (fn) == ADDR_EXPR || TREE_CODE (fn) == FDESC_EXPR); Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/pr52939.C @@ -0,0 +1,58 @@ +/* Verify that we do not ICE on invalid devirtualizations (which might + be OK at run-time because never executed). */ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-early-inlining -fno-inline" } */ + +extern "C" void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + +class B : public A +{ +public: + virtual int foo (int i); + virtual int bar (int i); +}; + +int A::foo (int i) +{ + return i + 1; +} + +int B::foo (int i) +{ + return i + 2; +} + +int B::bar (int i) +{ + return i + 3; +} + +static int middleman (class A *obj, int i) +{ + class B *b = (class B *) obj; + + if (i != 1) +return b->bar (i); + else +return i; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +int main (int argc, char *argv[]) +{ + class A o; + if (middleman (&o, get_input ()) != 1) +abort (); + return 0; +}
Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2
On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote: > Ping. Before advancing, has the problem that Rainer pointed out on March 19th with your earlier patch been fixed?
[patch, fortran] PR fortran/52537 Optimize string comparisons against empty strings
Hello world, this patch replaces a != '' with len_trim(a) != 0, to speed up the comparison. It also introduces a bit of cleanup in frontend-passes.c. Regression-tested. OK for trunk? Thomas 2012-04-13 Thomas Koenig PR fortran/52537 * frontend-passes.c (optimize_op): Change old-style comparison operators to new-style, simplify switch as a result. (empty_string): New function. (get_len_trim_call): New function. (optimize_comparison): If comparing to an empty string, use comparison of len_trim to zero. Use new-style comparison operators only. (optimize_trim): Use get_len_trim_call. 2012-04-13 Thomas Koenig PR fortran/52537 * gfortran.dg/string_compare_4.f90: New test. Index: frontend-passes.c === --- frontend-passes.c (Revision 186213) +++ frontend-passes.c (Arbeitskopie) @@ -796,20 +796,45 @@ optimize_op (gfc_expr *e) { gfc_intrinsic_op op = e->value.op.op; + /* Only use new-style comparisions. */ + switch(op) +{ +case INTRINSIC_EQ_OS: + op = INTRINSIC_EQ; + break; + +case INTRINSIC_GE_OS: + op = INTRINSIC_GE; + break; + +case INTRINSIC_LE_OS: + op = INTRINSIC_LE; + break; + +case INTRINSIC_NE_OS: + op = INTRINSIC_NE; + break; + +case INTRINSIC_GT_OS: + op = INTRINSIC_GT; + break; + +case INTRINSIC_LT_OS: + op = INTRINSIC_LT; + break; + +default: + break; +} + switch (op) { case INTRINSIC_EQ: -case INTRINSIC_EQ_OS: case INTRINSIC_GE: -case INTRINSIC_GE_OS: case INTRINSIC_LE: -case INTRINSIC_LE_OS: case INTRINSIC_NE: -case INTRINSIC_NE_OS: case INTRINSIC_GT: -case INTRINSIC_GT_OS: case INTRINSIC_LT: -case INTRINSIC_LT_OS: return optimize_comparison (e, op); default: @@ -819,6 +844,61 @@ optimize_op (gfc_expr *e) return false; } +/* Return true if a constant string contains spaces only. */ + +static bool +empty_string (gfc_expr *e) +{ + int i; + + if (e->ts.type != BT_CHARACTER || e->expr_type != EXPR_CONSTANT) +return false; + + for (i=0; ivalue.character.length; i++) +{ + if (e->value.character.string[i] != ' ') + return false; +} + + return true; +} + +/* Insert a call to the intrinsic len_trim. Use a different name for + the symbol tree so we don't run into trouble when the user has + renamed len_trim for some reason. */ + +static gfc_expr* +get_len_trim_call (gfc_expr *str, int kind) +{ + gfc_expr *fcn; + gfc_actual_arglist *actual_arglist, *next; + + fcn = gfc_get_expr (); + fcn->expr_type = EXPR_FUNCTION; + fcn->value.function.isym = gfc_intrinsic_function_by_id (GFC_ISYM_LEN_TRIM); + actual_arglist = gfc_get_actual_arglist (); + actual_arglist->expr = str; + next = gfc_get_actual_arglist (); + next->expr = gfc_get_int_expr (gfc_default_integer_kind, NULL, kind); + actual_arglist->next = next; + + fcn->value.function.actual = actual_arglist; + fcn->where = str->where; + fcn->ts.type = BT_INTEGER; + fcn->ts.kind = gfc_charlen_int_kind; + + gfc_get_sym_tree("__internal_len_trim", current_ns, &fcn->symtree, false); + fcn->symtree->n.sym->ts = fcn->ts; + fcn->symtree->n.sym->attr.flavor = FL_PROCEDURE; + fcn->symtree->n.sym->attr.function = 1; + fcn->symtree->n.sym->attr.elemental = 1; + fcn->symtree->n.sym->attr.referenced = 1; + fcn->symtree->n.sym->attr.access = ACCESS_PRIVATE; + gfc_commit_symbol (fcn->symtree->n.sym); + + return fcn; +} + /* Optimize expressions for equality. */ static bool @@ -862,6 +942,46 @@ optimize_comparison (gfc_expr *e, gfc_intrinsic_op if (e->rank > 0) return change; + /* Replace a == '' with len_trim(a) == 0 and a /= '' with + len_trim(a) != 0 */ + if (op1->ts.type == BT_CHARACTER && op2->ts.type == BT_CHARACTER + && (op == INTRINSIC_EQ || op == INTRINSIC_NE)) +{ + + bool empty_op1, empty_op2; + empty_op1 = empty_string(op1); + empty_op2 = empty_string(op2); + + if (empty_op1 || empty_op2) + { + gfc_expr *fcn; + gfc_expr *zero; + gfc_expr *str; + + /* This can only happen when an error for comparing + characters of different kinds has already been issued. */ + if (empty_op1 && empty_op2) + return false; + + zero = gfc_get_int_expr (gfc_charlen_int_kind, &e->where, 0); + str = empty_op1 ? op2 : op1; + + fcn = get_len_trim_call (str, gfc_charlen_int_kind); + + + if (empty_op1) + gfc_free_expr (op1); + else + gfc_free_expr (op2); + + op1 = fcn; + op2 = zero; + e->value.op.op1 = fcn; + e->value.op.op2 = zero; + } +} + + /* Don't compare REAL or COMPLEX expressions when honoring NaNs. */ if (flag_finite_math_only @@ -935,32 +1055,26 @@ optimize_comparison (gfc_expr *e, gfc_intrinsic_op switch (op) { case INTRINSIC_EQ: - case INTRI
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich wrote: > 2012-04-13 Enkovich Ilya > > * config/i386/gnu-user-common.h: New. > > * config/i386/gnu-user.h (CPP_SPEC): Moved to > gnu-user-common.h. > (CC1_SPEC): Likewise. > (ENDFILE_SPEC): Likewise. > (DEFAULT_PCC_STRUCT_RETURN): Likewise. > (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. > (TARGET_OS_CPP_BUILTINS): Likewise. > (LIBGCC2_HAS_TF_MODE): Likewise. > (LIBGCC2_TF_CEXT): Likewise. > (TF_SIZE): Likewise. > (TARGET_ASM_FILE_END): Likewise. > (STACK_CHECK_MOVING_SP): Likewise. > (STACK_CHECK_STATIC_BUILTIN): Likewise. > > * config/i386/gnu-user64.h: Likewise. OK for mainline SVN, but as Richard said, this is not appropriate for release branches. Thanks, Uros.
Re: [PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization
On Fri, 13 Apr 2012, Martin Jambor wrote: > Hi, > > currently we ICE when attempting to devirtualize a call to a virtual > method introduced in a descendant but with a base which is an ancestor > which does not have it. This is because fold_ctor_reference returns > constant zero when it cannot find the particular value in the provided > constructor which is something gimple_get_virt_method_for_binfo cannot > cope with. The patch below avoids it by simply testing for that > value and bailing out. > > Bootstrapped and tested on x86_64-linux, OK for trunk? Ok. Thanks, Richard. > Thanks, > > Martin > > > 2012-04-12 Martin Jambor > > PR middle-end/52939 > * gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if > fold_ctor_reference returns a zero constant. > > * testsuite/g++.dg/ipa/pr52939.C: New test. > > Index: src/gcc/gimple-fold.c > === > --- src.orig/gcc/gimple-fold.c > +++ src/gcc/gimple-fold.c > @@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W >offset += token * size; >fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v), > offset, size); > - if (!fn) > + if (!fn || integer_zerop (fn)) > return NULL_TREE; >gcc_assert (TREE_CODE (fn) == ADDR_EXPR > || TREE_CODE (fn) == FDESC_EXPR); > Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C > === > --- /dev/null > +++ src/gcc/testsuite/g++.dg/ipa/pr52939.C > @@ -0,0 +1,58 @@ > +/* Verify that we do not ICE on invalid devirtualizations (which might > + be OK at run-time because never executed). */ > +/* { dg-do run } */ > +/* { dg-options "-O3 -fno-early-inlining -fno-inline" } */ > + > +extern "C" void abort (void); > + > +class A > +{ > +public: > + int data; > + virtual int foo (int i); > +}; > + > +class B : public A > +{ > +public: > + virtual int foo (int i); > + virtual int bar (int i); > +}; > + > +int A::foo (int i) > +{ > + return i + 1; > +} > + > +int B::foo (int i) > +{ > + return i + 2; > +} > + > +int B::bar (int i) > +{ > + return i + 3; > +} > + > +static int middleman (class A *obj, int i) > +{ > + class B *b = (class B *) obj; > + > + if (i != 1) > +return b->bar (i); > + else > +return i; > +} > + > +int __attribute__ ((noinline,noclone)) get_input(void) > +{ > + return 1; > +} > + > +int main (int argc, char *argv[]) > +{ > + class A o; > + if (middleman (&o, get_input ()) != 1) > +abort (); > + return 0; > +} > > -- Richard Guenther SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2
On Apr 3, 2012, at 5:16 AM, Bernhard Reutner-Fischer wrote: > The second part of implicitly doing cleanup-modules is to remove the now > superfluous dg-final directives. Ok once the issue Rainer pointed out is addressed. As for the ChangeLog, I'd be tempted to list them as: * gfortran.dg/*.f90: Remove now redundant manual cleanup-modules directive. I think that strikes a reasonable balance.
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Apr 13, 2012, at 5:30 AM, NightStrike wrote: >> no warning from trunk. Which GCC version emits this warning? > Looks like cygwin gcc 3.4.4 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new compiler... :-)
Re: [PATCH] More code refactoring
> > Eventually I'll succeed in making tree-optimize.c empty. At least > the pass stuff I'm interested in get's better now. Decompozing tree-optimize was on my wishlist, too. > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > Richard. > > 2012-04-12 Richard Guenther > > * Makefile.in (cgraphunit.o): Add $(EXCEPT_H) dependency. > * cgraph.h (tree_rest_of_compilation): Remove. > * cgraph.c (cgraph_add_new_function): Move ... > * cgraphunit.c (cgraph_add_new_function): ... here. > (tree_rest_of_compilation): Make static. > (cgraph_expand_function): Do not set cgraph_function_flags_ready. We could try keep in mind that cgraphunit.c is a historical mess and should be also dismantled ;) Majority of logic should go into symbol table (I have symtab.c for that), cgraphbuild or pass management. Honza
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak wrote: > On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich > wrote: > >> 2012-04-13 Enkovich Ilya >> >> * config/i386/gnu-user-common.h: New. >> >> * config/i386/gnu-user.h (CPP_SPEC): Moved to >> gnu-user-common.h. >> (CC1_SPEC): Likewise. >> (ENDFILE_SPEC): Likewise. >> (DEFAULT_PCC_STRUCT_RETURN): Likewise. >> (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. >> (TARGET_OS_CPP_BUILTINS): Likewise. >> (LIBGCC2_HAS_TF_MODE): Likewise. >> (LIBGCC2_TF_CEXT): Likewise. >> (TF_SIZE): Likewise. >> (TARGET_ASM_FILE_END): Likewise. >> (STACK_CHECK_MOVING_SP): Likewise. >> (STACK_CHECK_STATIC_BUILTIN): Likewise. >> >> * config/i386/gnu-user64.h: Likewise. > > OK for mainline SVN, but as Richard said, this is not appropriate for > release branches. > Ilya, I will check it for you. There is no need for release branches. Thanks. -- H.J.
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On 04/13/2012 04:20 PM, Mike Stump wrote: > On Apr 13, 2012, at 5:30 AM, NightStrike wrote: >>> no warning from trunk. Which GCC version emits this warning? > >> Looks like cygwin gcc 3.4.4 > > 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new > compiler... :-) The thing is, the else really is ambiguous, so it looks like we have a warning regression. Bernd
Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2
On Fri, Apr 13, 2012 at 06:57:44AM -0700, Mike Stump wrote: >On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote: >> Ping. > >Before advancing, has the problem that Rainer pointed out on March 19th with >your earlier patch been fixed? I believe that it is fixed, yes. See r185688 and my follow up to him ( http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01498.html )
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
On Fri, Apr 13, 2012 at 7:34 AM, H.J. Lu wrote: > On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak wrote: >> On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich >> wrote: >> >>> 2012-04-13 Enkovich Ilya >>> >>> * config/i386/gnu-user-common.h: New. >>> >>> * config/i386/gnu-user.h (CPP_SPEC): Moved to >>> gnu-user-common.h. >>> (CC1_SPEC): Likewise. >>> (ENDFILE_SPEC): Likewise. >>> (DEFAULT_PCC_STRUCT_RETURN): Likewise. >>> (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. >>> (TARGET_OS_CPP_BUILTINS): Likewise. >>> (LIBGCC2_HAS_TF_MODE): Likewise. >>> (LIBGCC2_TF_CEXT): Likewise. >>> (TF_SIZE): Likewise. >>> (TARGET_ASM_FILE_END): Likewise. >>> (STACK_CHECK_MOVING_SP): Likewise. >>> (STACK_CHECK_STATIC_BUILTIN): Likewise. >>> >>> * config/i386/gnu-user64.h: Likewise. >> >> OK for mainline SVN, but as Richard said, this is not appropriate for >> release branches. >> > > Ilya, I will check it for you. There is no need for release branches. > ChangeLog for config.gcc is missing. -- H.J.
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote: > On 04/13/2012 04:20 PM, Mike Stump wrote: > > On Apr 13, 2012, at 5:30 AM, NightStrike wrote: > >>> no warning from trunk. Which GCC version emits this warning? > > > >> Looks like cygwin gcc 3.4.4 > > > > 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new > > compiler... :-) > > The thing is, the else really is ambiguous, so it looks like we have a > warning regression. To the compiler? There is for in between, no need to put extra braces IMHO. I believe it has been intentionally fixed for 4.0, but my bisect tree only goes back to r9. Jakub
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On 04/13/2012 04:44 PM, Jakub Jelinek wrote: > On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote: >> On 04/13/2012 04:20 PM, Mike Stump wrote: >>> On Apr 13, 2012, at 5:30 AM, NightStrike wrote: > no warning from trunk. Which GCC version emits this warning? >>> Looks like cygwin gcc 3.4.4 >>> >>> 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new >>> compiler... :-) >> >> The thing is, the else really is ambiguous, so it looks like we have a >> warning regression. > > To the compiler? There is for in between, no need to put extra braces IMHO. > I believe it has been intentionally fixed for 4.0, but my bisect tree only > goes back to r9. Well, to the compiler even if () if () x else y has an unambiguous meaning. The problem is that a human might be confused, for example due to bad indentation. Whether there's a for in between doesn't matter for this purpose, the following is most likely a bug: if () for (..) if () x else y Bernd
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Fri, Apr 13, 2012 at 10:20 AM, Mike Stump wrote: > On Apr 13, 2012, at 5:30 AM, NightStrike wrote: >>> no warning from trunk. Which GCC version emits this warning? > >> Looks like cygwin gcc 3.4.4 > > 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new > compiler... :-) Yeah, not sure why cygwin doesn't make the gcc4 packages the default.
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Fri, Apr 13, 2012 at 10:33 AM, Bernd Schmidt wrote: > On 04/13/2012 04:20 PM, Mike Stump wrote: >> On Apr 13, 2012, at 5:30 AM, NightStrike wrote: no warning from trunk. Which GCC version emits this warning? >> >>> Looks like cygwin gcc 3.4.4 >> >> 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new >> compiler... :-) > > The thing is, the else really is ambiguous, so it looks like we have a > warning regression. I'd agree here. It's why I didn't think to try with a new compiler.
Re: fix left-over debug insns in DCE
On Apr 9, 2012, Eric Botcazou wrote: > Could you add a comment for each value? Done > Missing "extern" for all declarations. Thanks, added. > I don't understand the "or _WITH_VALUE otherwise" part of the comment. Sorry, my bad. It didn't make sense. Fixed. > Please add a comment explaining what this is doing. How's this? I've just installed the patch, but if you find the need for any further improvement, let me know and I'll do it right away. Thanks, for gcc/ChangeLog from Alexandre Oliva PR debug/48866 * df.h (enum debug_temp_where): New. (dead_debug_init, dead_debug_finish) Declare. (dead_debug_add, dead_debug_insert_temp): Declare. (struct dead_debug_use, struct dead_debug): Moved from... * df-problems.c: ... here. (df_set_unused_notes_for_mw): Bind debug uses of unused regno to a debug temp. (df_create_unused_note): Likewise. (df_set_dead_notes_for_mw): Move comment where it belongs. (dead_debug_init): Export. (dead_debug_reset_uses): New, factored out of... (dead_debug_finish): ...this. Export. (dead_debug_reset): Remove. (dead_debug_add): Export. (dead_debug_insert_before): Rename to... (dead_debug_insert_temp): ... this. Add where argument. Export. Locate stored value for BEFORE_WITH_VALUE. Avoid repeat inserts. Return insertion count. (df_note_bb_compute): Adjust. * dce.c (word_dce_process_block): Adjust dead debug uses. (dce_process_block): Likewise. Index: gcc/df.h === --- gcc/df.h.orig 2012-04-13 05:18:29.140781640 -0300 +++ gcc/df.h 2012-04-13 07:13:18.0 -0300 @@ -1101,4 +1101,46 @@ extern void union_defs (df_ref, struct w unsigned int *used, struct web_entry *, bool (*fun) (struct web_entry *, struct web_entry *)); +/* Debug uses of dead regs. */ + +/* Node of a linked list of uses of dead REGs in debug insns. */ +struct dead_debug_use +{ + df_ref use; + struct dead_debug_use *next; +}; + +/* Linked list of the above, with a bitmap of the REGs in the + list. */ +struct dead_debug +{ + struct dead_debug_use *head; + bitmap used; + bitmap to_rescan; +}; + +/* This type controls the behavior of dead_debug_insert_temp WRT + UREGNO and INSN. */ +enum debug_temp_where + { +/* Bind a newly-created debug temporary to a REG for UREGNO, and + insert the debug insn before INSN. REG is expected to die at + INSN. */ +DEBUG_TEMP_BEFORE_WITH_REG = -1, +/* Bind a newly-created debug temporary to the value INSN stores + in REG, and insert the debug insn before INSN. */ +DEBUG_TEMP_BEFORE_WITH_VALUE = 0, +/* Bind a newly-created debug temporary to a REG for UREGNO, and + insert the debug insn after INSN. REG is expected to be set at + INSN. */ +DEBUG_TEMP_AFTER_WITH_REG = 1 + }; + +extern void dead_debug_init (struct dead_debug *, bitmap); +extern void dead_debug_finish (struct dead_debug *, bitmap); +extern void dead_debug_add (struct dead_debug *, df_ref, unsigned int); +extern int dead_debug_insert_temp (struct dead_debug *, + unsigned int uregno, rtx insn, + enum debug_temp_where); + #endif /* GCC_DF_H */ Index: gcc/df-problems.c === --- gcc/df-problems.c.orig 2012-04-13 06:58:42.053258184 -0300 +++ gcc/df-problems.c 2012-04-13 07:29:33.0 -0300 @@ -2886,25 +2886,6 @@ df_whole_mw_reg_unused_p (struct df_mw_h } -/* Node of a linked list of uses of dead REGs in debug insns. */ -struct dead_debug_use -{ - df_ref use; - struct dead_debug_use *next; -}; - -/* Linked list of the above, with a bitmap of the REGs in the - list. */ -struct dead_debug -{ - struct dead_debug_use *head; - bitmap used; - bitmap to_rescan; -}; - -static void dead_debug_reset (struct dead_debug *, unsigned int); - - /* Set the REG_UNUSED notes for the multiword hardreg defs in INSN based on the bits in LIVE. Do not generate notes for registers in artificial uses. DO_NOT_GEN is updated so that REG_DEAD notes are @@ -2930,7 +2911,7 @@ df_set_unused_notes_for_mw (rtx insn, st { unsigned int regno = mws->start_regno; df_set_note (REG_UNUSED, insn, mws->mw_reg); - dead_debug_reset (debug, regno); + dead_debug_insert_temp (debug, regno, insn, DEBUG_TEMP_AFTER_WITH_REG); #ifdef REG_DEAD_DEBUGGING df_print_note ("adding 1: ", insn, REG_NOTES (insn)); @@ -2945,7 +2926,7 @@ df_set_unused_notes_for_mw (rtx insn, st && !bitmap_bit_p (artificial_uses, r)) { df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]); - dead_debug_reset (debug, r); + dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG); #ifdef REG_DEAD_DEBUGGING df_print_note ("adding 2: ", insn, REG_NOTES (insn)); #endif @@ -3013,12 +2994,12 @@ df_set_dead_notes_for_mw (rtx insn, stru if (df_whole_mw_reg_dead_p (mws, live, artificial_uses, do_not_gen)) { - /* Add a dead note for the entire m
Re: fix incorrect debug temp added by df-problems
On Apr 9, 2012, Jakub Jelinek wrote, in response to my posting to the wrong thread (now fixed): > On Mon, Apr 09, 2012 at 03:29:05AM -0300, Alexandre Oliva wrote: >> + && (!df_ignore_stack_reg (uregno))) > Please remove the extra () around this line, > && !df_ignore_stack_reg (uregno)) > Ok for trunk with that change, thanks. Thanks, here's what I've just installed. This fixes a bug in pr43165.c with -Os -g. for gcc/ChangeLog from Alexandre Oliva * df-problems.c (df_note_bb_compute): Do not take note of debug uses for whose REGs we won't emit DEAD or UNUSED notes. Index: gcc/df-problems.c === --- gcc/df-problems.c.orig 2012-04-13 05:18:28.604788011 -0300 +++ gcc/df-problems.c 2012-04-13 06:58:42.053258184 -0300 @@ -3453,7 +3453,12 @@ df_note_bb_compute (unsigned int bb_inde { if (debug_insn > 0) { - dead_debug_add (&debug, use, uregno); + /* We won't add REG_UNUSED or REG_DEAD notes for + these, so we don't have to mess with them in + debug insns either. */ + if (!bitmap_bit_p (artificial_uses, uregno) + && !df_ignore_stack_reg (uregno)) + dead_debug_add (&debug, use, uregno); continue; } break; -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file
> On Fri, Apr 13, 2012 at 7:34 AM, H.J. Lu wrote: >> On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak wrote: >>> On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich >>> wrote: >>> 2012-04-13 Enkovich Ilya * config/i386/gnu-user-common.h: New. * config/i386/gnu-user.h (CPP_SPEC): Moved to gnu-user-common.h. (CC1_SPEC): Likewise. (ENDFILE_SPEC): Likewise. (DEFAULT_PCC_STRUCT_RETURN): Likewise. (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. (TARGET_OS_CPP_BUILTINS): Likewise. (LIBGCC2_HAS_TF_MODE): Likewise. (LIBGCC2_TF_CEXT): Likewise. (TF_SIZE): Likewise. (TARGET_ASM_FILE_END): Likewise. (STACK_CHECK_MOVING_SP): Likewise. (STACK_CHECK_STATIC_BUILTIN): Likewise. * config/i386/gnu-user64.h: Likewise. >>> >>> OK for mainline SVN, but as Richard said, this is not appropriate for >>> release branches. >>> >> >> Ilya, I will check it for you. There is no need for release branches. >> > > ChangeLog for config.gcc is missing. > > > -- > H.J. Hi H.J., Here is fixed ChangeLog Thanks, Ilya --- 2012-04-13 Enkovich Ilya * config/i386/gnu-user-common.h: New. * config/i386/gnu-user.h (CPP_SPEC): Moved to gnu-user-common.h. (CC1_SPEC): Likewise. (ENDFILE_SPEC): Likewise. (DEFAULT_PCC_STRUCT_RETURN): Likewise. (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise. (TARGET_OS_CPP_BUILTINS): Likewise. (LIBGCC2_HAS_TF_MODE): Likewise. (LIBGCC2_TF_CEXT): Likewise. (TF_SIZE): Likewise. (TARGET_ASM_FILE_END): Likewise. (STACK_CHECK_MOVING_SP): Likewise. (STACK_CHECK_STATIC_BUILTIN): Likewise. * config/i386/gnu-user64.h: Likewise. * config.gcc: Add i386/gnu-user-common.h before all i386/gnu-user.h and i386/gnu-user64.h usages.
Re: fix left-over debug insns in DCE
Il 13/04/2012 17:58, Alexandre Oliva ha scritto: > > I've just installed the patch, but if you find the need for any further > improvement, let me know and I'll do it right away. I wonder if it makes any sense to move the dead_debug_* stuff to its own file... Paolo
Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?
On Fri, Apr 13, 2012 at 01:57:33PM +0200, Rainer Orth wrote: > Richard Guenther writes: > > >> Anyway, the patch I posted previously would risk re-introducing PR > >> 50386 and PR 50326, even though they are very unlikely with just > >> bit-fields. So my current working version is the following, but it > >> causes failure of libmudflap.c++/pass55-frag.cxx execution test so I'm > >> not actually proposing it yet (sigh). > > > > I would not worry about mudflap tests. The patch looks good to my > > eyes. > > Are you sure the failure is new? At least for 64-bit at -O, > libmudflap.c++/pass55-frag.cxx already fails right now (cf. PR > libmudflap/49843). > Well, my patch makes it fail on all other optimization levels too (the supposedly no-optimization variant is actually compiled with -O2). But thanks for pointing me towards the bug, I'll use it as an excuse to commit the patch nevertheless (mainly because I really cannot see anything wrong going on there, at least with respect to SRA and this change). Thanks, Martin
PATCH: Define _ILP32 and __ILP32__ for x32
Hi, This patch defines _ILP32 and __ILP32__ for x32 as specified by x32 psABI. OK for trunk and 4.7 branch? Thanks. H.J. --- 2012-04-13 H.J. Lu * config/i386/i386-c.c (ix86_target_macros): Define _ILP32 and __ILP32__ for x32. diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 8adb3b4..49fd4d9 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -383,6 +383,11 @@ ix86_target_macros (void) cpp_define (parse_in, "__amd64__"); cpp_define (parse_in, "__x86_64"); cpp_define (parse_in, "__x86_64__"); + if (TARGET_X32) + { + cpp_define (parse_in, "_ILP32"); + cpp_define (parse_in, "__ILP32__"); + } } else {
Re: [patch, fortran-dev] Use fixed variable sizes for stride calculations
On Wed, Apr 11, 2012 at 10:51, Thomas Koenig wrote: > Hi, > > >> this patch uses division by known sizes (which can usually be replaced >> by a simple shift because intrinsics have sizes of power of two) instead >> of division by the size extracted from the array descriptor itself. >> >> This should save about 20 cycles for a single calculation. >> >> I'll go through the rest of the library to identify other possibilities >> for this. >> >> Regression-tested, no new failures. >> >> OK for the branch? > > > Full patch at http://gcc.gnu.org/ml/fortran/2012-03/msg00120.html > > Ping? > > Thomas I thought I already approved this a few weeks ago (http://gcc.gnu.org/ml/fortran/2012-03/msg00144.html)? Or if it has been decided that I'm no longer a GFortran reviewer, it would be nice if I'd be informed of this as well. -- Janne Blomqvist
Re: PATCH: Define _ILP32 and __ILP32__ for x32
On Fri, Apr 13, 2012 at 7:14 PM, H.J. Lu wrote: > This patch defines _ILP32 and __ILP32__ for x32 as specified by x32 psABI. > OK for trunk and 4.7 branch? > * config/i386/i386-c.c (ix86_target_macros): Define _ILP32 > and __ILP32__ for x32. OK. Thanks, Uros.
Re: [PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization
Hi, On Fri, Apr 13, 2012 at 04:13:13PM +0200, Richard Guenther wrote: > On Fri, 13 Apr 2012, Martin Jambor wrote: > > > Hi, > > > > currently we ICE when attempting to devirtualize a call to a virtual > > method introduced in a descendant but with a base which is an ancestor > > which does not have it. This is because fold_ctor_reference returns > > constant zero when it cannot find the particular value in the provided > > constructor which is something gimple_get_virt_method_for_binfo cannot > > cope with. The patch below avoids it by simply testing for that > > value and bailing out. > > > > Bootstrapped and tested on x86_64-linux, OK for trunk? > > Ok. I forgot to ask permission also to commit it to the 4.7 branch but I suppose that I can do that too and will go ahead with the same patch soon (after bootstrap and testing on the branch). Thanks, Martin > > Thanks, > Richard. > > > Thanks, > > > > Martin > > > > > > 2012-04-12 Martin Jambor > > > > PR middle-end/52939 > > * gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if > > fold_ctor_reference returns a zero constant. > > > > * testsuite/g++.dg/ipa/pr52939.C: New test. > > > > Index: src/gcc/gimple-fold.c > > === > > --- src.orig/gcc/gimple-fold.c > > +++ src/gcc/gimple-fold.c > > @@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W > >offset += token * size; > >fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v), > > offset, size); > > - if (!fn) > > + if (!fn || integer_zerop (fn)) > > return NULL_TREE; > >gcc_assert (TREE_CODE (fn) == ADDR_EXPR > > || TREE_CODE (fn) == FDESC_EXPR); > > Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C > > === > > --- /dev/null > > +++ src/gcc/testsuite/g++.dg/ipa/pr52939.C > > @@ -0,0 +1,58 @@ > > +/* Verify that we do not ICE on invalid devirtualizations (which might > > + be OK at run-time because never executed). */ > > +/* { dg-do run } */ > > +/* { dg-options "-O3 -fno-early-inlining -fno-inline" } */ > > + > > +extern "C" void abort (void); > > + > > +class A > > +{ > > +public: > > + int data; > > + virtual int foo (int i); > > +}; > > + > > +class B : public A > > +{ > > +public: > > + virtual int foo (int i); > > + virtual int bar (int i); > > +}; > > + > > +int A::foo (int i) > > +{ > > + return i + 1; > > +} > > + > > +int B::foo (int i) > > +{ > > + return i + 2; > > +} > > + > > +int B::bar (int i) > > +{ > > + return i + 3; > > +} > > + > > +static int middleman (class A *obj, int i) > > +{ > > + class B *b = (class B *) obj; > > + > > + if (i != 1) > > +return b->bar (i); > > + else > > +return i; > > +} > > + > > +int __attribute__ ((noinline,noclone)) get_input(void) > > +{ > > + return 1; > > +} > > + > > +int main (int argc, char *argv[]) > > +{ > > + class A o; > > + if (middleman (&o, get_input ()) != 1) > > +abort (); > > + return 0; > > +} > > > > > > -- > Richard Guenther > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
Re: Commit: RL78: Remove use of TODO_dump_func
> The optimization pass flag "TODO_dump_flag" has been removed (see > patch committed 2012-04-11) which was causing the RL78 backend to fail > to build. I am applying the following patch as an obvious fix. Ok, thanks!
C++ PATCH for c++/52915 (accepts-invalid anonymous union in C++11 mode)
C++11 extends unions so that a member can have a non-trivial default constructor, but the union then has a deleted constructor unless the user defines one. As a result, we can't assume that an anonymous union has a trivial default constructor anymore. Tested x86_64-pc-linux-gnu, applying to trunk. commit 761b558950409d024fd509228b1a3e04fcefb38a Author: Jason Merrill Date: Thu Apr 12 18:10:48 2012 -0400 PR c++/52915 * decl2.c (finish_anon_union): Use cp_finish_decl. * error.c (dump_function_name): Avoid showing anonymous "name". diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index b048ac7..212feea 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -1456,12 +1456,7 @@ finish_anon_union (tree anon_union_decl) } pushdecl (anon_union_decl); - if (building_stmt_list_p () - && at_function_scope_p ()) -add_decl_expr (anon_union_decl); - else if (!processing_template_decl) -rest_of_decl_compilation (anon_union_decl, - toplevel_bindings_p (), at_eof); + cp_finish_decl (anon_union_decl, NULL_TREE, false, NULL_TREE, 0); } /* Auxiliary functions to make type signatures for diff --git a/gcc/cp/error.c b/gcc/cp/error.c index ee8f0e0..77eb306 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -1556,6 +1556,8 @@ dump_function_name (tree t, int flags) { if (LAMBDA_TYPE_P (DECL_CONTEXT (t))) name = get_identifier (""); + else if (TYPE_ANONYMOUS_P (DECL_CONTEXT (t))) + name = get_identifier (""); else name = constructor_name (DECL_CONTEXT (t)); } diff --git a/gcc/testsuite/g++.dg/other/anon-union2.C b/gcc/testsuite/g++.dg/other/anon-union2.C new file mode 100644 index 000..31bb74f --- /dev/null +++ b/gcc/testsuite/g++.dg/other/anon-union2.C @@ -0,0 +1,10 @@ +// PR c++/52915 + +struct S { + int val; + S(int v) : val(v) {} +}; + +void f() { + union { S a; }; // { dg-error "constructor|no match" } +}
C++ PATCH for c++/52905 (ICE with invalid list-initialization)
When a list-initialization doesn't quite match either a list constructor or a non-list constructor, we end up trying to compare them in joust and get confused because they have different numbers of parameters. So let's just treat them as unordered; we're going to talk about what's wrong with both of them anyway. Tested x86_64-pc-linux-gnu, applying to trunk. commit fad09b389b18d6f375b07bb680abc7e4590ecae2 Author: Jason Merrill Date: Fri Apr 13 13:01:59 2012 -0400 PR c++/52905 * call.c (joust): Handle comparing list and non-list ctors. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 3c3dabb..46ac55c 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8011,6 +8011,12 @@ joust (struct z_candidate *cand1, struct z_candidate *cand2, bool warn) int static_1 = DECL_STATIC_FUNCTION_P (cand1->fn); int static_2 = DECL_STATIC_FUNCTION_P (cand2->fn); + if (DECL_CONSTRUCTOR_P (cand1->fn) + && is_list_ctor (cand1->fn) != is_list_ctor (cand2->fn)) + /* We're comparing a near-match list constructor and a near-match + non-list constructor. Just treat them as unordered. */ + return 0; + gcc_assert (static_1 != static_2); if (static_1) diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C b/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C new file mode 100644 index 000..82031cb --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C @@ -0,0 +1,13 @@ +// PR c++/52905 +// { dg-options -std=c++11 } + +#include + +enum E { e1, e2 }; +struct A +{ + A(std::initializer_list); // { dg-message "A::A" } + A(int, E); // { dg-message "A::A" } +}; + +A a{e1,2}; // { dg-error "" }
C++ PATCH for c++/52824 (pack expansion and fixed template parameters)
One case that I missed in my patch for PR 35722. Tested x86_64-pc-linux-gnu, applying to trunk and 4.7. commit 9fa7eea3608b19b53cc2f3c9a8195cf811b02d84 Author: Jason Merrill Date: Fri Apr 13 13:37:26 2012 -0400 PR c++/52824 * pt.c (any_pack_expanson_args_p): New. (coerce_template_parms): Use it. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index ee38254..07a2cc0 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -6725,6 +6725,20 @@ coerce_template_parameter_pack (tree parms, return argument_pack; } +/* Returns true if the template argument vector ARGS contains + any pack expansions, false otherwise. */ + +static bool +any_pack_expanson_args_p (tree args) +{ + int i; + if (args) +for (i = 0; i < TREE_VEC_LENGTH (args); ++i) + if (PACK_EXPANSION_P (TREE_VEC_ELT (args, i))) + return true; + return false; +} + /* Convert all template arguments to their appropriate types, and return a vector containing the innermost resulting template arguments. If any error occurs, return error_mark_node. Error and @@ -6790,6 +6804,7 @@ coerce_template_parms (tree parms, if ((nargs > nparms && !variadic_p) || (nargs < nparms - variadic_p && require_all_args + && !any_pack_expanson_args_p (inner_args) && (!use_default_args || (TREE_VEC_ELT (parms, nargs) != error_mark_node && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs)) diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C index 2bc9b11..b23e402 100644 --- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C @@ -2,7 +2,7 @@ // { dg-options "-std=c++0x" } template //#1 -struct foo {}; // { dg-error "provided for|foo" } +struct foo {}; template struct P {}; @@ -10,8 +10,8 @@ struct P {}; template class... TT> struct bar { template -using mem = P...>;//#2 { dg-error "wrong number of|arguments" } +using mem = P...>;//#2 }; -bar::mem b;//#3 { dg-error "invalid type" } +bar::mem b;//#3 diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic123.C b/gcc/testsuite/g++.dg/cpp0x/variadic123.C new file mode 100644 index 000..f0ab9fc --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic123.C @@ -0,0 +1,14 @@ +// PR c++/52824 +// { dg-do compile { target c++11 } } + +template +struct foo +{}; + +template +struct bar : foo +{}; + +int main() { + bar f; +}
C++ PATCH for c++/52824 (pack expansion and fixed template parameters)
One case that I missed in my patch for PR 35722. Tested x86_64-pc-linux-gnu, applying to trunk and 4.7. commit 9fa7eea3608b19b53cc2f3c9a8195cf811b02d84 Author: Jason Merrill Date: Fri Apr 13 13:37:26 2012 -0400 PR c++/52824 * pt.c (any_pack_expanson_args_p): New. (coerce_template_parms): Use it. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index ee38254..07a2cc0 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -6725,6 +6725,20 @@ coerce_template_parameter_pack (tree parms, return argument_pack; } +/* Returns true if the template argument vector ARGS contains + any pack expansions, false otherwise. */ + +static bool +any_pack_expanson_args_p (tree args) +{ + int i; + if (args) +for (i = 0; i < TREE_VEC_LENGTH (args); ++i) + if (PACK_EXPANSION_P (TREE_VEC_ELT (args, i))) + return true; + return false; +} + /* Convert all template arguments to their appropriate types, and return a vector containing the innermost resulting template arguments. If any error occurs, return error_mark_node. Error and @@ -6790,6 +6804,7 @@ coerce_template_parms (tree parms, if ((nargs > nparms && !variadic_p) || (nargs < nparms - variadic_p && require_all_args + && !any_pack_expanson_args_p (inner_args) && (!use_default_args || (TREE_VEC_ELT (parms, nargs) != error_mark_node && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs)) diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C index 2bc9b11..b23e402 100644 --- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C @@ -2,7 +2,7 @@ // { dg-options "-std=c++0x" } template //#1 -struct foo {}; // { dg-error "provided for|foo" } +struct foo {}; template struct P {}; @@ -10,8 +10,8 @@ struct P {}; template class... TT> struct bar { template -using mem = P...>;//#2 { dg-error "wrong number of|arguments" } +using mem = P...>;//#2 }; -bar::mem b;//#3 { dg-error "invalid type" } +bar::mem b;//#3 diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic123.C b/gcc/testsuite/g++.dg/cpp0x/variadic123.C new file mode 100644 index 000..f0ab9fc --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic123.C @@ -0,0 +1,14 @@ +// PR c++/52824 +// { dg-do compile { target c++11 } } + +template +struct foo +{}; + +template +struct bar : foo +{}; + +int main() { + bar f; +}
Re: RFA: consolidate DWARF strings into libiberty
Tom> Here is a new patch for gcc. Tom> I still haven't updated the src side, but there's little to do there Tom> that isn't already done in this patch. Tom> Ok? Tom> Ping. Ping. Tom
Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2
On Apr 13, 2012, at 7:39 AM, Bernhard Reutner-Fischer wrote: > On Fri, Apr 13, 2012 at 06:57:44AM -0700, Mike Stump wrote: >> On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote: >>> Ping. >> >> Before advancing, has the problem that Rainer pointed out on March 19th with >> your earlier patch been fixed? > > I believe that it is fixed, yes. See r185688 and my follow up to him ( > http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01498.html ) Hum, somehow I missed that... Thanks, I was hoping I just missed it.
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Apr 13, 2012, at 7:50 AM, Bernd Schmidt wrote: > The problem is that a human might be > confused, for example due to bad indentation. Whether there's a for in > between doesn't matter for this purpose, the following is most likely a bug: > > if () > for (..) >if () > x > else > y I like the warning only when the indentation of the else does not match the indentation of the matching if. Most humans I think can follow the meaning correctly, if the code is indented correctly. The point of the warning would be to catch typos, and an else that isn't in the right column, is most likely wrong. Though, I'd argue that any else in the wrong column, is most likely wrong...
Re: [Patch, Fortran, F03] PR52909: Procedure pointers not private to modules
On Tue, Apr 10, 2012 at 11:21, Tobias Burnus wrote: > Regarding ABI breakage: [snip] In general I agree that ABI compatibility is something we should take seriously, but OTOH we should take care that the anointed ABI makes sense. Which IMHO would imply that known ABI bugs/misdesigns should be fixed, and e.g. the mod file format should be extensible. And even that may not be enough, considering there are rather big performance issues with mod loading at the moment, which may or may not require a complete redesign of the mod file format as well. So I worry that if we commit to supporting something more or less like the current mod format going forward, will that make solving these problems prohibitively tedious as one would need to at the same time support both the existing, possibly completely different module format, and the new one? Also, ideally we should have a document similar to what C++ has at http://sourcery.mentor.com/public/cxx-abi/ which describes the ABI. Also, while it would be nice to fix the ABI in one go, considering the rather limited manpower we have, maybe it's more feasible to spread it over several releases. Say if we update the array descriptor (and hence library .so version number) in one release, name mangling fixes and other "OOP ABI" issues in another, better .mod file format whenever it gets done, and so forth. And only when we're "done" we fix the ABI and implement the -fabi-version thingy (since inevitably ABI bugs will be found and fixed in the future as well, and also of course new language features may require updating the ABI). -- Janne Blomqvist
Re: [patch, fortran-dev] Use fixed variable sizes for stride calculations
Hi Janne, I thought I already approved this a few weeks ago sorry, I totally missed that. Comes from having a computer crash on you... Thanks for the review! Thomas
RE: [PR tree-optimization/52558]: RFC: questions on store data race
> -Original Message- > From: Aldy Hernandez [mailto:al...@redhat.com] > Sent: Thursday, April 12, 2012 3:12 PM > To: Richard Guenther > Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel > Subject: [PR tree-optimization/52558]: RFC: questions on store data > race > > Here we have a testcase that affects both the C++ memory model and > transactional memory. > > [Hans, this is caused by the same problem that is causing the > speculative register promotion issue you and Torvald pointed me at]. > > In the following testcase (adapted from the PR), the loop invariant > motion pass caches a pre-existing value for g_2, and then performs a > store to g_2 on every path, causing a store data race: > > int g_1 = 1; > int g_2 = 0; > > int func_1(void) > { >int l; >for (l = 0; l < 1234; l++) >{ > if (g_1) >return l; > else >g_2 = 0; <-- Store to g_2 should only happen if !g_1 >} >return 999; > } > > This gets transformed into something like: > > g_2_lsm = g_2; > if (g_1) { > g_2 = g_2_lsm; // boo! hiss! > return 0; > } else { > g_2_lsm = 0; > g_2 = g_2_lsm; > } > > The spurious write to g_2 could cause a data race. Presumably the g_2_lsm = g_2 is actually outside the loop? Why does the second g_2 = g_2_lsm; get introduced? I would have expected it before the return. Was the example just over-abbreviated? Other than that, this sounds right to me. So does Richard's flag-based version, which is the approach I would have originally expected. But that clearly costs you a register. It would be interesting to see how they compare. Hans > > Andrew has suggested verifying that the store to g_2 was actually > different than on entry, and letting PHI copy propagation optimize the > redundant comparisons. Like this: > > g_2_lsm = g_2; > if (g_1) { > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; // hopefully optimized away > return 0; > } else { > g_2_lsm = 0; > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; > } > > ...which PHI copy propagation and/or friends should be able to > optimize. > > For that matter, regardless of the memory model, the proposed solution > would improve the existing code by removing the extra store (in this > contrived test case anyhow). > > Anyone see a problem with this approach (see attached patch)? > > Assuming the unlikely scenario that everyone likes this :), I have two > problems that I would like answered. But feel free to ignore if the > approach is a no go. > > 1. No pass can figure out the equality (or inequality) of g_2_lsm and > g_2. In the PR, Richard mentions that both FRE/PRE can figure this > out, but they are not run after store motion. DOM should also be able > to, but apparently does not :(. Tips? > > 2. The GIMPLE_CONDs I create in this patch are currently causing > problems with complex floats/doubles. do_jump is complaining that it > can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) > to compare complex values since complex lowering has already happened > (??). Is there a more canonical way of creating a GIMPLE_COND that may > contain complex floats at this stage? > > Thanks folks.
RE: [PR tree-optimization/52558]: RFC: questions on store data race
> From: Richard Guenther [mailto:richard.guent...@gmail.com] > Can we _remove_ a store we percieve as redundant (with a single-threaded > view) with the memory model? Generally yes, so long as synchronization operations either conservatively treated as completely opaque, or are treated correctly in the "single-threaded view". If there is no synchronization between the original store, and the redundant one, then the redundant one changes things only if another thread writes to the same variable in-between. That constitutes a data race, which invokes undefined behavior. The general rule is that any sequentially correct transformation is OK between synchronization operations, so long as you don't store to anything you weren't supposed to modify, and the state at the next synchronization point is what would have been expected. You can sometimes treat synchronizations more aggressively, but that should be safe. Hans
Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings
On Fri, 2012-04-13 at 16:50 +0200, Bernd Schmidt wrote: > On 04/13/2012 04:44 PM, Jakub Jelinek wrote: > > On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote: > >> On 04/13/2012 04:20 PM, Mike Stump wrote: > >>> On Apr 13, 2012, at 5:30 AM, NightStrike wrote: > > no warning from trunk. Which GCC version emits this warning? > >>> > Looks like cygwin gcc 3.4.4 > >>> > >>> 3.4.4 is a little old now.. We'd encourage an upgrade to a fine new > >>> compiler... :-) > >> > >> The thing is, the else really is ambiguous, so it looks like we have a > >> warning regression. > > > > To the compiler? There is for in between, no need to put extra braces IMHO. > > I believe it has been intentionally fixed for 4.0, but my bisect tree only > > goes back to r9. > > Well, to the compiler even > if () > if () > x > else > y > > has an unambiguous meaning. The problem is that a human might be > confused, for example due to bad indentation. For me this warning is almost always a false positive since I am unable to turn it of selectively. I am programming in C++. My main use of if () ; else y is to get a temporary scope that doesn't have any braces, e.g. in this macro #define DEBUG if (!enable_debugging) ; else debug_object() where debug_object is an instance of std::ostream. I then use it like DEBUG << "fancy message goes here"; Now, writing if (condition) DEBUG << "ooops, condition is true"; is not that far-fetched and obviously triggers the warning but I have a hard time seeing how that is confusing to a casual reader of the code. (In order to avoid this warning I am currently writing the above code as #define DEBUG switch (!enable_debugging) case false: debug_object() but that makes the define harder to understand and so the net result of the warning is to reduce the clarity of the code) /MF
Re: [PATCH] Fix for PR52081 - Missed tail merging with pure calls
On 06/03/12 15:21, Richard Guenther wrote: > On Mon, Feb 13, 2012 at 1:36 PM, Tom de Vries wrote: >> On 13/02/12 12:54, Richard Guenther wrote: >>> On Thu, Feb 2, 2012 at 11:44 AM, Tom de Vries >>> wrote: Richard, this patch fixes PR52801. Consider test-case pr51879-12.c: ... __attribute__((pure)) int bar (int); __attribute__((pure)) int bar2 (int); void baz (int); int x, z; void foo (int y) { int a = 0; if (y == 6) { a += bar (7); a += bar2 (6); } else { a += bar2 (6); a += bar (7); } baz (a); } ... When compiling at -O2, pr51879-12.c.094t.pre looks like this: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # VUSE <.MEMD.1722_12(D)> # USE = nonlocal escaped D.1717_4 = barD.1703 (7); # VUSE <.MEMD.1722_12(D)> # USE = nonlocal escaped D.1718_6 = bar2D.1705 (6); aD.1713_7 = D.1717_4 + D.1718_6; goto ; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # VUSE <.MEMD.1722_12(D)> # USE = nonlocal escaped D.1720_8 = bar2D.1705 (6); # VUSE <.MEMD.1722_12(D)> # USE = nonlocal escaped D.1721_10 = barD.1703 (7); aD.1713_11 = D.1720_8 + D.1721_10; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1713_1 = PHI # .MEMD.1722_13 = VDEF <.MEMD.1722_12(D)> # USE = nonlocal # CLB = nonlocal bazD.1707 (aD.1713_1); # VUSE <.MEMD.1722_13> return; ... block 3 and 4 can be tail-merged. Value numbering numbers the two phi arguments a_7 and a_11 the same so the problem is not in value numbering: ... Setting value number of a_11 to a_7 (changed) ... There are 2 reasons that tail_merge_optimize doesn't optimize this: 1. The clause is_gimple_assign (stmt) && local_def (gimple_get_lhs (stmt)) && !gimple_has_side_effects (stmt) used in both same_succ_hash and gsi_advance_bw_nondebug_nonlocal evaluates to false for pure calls. This is fixed by replacing is_gimple_assign with gimple_has_lhs. 2. In same_succ_equal we check gimples from the 2 bbs side-by-side: ... gsi1 = gsi_start_nondebug_bb (bb1); gsi2 = gsi_start_nondebug_bb (bb2); while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2))) { s1 = gsi_stmt (gsi1); s2 = gsi_stmt (gsi2); if (gimple_code (s1) != gimple_code (s2)) return 0; if (is_gimple_call (s1) && !gimple_call_same_target_p (s1, s2)) return 0; gsi_next_nondebug (&gsi1); gsi_next_nondebug (&gsi2); } ... and we'll be comparing 'bar (7)' and 'bar2 (6)', and gimple_call_same_target_p will return false. This is fixed by ignoring local defs in this comparison, by using gsi_advance_fw_nondebug_nonlocal on the iterators. bootstrapped and reg-tested on x86_64. ok for stage1? >>> >>> Sorry for responding so late ... >> >> no problem :) >> >>> I think these fixes hint at that we should >>> use "structural" equality as fallback if value-numbering doesn't equate >>> two stmt effects. Thus, treat two stmts with exactly the same operands >>> and flags as equal and using value-numbering to canonicalize operands >>> (when they are SSA names) for that comparison, or use VN entirely >>> if there are no side-effects on the stmt. >>> >>> Changing value-numbering of virtual operands, even if it looks correct in >>> the >>> simple cases you change, doesn't look like a general solution for the missed >>> tail merging opportunities. >>> >> >> Your comment is relevant for the other recent tail-merge related fixes I >> submitted, but I think not for this one. >> >> In this case, value-numbering manages to value number the 2 phi-alternatives >> equal. It's tail-merging that doesn't take advantage of this, by treating >> pure >> function calls the same as non-pure function calls. The fixes are therefore >> in >> tail-merging, not in value numbering. >> >> So, ok for stage1? > > I see. A few comments. > > +/* Returns whether VAL is used in the same bb as in which it is defined, or > + in the phi of a successor bb. */ > + > +static bool > +local_def (tree val) > +{ > + gimple stmt, def_stmt; > + basic_block bb, def_bb; > + imm_use_iterator iter; > + bool res; > + > + if (TREE_CODE (val) != SSA_NAME) > +return false; > > what about SSA_NAME_IS_DEFAULT_DEF names? They have a def-stmt > with a NULL basic-block. As you suggested below, I've inlined local_def into stmt_local_def, so val is now initialized with DEF_FROM_PTR (SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF)