Re: Bogus gcc.c-torture/execute/20071018-1.c testcase?
On Sat, 31 Dec 2011, Mark Kettenis wrote: > Execution of the test randomly fails for me on OpenBSD/amd64. Looking > at the code, it seems it is doing an out-of-bounds array access. For > refernce I've copied the code of the testcase below. As you can see > there's a foo(0) call in main(). Therefore > > struct foo **upper = &as->x[rank * 8 - 1]; > > becomes > > struct foo **upper = &as->x[-1]; > > so upper points to an address before the malloc()'ed memory. Then > when the code does > > *upper = 0; > > this generates a SIGSEGV, if the malloc()'ed memory happens to lie > right at the start of a page. I suppose that may never happen on some > platforms (Linux?) since many malloc() implementations will use the > start of a page for their own bookkeeping. > > I don't really understand what the testcase is testing. Richard, can > you perhaps shed some light on this? The testcase was added when trying to fix PR32921, I do not remember whether it was relevant that an out-of-bounds array access occured (but I suppose it was so). It was a miscompile, so a runtime testcase is required. I suppose changing the testcase to do &as->x[rank * 8 - 8] and pass in rank == 1 would make it fail similarly on rev. 129484 (or better make it &as->x[rank * 2 = 2]). Richard. > Thanks, > > Mark > > --- > > extern void abort(void); > > struct foo { > int rank; > char *name; > }; > > struct mem { > struct foo *x[4]; > }; > > void __attribute__((noinline)) bar(struct foo **f) > { > *f = __builtin_malloc(sizeof(struct foo)); > } > struct foo * foo(int rank) > { > void *x = __builtin_malloc(sizeof(struct mem)); > struct mem *as = x; > struct foo **upper = &as->x[rank * 8 - 1]; > *upper = 0; > bar(upper); > return *upper; > } > > int main() > { > if (foo(0) == 0) > abort (); > return 0; > } > > -- Richard Guenther SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
Re: Lingering tbaa in anti_dependence?
On Fri, Dec 30, 2011 at 6:53 PM, Jakub Jelinek wrote: > On Thu, Dec 29, 2011 at 04:24:31PM +, Richard Sandiford wrote: >> AIUI, the outcome of PR38964 was that we can't use TBAA for testing an >> anti_dependence between a load X and store Y because Y might be defining >> a new object in the same space as the object that was being read by X. >> But it looks like we still use component-based disambiguation >> (nonoverlapping_component_refs_p) in this case. Is it true that >> that's also a problem? E.g. for: >> >> struct s { int f; float g; }; >> struct t { int header; struct s s; }; >> >> float foo (struct t *newt, struct s *olds, int x, int y) >> { >> float ret = olds[x * y].g; >> newt->header = 0; >> newt->s.f = 1; >> newt->s.g = 1.0; >> return ret; >> } >> >> we can (and on ARM Cortex A8, do) move the store to newt->s.f above >> the load from olds[...].g. If we view the assignment to newt >> as defining a new object in the same space as the now-defunct olds, >> and if x * y happens to be zero, then the accesses might well be >> to the same address. > > You would need a placement new in between the read from olds[x * y].g > and newt->* stores, without that it is definitely valid to move it > ahead of the store. > IMHO we should keep placement new in the IL in some form, and decide how > conservative we need to be based on whether the current function > or current loop etc. contains (or could contain) a placement new or not. I disagree. Any representation of placement new would pessimize code more than it is pessimized now. You also will lose the advantage that the current implementation of our TBAA semantics integrate nicely with the type-punning through union stuff which you'd otherwise have to make explicit (thus, lower to the proposed placement-new style), too. Also consider placement new in a function (it would be pure/const), also consider that placement new is simply an inline function returning its argument. > E.g. if a loop doesn't contain placement new anywhere, we could be less > conservative in optimizing that loop, do more vectorization on it etc. For loops we can just do more proper dependence analysis. Richard. > Jakub
RFC: Handle conditional expression in sccvn/fre/pre
Hi, Since SCCVN operates on SSA graph instead of the control flow graph for the sake of efficiency, it does not handle or value number the conditional expression of GIMPLE_COND statement. As a result, FRE/PRE does not simplify conditional expression, as reported in bug 30997. Since it would be complicate and difficult to process conditional expression in currently SCCVN algorithm, how about following method? STEP1 Before starting FRE/PRE, we can factor out the conditional expression, like change following codes: if (cond_expr) goto lable_a else goto label_b into codes: tmp = cond_expr if (tmp == 1) goto label_a else goto label_b STEP2 Let SCCVN/FRE/PRE do its job on value numbering cond_expr and redundancy elimination; STEP3 After FRE/PRE, for those "tmp=cond_expr" not used in any redundancy elimination, we can forward it to the corresponding GIMPLE_COND statement, just like tree-ssa-forwprop.c. In this way, the conditional expression will be handled as other expressions and no redundant assignment generated. Most important,this does not affect the current implementation of SCCVN/FRE/PRE. The only problem is the method cannot work on reversion of conditional expression. For example: x = a > 2; if (a<=2) goto label_a else goto lable_b could be optimized as: x = a > 2 if (x == 0) goto label_a else goto label_b I have worked a draft patch to do the work and would like to hear your comments on this. Thanks very much. -- Best Regards.
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 12:37 PM, Amker.Cheng wrote: > Hi, > Since SCCVN operates on SSA graph instead of the control flow graph > for the sake of efficiency, > it does not handle or value number the conditional expression of > GIMPLE_COND statement. > As a result, FRE/PRE does not simplify conditional expression, as > reported in bug 30997. > > Since it would be complicate and difficult to process conditional > expression in currently SCCVN > algorithm, how about following method? > > STEP1 Before starting FRE/PRE, we can factor out the conditional > expression, like change following > codes: > > if (cond_expr) > goto lable_a > else > goto label_b > > into codes: > > tmp = cond_expr > if (tmp == 1) > goto label_a > else > goto label_b > > STEP2 Let SCCVN/FRE/PRE do its job on value numbering cond_expr and > redundancy elimination; > STEP3 After FRE/PRE, for those "tmp=cond_expr" not used in any > redundancy elimination, > we can forward it to the corresponding GIMPLE_COND statement, just > like tree-ssa-forwprop.c. > > In this way, the conditional expression will be handled as other > expressions and no > redundant assignment generated. > Most important,this does not affect the current implementation of > SCCVN/FRE/PRE. > > The only problem is the method cannot work on reversion of conditional > expression. > For example: > > x = a > 2; > if (a<=2) > goto label_a > else > goto lable_b > could be optimized as: > > x = a > 2 > if (x == 0) > goto label_a > else > goto label_b > I have worked a draft patch to do the work and would like to hear your > comments on this. I've previously worked on changing GIMPLE_COND to no longer embed the comparison but carry a predicate SSA_NAME only (this is effectively what you do as pre-processing before SCCVN). It had some non-trivial fallout (for example PRE get's quite confused and ends up separating conditionals and jumps too far ...) so I didn't finish it. A subset of all cases can be catched by simply looking up the N-ary at eliminate () time and re-writing the GIMPLE_COND to use the predicate - which might not actually be beneficial (but forwprop will undo not beneficial cases - hopefully). In the end I'd rather go the way changing the GIMPLE IL to not embed the comparison in the GIMPLE_COND - that reduces the amount of redundant way we can express the same thing. Richard. > Thanks very much. > -- > Best Regards.
Re: A case exposing code sink issue
On Thu, Dec 29, 2011 at 9:50 AM, Jiangning Liu wrote: > > >> -Original Message- >> From: Jiangning Liu >> Sent: Wednesday, December 28, 2011 5:38 PM >> To: Jiangning Liu; 'Richard Guenther' >> Cc: Michael Matz; gcc@gcc.gnu.org >> Subject: RE: A case exposing code sink issue >> >> >> >> > -Original Message- >> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf >> Of >> > Jiangning Liu >> > Sent: Tuesday, December 27, 2011 5:10 PM >> > To: 'Richard Guenther' >> > Cc: Michael Matz; gcc@gcc.gnu.org >> > Subject: RE: A case exposing code sink issue >> > >> > > >> > > The job to do this is final value replacement, not sinking (we do >> not >> > > sink non-invariant expressions - you'd have to translate them >> through >> > > the loop-closed SSA exit PHI node, certainly doable, patches >> > > welcome ;)). >> > > >> > >> > Richard, >> > >> > In final value replacement, expression "&a + D." can be figured >> out, >> > while "&a[i_xxx]" failed to be CHRECed, so I'm wondering if we should >> > lower >> > &a[i_xxx] to "&a + unitsize(a) * i_xxx" first? It seems GCC intends >> to >> > keep >> > &a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC >> > algorithm to get it calculated? >> > >> > Appreciate your kindly help in advance! >> > >> >> Richard, >> >> Now I have a patch working for the case of step "i++", by directly >> modifying >> scalar evolution algorithm. the following code would be generated after >> SCCP, >> l >> # i_13 = PHI >> a_p.0_4 = &a[i_13]; >> MEM[(int *)&a][i_13] = 100; >> i_6 = i_13 + 1; >> if (i_6 <= 999) >> goto ; >> else >> goto ; >> >> : >> a_p_lsm.5_11 = &MEM[(void *)&a + 3996B]; >> a_p = a_p_lsm.5_11; >> goto ; >> >> It looks good, but I still have problem when the case has step "i+=k". >> >> For this case the value of variable i exiting loop isn't invariant, the >> algorithm below in scalar evolution doesn't work on it, >> >> compute_overall_effect_of_inner_loop() >> { >> ... >> tree nb_iter = number_of_latch_executions (inner_loop); >> >> if (nb_iter == chrec_dont_know) >> return chrec_dont_know; >> else >> { >> tree res; >> >> /* evolution_fn is the evolution function in LOOP. Get >> its value in the nb_iter-th iteration. */ >> res = chrec_apply (inner_loop->num, evolution_fn, nb_iter); >> >> if (chrec_contains_symbols_defined_in_loop (res, loop->num)) >> res = instantiate_parameters (loop, res); >> >> /* Continue the computation until ending on a parent of LOOP. >> */ >> return compute_overall_effect_of_inner_loop (loop, res); >> } >> } >> >> In theory, we can still have the transformation like below even if the >> step >> is "i+=k", >> >> # i_13 = PHI >> i_14 = i_13, >> a_p.0_4 = &a[i_13]; >> MEM[(int *)&a][i_13] = 100; >> i_6 = i_13 + k_2(D); // i+=k >> if (i_6 <= 999) >> goto ; >> else >> goto ; >> >> : >> a_p_lsm.5_11 = &a[i_14]; >> a_p = a_p_lsm.5_11; >> goto ; >> >> But I realize this is not a loop closed SSA form at all, because i_14 >> is >> being used out of the loop. Where could we extend the liverange of >> variable >> i in GCC infrastructure and finally solve this problem? >> > > It seems many people are still in the happy of the upcoming 2012 New Year. > :-) > > Following my story, I find the following code in tree-ssa-copy.c > > /* Avoid copy propagation from an inner into an outer loop. > Otherwise, this may move loop variant variables outside of > their loops and prevent coalescing opportunities. If the > value was loop invariant, it will be hoisted by LICM and > exposed for copy propagation. > ??? The value will be always loop invariant. > In loop-closed SSA form do not copy-propagate through > PHI nodes in blocks with a loop exit edge predecessor. */ > if (current_loops > && TREE_CODE (arg_value) == SSA_NAME > && (loop_depth_of_name (arg_value) > loop_depth_of_name (lhs) > || (loops_state_satisfies_p (LOOP_CLOSED_SSA) > && loop_exit_edge_p (e->src->loop_father, e > { > phi_val.value = lhs; > break; > } > > Here http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00066.html, Dan said > > "The original check was not because of coalescing, but because we would > copy prop in-loop variables outside the loop, causing *more* > invariantness in nested loops." > > Can anybody give me a concrete example to explain this statement? I can neither make sense of the code nor of Dans comment. But of course the part about loop-closed-SSA is still true. I _suppose_ that the situation is like tem = stuff; for () x = tem; foo = x; and Dan expects LIM to hoist x = tem (instead of "sinking" it by copyprop). Then we'd coalesce tem and x (but it would be still live acr
Re: RFC: Handle conditional expression in sccvn/fre/pre
Thanks Richard, On Mon, Jan 2, 2012 at 8:33 PM, Richard Guenther wrote: > > I've previously worked on changing GIMPLE_COND to no longer embed > the comparison but carry a predicate SSA_NAME only (this is effectively > what you do as pre-processing before SCCVN). It had some non-trivial > fallout (for example PRE get's quite confused and ends up separating > conditionals and jumps too far ...) so I didn't finish it. Here changing GIMPLE_COND to no longer embed the comparison, do you mean this only in fre/pre passes or in common? If only in fre/pre passes, when and how these changed GIMPLE_COND be changed back to normal ones? If in common, won't this affects passes working on GIMPLE_COND, like tree-ssa-forwprop.c? > > A subset of all cases can be catched by simply looking up the > N-ary at eliminate () time and re-writing the GIMPLE_COND to use > the predicate - which might not actually be beneficial (but forwprop > will undo not beneficial cases - hopefully). > > In the end I'd rather go the way changing the GIMPLE IL to not > embed the comparison in the GIMPLE_COND - that reduces > the amount of redundant way we can express the same thing. Will you try to handle the reversion comparison case as mentioned in my previous message? I guess this needs both sccvn and fre/pre's work. It would be great to hear your thoughts on this. Thanks -- Best Regards.
-fstack-protector, __stack_chk_fail_local and TARGET_LIBC_PROVIDES_SSP
Hi, I ran into an issue with -fstack-protector on FreeBSD/i386. GCC generates calls to __stack_chk_fail_local that the linker complains are undefined. The following test case shows it: % cat test.c int main( void ) { return( 0 ); } % gcc46 -o test test.c -fstack-protector-all -fPIE /var/tmp//ccjYQxKu.o: In function `main': test.c:(.text+0x37): undefined reference to `__stack_chk_fail_local' /usr/local/bin/ld: test: hidden symbol `__stack_chk_fail_local' isn't defined /usr/local/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status I managed to fix the problem with the following one line change: --- gcc/gcc.c.orig +++ gcc/gcc.c @@ -601,7 +601,7 @@ #ifndef LINK_SSP_SPEC #ifdef TARGET_LIBC_PROVIDES_SSP -#define LINK_SSP_SPEC "%{fstack-protector:}" +#define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all:-lssp_nonshared}" #else #define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all:-lssp_nonshared -lssp}" #endif The configure script detects __stack_chk_fail in FreeBSD libc and defines TARGET_LIBC_PROVIDES_SSP, but __stack_chk_fail_local should be statically linked in so (a dynamic) libc should not provide it. My question is now whether the problem is with FreeBSD's SSP implementation (where exactly does GCC expect __stack_chk_fail_local to be defined?) or with GCC (should GCC just always link in ssp_nonshared as in the patch above?).
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 2:11 PM, Amker.Cheng wrote: > Thanks Richard, > > On Mon, Jan 2, 2012 at 8:33 PM, Richard Guenther > wrote: >> >> I've previously worked on changing GIMPLE_COND to no longer embed >> the comparison but carry a predicate SSA_NAME only (this is effectively >> what you do as pre-processing before SCCVN). It had some non-trivial >> fallout (for example PRE get's quite confused and ends up separating >> conditionals and jumps too far ...) so I didn't finish it. > Here changing GIMPLE_COND to no longer embed the comparison, > do you mean this only in fre/pre passes or in common? > If only in fre/pre passes, when and how these changed GIMPLE_COND > be changed back to normal ones? > If in common, won't this affects passes working on GIMPLE_COND, like > tree-ssa-forwprop.c? Everywhere. I posted a patch early last year and changed COND_EXPRs on the RHS of an assignment later. >> >> A subset of all cases can be catched by simply looking up the >> N-ary at eliminate () time and re-writing the GIMPLE_COND to use >> the predicate - which might not actually be beneficial (but forwprop >> will undo not beneficial cases - hopefully). >> >> In the end I'd rather go the way changing the GIMPLE IL to not >> embed the comparison in the GIMPLE_COND - that reduces >> the amount of redundant way we can express the same thing. > Will you try to handle the reversion comparison case as mentioned > in my previous message? I guess this needs both sccvn and fre/pre's > work. It would be great to hear your thoughts on this. Well, with Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 182784) +++ gcc/tree-ssa-pre.c (working copy) @@ -4335,16 +4335,23 @@ eliminate (void) available value-numbers. */ else if (gimple_code (stmt) == GIMPLE_COND) { - tree op0 = gimple_cond_lhs (stmt); - tree op1 = gimple_cond_rhs (stmt); + tree op[2]; tree result; + vn_nary_op_t nary; - if (TREE_CODE (op0) == SSA_NAME) - op0 = VN_INFO (op0)->valnum; - if (TREE_CODE (op1) == SSA_NAME) - op1 = VN_INFO (op1)->valnum; + op[0] = gimple_cond_lhs (stmt); + op[1] = gimple_cond_rhs (stmt); + if (TREE_CODE (op[0]) == SSA_NAME) + op[0] = VN_INFO (op[0])->valnum; + if (TREE_CODE (op[1]) == SSA_NAME) + op[1] = VN_INFO (op[1])->valnum; result = fold_binary (gimple_cond_code (stmt), boolean_type_node, - op0, op1); + op[0], op[1]); + if (!result) + result = vn_nary_op_lookup_pieces (2, gimple_cond_code (stmt), + boolean_type_node, + op, &nary); + if (result && TREE_CODE (result) == INTEGER_CST) { if (integer_zerop (result)) @@ -4354,6 +4361,13 @@ eliminate (void) update_stmt (stmt); todo = TODO_cleanup_cfg; } + else if (result && TREE_CODE (result) == SSA_NAME) + { + gimple_cond_set_code (stmt, NE_EXPR); + gimple_cond_set_lhs (stmt, result); + gimple_cond_set_rhs (stmt, boolean_false_node); + update_stmt (stmt); + } } /* Visit indirect calls and turn them into direct calls if possible. */ you get the CSE (too simple patch, you need to check leaders properly). You can then add similar lookups for an inverted conditional. Richard. > Thanks > > -- > Best Regards.
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther wrote: > Well, with > > Index: gcc/tree-ssa-pre.c > === > --- gcc/tree-ssa-pre.c (revision 182784) > +++ gcc/tree-ssa-pre.c (working copy) > @@ -4335,16 +4335,23 @@ eliminate (void) > available value-numbers. */ > else if (gimple_code (stmt) == GIMPLE_COND) > { > - tree op0 = gimple_cond_lhs (stmt); > - tree op1 = gimple_cond_rhs (stmt); > + tree op[2]; > tree result; > + vn_nary_op_t nary; > > - if (TREE_CODE (op0) == SSA_NAME) > - op0 = VN_INFO (op0)->valnum; > - if (TREE_CODE (op1) == SSA_NAME) > - op1 = VN_INFO (op1)->valnum; > + op[0] = gimple_cond_lhs (stmt); > + op[1] = gimple_cond_rhs (stmt); > + if (TREE_CODE (op[0]) == SSA_NAME) > + op[0] = VN_INFO (op[0])->valnum; > + if (TREE_CODE (op[1]) == SSA_NAME) > + op[1] = VN_INFO (op[1])->valnum; > result = fold_binary (gimple_cond_code (stmt), boolean_type_node, > - op0, op1); > + op[0], op[1]); > + if (!result) > + result = vn_nary_op_lookup_pieces (2, gimple_cond_code (stmt), > + boolean_type_node, > + op, &nary); > + > if (result && TREE_CODE (result) == INTEGER_CST) > { > if (integer_zerop (result)) > @@ -4354,6 +4361,13 @@ eliminate (void) > update_stmt (stmt); > todo = TODO_cleanup_cfg; > } > + else if (result && TREE_CODE (result) == SSA_NAME) > + { > + gimple_cond_set_code (stmt, NE_EXPR); > + gimple_cond_set_lhs (stmt, result); > + gimple_cond_set_rhs (stmt, boolean_false_node); > + update_stmt (stmt); > + } > } > /* Visit indirect calls and turn them into direct calls if > possible. */ > > you get the CSE (too simple patch, you need to check leaders properly). > You can then add similar lookups for an inverted conditional. Thanks for your explanation. On shortcoming of this method is that it cannot find/take cond_expr(and the implicitly defined variable) as the leader in pre. I guess this is why you said it can handle a subset of all cases in previous message? on the other hand, I like this method, given the simplicity especially. :) -- Best Regards.
Re: fixed_scalar_and_varying_struct_p and varies_p
Thanks for both replies. Richard Guenther writes: > On Thu, Dec 29, 2011 at 8:48 PM, Eric Botcazou wrote: >>> fixed_scalar_and_varying_struct_p passes an _address_ rather than a MEM. >>> So in these cases fixed_scalar_and_varying_struct_p effectively becomes >>> a no-op on targets that don't allow MEMs in addresses and takes on >>> suspicious semantics for those that do. In the former case, every >>> address is treated as "unvarying" and f_s_a_v_s_p always returns null. >>> In the latter case, things like REG addresses are (wrongly) treated as >>> unvarying while a MEM address might correctly be treated as varying, >>> leading to false positives. >>> >>> It looks like this goes back to when fixed_scalar_and_varying_struct_p >>> was added in r24759 (1999). >> >> Does this mean that MEM_IN_STRUCT_P and MEM_SCALAR_P have also been >> effectively disabled since then? Some important callers (cse.c and sched-deps.c) do use the proper varies_p routine, so it probably isn't quite that extreme. But... >>> AIUI, the true_dependence varies_p parameter exists for the benefit >>> of CSE, so that it can use its local cse_rtx_varies_p function. >>> All other callers should be using rtx_varies_p instead. Question is, >>> should I make that change, or is it time to get rid of >>> fixed_scalar_and_varying_struct_p instead? >> >> I'd vote for the latter (and for eliminating MEM_IN_STRUCT_P and MEM_SCALAR_P >> in the process, if the answer to the above question is positive), there is no >> point in resurrecting this now IMO. > > I agree. The tree level routines should be able to figure most of, if > not all, cases > on their own via rtx_refs_may_alias_p (similar to the > nonoverlapping_component_refs > case which we could simply delete as well). ...that's 2 votes for and none so far against. :-) I compiled the cc1 .ii files on x86_64-linux-gnu with and without fixed_scalar_and_varying_struct_p. There were 19 changes in total, all of them cases where sched2 was able to reorder two memory accesses because of f_s_a_v_s_p. I've attached the diff below. A good example is: if (bit_obstack) { element = bit_obstack->elements; if (element) /* Use up the inner list first before looking at the next element of the outer list. */ if (element->next) { bit_obstack->elements = element->next; bit_obstack->elements->prev = element->prev; } else /* Inner list was just a singleton. */ bit_obstack->elements = element->prev; else element = XOBNEW (&bit_obstack->obstack, bitmap_element); } else { element = bitmap_ggc_free; if (element) /* Use up the inner list first before looking at the next element of the outer list. */ if (element->next) { bitmap_ggc_free = element->next; bitmap_ggc_free->prev = element->prev; } else /* Inner list was just a singleton. */ bitmap_ggc_free = element->prev; else element = ggc_alloc_bitmap_element_def (); } from bitmap.c, specifically: bitmap_ggc_free = element->next; bitmap_ggc_free->prev = element->prev; Without f_s_a_v_s_p, sched2 couldn't tell that element->prev didn't alias bitmap_ggc_free. And this in turn is because cfgcleanup considered trying to merge this block with: bit_obstack->elements = element->next; bit_obstack->elements->prev = element->prev; It called merge_memattrs for each pair of instructions that it was thinking of merging, and because the element->next and element->prev MEMs were based on different SSA names, we lost the MEM_EXPR completely. As it happens, we decided not to merge the blocks after all. So an obvious first observation is that query functions like flow_find_cross_jump and flow_find_head_matching_sequence shouldn't change the rtl. We should only do that once we've decided which instructions we're actually going to merge. Of course, that's not a trivial change. It's easy to make try_head_merge_bb call merge_memattrs during merging, but less easy for try_crossjump_to_edge and cond_exec_process_if_block. (Note that the latter, like try_head_merge_bb, can end up merging fewer instructions than flow_find_* saw). But does the choice of SSA name actually count for anything this late on? Should we consider MEM_EXPRs node_X->prev and node_Y->prev to be "similar enough", if node_X and node_Y have equal types? I've attached a patch to remove fixed_scalar_and_varying_struct_p just in case it's OK. Tested on mips64-linux-gnu. Also, as Eric says, this is really the only middle-end use of MEM_SCALAR and MEM_IN_STRUCT_P. It looks like the only other use is in config/m32c/m32c.c:m32c_immd_dbl_mov. TBH, I don't really understand what that function is trying to test, so I can't tell whether it should be using MEM_EXPR instead. I've attached a patch
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 3:09 PM, Amker.Cheng wrote: > On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther > wrote: > >> Well, with >> >> Index: gcc/tree-ssa-pre.c >> === >> --- gcc/tree-ssa-pre.c (revision 182784) >> +++ gcc/tree-ssa-pre.c (working copy) >> @@ -4335,16 +4335,23 @@ eliminate (void) >> available value-numbers. */ >> else if (gimple_code (stmt) == GIMPLE_COND) >> { >> - tree op0 = gimple_cond_lhs (stmt); >> - tree op1 = gimple_cond_rhs (stmt); >> + tree op[2]; >> tree result; >> + vn_nary_op_t nary; >> >> - if (TREE_CODE (op0) == SSA_NAME) >> - op0 = VN_INFO (op0)->valnum; >> - if (TREE_CODE (op1) == SSA_NAME) >> - op1 = VN_INFO (op1)->valnum; >> + op[0] = gimple_cond_lhs (stmt); >> + op[1] = gimple_cond_rhs (stmt); >> + if (TREE_CODE (op[0]) == SSA_NAME) >> + op[0] = VN_INFO (op[0])->valnum; >> + if (TREE_CODE (op[1]) == SSA_NAME) >> + op[1] = VN_INFO (op[1])->valnum; >> result = fold_binary (gimple_cond_code (stmt), >> boolean_type_node, >> - op0, op1); >> + op[0], op[1]); >> + if (!result) >> + result = vn_nary_op_lookup_pieces (2, gimple_cond_code >> (stmt), >> + boolean_type_node, >> + op, &nary); >> + >> if (result && TREE_CODE (result) == INTEGER_CST) >> { >> if (integer_zerop (result)) >> @@ -4354,6 +4361,13 @@ eliminate (void) >> update_stmt (stmt); >> todo = TODO_cleanup_cfg; >> } >> + else if (result && TREE_CODE (result) == SSA_NAME) >> + { >> + gimple_cond_set_code (stmt, NE_EXPR); >> + gimple_cond_set_lhs (stmt, result); >> + gimple_cond_set_rhs (stmt, boolean_false_node); >> + update_stmt (stmt); >> + } >> } >> /* Visit indirect calls and turn them into direct calls if >> possible. */ >> >> you get the CSE (too simple patch, you need to check leaders properly). >> You can then add similar lookups for an inverted conditional. > > Thanks for your explanation. On shortcoming of this method is that it > cannot find/take cond_expr(and the implicitly defined variable) as the > leader in pre. I guess this is why you said it can handle a subset of all > cases in previous message? Yes. It won't handle if (x > 1) ... tem = x > 1; or if (x > 1) ... if (x > 1) though maybe we could teach PRE to do the insertion by properly putting x > 1 into EXP_GEN in compute_avail (but not into AVAIL_OUT). Not sure about this though. Currently we don't do anything to GIMPLE_COND operands (which seems odd anyway, we should at least add the operands to EXP_GEN). > on the other hand, I like this method, given the simplicity especially. :) Yeah. > -- > Best Regards.
Re: fixed_scalar_and_varying_struct_p and varies_p
On Mon, Jan 2, 2012 at 3:42 PM, Richard Sandiford wrote: > Thanks for both replies. > > Richard Guenther writes: >> On Thu, Dec 29, 2011 at 8:48 PM, Eric Botcazou wrote: fixed_scalar_and_varying_struct_p passes an _address_ rather than a MEM. So in these cases fixed_scalar_and_varying_struct_p effectively becomes a no-op on targets that don't allow MEMs in addresses and takes on suspicious semantics for those that do. In the former case, every address is treated as "unvarying" and f_s_a_v_s_p always returns null. In the latter case, things like REG addresses are (wrongly) treated as unvarying while a MEM address might correctly be treated as varying, leading to false positives. It looks like this goes back to when fixed_scalar_and_varying_struct_p was added in r24759 (1999). >>> >>> Does this mean that MEM_IN_STRUCT_P and MEM_SCALAR_P have also been >>> effectively disabled since then? > > Some important callers (cse.c and sched-deps.c) do use the proper > varies_p routine, so it probably isn't quite that extreme. But... > AIUI, the true_dependence varies_p parameter exists for the benefit of CSE, so that it can use its local cse_rtx_varies_p function. All other callers should be using rtx_varies_p instead. Question is, should I make that change, or is it time to get rid of fixed_scalar_and_varying_struct_p instead? >>> >>> I'd vote for the latter (and for eliminating MEM_IN_STRUCT_P and >>> MEM_SCALAR_P >>> in the process, if the answer to the above question is positive), there is >>> no >>> point in resurrecting this now IMO. >> >> I agree. The tree level routines should be able to figure most of, if >> not all, cases >> on their own via rtx_refs_may_alias_p (similar to the >> nonoverlapping_component_refs >> case which we could simply delete as well). > > ...that's 2 votes for and none so far against. :-) > > I compiled the cc1 .ii files on x86_64-linux-gnu with and without > fixed_scalar_and_varying_struct_p. There were 19 changes in total, > all of them cases where sched2 was able to reorder two memory accesses > because of f_s_a_v_s_p. I've attached the diff below. > > A good example is: > > if (bit_obstack) > { > element = bit_obstack->elements; > > if (element) > /* Use up the inner list first before looking at the next > element of the outer list. */ > if (element->next) > { > bit_obstack->elements = element->next; > bit_obstack->elements->prev = element->prev; > } > else > /* Inner list was just a singleton. */ > bit_obstack->elements = element->prev; > else > element = XOBNEW (&bit_obstack->obstack, bitmap_element); > } > else > { > element = bitmap_ggc_free; > if (element) > /* Use up the inner list first before looking at the next > element of the outer list. */ > if (element->next) > { > bitmap_ggc_free = element->next; > bitmap_ggc_free->prev = element->prev; > } > else > /* Inner list was just a singleton. */ > bitmap_ggc_free = element->prev; > else > element = ggc_alloc_bitmap_element_def (); > } > > from bitmap.c, specifically: > > bitmap_ggc_free = element->next; > bitmap_ggc_free->prev = element->prev; > > Without f_s_a_v_s_p, sched2 couldn't tell that element->prev didn't > alias bitmap_ggc_free. And this in turn is because cfgcleanup > considered trying to merge this block with: > > bit_obstack->elements = element->next; > bit_obstack->elements->prev = element->prev; > > It called merge_memattrs for each pair of instructions that it was > thinking of merging, and because the element->next and element->prev > MEMs were based on different SSA names, we lost the MEM_EXPR completely. > > As it happens, we decided not to merge the blocks after all. > So an obvious first observation is that query functions like > flow_find_cross_jump and flow_find_head_matching_sequence shouldn't > change the rtl. We should only do that once we've decided which > instructions we're actually going to merge. > > Of course, that's not a trivial change. It's easy to make > try_head_merge_bb call merge_memattrs during merging, but less > easy for try_crossjump_to_edge and cond_exec_process_if_block. > (Note that the latter, like try_head_merge_bb, can end up merging > fewer instructions than flow_find_* saw). > > But does the choice of SSA name actually count for anything this > late on? Should we consider MEM_EXPRs node_X->prev and node_Y->prev > to be "similar enough", if node_X and node_Y have equal types? It's a conservative way of merging alias info. A more precise variant would union the points-to information of both bases (and stick it to a new SSA name), conveniently pt_solution_ior_into exists. And no, we can't just pick eithe
Re: LTO multiple definition failures
On 01/02/2012 12:22 AM, Andi Kleen wrote: Sandra Loosemore writes: I'm still finding my way around LTO; can anyone who's more familiar with this help narrow down where to look for the cause of this? I don't even know if this is a compiler or ld bug at this point. I'm I would look into the interaction between the LTO plugin and your ld (and also try gold if you can) Generally there are still various issues in these areas which need workarounds in the LTOed programs, for some things (like ld -r and some ar) you also need the latest version of HJ Lu's binutils which implement http://gcc.gnu.org/ml/gcc/2010-12/msg00229.html A lot of older binutils lds also tended to mishandle mixed LTOed ar archives. For avoidance of doubt, I was using mainline HEAD for both gcc and binutils for my powerpc-none-eabi experiments last week. The port for our new target is based on GCC 4.6 and a binutils branch from a few months ago, but it looked to me like the ld/gcc interaction was basically the same -- the link error being triggered by a difference in the startup code on the two targets, instead. Anyway, the problem here isn't that I particularly care about coming up with some workaround to make LTO work, but rather that tests from the gcc testsuite are failing on this target because of what looks like buggy LTO behavior instead of bugs in the target support, and I wanted to be sure this was being tracked somewhere. I didn't see a relevant issue in either the gcc or binutils bugzillas, but if it's a known consequence of the ld -r problem, I'll shut up and go away again. ;-) -Sandra
Re: fixed_scalar_and_varying_struct_p and varies_p
> I'd say open a missed optimization bug with the testcase and go ahead > with both patches. Let's see if Eric has some comments first though. None, but the m32c maintainer may have some. DJ, do you happen to know the rationale for the use of the MEM_SCALAR_P and MEM_IN_STRUCT_P flags in m32c_immd_dbl_mov? What condition do these tests try to model exactly? -- Eric Botcazou
Re: LTO multiple definition failures
> Anyway, the problem here isn't that I particularly care about coming up > with some workaround to make LTO work, but rather that tests from the > gcc testsuite are failing on this target because of what looks like > buggy LTO behavior instead of bugs in the target support, and I wanted > to be sure this was being tracked somewhere. I didn't see a relevant > issue in either the gcc or binutils bugzillas, but if it's a known > consequence of the ld -r problem, I'll shut up and go away again. ;-) AFAIK none of the test suite tests the ld -r problem, at least not on x86-linux. So it may be something else and still worth tracking down. -Andi -- a...@linux.intel.com -- Speaking for myself only.