RE: A problem about loop store motion
> The MEM form is more "canonical", so the loop SM machinery to detect > equality should be adjusted accordingly. Alternatively you can teach > PRE insertion to strip off the MEM if possible (though > fold_stmt_inplace should > arelady do this if possible). Richard, Thank you! You are right. I noticed on latest trunk the problem in PRE was already fixed by invoking fold_stmt_inplace. Unfortunately for this small case, the latest trunk code still can't do SM for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that is, pos has alias with l[pos]. I think alias analysis should be able to know they don't have alias with each other, unless there is an assignment statement like "l=&pos;". Can alias analysis fix the problem? Thanks, -Jiangning > > Richard. >
Re: Inefficient end-of-loop value computation - missed optimization somewhere?
On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand wrote: > Hello, > > we've noticed that the loop optimizer sometimes inserts weirdly > inefficient code to compute the value of an induction variable > at the end of the loop. > > As a test case stripped down to the core of the problem, consider: > > int test (int start, int end) > { > int i; > > for (i = start; i < end; i++) > ; > > return i; > } > > That function really ought to be equivalent to > > int test2 (int start, int end) > { > return start < end ? end : start; > } > > And indeed on i386-linux-gnu, we get mostly identical code > (except for the branch direction prediction). > > However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a), > we see for the first function: > > test: > cmp r0, r1 > bge .L2 > adds r3, r0, #1 > mvns r0, r0 > adds r1, r1, r0 > adds r0, r3, r1 > .L2: > bx lr > > instead of simply (as for the second function): > > test2: > cmp r1, r0 > it ge > movge r0, r1 > bx lr > > > > Where does that weird addition-and-negation sequence come from? > In 100t.dceloop we still have: > > : > # i_9 = PHI > i_5 = i_9 + 1; > if (end_4(D) > i_5) > goto ; > else > goto ; > > : > goto ; > > : > # i_1 = PHI > > > But 102t.sccp has: > > : > # i_9 = PHI > i_5 = i_9 + 1; > if (end_4(D) > i_5) > goto ; > else > goto ; > > : > goto ; > > : > D.4123_10 = start_2(D) + 1; > D.4124_11 = ~start_2(D); > D.4125_12 = (unsigned int) D.4124_11; > D.4126_13 = (unsigned int) end_4(D); > D.4127_14 = D.4125_12 + D.4126_13; > D.4128_15 = (int) D.4127_14; > i_1 = D.4123_10 + D.4128_15; > > This is the sequence that ends up in the final assembler code. > > Note that this computes: > > i_1 = (start_2(D) + 1) > + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D)) > > which is (since those casts are no-ops on the assembler level): > > i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D) > > which is due to two's-complement arithmetic: > > i_1 = start_2(D) - start_2(D) + end_4(D) > > that is simply: > > i_1 = end_4(D) > > > Unfortunately, no tree-based optimization pass after 102t.sccp is able to > clean up that mess. This is true even on *i386*: the only reason we get > nice assembler there is due to *RTX* optimization, notably in combine. > This hits on i386 because of an intermediate stage that adds two registers > and the constant 1 (which matches the lea pattern). On arm, there is no > instruction that would handle that intermediate stage, and combine is not > powerful enough to perform the whole simplification in a single step. > > > Now that sequence of gimple operations is generated in scev_const_prop > in tree-scalar-evolution.c. First, the phi node corresponding to the > loop exit value is evaluated: > > def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); > > which returns a chrec of the form > > { 1, +, (start + 1) } > > This is then further simplified by > > def = compute_overall_effect_of_inner_loop (ex_loop, def); > > which first computes the number of loop iterations: > > tree nb_iter = number_of_latch_executions (loop); > > which returns a tree representing > > (unsigned int) ~start + (unsigned int) end > > (which at this point seems to be the validly most efficient way to > compute end - start - 1). > > The chrec is then evaluated via chrec_apply which basically computes > > (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end) > > which ends up being converted to the long gimple sequence seen above. > > > Now I guess my questions are: > > - In the computation shown above, should that tree have been folded > into just "end" to start with? The tree is constructed at its The issue is that (start + 1) + 1 * (int) ... is computed using signed integer arithmetic which may invoke undefined behavior when it wraps. Thus we cannot re-associate it. I suppose if you'd arrange the code to do the arithmetic in unsigned and only cast the result back to the desired type we might fold it ... > outermost level in chrec_fold_plus, which simply calls fold_build2. > While simplify-rtx.c has code to detect sequences of operations > that cancel out while folding a PLUS or MINUS RTX, there does > not appear to be corresponding code in fold-const.c to handle > the equivalent optimization at the tree level. There are. fold-const.c has some re-association logic to catch this and we have tree-ssa-reassoc.c. All of those only handle same-sign operands though I think. > - If not, should there be another tree pass later on that ought to > clean this up? I notice there is code to handle plus/minus > sequences in tree-ssa-reassoc.c. But this doesn't seem to be > able to handle this particular case, possibly because the presence > of the casts (nop_exprs) ... > > - Anywhere else I'm overlooking right now? > > > I'd appreciate any tips / suggestions how to fix
Re: A problem about loop store motion
On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu wrote: >> The MEM form is more "canonical", so the loop SM machinery to detect >> equality should be adjusted accordingly. Alternatively you can teach >> PRE insertion to strip off the MEM if possible (though >> fold_stmt_inplace should >> arelady do this if possible). > > Richard, > > Thank you! You are right. I noticed on latest trunk the problem in PRE was > already fixed by invoking fold_stmt_inplace. > > Unfortunately for this small case, the latest trunk code still can't do SM > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that > is, pos has alias with l[pos]. > > I think alias analysis should be able to know they don't have alias with > each other, unless there is an assignment statement like "l=&pos;". > > Can alias analysis fix the problem? The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think its address got taken. 'l' points to any global possibly address-taken variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which re-gimplifies the tree it creates which marks the variable as addressable. I'll have a look. Richard. > Thanks, > -Jiangning > >> >> Richard. >> > > >
RE: A problem about loop store motion
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Tuesday, February 21, 2012 5:40 PM > To: Jiangning Liu > Cc: gcc@gcc.gnu.org > Subject: Re: A problem about loop store motion > > On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu > wrote: > >> The MEM form is more "canonical", so the loop SM machinery to detect > >> equality should be adjusted accordingly. Alternatively you can > teach > >> PRE insertion to strip off the MEM if possible (though > >> fold_stmt_inplace should > >> arelady do this if possible). > > > > Richard, > > > > Thank you! You are right. I noticed on latest trunk the problem in > PRE was > > already fixed by invoking fold_stmt_inplace. > > > > Unfortunately for this small case, the latest trunk code still can't > do SM > > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, > that > > is, pos has alias with l[pos]. > > > > I think alias analysis should be able to know they don't have alias > with > > each other, unless there is an assignment statement like "l=&pos;". > > > > Can alias analysis fix the problem? > > The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think > its address got taken. 'l' points to any global possibly address-taken > variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which > re-gimplifies the tree it creates which marks the variable as > addressable. > > I'll have a look. Terrific! :-) Could you please let me know when you have a patch to fix this, because I want to double check the big case, and there might be other hidden problems? Thanks, -Jiangning > > Richard. > > > > > Thanks, > > -Jiangning > > > >> > >> Richard. > >> > > > > > >
Re: A problem about loop store motion
On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu wrote: > > >> -Original Message- >> From: Richard Guenther [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, February 21, 2012 5:40 PM >> To: Jiangning Liu >> Cc: gcc@gcc.gnu.org >> Subject: Re: A problem about loop store motion >> >> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu >> wrote: >> >> The MEM form is more "canonical", so the loop SM machinery to detect >> >> equality should be adjusted accordingly. Alternatively you can >> teach >> >> PRE insertion to strip off the MEM if possible (though >> >> fold_stmt_inplace should >> >> arelady do this if possible). >> > >> > Richard, >> > >> > Thank you! You are right. I noticed on latest trunk the problem in >> PRE was >> > already fixed by invoking fold_stmt_inplace. >> > >> > Unfortunately for this small case, the latest trunk code still can't >> do SM >> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, >> that >> > is, pos has alias with l[pos]. >> > >> > I think alias analysis should be able to know they don't have alias >> with >> > each other, unless there is an assignment statement like "l=&pos;". >> > >> > Can alias analysis fix the problem? >> >> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think >> its address got taken. 'l' points to any global possibly address-taken >> variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which >> re-gimplifies the tree it creates which marks the variable as >> addressable. >> >> I'll have a look. > > Terrific! :-) Could you please let me know when you have a patch to fix > this, because I want to double check the big case, and there might be other > hidden problems? PR52324, I am testing the following patch (in reality the gimplifier should not mark things addressable unless it itself makes them so, but frontends are broken, so we do that ... ugh). Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 184428) +++ gcc/gimplify.c (working copy) @@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq ret = GS_OK; break; } - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, - is_gimple_mem_ref_addr, fb_rvalue); - if (ret == GS_ERROR) - break; + /* Avoid re-gimplifying the address operand if it is already +in suitable form. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) + { + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, + is_gimple_mem_ref_addr, fb_rvalue); + if (ret == GS_ERROR) + break; + } recalculate_side_effects (*expr_p); ret = GS_ALL_DONE; break; - /* Constants need not be gimplified. */ + /* Constants need not be gimplified. */ case INTEGER_CST: case REAL_CST: case FIXED_CST:
How to suppress frame pointer usage by default?
Hi, What one must use in custom backend to suppress frame pointer usage by default? Frame pointer is mentioned in ELIMINABLE_REGS: #define ELIMINABLE_REGS \ { \ {ARG_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ {ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM}, \ {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ } TARGET_CAN_ELIMINATE is defined to be always true, TARGET_FRAME_POINTER_REQUIRED is defined to be always false Instead of this, gcc generates code with frame pointer usage, and I need manually specify -fomit-frame-pointer to suppress this behavior. What else may influence this? I feel, I missed something in the GCC internals. Thanks in advance for any suggestions. --- With best regards, Konstantin
Re: How to suppress frame pointer usage by default?
Konstantin Vladimirov writes: > What one must use in custom backend to suppress frame pointer usage by > default? > > Frame pointer is mentioned in ELIMINABLE_REGS: > > #define ELIMINABLE_REGS \ > { \ > {ARG_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ > {ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM}, \ > {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ > } > > TARGET_CAN_ELIMINATE is defined to be always true, > > TARGET_FRAME_POINTER_REQUIRED is defined to be always false > > Instead of this, gcc generates code with frame pointer usage, and I > need manually specify -fomit-frame-pointer to suppress this behavior. > What else may influence this? You may need to set flag_omit_frame_pointer in your TARGET_OPTION_OVERRIDE or TARGET_OPTION_OPTIMIZATION_TABLE or TARGET_OPTION_OVERRIDE (note that these have different names in earlier versions of gcc). Ian
GCC and x86intrin.h header
Hi, I read in the manual of GCC the following line: // new intrinsic header file, it should be included before using any IA-32/x86-64 intrinsics. What does it mean? I have to explicitly include this library in my code if I want to use the intrinsic functions like ssex or mcrc32 etc. Thank you
Re: GCC and x86intrin.h header
erotavlas_tu...@libero.it writes: > I read in the manual of GCC the following line: > // new intrinsic header file, it should be included before > using any IA-32/x86-64 intrinsics. > What does it mean? I have to explicitly include this library in my code if I > want to use the intrinsic functions like ssex or mcrc32 etc. This question would be more appropriate for the mailing list gcc-h...@gcc.gnu.org than for gcc@. The gcc@ mailing list is for discussions related to the development of gcc itself. Please take any followups to gcc-help. Thanks. To answer your question: the header file provides definitions for all intrinsic functions which are relevant to the target processor. The intrinsic functions are the ones with names like _mm_crc32_u8. You are not required to #include . But it is a more reliable mechanism than the older one of remembering that _mm_crc32_u8 is defined in . Ian
Re: GCC: OpenMP posix pthread
Nobody can answer to my question? Thank you Salvatore Frandina
Some confuse about the pass of the arguments by gcc
I do a experiments to check how gcc pass the arguments. here is the code #include int main(int argc , char *argv[]){ int a=3; int b=3; int c=3; printf("%d %d\n",++a+c,a+c); printf("%d %d\n",++b,b); return 0; } the anwer is 8 7 4 4 the piece of assembly language: gcc 4.6.2 movl$3, 28(%esp) movl$3, 24(%esp) movl$3, 20(%esp) movl20(%esp), %eax movl28(%esp), %edx leal(%edx,%eax), %ecx addl$1, 28(%esp) movl20(%esp), %eax movl28(%esp), %edx addl%eax, %edx movl$.LC0, %eax movl%ecx, 8(%esp) movl%edx, 4(%esp) movl%eax, (%esp) callprintf addl$1, 24(%esp) movl$.LC0, %eax movl24(%esp), %edx movl%edx, 8(%esp) movl24(%esp), %edx movl%edx, 4(%esp) movl%eax, (%esp) callprintf In the first case , gcc first compute the a+c to %ecx ,and pass it stack , the compute ++a+c to %edx ,so the answer is 8 7 In the second case , why it didn't do the same thing like compute b=3 and pass it to stack ,then compute ++b and pass it to stack .to the result 4 3. However it first addl$1, 24(%esp) ==> b+1 I think it compute the expression b+1. the pass it to stack . the b which now is 4 and was passed to stack. I was wondering why gcc handle the same mode in two ways. Is there some errors I had made ? In my opinion ,for the reason of the concept of consistent ,it may be 8 8 4 4 thank you advance
Re: Some confuse about the pass of the arguments by gcc
On 02/21/2012 03:18 PM, 嘉谟 wrote: > I do a experiments to check how gcc pass the arguments. > here is the code > > #include > int main(int argc , char *argv[]){ > int a=3; > int b=3; > int c=3; > printf("%d %d\n",++a+c,a+c); > printf("%d %d\n",++b,b); > return 0; > } > > the anwer is > > 8 7 > 4 4 > > the piece of assembly language: gcc 4.6.2 > > movl$3, 28(%esp) > movl$3, 24(%esp) > movl$3, 20(%esp) > movl20(%esp), %eax > movl28(%esp), %edx > leal(%edx,%eax), %ecx > addl$1, 28(%esp) > movl20(%esp), %eax > movl28(%esp), %edx > addl%eax, %edx > movl$.LC0, %eax > movl%ecx, 8(%esp) > movl%edx, 4(%esp) > movl%eax, (%esp) > callprintf > addl$1, 24(%esp) > movl$.LC0, %eax > movl24(%esp), %edx > movl%edx, 8(%esp) > movl24(%esp), %edx > movl%edx, 4(%esp) > movl%eax, (%esp) > callprintf > > In the first case , gcc first compute the a+c to %ecx ,and pass it > stack , the compute ++a+c to %edx ,so the answer is 8 7 > > In the second case , why it didn't do the same thing like > compute b=3 and pass it to stack ,then compute ++b and pass it to > stack .to the result 4 3. However it first > addl$1, 24(%esp) ==> b+1 I think it compute the expression > b+1. the pass it to stack . the b which now is 4 and was passed to > stack. > > I was wondering why gcc handle the same mode in two ways. > > Is there some errors I had made ? http://c-faq.com/expr/evalorder2.html Andrew.
Re: GCC: OpenMP posix pthread
On Tue, Feb 21, 2012 at 04:50:58PM +0100, erotavlas_tu...@libero.it wrote: > Nobody can answer to my question? You can configure gcc not to use futexes in libgomp (--disable-linux-futex), but the default is to use them for performance reasons. Jakub
Re: Inefficient end-of-loop value computation - missed optimization somewhere?
On Tue, Feb 21, 2012 at 1:27 AM, Richard Guenther wrote: > On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand wrote: >> Hello, >> >> we've noticed that the loop optimizer sometimes inserts weirdly >> inefficient code to compute the value of an induction variable >> at the end of the loop. >> >> As a test case stripped down to the core of the problem, consider: >> >> int test (int start, int end) >> { >> int i; >> >> for (i = start; i < end; i++) >> ; >> >> return i; >> } >> >> That function really ought to be equivalent to >> >> int test2 (int start, int end) >> { >> return start < end ? end : start; >> } >> >> And indeed on i386-linux-gnu, we get mostly identical code >> (except for the branch direction prediction). >> >> However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a), >> we see for the first function: >> >> test: >> cmp r0, r1 >> bge .L2 >> adds r3, r0, #1 >> mvns r0, r0 >> adds r1, r1, r0 >> adds r0, r3, r1 >> .L2: >> bx lr >> >> instead of simply (as for the second function): >> >> test2: >> cmp r1, r0 >> it ge >> movge r0, r1 >> bx lr >> >> >> >> Where does that weird addition-and-negation sequence come from? >> In 100t.dceloop we still have: >> >> : >> # i_9 = PHI >> i_5 = i_9 + 1; >> if (end_4(D) > i_5) >> goto ; >> else >> goto ; >> >> : >> goto ; >> >> : >> # i_1 = PHI >> >> >> But 102t.sccp has: >> >> : >> # i_9 = PHI >> i_5 = i_9 + 1; >> if (end_4(D) > i_5) >> goto ; >> else >> goto ; >> >> : >> goto ; >> >> : >> D.4123_10 = start_2(D) + 1; >> D.4124_11 = ~start_2(D); >> D.4125_12 = (unsigned int) D.4124_11; >> D.4126_13 = (unsigned int) end_4(D); >> D.4127_14 = D.4125_12 + D.4126_13; >> D.4128_15 = (int) D.4127_14; >> i_1 = D.4123_10 + D.4128_15; >> >> This is the sequence that ends up in the final assembler code. >> >> Note that this computes: >> >> i_1 = (start_2(D) + 1) >> + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D)) >> >> which is (since those casts are no-ops on the assembler level): >> >> i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D) >> >> which is due to two's-complement arithmetic: >> >> i_1 = start_2(D) - start_2(D) + end_4(D) >> >> that is simply: >> >> i_1 = end_4(D) >> >> >> Unfortunately, no tree-based optimization pass after 102t.sccp is able to >> clean up that mess. This is true even on *i386*: the only reason we get >> nice assembler there is due to *RTX* optimization, notably in combine. >> This hits on i386 because of an intermediate stage that adds two registers >> and the constant 1 (which matches the lea pattern). On arm, there is no >> instruction that would handle that intermediate stage, and combine is not >> powerful enough to perform the whole simplification in a single step. >> >> >> Now that sequence of gimple operations is generated in scev_const_prop >> in tree-scalar-evolution.c. First, the phi node corresponding to the >> loop exit value is evaluated: >> >> def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); >> >> which returns a chrec of the form >> >> { 1, +, (start + 1) } >> >> This is then further simplified by >> >> def = compute_overall_effect_of_inner_loop (ex_loop, def); >> >> which first computes the number of loop iterations: >> >> tree nb_iter = number_of_latch_executions (loop); >> >> which returns a tree representing >> >> (unsigned int) ~start + (unsigned int) end >> >> (which at this point seems to be the validly most efficient way to >> compute end - start - 1). >> >> The chrec is then evaluated via chrec_apply which basically computes >> >> (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end) >> >> which ends up being converted to the long gimple sequence seen above. >> >> >> Now I guess my questions are: >> >> - In the computation shown above, should that tree have been folded >> into just "end" to start with? The tree is constructed at its > > The issue is that (start + 1) + 1 * (int) ... is computed using signed > integer arithmetic which may invoke undefined behavior when it wraps. > Thus we cannot re-associate it. I suppose if you'd arrange the code > to do the arithmetic in unsigned and only cast the result back to the > desired type we might fold it ... > >> outermost level in chrec_fold_plus, which simply calls fold_build2. >> While simplify-rtx.c has code to detect sequences of operations >> that cancel out while folding a PLUS or MINUS RTX, there does >> not appear to be corresponding code in fold-const.c to handle >> the equivalent optimization at the tree level. > > There are. fold-const.c has some re-association logic to catch this > and we have tree-ssa-reassoc.c. All of those only handle same-sign > operands though I think. > >> - If not, should there be another tree pass later on that ought to >> clean this up? I notice there is code to handle plus/minus >> sequences in tree-ssa
gcc-4.4-20120221 is now available
Snapshot gcc-4.4-20120221 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20120221/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 184449 You'll find: gcc-4.4-20120221.tar.bz2 Complete GCC MD5=df3598802125f5a24e70e0039d7b077b SHA1=efcdf9a007fca4524663704c417f91e02a8f4a58 Diffs from 4.4-20120214 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RE: A problem about loop store motion
> -Original Message- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Richard Guenther > Sent: Tuesday, February 21, 2012 6:19 PM > To: Jiangning Liu > Cc: gcc@gcc.gnu.org > Subject: Re: A problem about loop store motion > > On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, February 21, 2012 5:40 PM > >> To: Jiangning Liu > >> Cc: gcc@gcc.gnu.org > >> Subject: Re: A problem about loop store motion > >> > >> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu > > >> wrote: > >> >> The MEM form is more "canonical", so the loop SM machinery to > detect > >> >> equality should be adjusted accordingly. Alternatively you can > >> teach > >> >> PRE insertion to strip off the MEM if possible (though > >> >> fold_stmt_inplace should > >> >> arelady do this if possible). > >> > > >> > Richard, > >> > > >> > Thank you! You are right. I noticed on latest trunk the problem in > >> PRE was > >> > already fixed by invoking fold_stmt_inplace. > >> > > >> > Unfortunately for this small case, the latest trunk code still > can't > >> do SM > >> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is > true, > >> that > >> > is, pos has alias with l[pos]. > >> > > >> > I think alias analysis should be able to know they don't have > alias > >> with > >> > each other, unless there is an assignment statement like "l=&pos;". > >> > > >> > Can alias analysis fix the problem? > >> > >> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think > >> its address got taken. 'l' points to any global possibly address- > taken > >> variable. It get's the TREE_ADDRESSABLE flag via PRE insertion > which > >> re-gimplifies the tree it creates which marks the variable as > >> addressable. > >> > >> I'll have a look. > > > > Terrific! :-) Could you please let me know when you have a patch to > fix > > this, because I want to double check the big case, and there might be > other > > hidden problems? > > PR52324, I am testing the following patch (in reality the gimplifier > should not > mark things addressable unless it itself makes them so, but frontends > are > broken, so we do that ... ugh). > Richard, Now trunk works for the big case as well. Thanks a lot for your quick fix. BTW, why can't we fix front-end directly? Thanks, -Jiangning > Index: gcc/gimplify.c > === > --- gcc/gimplify.c (revision 184428) > +++ gcc/gimplify.c (working copy) > @@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq > ret = GS_OK; > break; > } > - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, > post_p, > - is_gimple_mem_ref_addr, fb_rvalue); > - if (ret == GS_ERROR) > - break; > + /* Avoid re-gimplifying the address operand if it is already > +in suitable form. */ > + if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) > + { > + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, > post_p, > + is_gimple_mem_ref_addr, fb_rvalue); > + if (ret == GS_ERROR) > + break; > + } > recalculate_side_effects (*expr_p); > ret = GS_ALL_DONE; > break; > > - /* Constants need not be gimplified. */ > + /* Constants need not be gimplified. */ > case INTEGER_CST: > case REAL_CST: > case FIXED_CST: