About loop unrolling and optimize for size
Hi I'm using an ARM thumb cross compiler for embedded systems and always do optimize for small size with -Os. Though I've experimented with optimization flags, and loop unrolling. Normally loop unrolling is always bad for size, code is duplicated and size increases. Though I discovered that in some special cases where the number of iteration is very small, eg a loop of 2-3 times, in this case an unrolling could make code size smaller - eg. losen up registers used for index in loops etc. Example when I use the flag "-fpeel-loops" together with -Os I will 99% of the cases get smaller code size for ARM thumb target. Some my question is how unrolling works with -Os, is it always totally disabled, or are there some cases when it could be tested, eg. with small number iterations, so loop can be eliminated? Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only enabled for -O2 and above I think. Thanks and Best Regards Fredrik
RE: About loop unrolling and optimize for size
I think I found explanation, the -fpeel-loops trigger some extra flags: from "toplev.c": /* web and rename-registers help when run after loop unrolling. */ if (flag_web == AUTODETECT_VALUE) flag_web = flag_unroll_loops || flag_peel_loops; if (flag_rename_registers == AUTODETECT_VALUE) flag_rename_registers = flag_unroll_loops || flag_peel_loops; actually its -frename-registers that causes the code size to decrease. This flags seems to be set when enable -fpeel-loops. Maybe this flag could be enabled in -Os, shouldn't have any downside besides makes possibly debugging harder? Thanks/Fredrik From: Richard Biener [richard.guent...@gmail.com] Sent: Friday, August 14, 2015 09:28 To: sa...@hederstierna.com Cc: gcc@gcc.gnu.org Subject: Re: About loop unrolling and optimize for size On Thu, Aug 13, 2015 at 6:26 PM, sa...@hederstierna.com wrote: > Hi > I'm using an ARM thumb cross compiler for embedded systems and always do > optimize for small size with -Os. > > Though I've experimented with optimization flags, and loop unrolling. > > Normally loop unrolling is always bad for size, code is duplicated and size > increases. > > Though I discovered that in some special cases where the number of iteration > is very small, eg a loop of 2-3 times, > in this case an unrolling could make code size smaller - eg. losen up > registers used for index in loops etc. > > Example when I use the flag "-fpeel-loops" together with -Os I will 99% of > the cases get smaller code size for ARM thumb target. > > Some my question is how unrolling works with -Os, is it always totally > disabled, > or are there some cases when it could be tested, eg. with small number > iterations, so loop can be eliminated? > > Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only > enabled for -O2 and above I think. Complete peeling is already enabled with -Os, it is just restricted to those cases where GCCs cost modeling of the unrolling operation determines the code size shrinks. If you enable -fpeel-loops then the cost model allows the code size to grow - sth not (always) intended with -Os. The solution is of course to improve the cost modeling and GCCs idea of followup optimization opportunities. I do have some incomplete patches to improve that and hope to get back to it for GCC 6. If you have (small) testcases that show code size improvements with -Os -fpeel-loops over -Os and you are confident they are caused by unrolling please open a bugzilla containing them. Thanks, Richard. > Thanks and Best Regards > Fredrik
Question about "instruction merge" pass when optimizing for size
When compiling ARM/thumb with -Os for size, I've seen some cases where GCC generates unnecessary move instructions. It seems sometimes that there are some possibility to improve the use from 2-operand into 3-operand instructions. Some patterns I see is: Generated code Case 1: mov Ry, Rx ... add Ry,Ry,Rz mov Rx,Ry --> can be transformed to add Rx, Rz mov Ry, Rx Generated code Case 2: mov Ry, Rx add Ry, Ry, Rx -> can be transformed to add Ry,Ry,Ry - Generated code Case 3: mov Ry,Rx add Rz,Ry,Rx ... mov Rx,Ry -> can be transformed to add Rz,Rx,Rx mov Ry,Rx -- I'm sure there are alot of more similar patterns, I guess 'add' could be 'sub' or other instructions. It seems like the optimizers sometimes prefer the additional move, maybe for performance its equal due to other instruction stall etc, but when optimizing for size, its quite straight forward that you can gain bytes on these transformations, if possible, and should be preferred. The thing I was thinking of if it was possible to add a more generic GCC pass that could check for such "transformations", like an 'merge_multi_operator_insn' pass, that could do these transformations for any target, not only 2-op to 3-op transforms. Or maybe this is a peephole2 type of pass. The pass could be run maybe just if optimizing for size, where the cost is obvious (bytes generated). The pass could maybe be executed after reload when all hard registers are set, but before scheduling passes, like sched2. Proposed inbetween "pass_cprop_hardreg" and "pass_fast_rtl_dce". I'm new to these topics, so maybe I'm all wrong, but please comment my ideas if you have the time =) Thanks and Kind Regards, Fredrik
Re: Question about "instruction merge" pass when optimizing for size
> > From: Jeff Law > More important is to determine *why* we're getting these patterns. In > the IRA/LRA world, they should be a lot less common. Yes I agree this phenomena seems more common after introducing LRA. Though I was thinking that such a pass still maybe can be relevant. Thinking hypothetically of an architecture, lets call it cortex-X, assume this specific target type have an op-code for ADD with 5-operands. Optimal code for a = a + b + c + d would be addx Ra,Ra,Rb,Rc,Rd where in the optimization process do we introduce the merging into this target type specific instruction. Can the more generic IRA/LRA handle this? And maybe patterns can appear across different BB, or somewhere that the normal optimizers have hard to find, or figure out? Sorry if I'm ignorant, I don't know the internals of the different optimizers, but I'm trying to learn and understand how to come forward on this issue we have with code size currently. (I tried to but some bugs on it also Bug 61578 and Bug 67213.) Thanks and Kind Regards, Fredrik
Possible typo in LRA
Hi When reviewing some code from LRA, I just saw some lines that looked a bit strange, could it be a possible typo perhaps? The file "lra.c" from GC5 master branch current date Line 469: /* Try x = index_scale; x = x + disp; x = x + base. */ last = get_last_insn (); rtx_insn *move_insn = emit_move_insn (x, index_scale); ok_p = false; if (recog_memoized (move_insn) >= 0) { rtx insn = emit_add2_insn (x, disp); if (insn != NULL_RTX) { insn = emit_add2_insn (x, disp); if (insn != NULL_RTX) ok_p = true; } } Shouldn't the code be as the comment suggest to in the second call to emit_add2_insn use 'base' ? - insn = emit_add2_insn (x, disp); + insn = emit_add2_insn (x, base); Maybe the code is right, I tried to mail vmakarov, some months ago, but did not get any reply. But it looks like I typo, so I just wanted to verify it so its not a bug, Thanks and Kind Regards, Fredrik
Could preprocessor warn for unsafe macros and side-effects?
Hi Reading about macro pitfalls and eg duplication side-effects https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls would it be possible to let the preprocessor generate warnings for any of these pitfalls? Maybe all language specific parts are not know at this early preprocessing stage, but possibly some info could be stored for use in later pass? I'm thinking of eg. for "function-like macros" with arguments, checking -Wmacro-side-effects * IF function-like macro expands/duplicates an argument more than once THEN WARN if function() is part as the argument WARN if unary ++ or -- is used on variable as part of argument WARN if assignment operator = is part of argument WARN if volatile variable part as the argument -Wmacro-operator-precedence * WARN if macro argument contains an expression with operator(s), an a _higher_ precedence operator is used within the macro on this argument, without parenthesis around I'm not sure its even possible at preprocessing stage, but it would be nice to have, I saw some static code analysis tools like Coverity detects these https://www.securecoding.cert.org/confluence/display/c/PRE31-C.+Avoid+side+effects+in+arguments+to+unsafe+macros Of course it might generate some false-positives so warning might not be enabled by default, maybe just -Wall or -Wextra, but perhaps it hard to solve, and I'm not sure where and how to implement the checking algorithm. Thanks for any feedback! Kindly, Fredrik
Question about static code analysis features in GCC
Hi I would like to have some advice regarding static code analysis and GCC. I've just reviewed several tools like Klocwork, Coverity, CodeSonar and PolySpace. These tools offer alot of features and all tools seems to find different types of defects. The tool that found most bugs on our code was Coverity, but it is also the most expensive tool. But basically I would most like just to find very "simple" basic errors like NULL-dereferences and buffer overruns. I attach a small example file with some very obvious errors like NULL-dereferences and buffer overruns. This buggy file compiles fine though without any warnings at all with GCC as expected gcc -o example example.c -W -Wall -Wextra I tried to add checking with mudflap: gcc -fmudflap -o example example.c -W -Wall -Wextra -lmudflap Then I found all defects in run-time, but I had to run the program so I could not find all potential errors in compile-time. Also Valgrind could be used to check run-time bugs, but I'm not 100% sure I can cover all execution paths in my tests (I also tried gcov). I tried to analyze my example file with CLANG, then I found "uninitialized" issues and NULL-pointers, but not buffer overruns: clang --analyze example.c example.c:7:3: warning: Dereference of null pointer loaded from variable 'a' example.c:41:3: warning: Undefined or garbage value returned to caller About NULL-checks and buffer-overruns, is there any possible path to get such checkers into a standard GCC, maybe in just some very limited level? I've checked the "MyGCC" (http://mygcc.free.fr) patch on Graphite, but it has been rejected, could it be rewritten somehow as a standard opt_pass to just find NULL-derefs? I've also checked TreeHydra in Mozilla project (https://developer.mozilla.org/en/Treehydra) that gives JavaScript interface to GIMPEL. Is the GCC4.5.x plugin API something that is recommended to use to implement such features, is performance okey to not have it as a core opt-pass? I'm willing to put some free time into this matter if it turns out its possible to add some tree SSA optimization pass that could find some limited set of errors. Example given if some value is constant NULL and dereferenced, or some element if accessed outside a constant length buffer using a constant index. What is your recommended path to move forward using GCC and basic static code analysis? About un-initialized values I found some additional info, it seems to be hard to solve... http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18501 http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-February/013170.html Thanks and Best Regards /Fredrik -- #include // Example1: null pointer de-reference int f1(void) { int* a = NULL; *a = 1; return *a; } // Example2: buffer overrun global variable char v2[1]; char* f2(void) { v2[-1] = 1; v2[0] = 1; v2[1] = 1; return (char*)v2; } // Example3: buffer overrun local variable int f3(void) { char v3[1]; v3[-1] = 1; v3[0] = 1; v3[1] = 1; return v3[-1] + v3[0] + v3[1] + v3[2]; } // Example4: uninitialized memory access int f4(void) { char v4[1]; return v4[0]; } // Examples NULL dereference and buffer overruns int main(void) { int t1 = f1(); printf("test1 %d\n", t1); void *t2 = f2(); printf("test2 %08x\n", (unsigned int)t2); int t3 = f3(); printf("test3 %d\n", t3); int t4 = f4(); printf("test4 %d\n", t4); return 0; }
RE: Question about static code analysis features in GCC
Hi Thanks for you answer, I just discovered though that the array-bounds-error could be catched by "-Warray-bounds" warning. I guess this analysis is done in Range Value Propagation "tree-vrp.c" The testcases I tried (+mine example code) did not warn though, is it a bug? testsuite/gcc.dg/Warray-bounds.c testsuite/gcc.dg/Warray-bounds-2.c testsuite/gcc.dg/Warray-bounds-3.c testsuite/gcc.dg/Warray-bounds-4.c FAILED?? testsuite/gcc.dg/Warray-bounds-5.c testsuite/gcc.dg/Warray-bounds-6.c testsuite/gcc.dg/Warray-bounds-7.c FAILED?? testsuite/gcc.dg/Warray-bounds-8.c Couldn't NULL dereferences also be checked in tree-VRP to some extent? And about adding a opt-pass, do you mean about here (in passes.c) p = &all_regular_ipa_passes; +NEXT_PASS (pass_ipa_static_analysis); NEXT_PASS (pass_ipa_whole_program_visibility); What passes do you think have an additional mode for non-code generation, value-numbering (tree-nrv? tree-ssa-sccvn, tree-ssa-pre?) or constant-propagation (tree-cp)? Could this opt-stages be called earlier in the passes pipeline? Thanks and Best Regards /Fredrik From: Richard Guenther [richard.guent...@gmail.com] Sent: Sunday, February 13, 2011 10:54 To: sa...@hederstierna.com Cc: gcc@gcc.gnu.org Subject: Re: Question about static code analysis features in GCC On Sun, Feb 13, 2011 at 2:34 AM, sa...@hederstierna.com wrote: > Hi > > I would like to have some advice regarding static code analysis and GCC. > I've just reviewed several tools like Klocwork, Coverity, CodeSonar and > PolySpace. > These tools offer alot of features and all tools seems to find different > types of defects. > The tool that found most bugs on our code was Coverity, but it is also the > most expensive tool. > > But basically I would most like just to find very "simple" basic errors like > NULL-dereferences and buffer overruns. > I attach a small example file with some very obvious errors like > NULL-dereferences and buffer overruns. > > This buggy file compiles fine though without any warnings at all with GCC as > expected > >gcc -o example example.c -W -Wall -Wextra > > I tried to add checking with mudflap: > >gcc -fmudflap -o example example.c -W -Wall -Wextra -lmudflap > > Then I found all defects in run-time, but I had to run the program so I could > not find all potential errors in compile-time. > Also Valgrind could be used to check run-time bugs, but I'm not 100% sure I > can cover all execution paths in my tests (I also tried gcov). > > I tried to analyze my example file with CLANG, then I found "uninitialized" > issues and NULL-pointers, but not buffer overruns: > >clang --analyze example.c >example.c:7:3: warning: Dereference of null pointer loaded from variable > 'a' >example.c:41:3: warning: Undefined or garbage value returned to caller > > About NULL-checks and buffer-overruns, is there any possible path to get such > checkers into a standard GCC, maybe in just some very limited level? > I've checked the "MyGCC" (http://mygcc.free.fr) patch on Graphite, but it has > been rejected, could it be rewritten somehow as a standard opt_pass to just > find NULL-derefs? > > I've also checked TreeHydra in Mozilla project > (https://developer.mozilla.org/en/Treehydra) that gives JavaScript interface > to GIMPEL. > Is the GCC4.5.x plugin API something that is recommended to use to implement > such features, is performance okey to not have it as a core opt-pass? > > I'm willing to put some free time into this matter if it turns out its > possible to add some tree SSA optimization pass that could find some limited > set of errors. > Example given if some value is constant NULL and dereferenced, or some > element if accessed outside a constant length buffer using a constant index. > What is your recommended path to move forward using GCC and basic static code > analysis? It should be possible to fit static code analysis into GCC. The most prominent issue is that GCC is a compiler mainly looking at optimization quality, and optimization can defeat static code analysis in some cases (such as aggressively using undefined behavior to do dead code elimination). On the other hand optimization makes static analysis easier in some cases, and even more useful if issues in "really" dead code are removed. As a way to start I would suggest to restrict static analysis to -O0 (no optimization), a suitable place to do such analysis is the first entry in the IPA pass pipeline (then you have the whole program in SSA, a callgraph built and unused functions removed - you also have always_inline functions inlined). Something that can be done quite easily is have a mode for t
RE: Question about static code analysis features in GCC
Hi Richard, I've implemented a simple nop-pass as you described and are now investigating a path forward for static code analysis. I'm trying to modify eg. cp-pass to be able to call these workers from my analysis pass. I found some other work though done by Alexander Ivanov Sotirov called "Vulncheck". Available patch at "http://gcc.vulncheck.org/";. It seems to contain some work that might be useful to continue on? Why was not this patch applied to GCC trunk? A question from Sotirov about additional features was unanswered or done off-list? http://gcc.gnu.org/ml/gcc/2007-09/msg00549.html I guess the constant propagation etc is done by other workers/passes in GCC today, so its better to use the available workers. But when starting reading his paper, it seems to me that some parts could be usable? Also Sotirov have a "ssa-tree" approach to analysis rather than Volanchi (http://mygcc.free.fr) that using pretty-printer and pattern matching approach. (Which as I understand stopped this patch from being applied to official GCC.) Or is it even better just to do it as a plugin-pass using MELT or something similar? Thanks and Best Regards /Fredrik From: Richard Guenther [richard.guent...@gmail.com] Sent: Wednesday, February 16, 2011 11:17 To: sa...@hederstierna.com Cc: gcc@gcc.gnu.org Subject: Re: Question about static code analysis features in GCC On Wed, Feb 16, 2011 at 8:54 AM, sa...@hederstierna.com wrote: > Hi > > Thanks for you answer, I just discovered though that the array-bounds-error > could be catched by "-Warray-bounds" warning. > I guess this analysis is done in Range Value Propagation "tree-vrp.c" > The testcases I tried (+mine example code) did not warn though, is it a bug? the array-bounds warning only works when VRP is enabled which it is only at -O2 by default, usually in simple testcases accesses are optimized away. > testsuite/gcc.dg/Warray-bounds.c > testsuite/gcc.dg/Warray-bounds-2.c > testsuite/gcc.dg/Warray-bounds-3.c > testsuite/gcc.dg/Warray-bounds-4.c FAILED?? > testsuite/gcc.dg/Warray-bounds-5.c > testsuite/gcc.dg/Warray-bounds-6.c > testsuite/gcc.dg/Warray-bounds-7.c FAILED?? > testsuite/gcc.dg/Warray-bounds-8.c > > Couldn't NULL dereferences also be checked in tree-VRP to some extent? Yes, but VRP assumes that once you dereference a pointer it will be not NULL - thus its optimistic analysis does defeat the intent to warn for NULL accesses ;) > And about adding a opt-pass, do you mean about here (in passes.c) > > p = &all_regular_ipa_passes; > +NEXT_PASS (pass_ipa_static_analysis); > NEXT_PASS (pass_ipa_whole_program_visibility); No, I was thinking about Index: passes.c === --- passes.c(revision 170176) +++ passes.c(working copy) @@ -796,6 +796,7 @@ init_optimization_passes (void) *p = NULL; p = &all_regular_ipa_passes; + NEXT_PASS (pass_ipa_static_analysis); NEXT_PASS (pass_ipa_whole_program_visibility); NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_cp); at the point you show we are not yet in SSA form. The above will only reliably work at -O0 as otherwise early optimizations will have taken place. > What passes do you think have an additional mode for non-code generation, > value-numbering (tree-nrv? tree-ssa-sccvn, tree-ssa-pre?) or > constant-propagation (tree-cp)? There are none at the moment, but at least the SSA propagators (tree-ssa-ccp.c, tree-ssa-copy.c) and the value-numberer (tree-ssa-sccvn.c/tree-ssa-pre.c) whould be easy to modify. > Could this opt-stages be called earlier in the passes pipeline? I would rather arrange for the workers to be able to be called from the static analysis pass directly instead of trying to make them "passes without code-gen". Richard. > > Thanks and Best Regards > /Fredrik > > From: Richard Guenther [richard.guent...@gmail.com] > Sent: Sunday, February 13, 2011 10:54 > To: sa...@hederstierna.com > Cc: gcc@gcc.gnu.org > Subject: Re: Question about static code analysis features in GCC > > On Sun, Feb 13, 2011 at 2:34 AM, sa...@hederstierna.com > wrote: >> Hi >> >> I would like to have some advice regarding static code analysis and GCC. >> I've just reviewed several tools like Klocwork, Coverity, CodeSonar and >> PolySpace. >> These tools offer alot of features and all tools seems to find different >> types of defects. >> The tool that found most bugs on our code was Coverity, but it is also the >> most expensive tool. >> >> But basically I would most like just to find very "simple" basic errors like >> NULL-dereferences and buffer overruns. >> I
Static code analysis follow ups
Hi! I'm currently looking into possibilities to improve GCC for static-code-analysis features. Some weeks ago I proposed re-introducing -Wunreachable-code for finding dead code: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00385.html (The warning was removed in http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00251.html) Though I have not got any reply yet, the patch might be wrong, but possibly the remove_bb() call could have some kind of 'reason' parameter to avoid false positives? Also I last year sent out ideas about static code analysis in: http://gcc.gnu.org/ml/gcc/2011-02/msg00227.html And got positive response. When I tries to check eg. null-deref-checking I though found some work done on this, the 'bug' is on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16351 I found a patch that add an extra 'static code analysis' pass to check null-dereferencing at: http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00423.html Though it was concluded that this should be done fold_stmt() function, is this still valid? And what is the difference between adding an additional static-code-analysis-pass compared to this null-checking pass? (Even though all optimization workers have been done before checking this). I really would like to see under 2012 some more static code analysis features to be added to GCC, like null-deref-checking and dead-code checkers, possibly even better array-over/under-run-bounds-checkers? Any comments or ideas are most welcome! Thanks and Best Regards, Fredrik
Warn if making external references to local stack memory?
Hi GCC does warn if returning a pointer to a local variable (stack memory). But there are alot of more cases where GCC could possibly warn, eg. when references are made to local variables or stack memory. See this attached example code. GCC warns for first case, but not the others. I think all cases can be considered program bugs, and could trigger a compiler warning I think. I've found out that the present warning is done in "c-typeck.c", is this the right place to but additional warnings of this kind too? Thanks & Best Regards Fredrik Hederstierna The example code file Compiled with "-O2 -W -Wall -Wextra" --- #include #include int * test_ptr; struct test { int *ptr; }; int* test_return_ptr_to_stack_mem(void) { int a[100]; // CORRECT WARNING: // "warning: function returns address of local variable". // (Checking done in file gcc/c-typeck.c, function c_finish_return()). return a; } void test_set_ptr_to_stack_mem(void) { int a[100]; // GIVE WARNING? // "function returns with external reference to local variable?" test_ptr = a; return; } void* test_alloc_struct_ptr_to_stack_mem(void) { int a[100]; struct test* t = (struct test*)malloc(sizeof(struct test)); // GIVE WARNING? // "function returns with reference to local variable?" t->ptr = a; return t; } void* test_alloc_struct_on_stack_mem(void) { struct test* t = (struct test*)alloca(sizeof(struct test)); t->ptr = NULL; // GIVE WARNING? // "function returns allocation from stack memory?" return t; } int main(void) { // GIVES WARNING int* t1 = test_return_ptr_to_stack_mem(); printf("Stack mem ref test 1: %p\n", t1); // NO WARNING? test_set_ptr_to_stack_mem(); printf("Stack mem ref test 2: %d\n", test_ptr[0]); // NO WARNING? struct test * t3 = test_alloc_struct_ptr_to_stack_mem(); printf("Stack mem ref test 3: %d\n", t3->ptr[0]); // NO WARNING? struct test * t4 = test_alloc_struct_on_stack_mem(); printf("Stack mem ref test 4: %p\n", t4->ptr); return 0; }