I think I found explanation, the -fpeel-loops trigger some extra flags: from "toplev.c":
/* web and rename-registers help when run after loop unrolling. */ if (flag_web == AUTODETECT_VALUE) flag_web = flag_unroll_loops || flag_peel_loops; if (flag_rename_registers == AUTODETECT_VALUE) flag_rename_registers = flag_unroll_loops || flag_peel_loops; actually its -frename-registers that causes the code size to decrease. This flags seems to be set when enable -fpeel-loops. Maybe this flag could be enabled in -Os, shouldn't have any downside besides makes possibly debugging harder? Thanks/Fredrik ________________________________________ From: Richard Biener [richard.guent...@gmail.com] Sent: Friday, August 14, 2015 09:28 To: sa...@hederstierna.com Cc: gcc@gcc.gnu.org Subject: Re: About loop unrolling and optimize for size On Thu, Aug 13, 2015 at 6:26 PM, sa...@hederstierna.com <fred...@hederstierna.com> wrote: > Hi > I'm using an ARM thumb cross compiler for embedded systems and always do > optimize for small size with -Os. > > Though I've experimented with optimization flags, and loop unrolling. > > Normally loop unrolling is always bad for size, code is duplicated and size > increases. > > Though I discovered that in some special cases where the number of iteration > is very small, eg a loop of 2-3 times, > in this case an unrolling could make code size smaller - eg. losen up > registers used for index in loops etc. > > Example when I use the flag "-fpeel-loops" together with -Os I will 99% of > the cases get smaller code size for ARM thumb target. > > Some my question is how unrolling works with -Os, is it always totally > disabled, > or are there some cases when it could be tested, eg. with small number > iterations, so loop can be eliminated? > > Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only > enabled for -O2 and above I think. Complete peeling is already enabled with -Os, it is just restricted to those cases where GCCs cost modeling of the unrolling operation determines the code size shrinks. If you enable -fpeel-loops then the cost model allows the code size to grow - sth not (always) intended with -Os. The solution is of course to improve the cost modeling and GCCs idea of followup optimization opportunities. I do have some incomplete patches to improve that and hope to get back to it for GCC 6. If you have (small) testcases that show code size improvements with -Os -fpeel-loops over -Os and you are confident they are caused by unrolling please open a bugzilla containing them. Thanks, Richard. > Thanks and Best Regards > Fredrik