> We're not able to enable BB reordering with -Os. The behaviour is
> hard-coded via this if statement in rest_of_handle_reorder_blocks():
>
> if ((flag_reorder_blocks || flag_reorder_blocks_and_partition)
> /* Don't reorder blocks when optimizing for size because extra
> jump insns may
> be created; also barrier may create extra padding.
>
> More correctly we should have a block reordering mode that
tried
> to
> minimize the combined size of all the jumps. This would more
or
> less
> automatically remove extra jumps, but would also try to use
more
> short
> jumps instead of long jumps. */
> && optimize_function_for_speed_p (cfun))
> {
> reorder_basic_blocks ();
>
> If you comment out the "&& optimize_function_for_speed_p (cfun)" then
> BB reordering takes places as desired (although this isn't a solution
> obviously).
>
> In a private message Ian indicated that this had a small impact for
the
> ISA he's working with but a significant performance gain. I tried the
> same thing with the ISA I work on (Ubicom32) and this change typically
> increased code sizes by between 0.1% and 0.3% but improved performance
> by anything from 0.8% to 3% so on balance this is definitely winning
> for most of our users (this for a couple of benchmarks, the Linux
> kernel, busybox and smbd).
>
It should be noted that commenting out the conditional to do with
optimising for speed will make BB reordering come on for all functions,
even cold ones, so I think whatever gains have come from making this
hacky change could increase further if BB reordering is set to
only come on for hot functions when compiling with -Os. (Certainly
the code size increases could be minimised, whilst hopefully retaining
the performance gains.)
Note that I am in no way suggesting this should be the default
behaviour for -Os, but that it should be switchable via the
flags just like other optimisations are. But, once it is switchable,
I expect choosing to turn it on for -Os should not cause universal
enabling of BB reordering for every function (as opposed to the current
universal disabling of BB reordering for every function), but a sensible
half-way point, based on heat, so that you get the performance wins with
minimal code size increases on selected functions.
Cheers,
Ian