https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118356

--- Comment #3 from Javier Mora <cousteaulecommandant at gmail dot com> ---
(In reply to Palmer Dabbelt from comment #1)
> The GCC for docs say
> 
>     -O2
>     Optimize even more. GCC performs nearly all supported optimizations that
> do not involve a space-speed tradeoff.
> 
> so this is something I'd expect to be off for "-O2" and on for "-O3".

I don't *think* this would count as the kind of space-speed tradeoff they refer
to in that paragraph.  I mean, yes, technically we're slightly increasing the
code size by a few bytes, so we're trading a very small space increase for a
significant speed increase, but it's not the huge space increase you'd get e.g.
with loop unrolling (which is the kind of thing enabled at -O3 levels).  Plus,
it doesn't seem that any of the extra options enabled at -O3 provide any form
of jump alignment, only -O2 does, so I would imagine that the option was left
as a placeholder in -O2 so that "any implementations where this makes sense
should do that here".

Note that there's also -Os, which "enables all -O2 optimizations except those
that often increase code size: -falign-etc-etc..."; that option wouldn't exist
if none of the -O2 optimizations increased the code size even slightly.

> This seems like a good candidate for a microarchitecture-specific tuning
> parameter.  The exact cost here is going to be very implementation-dependent
> and the mapping to GCC codegen is a little clunky, so it might be hard to
> get this all to be a net performance win on real code.

I could find very little info on RISC-V -mtune options, but it's true that
there should be some analysis on some of the most common microarchitectures and
see how many of them would benefit from aligned jumps; if most of them are, it
might make sense to make it the default for all.

> We might also need that alignment-based decompression stuff in binutils,
> depending on how implementations handle the extra NOPs (which IIRC we never
> merged because nobody cared enough to figure out if the code actually
> worked).

Yes; if possible, decompression to achieve alignment sounds better than adding
NOPs.

Reply via email to