https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118356
--- Comment #3 from Javier Mora <cousteaulecommandant at gmail dot com> --- (In reply to Palmer Dabbelt from comment #1) > The GCC for docs say > > -O2 > Optimize even more. GCC performs nearly all supported optimizations that > do not involve a space-speed tradeoff. > > so this is something I'd expect to be off for "-O2" and on for "-O3". I don't *think* this would count as the kind of space-speed tradeoff they refer to in that paragraph. I mean, yes, technically we're slightly increasing the code size by a few bytes, so we're trading a very small space increase for a significant speed increase, but it's not the huge space increase you'd get e.g. with loop unrolling (which is the kind of thing enabled at -O3 levels). Plus, it doesn't seem that any of the extra options enabled at -O3 provide any form of jump alignment, only -O2 does, so I would imagine that the option was left as a placeholder in -O2 so that "any implementations where this makes sense should do that here". Note that there's also -Os, which "enables all -O2 optimizations except those that often increase code size: -falign-etc-etc..."; that option wouldn't exist if none of the -O2 optimizations increased the code size even slightly. > This seems like a good candidate for a microarchitecture-specific tuning > parameter. The exact cost here is going to be very implementation-dependent > and the mapping to GCC codegen is a little clunky, so it might be hard to > get this all to be a net performance win on real code. I could find very little info on RISC-V -mtune options, but it's true that there should be some analysis on some of the most common microarchitectures and see how many of them would benefit from aligned jumps; if most of them are, it might make sense to make it the default for all. > We might also need that alignment-based decompression stuff in binutils, > depending on how implementations handle the extra NOPs (which IIRC we never > merged because nobody cared enough to figure out if the code actually > worked). Yes; if possible, decompression to achieve alignment sounds better than adding NOPs.