https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118356
Palmer Dabbelt <palmer at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2025-01-08 Keywords| |missed-optimization CC| |palmer at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #1 from Palmer Dabbelt <palmer at gcc dot gnu.org> --- (In reply to Javier Mora from comment #0) > Some RISC-V implementations, including the CORE-V CVE4 family [1], allow > having instructions aligned to 2- or 4-byte boundaries, but introduce an > extra clock cycle penalty if the target of a branch instruction is a 4-byte > instruction that is not aligned to a 4-byte boundary (but not if the target > instruction is aligned to a 4-byte boundary, or if it's a 2-byte > instruction). > > In those cases, forcing alignment of branch targets to 4 bytes (which can be > achieved by providing `-falign-labels=4`) can provide a great improvement on > certain programs. For example, a tight `for` loop may take 9 clock cycles > to run if the branch target is aligned but 10 if it's not, resulting in a > 10% performance loss. (What's worse, this performance loss will only kick > in arbitrarily, and can appear or disappear even if I change a completely > different part of the code, which drove me crazy when I was trying to > measure the performance of a function affected by this issue; enabling > `-falign-labels=4` also has the advantage of removing this uncertainty.) > > Here, my expectation would be that enabling a certain optimization level > (such as `-O2`) enabled this particular optimization. In fact, the > documentation [2] states that `-O2` enables the `-falign-labels` flag, but > without specifying an alignment. It later states that `-falign-labels` > without a value or with `=0` will "use a machine-dependent default which is > very likely to be ‘1’, meaning no alignment". > > Now, I don't quite understand why `-O2` would want to enable an optimization > option whose default behavior is to do nothing, but my guess is that this is > so that specific targets where setting `-falign-labels=X` can provide an > advantage (as is the case with RISC-V) use `X` as the default value rather > than 1. The GCC for docs say -O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. so this is something I'd expect to be off for "-O2" and on for "-O3". > What do you think? Would it make sense for RISC-V targets to make > `-falign-labels=4` the default alignment value when `-falign-labels` is used > without providing an explicit value, so that this forced alignment will > happen when `-O2` or `-O3` are used? This seems like a good candidate for a microarchitecture-specific tuning parameter. The exact cost here is going to be very implementation-dependent and the mapping to GCC codegen is a little clunky, so it might be hard to get this all to be a net performance win on real code. We might also need that alignment-based decompression stuff in binutils, depending on how implementations handle the extra NOPs (which IIRC we never merged because nobody cared enough to figure out if the code actually worked). > [1]: > https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/ > pipeline.html#cycle-counts-per-instruction-type > [2]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html