https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015
--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> --- opj_t1_enc_refpass is not inlined due to large function growth and some others due to max-inline-insns-auto. With inlining forced I get profile: 87.35% opj_t1_cblk_encode_processor 6.22% opj_dwt_encode_and_deinterleave_v.lto_priv.0 1.80% opj_mqc_byteout 1.50% opj_dwt_encode_and_deinterleave_h_one_row.lto_priv.0 So pretty much same profile as for clang. However runtime is still 45573 with -O3 -flto -march=native -fno-semantic-interposition --param large-function-insns=1000000 --param max-inline-insns-auto=50000 So it does not seem to be missing IPA optimizations. There are number of conditional moves in clang code, -mbrach=cost helps a bit, but not enough.