https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89992
--- Comment #2 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Richard Biener from comment #1) > It's simply that inlining makes the guessed profile not consider the loop > worth > optimizing for speed. Part of that is because the loop ends up in main() > which we know is executed exactly once and bb->count is less than the entry > block count so we hit > > maybe_hot_count_p (struct function *fun, profile_count count) > { > ... > if (node->frequency == NODE_FREQUENCY_EXECUTED_ONCE > && count < (ENTRY_BLOCK_PTR_FOR_FN (fun)->count.apply_scale (2, > 3))) > return false; > > this is probably due to predictors saying that > > if (__eax <= 6) > return 0; // return from main > > is likely (it gets 66% hit predicted). The foo() != 0 gets even probability > and the following == 230 test gets only 11% probability to hit. > > The "fun" of static profile... (and doing benchmarking in main()). > > But it doesn't have anything to do with the vectorizer or calls. As Richi says, static probability of calling 'do_test' in main is 3.8%. You can use __builtin_expect{,_with_probability} if you want to make the path more probable.