https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The problem is that the internal loop in hottest function changes between train
and ref run (train run uses different variant of the loop). This disables
vectorization of the loop believed to be cold causing -fprofile-use binary to
run slower.

Training with ref run solves the issue. -fprofile-partial-training does not,
since it is function-level. If function is not trained at all we optimize as
without profile but here we just enter different loop

Reply via email to