https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90364
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
6.22% 80774 wrf_r_peak.pgo __module_mp_wsm5_MOD_nislfv_rain_plm
5.50% 71494 wrf_r_peak.pgo __module_mp_wsm5_MOD_wsm52d
vs.
4.04% 49253 wrf_r_peak.std __module_mp_wsm5_MOD_wsm52d
3.93% 47888 wrf_r_peak.std __module_mp_wsm5_MOD_nislfv_rain_plm
shows the biggest differences. The reason must still lie with how GCC
considers loops hot or cold.
I wonder whether if-conversion loop versioning properly handles profile
or whether we consider loops cold afterwards.
I notice the predicate degrades to !optimize_bb_for_size_p (loop->header).
I guess dumping the result of optimize_loop[_nest]_for_speed_p in IL
dumps along loop headers might show the differences.