http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60577
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > Which leaves the possibility of more cleverly instrumenting the program > in the first place ... Which shouldn't be so hard as we are in SSA form and have loop structures set up. But it's of course "old" code (profile.c:branch_prob) which was written with "only" a CFG in mind. Unfortunately the code chooses to instrument a non-latch edge in the testcase (loop latches would have been easy to special-case in gimple_gen_edge_profiler). This is the other edge out of a block with a loop exit - so we can special-case single-exit loops that have such edge instrumented where the edge source dominates the loop latch. Or find some even more clever way of placing the loads / stores (you can always use a scalar as counter and load/store in a dominating / post-dominating block of course, but that may have detrimental effects on register pressure ...). Maybe still do that, but only if the counter update is in a loop. Note that applying store motion here may introduce no-op load/store which may have issues with thread safety.