https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sebastian.huber@embedded-br | |ains.de --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Sebastian was also working in this area. Note that when you do it as proposed the code will appear as having no coverage (the counters will be allocated at the host side but nothing will increment them). I suppose the very same issue exists for -fprofile-generate/use then where this will then cause the offload code to be optimized for size because it's cold (unless you use -fprofile-partial-training)?