https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #1 from hubicka at kam dot mff.cuni.cz --- I think ipa-cp heuristics still needs some work. It is nice that we got it to do something, but I just checked and with LTO+PGO build of clang it produces cca 30 clones that are not "for all known contexts", so it seems that it is still quite strict on what it considers to clone at least with FDO. On the other hand we have PR103195 where tfft2 grows by 70% becuase function is cloned with no great benefits. I think one problem is that it is based on absolute time improvements. This has problems because bigger functions run longer and more likely to see bigger absolute improvement, but we are more interested in relative improvement (i.e. duplicating very large fucntion to get 0.001% speedup is less useful than duplicating small function that get 10% speedup even if absolute number is same). With profile feedback absolute numbers are OK since they should translate to actual speedups of the whole program. But then longer trained programs will see bigger numbers than shorter trained numbers so perhaps this should be relative to overall running time of the program?