On Mon, Oct 18 2021, Martin Jambor wrote: > [...] > > IPA-CP does not do a reasonable job when it is updating profile counts > after it has created clones of recursive functions. This patch > addresses that by: > > 1. Only updating counts for special-context clones. When a clone is > created for all contexts, the original is going to be dead and the > cgraph machinery has copied counts to the new node which is the right > thing to do. Therefore updating counts has been moved from > create_specialized_node to decide_about_value and > decide_whether_version_node. > > 2. The current profile updating code artificially increased the assumed > old count when the sum of counts of incoming edges to both the > original and new node were bigger than the count of the original > node. This always happened when self-recursive edge from the clone > was also redirected to the clone because both the original edge and > its clone had original high counts. This clutch was removed and > replaced by the next point. > > 3. When cloning also redirects a self-recursive clone to the clone > itself, new logic has been added to divide the counts brought by such > recursive edges between the original node and the clone. This is > impossible to do well without special knowledge about the function and > which non-recursive entry calls are responsible for what portion of > recursion depth, so the approach taken is rather crude. > > For local nodes, we detect the case when the original node is never > called (in the training run at least) with another value and if so, > steal all its counts like if it was dead. If that is not the case, we > try to divide the count brought by recursive edges (or rather not > brought by direct edges) proportionally to the counts brought by > non-recursive edges - but with artificial limits in place so that we > do not take too many or too few, because that was happening with > detrimental effect in mcf_r. > > 4. When cloning creates extra clones for values brought by a formerly > self-recursive edge with an arithmetic pass-through jump function on > it, such as it does in exchange2_r, all such clones are processed at > once rather than one after another. The counts of all such nodes are > distributed evenly (modulo even-formerly-non-recursive-edges) and the > whole situation is then fixed up so that the edge counts fit. This is > what new function update_counts_for_self_gen_clones does. > > 5. When values brought by a formerly self-recursive edge with an > arithmetic pass-through jump function on it are evaluated by > heuristics which assumes vast majority of node counts are result of > recursive calls and so we simply divide those with the number of > clones there would be if we created another one. > > 6. The mechanisms in init_caller_stats and gather_caller_stats and > get_info_about_necessary_edges was enhanced to gather data required > for the above and a missing check not to count dead incoming edges was > also added. > > gcc/ChangeLog: > > 2021-10-15 Martin Jambor <mjam...@suse.cz> > > * ipa-cp.c (struct caller_statistics): New fields rec_count_sum, > n_nonrec_calls and itself, document all fields. > (init_caller_stats): Initialize the above new fields. > (gather_caller_stats): Gather self-recursive counts and calls number. > (get_info_about_necessary_edges): Gather counts of self-recursive and > other edges bringing in the requested value separately. > (dump_profile_updates): Rework to dump info about a single node only. > (lenient_count_portion_handling): New function. > (struct gather_other_count_struct): New type. > (gather_count_of_non_rec_edges): New function. > (struct desc_incoming_count_struct): New type. > (analyze_clone_icoming_counts): New function. > (adjust_clone_incoming_counts): Likewise. > (update_counts_for_self_gen_clones): Likewise. > (update_profiling_info): Rewritten. > (update_specialized_profile): Adjust call to dump_profile_updates. > (create_specialized_node): Do not update profiling info. > (decide_about_value): New parameter self_gen_clones, either push new > clones into it or updat their profile counts. For self-recursively > generated values, use a portion of the node count instead of count > from self-recursive edges to estimate goodness. > (decide_whether_version_node): Gather clones for self-generated values > in a new vector, update their profiles at once at the end.
Honza approved the patch in a private conversation and I have pushed it to master as commit d1e2e4f9ce4df50564f1244dcea9befc3066faa8. Thanks, Martin