On 1/9/20 10:51 AM, Jan Hubicka wrote:
On 1/8/20 3:05 PM, Jan Hubicka wrote:
I would still preffer invalidation before streaming (which is fully
deterministic) and possibly have option
Do you mean __gcov_merge_topn?
I suggest we do the following:
- have non-deterministic and deterministic version of TOPN counter
and a flag chosing between deterministic and non-deterministic
instrumentation.
- in non-deterministic version do merging dropping least frequent
values as done with your patch.
Good.
- in deterministic version do the following
in __gcov_merge_topn
1) prune out all values in measured counts with associated counter
less than total_count/N (where N is number of values we track)
Can you please explain me how precisely is this prune useful?
The motivation is to keep everything deterministic. At the end of run
we have deterministic set of counters with deterministic values and
after this step we again have deterministic set, just smaller (since we
removed useless ones).
The the idea was to produce union of all the sets from all the runs
(invalidating ocunter at the overflow) which is again deterministic by
distributivity of union operation, so it does not matter in what order
we perform it.
Yes.
2) prune out all values in read counter same way
3) merge the values ignoring any with resulting counter being
less than merged_total_count/N and invalidating whole counter if
there are too many suriving ones
4) before using profiles in GCC prune out all vlaues with associated
counter less than total_count/N (they are not useful anyway).
However it seems that I missed a problem here. With step 2 and 4 the
merge is not distributive.
I added 2 and 4 trying to work around the fact that we have no
convenient place to do pruning if merging is not performed (since merge
functions are not called at all). I think we need to invent such place
- just do that on the begginign of dump_one_gcov.
Hm, anyway, stage3 closes today and it's definitely next stage1 material.
Here total_count can be simply sum of all N counters, it would be
better if it was total number of runs, but that is not available at
gcov_merge time unless we introduce extra counter for misses.
Observation is that since we never kick out value because counter is
full (but still rather invalidate whole counter) this remains
deterministics. Hopefully prunning out useless small values will
prevent the Firefox-like examples from triggering too often.
Definitly this is stronger than simply invalidating all counters
where number of runs != sum of all counts which is other
deterministics variant.
2) and 4) is only needed since we currently have no convenient place
to prune counters prior streaming if merging is not done at all.
If we invent one we could skip that step.
Does it make sense to you?
I was also wondering: with TOPN it would be nice to have the property
that target with at greater 1/(TOPN+1) probability gets into the counter,
but this guaranteed only for N=1.
For N>1 one can populate N-1 counters with not too important values
and only then start putting in frequent values be sure that they fight
and be sure that the frequent target is always fight between themself
for the last slot (exploiting the "lets decrease the least counter
heuristics")
Like for N=3
X X Y Y Z W Z W Z W Z W ......
here first two counters will get occupied by X and Y with counts 2 and
the third counter will end up with W. Z will be missed even if it has
limit 0.5 and both X and Y probability 0 in the limit.
What about increasing value of every couner by N on hit and decreasing
all by 1 on miss? It will make sure that counts having frequency less
than 1/N will be dropped making space for those with higher frequencies.
I like this approach. It looks smart. So then we'll divide all counter values
by N, right?
Yes.
Note that we used to have the Google's profiling code for TOP N:
https://github.com/gcc-mirror/gcc/blob/gcc-7-branch/libgcc/libgcov-profiler.c#L179
It seems to not implement the logic to decrease counts on mismatch.
Instead if counter is full the value is entered to the least frequently
used entry. This clearly can degreade. For N=3 one can do
X X Y Y Z W Z W Z W Z W ..
where Z and W will keep kicking off each other.
So they seem to wait when this grows to 3000 and take low values out of
histogram.
I do not see much advantage in this scheme except that the counts are
not decreased and this they more closely corresponds to actual
frequencies of the target (still not precisely since the target might
have been kicked out earlier).
Yep, I just wanted to mention that approach ;) Not saying it's the best one.
Martin
Honza
It's somehow counting number of evictions and if it reaches a threshold, then
low
counter values are evicted.
Martin
This should also improve behaviour WRT the problem above.
-fno-deterministic-profile-merging (or something similar) for users who
do not care and for threaded programs where this is impossible
(which needs to trigger different merging in libgcov).
Fine, let's postpone that for GCC 11.
It is a regression, so I think we have right to solve it for GCC 10.
Honza
Martin
Honza