> On 1/20/21 5:00 PM, Jan Hubicka wrote: > > There are two thinks that I would like to discuss first > > 1) I think the option is stil used for value profiling of divisors > > It's not. Right now the only usage lives in get_nth_most_common_value which > is an entry point being used for stringops, indirect call and divmod > histograms.
I see, you replaced the implementation here as well (not only for indirect calls). I am attaching testcase that demonstrates the reproducibility issue on divmod tracking. Script cmd compiles t.c and trains it twice with same inputs but different order. On GCC10 in tumbleweed I get: run 1 run 2 cmpl $257, %esi So the first build does not specialize mod function for 257 while second does. cmd11 script does the same for trunk: run 1 cmpl $257, %esi run 2 This also made me to notice that the following is currently broken: hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 -fprofile-generate t.c hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./a.out 10 10 hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 -fprofile-use t.c -S This is becuase we now call the dump file a-t.gcda instead of t.gcda that seems unintentional change that probably can break some user scripts. I think this testcase is not so unrealistic especially in the context of value profiling for divmod or stringops size. > > > 2) it is not clear to me how the new counter establishes > > reproducibility for indiect calls that have more than 32 targets during > > the train run. > > We cannot ensure, but I would say that it's very unlikely to happen. > In case of Firefox, there are definitely other reasons why the build is not > reproducible. > I would expect arc counters to differ in between multiple training runs. Even in the context of indirect call profiling, I do not think the situation is limited to firefox. With virtual function calls and large object hiearchy one can end up with many possible targets quite easily. The code pattern seems quite sane here. Basic datastructures in firefox has refcounting and have virutal call to decrement the count. Also there is get_id method associated with almost everything and it is what we get currently wrong (the get_id benchmarks). I will try to debug why. While with tracking 32 values instead of 4 we made the problem less frequent (at the cost of adding new issues with malloc recursion), reproducibility is still not guaranteed. > > If it's really a problem we can come up with other approaches: > - GCOV run-time control over # of tracked values (32 right now) > - similarly we can save more values in .gcda files For divmod profiling you can easilly have very large number of values, so I do not think we can count on fact that it is always possible to track them all. In general I see the point of reproducible builds, but it is also fragile feature. If we commit to it (which I think we did), we should support it carefully and not just patch around most frequent issues. I would expect that relatively soon we will see need for reproducible builds of multthreaded programs (such as when building clang, or if we add threads to gcc). In this case the 32bit targets are going to be even less useful. I think we can only accept the profile if no value got dropped. Honza > > I'm sending updated version of the patch. > Martin > From ffbd2fe90dd3bfceecfeefb679572ca2f6b5f333 Mon Sep 17 00:00:00 2001 > From: Martin Liska <mli...@suse.cz> > Date: Tue, 19 Jan 2021 15:43:31 +0100 > Subject: [PATCH] Drop profile reproducibility flag as it is not used. > > gcc/ChangeLog: > > PR gcov-profile/98739 > * common.opt: Drop -fprofile-reproducibility flag. > * coretypes.h (enum profile_reproducibility): Drop the enum. > * doc/invoke.texi: Remove documentation. > * value-prof.c (get_nth_most_common_value): Drop usage of > flag_profile_reproducible. > --- > gcc/common.opt | 16 ++-------------- > gcc/coretypes.h | 7 ------- > gcc/doc/invoke.texi | 30 ------------------------------ > gcc/value-prof.c | 15 +++------------ > 4 files changed, 5 insertions(+), 63 deletions(-) > > diff --git a/gcc/common.opt b/gcc/common.opt > index bde1711870d..863e9681aea 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2236,21 +2236,9 @@ fprofile-exclude-files= > Common Joined RejectNegative Var(flag_profile_exclude_files) > Instrument only functions from files whose name does not match any of the > regular expressions (separated by semi-colons). > > -Enum > -Name(profile_reproducibility) Type(enum profile_reproducibility) > UnknownError(unknown profile reproducibility method %qs) > - > -EnumValue > -Enum(profile_reproducibility) String(serial) > Value(PROFILE_REPRODUCIBILITY_SERIAL) > - > -EnumValue > -Enum(profile_reproducibility) String(parallel-runs) > Value(PROFILE_REPRODUCIBILITY_PARALLEL_RUNS) > - > -EnumValue > -Enum(profile_reproducibility) String(multithreaded) > Value(PROFILE_REPRODUCIBILITY_MULTITHREADED) > - > fprofile-reproducible > -Common Joined RejectNegative Var(flag_profile_reproducible) > Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL) > --fprofile-reproducible=[serial|parallel-runs|multithreaded] Control level > of reproducibility of profile gathered by -fprofile-generate. > +Common Ignore Joined RejectNegative > +Does nothing. Preserved for backward compatibility. > > Enum > Name(profile_update) Type(enum profile_update) UnknownError(unknown profile > update method %qs) > diff --git a/gcc/coretypes.h b/gcc/coretypes.h > index 406572e947d..3a67061a483 100644 > --- a/gcc/coretypes.h > +++ b/gcc/coretypes.h > @@ -212,13 +212,6 @@ enum profile_update { > PROFILE_UPDATE_PREFER_ATOMIC > }; > > -/* Type of profile reproducibility methods. */ > -enum profile_reproducibility { > - PROFILE_REPRODUCIBILITY_SERIAL, > - PROFILE_REPRODUCIBILITY_PARALLEL_RUNS, > - PROFILE_REPRODUCIBILITY_MULTITHREADED > -}; > - > /* Type of -fstack-protector-*. */ > enum stack_protector { > SPCT_FLAG_DEFAULT = 1, > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 5f4a06625eb..c66ba2a62ad 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -582,7 +582,6 @@ Objective-C and Objective-C++ Dialects}. > -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol > -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol > -fprofile-exclude-files=@var{regex} @gol > --fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} > @gol > -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} > @gol > -fasan-shadow-offset=@var{number} -fsanitize-sections=@var{s1},@var{s2},... > @gol > -fsanitize-undefined-trap-on-error -fbounds-check @gol > @@ -14640,35 +14639,6 @@ any of the regular expressions (separated by > semi-colons). > For example, @option{-fprofile-exclude-files=/usr/.*} will prevent > instrumentation > of all files that are located in the @file{/usr/} folder. > > -@item > -fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} > -@opindex fprofile-reproducible > -Control level of reproducibility of profile gathered by > -@code{-fprofile-generate}. This makes it possible to rebuild program > -with same outcome which is useful, for example, for distribution > -packages. > - > -With @option{-fprofile-reproducible=serial} the profile gathered by > -@option{-fprofile-generate} is reproducible provided the trained program > -behaves the same at each invocation of the train run, it is not > -multi-threaded and profile data streaming is always done in the same > -order. Note that profile streaming happens at the end of program run but > -also before @code{fork} function is invoked. > - > -Note that it is quite common that execution counts of some part of > -programs depends, for example, on length of temporary file names or > -memory space randomization (that may affect hash-table collision rate). > -Such non-reproducible part of programs may be annotated by > -@code{no_instrument_function} function attribute. @command{gcov-dump} with > -@option{-l} can be used to dump gathered data and verify that they are > -indeed reproducible. > - > -With @option{-fprofile-reproducible=parallel-runs} collected profile > -stays reproducible regardless the order of streaming of the data into > -gcda files. This setting makes it possible to run multiple instances of > -instrumented program in parallel (such as with @code{make -j}). This > -reduces quality of gathered data, in particular of indirect call > -profiling. > - > @item -fsanitize=address > @opindex fsanitize=address > Enable AddressSanitizer, a fast memory error detector. > diff --git a/gcc/value-prof.c b/gcc/value-prof.c > index 4c916f4994f..1b174006941 100644 > --- a/gcc/value-prof.c > +++ b/gcc/value-prof.c > @@ -745,13 +745,10 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, > profile_probability prob, > > Counters have the following meaning. > > - abs (counters[0]) is the number of executions > + counters[0] is the number of executions > for i in 0 ... TOPN-1 > counters[2 * i + 1] is target > - abs (counters[2 * i + 2]) is corresponding hitrate counter. > - > - Value of counters[0] negative when counter became > - full during merging and some values are lost. */ > + counters[2 * i + 2] is corresponding hitrate counter. */ > > bool > get_nth_most_common_value (gimple *stmt, const char *counter_type, > @@ -765,17 +762,11 @@ get_nth_most_common_value (gimple *stmt, const char > *counter_type, > *count = 0; > *value = 0; > > - gcov_type read_all = abs_hwi (hist->hvalue.counters[0]); > + gcov_type read_all = hist->hvalue.counters[0]; > > gcov_type v = hist->hvalue.counters[2 * n + 2]; > gcov_type c = hist->hvalue.counters[2 * n + 3]; > > - if (hist->hvalue.counters[0] < 0 > - && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS > - || (flag_profile_reproducible > - == PROFILE_REPRODUCIBILITY_MULTITHREADED))) > - return false; > - > /* Indirect calls can't be verified. */ > if (stmt > && check_counter (stmt, counter_type, &c, &read_all, > -- > 2.30.0 >
#include <stdlib.h> __attribute__ ((noipa)) int mod(int a,int b) { return a%b; } int main(int argc, char **argv) { int n = atoi (argv[1]); int siz = atoi (argv[2]); int sum = 1; for (int i=0; i < n; i++) sum += mod (sum, siz); return sum; }
echo run 1 rm t.gcda gcc -O2 -fprofile-generate t.c ./a.out 100 257 ./a.out 100 3 ./a.out 100 7 ./a.out 100 13 ./a.out 100 25 ./a.out 310 257 gcc -O2 -fprofile-use t.c -S grep cmpl.*257 t.s echo run 2 rm t.gcda gcc -O2 -fprofile-generate t.c ./a.out 100 3 ./a.out 100 7 ./a.out 100 13 ./a.out 100 25 ./a.out 100 257 ./a.out 310 257 gcc -O2 -fprofile-use t.c -S grep cmpl.*257 t.s
echo run 1 rm t.gcda ./xgcc -B ./ -O2 -fprofile-generate t.c -c ./xgcc -B ./ -O2 -fprofile-generate t.o ./a.out 50 257 ./a.out 50 1 ./a.out 50 2 ./a.out 50 3 ./a.out 50 4 ./a.out 50 5 ./a.out 50 6 ./a.out 50 7 ./a.out 50 8 ./a.out 50 9 ./a.out 50 10 ./a.out 50 11 ./a.out 50 12 ./a.out 50 13 ./a.out 50 14 ./a.out 50 15 ./a.out 50 16 ./a.out 50 17 ./a.out 50 18 ./a.out 50 19 ./a.out 50 20 ./a.out 50 21 ./a.out 50 22 ./a.out 50 23 ./a.out 50 24 ./a.out 50 25 ./a.out 50 26 ./a.out 50 27 ./a.out 50 28 ./a.out 50 29 ./a.out 50 30 ./a.out 50 31 ./a.out 50 32 ./a.out 1560 257 #./gcov-dump t.gcda -l ./xgcc -B ./ -O2 -fprofile-use t.c -S #-fprofile-reproducibleserial grep cmpl.*257 t.s echo run 2 rm t.gcda ./xgcc -B ./ -O2 -fprofile-generate t.c -c ./xgcc -B ./ -O2 -fprofile-generate t.o ./a.out 50 1 ./a.out 50 2 ./a.out 50 3 ./a.out 50 4 ./a.out 50 5 ./a.out 50 6 ./a.out 50 7 ./a.out 50 8 ./a.out 50 9 ./a.out 50 10 ./a.out 50 11 ./a.out 50 12 ./a.out 50 13 ./a.out 50 14 ./a.out 50 15 ./a.out 50 16 ./a.out 50 17 ./a.out 50 18 ./a.out 50 19 ./a.out 50 20 ./a.out 50 21 ./a.out 50 22 ./a.out 50 23 ./a.out 50 24 ./a.out 50 25 ./a.out 50 26 ./a.out 50 27 ./a.out 50 28 ./a.out 50 29 ./a.out 50 30 ./a.out 50 31 ./a.out 50 32 ./a.out 50 257 ./a.out 1560 257 #./gcov-dump t.gcda -l ./xgcc -B ./ -O2 -fprofile-use t.c -S #-fprofile-reproducibleserial grep cmpl.*257 t.s