> On 1/20/21 5:00 PM, Jan Hubicka wrote:
> > There are two thinks that I would like to discuss first
> >   1) I think the option is stil used for value profiling of divisors
> 
> It's not. Right now the only usage lives in get_nth_most_common_value which
> is an entry point being used for stringops, indirect call and divmod 
> histograms.

I see, you replaced the implementation here as well (not only for
indirect calls).  I am attaching testcase that demonstrates the
reproducibility issue on divmod tracking.

Script cmd compiles t.c and trains it twice with same inputs but
different order.  On GCC10 in tumbleweed I get:

run 1
run 2
        cmpl    $257, %esi

So the first build does not specialize mod function for 257 while second
does.

cmd11 script does the same for trunk:

run 1   
        cmpl    $257, %esi
run 2

This also made me to notice that the following is currently broken:

hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 
-fprofile-generate t.c
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./a.out 10 10
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 
-fprofile-use t.c -S

This is becuase we now call the dump file a-t.gcda instead of t.gcda
that seems unintentional change that probably can break some user
scripts.

I think this testcase is not so unrealistic especially in the context of
value profiling for divmod or stringops size.
> 
> >   2) it is not clear to me how the new counter establishes
> >   reproducibility for indiect calls that have more than 32 targets during
> >   the train run.
> 
> We cannot ensure, but I would say that it's very unlikely to happen.
> In case of Firefox, there are definitely other reasons why the build is not 
> reproducible.
> I would expect arc counters to differ in between multiple training runs.

Even in the context of indirect call profiling, I do not think the
situation is limited to firefox.   With virtual function calls and large
object hiearchy one can end up with many possible targets quite easily.
The code pattern seems quite sane here.  Basic datastructures in firefox
has refcounting and have virutal call to decrement the count.  Also
there is get_id method associated with almost everything and it is what
we get currently wrong (the get_id benchmarks).  I will try to debug
why.

While with tracking 32 values instead of 4 we made the problem less
frequent (at the cost of adding new issues with malloc recursion),
reproducibility is still not guaranteed.
> 
> If it's really a problem we can come up with other approaches:
> - GCOV run-time control over # of tracked values (32 right now)
> - similarly we can save more values in .gcda files

For divmod profiling you can easilly have very large number of values,
so I do not think we can count on fact that it is always possible to
track them all. 

In general I see the point of reproducible builds, but it is also fragile
feature. If we commit to it (which I think we did), we should support it
carefully and not just patch around most frequent issues.  

I would expect that relatively soon we will see need for reproducible
builds of multthreaded programs (such as when building clang, or if we
add threads to gcc). In this case the 32bit targets are going to be even
less useful.  I think we can only accept the profile if no value got
dropped.

Honza
> 
> I'm sending updated version of the patch.
> Martin

> From ffbd2fe90dd3bfceecfeefb679572ca2f6b5f333 Mon Sep 17 00:00:00 2001
> From: Martin Liska <mli...@suse.cz>
> Date: Tue, 19 Jan 2021 15:43:31 +0100
> Subject: [PATCH] Drop profile reproducibility flag as it is not used.
> 
> gcc/ChangeLog:
> 
>       PR gcov-profile/98739
>       * common.opt: Drop -fprofile-reproducibility flag.
>       * coretypes.h (enum profile_reproducibility): Drop the enum.
>       * doc/invoke.texi: Remove documentation.
>       * value-prof.c (get_nth_most_common_value): Drop usage of
>       flag_profile_reproducible.
> ---
>  gcc/common.opt      | 16 ++--------------
>  gcc/coretypes.h     |  7 -------
>  gcc/doc/invoke.texi | 30 ------------------------------
>  gcc/value-prof.c    | 15 +++------------
>  4 files changed, 5 insertions(+), 63 deletions(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index bde1711870d..863e9681aea 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2236,21 +2236,9 @@ fprofile-exclude-files=
>  Common Joined RejectNegative Var(flag_profile_exclude_files)
>  Instrument only functions from files whose name does not match any of the 
> regular expressions (separated by semi-colons).
>  
> -Enum
> -Name(profile_reproducibility) Type(enum profile_reproducibility) 
> UnknownError(unknown profile reproducibility method %qs)
> -
> -EnumValue
> -Enum(profile_reproducibility) String(serial) 
> Value(PROFILE_REPRODUCIBILITY_SERIAL)
> -
> -EnumValue
> -Enum(profile_reproducibility) String(parallel-runs) 
> Value(PROFILE_REPRODUCIBILITY_PARALLEL_RUNS)
> -
> -EnumValue
> -Enum(profile_reproducibility) String(multithreaded) 
> Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
> -
>  fprofile-reproducible
> -Common Joined RejectNegative Var(flag_profile_reproducible) 
> Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
> --fprofile-reproducible=[serial|parallel-runs|multithreaded]  Control level 
> of reproducibility of profile gathered by -fprofile-generate.
> +Common Ignore Joined RejectNegative
> +Does nothing. Preserved for backward compatibility.
>  
>  Enum
>  Name(profile_update) Type(enum profile_update) UnknownError(unknown profile 
> update method %qs)
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 406572e947d..3a67061a483 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -212,13 +212,6 @@ enum profile_update {
>    PROFILE_UPDATE_PREFER_ATOMIC
>  };
>  
> -/* Type of profile reproducibility methods.  */
> -enum profile_reproducibility {
> -    PROFILE_REPRODUCIBILITY_SERIAL,
> -    PROFILE_REPRODUCIBILITY_PARALLEL_RUNS,
> -    PROFILE_REPRODUCIBILITY_MULTITHREADED
> -};
> -
>  /* Type of -fstack-protector-*.  */
>  enum stack_protector {
>    SPCT_FLAG_DEFAULT = 1,
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5f4a06625eb..c66ba2a62ad 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -582,7 +582,6 @@ Objective-C and Objective-C++ Dialects}.
>  -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol
>  -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol
>  -fprofile-exclude-files=@var{regex} @gol
> --fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} 
> @gol
>  -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} 
> @gol
>  -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... 
> @gol
>  -fsanitize-undefined-trap-on-error  -fbounds-check @gol
> @@ -14640,35 +14639,6 @@ any of the regular expressions (separated by 
> semi-colons).
>  For example, @option{-fprofile-exclude-files=/usr/.*} will prevent 
> instrumentation
>  of all files that are located in the @file{/usr/} folder.
>  
> -@item 
> -fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]}
> -@opindex fprofile-reproducible
> -Control level of reproducibility of profile gathered by
> -@code{-fprofile-generate}.  This makes it possible to rebuild program
> -with same outcome which is useful, for example, for distribution
> -packages.
> -
> -With @option{-fprofile-reproducible=serial} the profile gathered by
> -@option{-fprofile-generate} is reproducible provided the trained program
> -behaves the same at each invocation of the train run, it is not
> -multi-threaded and profile data streaming is always done in the same
> -order.  Note that profile streaming happens at the end of program run but
> -also before @code{fork} function is invoked.
> -
> -Note that it is quite common that execution counts of some part of
> -programs depends, for example, on length of temporary file names or
> -memory space randomization (that may affect hash-table collision rate).
> -Such non-reproducible part of programs may be annotated by
> -@code{no_instrument_function} function attribute. @command{gcov-dump} with
> -@option{-l} can be used to dump gathered data and verify that they are
> -indeed reproducible.
> -
> -With @option{-fprofile-reproducible=parallel-runs} collected profile
> -stays reproducible regardless the order of streaming of the data into
> -gcda files.  This setting makes it possible to run multiple instances of
> -instrumented program in parallel (such as with @code{make -j}). This
> -reduces quality of gathered data, in particular of indirect call
> -profiling.
> -
>  @item -fsanitize=address
>  @opindex fsanitize=address
>  Enable AddressSanitizer, a fast memory error detector.
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 4c916f4994f..1b174006941 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -745,13 +745,10 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
> profile_probability prob,
>  
>     Counters have the following meaning.
>  
> -   abs (counters[0]) is the number of executions
> +   counters[0] is the number of executions
>     for i in 0 ... TOPN-1
>       counters[2 * i + 1] is target
> -     abs (counters[2 * i + 2]) is corresponding hitrate counter.
> -
> -   Value of counters[0] negative when counter became
> -   full during merging and some values are lost.  */
> +     counters[2 * i + 2] is corresponding hitrate counter.  */
>  
>  bool
>  get_nth_most_common_value (gimple *stmt, const char *counter_type,
> @@ -765,17 +762,11 @@ get_nth_most_common_value (gimple *stmt, const char 
> *counter_type,
>    *count = 0;
>    *value = 0;
>  
> -  gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
> +  gcov_type read_all = hist->hvalue.counters[0];
>  
>    gcov_type v = hist->hvalue.counters[2 * n + 2];
>    gcov_type c = hist->hvalue.counters[2 * n + 3];
>  
> -  if (hist->hvalue.counters[0] < 0
> -      && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
> -       || (flag_profile_reproducible
> -           == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
> -    return false;
> -
>    /* Indirect calls can't be verified.  */
>    if (stmt
>        && check_counter (stmt, counter_type, &c, &read_all,
> -- 
> 2.30.0
> 

#include <stdlib.h>
__attribute__ ((noipa))
int mod(int a,int b)
{
  return a%b;
}
int main(int argc, char **argv)
{
  int n = atoi (argv[1]);
  int siz = atoi (argv[2]);
  int sum = 1;
  for (int i=0; i < n; i++)
    sum += mod (sum, siz);
  return sum;
}
echo run 1
rm t.gcda
gcc -O2 -fprofile-generate t.c
./a.out 100 257 
./a.out 100 3
./a.out 100 7
./a.out 100 13
./a.out 100 25
./a.out 310 257 
gcc -O2 -fprofile-use t.c -S
grep cmpl.*257 t.s

echo run 2
rm t.gcda
gcc -O2 -fprofile-generate t.c
./a.out 100 3
./a.out 100 7
./a.out 100 13
./a.out 100 25
./a.out 100 257 
./a.out 310 257 
gcc -O2 -fprofile-use t.c -S
grep cmpl.*257 t.s
echo run 1
rm t.gcda
./xgcc -B ./ -O2 -fprofile-generate t.c -c
./xgcc -B ./ -O2 -fprofile-generate t.o 
./a.out 50 257 
./a.out 50 1
./a.out 50 2
./a.out 50 3
./a.out 50 4
./a.out 50 5
./a.out 50 6
./a.out 50 7
./a.out 50 8
./a.out 50 9
./a.out 50 10
./a.out 50 11
./a.out 50 12
./a.out 50 13
./a.out 50 14
./a.out 50 15
./a.out 50 16
./a.out 50 17
./a.out 50 18
./a.out 50 19
./a.out 50 20
./a.out 50 21
./a.out 50 22
./a.out 50 23
./a.out 50 24
./a.out 50 25
./a.out 50 26
./a.out 50 27
./a.out 50 28
./a.out 50 29
./a.out 50 30
./a.out 50 31
./a.out 50 32
./a.out 1560 257 
#./gcov-dump t.gcda -l
./xgcc -B ./ -O2 -fprofile-use t.c -S #-fprofile-reproducibleserial
grep cmpl.*257 t.s

echo run 2
rm t.gcda
./xgcc -B ./ -O2 -fprofile-generate t.c -c
./xgcc -B ./ -O2 -fprofile-generate t.o 
./a.out 50 1
./a.out 50 2
./a.out 50 3
./a.out 50 4
./a.out 50 5
./a.out 50 6
./a.out 50 7
./a.out 50 8
./a.out 50 9
./a.out 50 10
./a.out 50 11
./a.out 50 12
./a.out 50 13
./a.out 50 14
./a.out 50 15
./a.out 50 16
./a.out 50 17
./a.out 50 18
./a.out 50 19
./a.out 50 20
./a.out 50 21
./a.out 50 22
./a.out 50 23
./a.out 50 24
./a.out 50 25
./a.out 50 26
./a.out 50 27
./a.out 50 28
./a.out 50 29
./a.out 50 30
./a.out 50 31
./a.out 50 32
./a.out 50 257 
./a.out 1560 257 
#./gcov-dump t.gcda -l
./xgcc -B ./ -O2 -fprofile-use t.c -S #-fprofile-reproducibleserial
grep cmpl.*257 t.s

Reply via email to