Re: Check that passes do not forget to define profile

2023-10-04 Thread Jan Hubicka
> Hi Honza,
> 
> My current patch set for AArch64 VLA omp codegen started failing on
> gcc.dg/gomp/pr87898.c after this. I traced it back to
> 'move_sese_region_to_fn' in tree/cfg.cc not setting count for the bb
> created.
> 
> I was able to 'fix' it locally by setting the count of the new bb to the
> accumulation of e->count () of all the entry_endges (if initialized). I'm
> however not even close to certain that's the right approach, attached patch
> for illustration.
> 
> Kind regards,
> Andre
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc

> index 
> ffab7518b1568b58e610e26feb9e3cab18ddb3c2..32fc47ae683164bf8fac477fbe6e2c998382e754
>  100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -8160,11 +8160,15 @@ move_sese_region_to_fn (struct function *dest_cfun, 
> basic_block entry_bb,
>bb = create_empty_bb (entry_pred[0]);
>if (current_loops)
>  add_bb_to_loop (bb, loop);
> +  profile_count count = profile_count::zero ();
>for (i = 0; i < num_entry_edges; i++)
>  {
>e = make_edge (entry_pred[i], bb, entry_flag[i]);
>e->probability = entry_prob[i];
> +  if (e->count ().initialized_p ())
> + count += e->count ();
>  }
> +  bb->count = count;

This looks generally right - if you create a BB you need to set its
count and unless it has self-loop that is the sum of counts of
incommping edges.

However the initialized_p check should be unnecessary: if one of entry
edges to BB is uninitialized, the + operation will make bb count
uninitialized too, which is OK.

Honza
>  
>for (i = 0; i < num_exit_edges; i++)
>  {



Re: [PATCH v2] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> From: Sergei Trofimovich 
> 
> r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
> exposed check failures in cases when gcc produces uninitialized profile
> probabilities. In case of PR/111559 uninitialized profile is generated
> by edges executed 0 times reported by IPA profile:
> 
> $ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
> $ ./b
> $ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info
> 
> pr111559.c: In function 'rule1':
> pr111559.c:6:13: error: probability of edge 3->4 not initialized
> 6 | static void rule1(void) { if (p) edge(); }
>   | ^
> during GIMPLE pass: fixup_cfg
> pr111559.c:6:13: internal compiler error: verify_flow_info failed
> 
> The change conservatively ignores updates with uninitialized values and
> uses initially assigned probabilities (`always` probability in case of
> the example).
> 
>   PR ipa/111283
>   PR gcov-profile/111559
> 
> gcc/
>   * ipa-utils.cc (ipa_merge_profiles): Avoid producing
>   uninitialized probabilities when merging counters with zero
>   denominators.
> 
> gcc/testsuite/
>   * gcc.dg/tree-prof/pr111559.c: New test.
> ---
>  gcc/ipa-utils.cc  |  6 +-
>  gcc/testsuite/gcc.dg/tree-prof/pr111559.c | 16 
>  2 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> 
> diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> index 956c6294fd7..7c53ae9dd45 100644
> --- a/gcc/ipa-utils.cc
> +++ b/gcc/ipa-utils.cc
> @@ -651,13 +651,17 @@ ipa_merge_profiles (struct cgraph_node *dst,
>   {
> edge srce = EDGE_SUCC (srcbb, i);
> edge dste = EDGE_SUCC (dstbb, i);
> -   dste->probability = 
> +   profile_probability merged =
>   dste->probability * dstbb->count.ipa ().probability_in
>(dstbb->count.ipa ()
> + srccount.ipa ())
>   + srce->probability * srcbb->count.ipa ().probability_in
>(dstbb->count.ipa ()
> + srccount.ipa ());
> +   /* We produce uninitialized probabilities when
> +  denominator is zero: https://gcc.gnu.org/PR111559.  */
> +   if (merged.initialized_p ())
> + dste->probability = merged;

Thanks for the patch.
We usually avoid the uninitialized value here by simply checking that
parameter of probability_in satifies nonzero_p.  So I think it would be
more consistent doing it here to:

  profile_probability sum = dstbb->count.ipa () + srccount.ipa ()
  if (sum.nonzero_p ())
  {
 dste->probability = .
  }

OK with this change.
Honza
>   }
> dstbb->count = dstbb->count.ipa () + srccount.ipa ();
>   }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr111559.c 
> b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> new file mode 100644
> index 000..43202c6c888
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> @@ -0,0 +1,16 @@
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) static void edge(void) {}
> +
> +int p = 0;
> +
> +__attribute__((noinline))
> +static void rule1(void) { if (p) edge(); }
> +
> +__attribute__((noinline))
> +static void rule1_same(void) { if (p) edge(); }
> +
> +__attribute__((noipa)) int main(void) {
> +rule1();
> +rule1_same();
> +}
> -- 
> 2.42.0
> 


Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jan Hubicka
> 
> Like Wahlen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This is very fast, but
> introduces a limit on the number of terms in a single boolean
> expression, the number of bits in a gcov_unsigned_type (which is
> typedef'd to uint64_t), so for most practical purposes this would be
> acceptable. This limitation is in the implementation and not the
> algorithm, so support for more conditions can be added by also
> introducing arbitrary-sized bitsets.

This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?
> 
> For space overhead, the instrumentation needs two accumulators
> (gcov_unsigned_type) per condition in the program which will be written
> to the gcov file. In addition, every function gets a pair of local
> accumulators, but these accmulators are reused between conditions in the
> same function.
> 
> For time overhead, there is a zeroing of the local accumulators for
> every condition and one or two bitwise operation on every edge taken in
> the an expression.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
> 3:   17:void fn (int a, int b, int c, int d) {
> 3:   18:if ((a && (b || c)) && d)
> conditions covered 3/8
> condition  0 not covered (true)
> condition  0 not covered (false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?
> 
> Some expressions, mostly those without else-blocks, are effectively
> "rewritten" in the CFG construction making the algorithm unable to
> distinguish them:
> 
> and.c:
> 
> if (a && b && c)
> x = 1;
> 
> ifs.c:
> 
> if (a)
> if (b)
> if (c)
> x = 1;
> 
> gcc will build the same graph for both these programs, and gcov will
> report boths as 3-term expressions. It is vital that it is not
> interpreted the other way around (which is consistent with the shape of
> the graph) because otherwise the masking would be wrong for the and.c
> program which is a more severe error. While surprising, users would
> probably expect some minor rewriting of semantically-identical
> expressions.
> 
> and.c.gcov:
> #:2:if (a && b && c)
> conditions covered 6/6
> #:3:x = 1;
> 
> ifs.c.gcov:
> #:2:if (a)
> #:3:if (b)
> #:4:if (c)
> #:5:x = 1;
> conditions covered 6/6

Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used 
for
auto-FDO?
> 
> gcc/ChangeLog:
> 
>   * builtins.cc (expand_builtin_fork_or_exec): Check
>   profile_condition_flag.
> * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
>   * common.opt: Add new options -fprofile-conditions and
>   * doc/gcov.texi: Add --conditions documentation.
>   * doc/invoke.texi: Add -fprofile-conditions documentation.
>   * gcc.cc: Link gcov on -fprofile-conditions.
>   * gcov-counter.def (GCOV_COUNTER_CONDS): New.
>   * gcov-dump.cc (tag_conditions): New.
>   * gcov-io.h (GCOV_TAG_CONDS): New.
>   (GCOV_TAG_CONDS_LENGTH): Likewise.
>   (GCOV_TAG_CONDS_NUM): Likewise.
>   * gcov.cc (class condition_info): New.
>   (condition_info::condition_info): New.
>   (condition_info::popcount): New.
>   (struct coverage_info): New.
>   (add_condition_counts): New.
>   (output_conditions): New.
>   (print_usage): Add -g, --conditions.
>   (process_args): Likewise.
>   (output_intermediate_json_line): Output conditions.
>   (read_graph_file): Read conditions counters.
>   (read_count_file): Read conditions counters.
>   (file_summary): Print conditions.
>   (accumulate_line_info): Accumulate conditions.
>   (output_line_details): Print conditions.
>   * ipa-inline.cc (can_early_inline_edge_p): Check
>   profile_condition_flag.
>   * ipa-split.cc (pass_split_functions::gate): Likewise.
>   * passes.cc (finish_optimization_passes): Likewise.
>   * profile.cc (find_conditions): New declaration.
>   (cov_length): Likewise.
>   (cov_blocks): Likewise.
>   (cov_masks): Likewise.
>   (cov_free): Likewise.
>   (instrument_decisions): New.
>   (read_thunk

Re: [PATCH 06/22] Use popcount_hwi rather than builtin

2023-10-05 Thread Jan Hubicka
Hi,
can you please also squash those changes which fixes patch #1
so it is easier to review?
Honza
> From: Jørgen Kvalsvik 
> 
> ---
>  gcc/gcov.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/gcov.cc b/gcc/gcov.cc
> index 274f2fc5d9f..35be97cf5ac 100644
> --- a/gcc/gcov.cc
> +++ b/gcc/gcov.cc
> @@ -46,6 +46,7 @@ along with Gcov; see the file COPYING3.  If not see
>  #include "color-macros.h"
>  #include "pretty-print.h"
>  #include "json.h"
> +#include "hwint.h"
>  
>  #include 
>  #include 
> @@ -159,7 +160,7 @@ condition_info::condition_info (): truev (0), falsev (0), 
> n_terms (0)
>  
>  int condition_info::popcount () const
>  {
> -return __builtin_popcountll (truev) + __builtin_popcountll (falsev);
> +return popcount_hwi (truev) + popcount_hwi (falsev);
>  }
>  
>  /* Describes a basic block. Contains lists of arcs to successor and
> -- 
> 2.30.2
> 


Re: [PATCH v2] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> index 956c6294fd7..1355ccac6f0 100644
> --- a/gcc/ipa-utils.cc
> +++ b/gcc/ipa-utils.cc
> @@ -651,13 +651,16 @@ ipa_merge_profiles (struct cgraph_node *dst,
>   {
> edge srce = EDGE_SUCC (srcbb, i);
> edge dste = EDGE_SUCC (dstbb, i);
> -   dste->probability = 
> - dste->probability * dstbb->count.ipa ().probability_in
> -  (dstbb->count.ipa ()
> -   + srccount.ipa ())
> - + srce->probability * srcbb->count.ipa ().probability_in
> -  (dstbb->count.ipa ()
> -   + srccount.ipa ());
> +   profile_count sum =
> + dstbb->count.ipa () + srccount.ipa ();
> +   if (sum.nonzero_p ())
> + dste->probability =
> +   dste->probability * dstbb->count.ipa ().probability_in
> +(dstbb->count.ipa ()
> + + srccount.ipa ())
> +   + srce->probability * srcbb->count.ipa ().probability_in
> +(dstbb->count.ipa ()
> + + srccount.ipa ());

looks good.  You can use probability_in (sum) 
in both of the places.

Honza


Re: [PATCH 1/3] ipa-cp: Templatize filtering of m_agg_values

2023-10-05 Thread Jan Hubicka
> PR 57 points to another place where IPA-CP collected aggregate
> compile-time constants need to be filtered, in addition to the one
> place that already does this in ipa-sra.  In order to re-use code,
> this patch turns the common bit into a template.
> 
> The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.
> 
> gcc/ChangeLog:
> 
> 2023-09-13  Martin Jambor  
> 
>   PR ipa/57
>   * ipa-prop.h (ipcp_transformation): New member function template
>   remove_argaggs_if.
>   * ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
>   filter aggreagate constants.
OK,
Honza


Re: [PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-05 Thread Jan Hubicka
> gcc/ChangeLog:
> 
> 2023-09-19  Martin Jambor  
> 
>   PR ipa/57
>   * ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
>   * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
>   (update_signature): Mark any any IPA-CP aggregate constants at
>   positions known to be killed as killed.  Move check that there is
>   clone_info after this pruning.
>   * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
>   (ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
>   (push_agg_values_from_plats): Likewise.
>   (ipa_push_agg_values_from_jfunc): Likewise.
>   (estimate_local_effects): Likewise.
>   (push_agg_values_for_index_from_edge): Likewise.
>   * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
>   flag.
>   (read_ipcp_transformation_info): Likewise.
>   (ipcp_get_aggregate_const): Update comment, assert that encountered
>   record does not have killed flag set.
>   (ipcp_transform_function): Prune all aggregate constants with killed
>   set.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-09-18  Martin Jambor  
> 
>   PR ipa/57
>   * gcc.dg/lto/pr57_0.c: New test.
>   * gcc.dg/lto/pr57_1.c: Second file of the same new test.

> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> index c04f9f44c06..a8fcf159259 100644
> --- a/gcc/ipa-modref.cc
> +++ b/gcc/ipa-modref.cc
> @@ -4065,21 +4065,71 @@ remap_kills (vec  &kills, const 
> vec  &map)
>i++;
>  }
>  
> +/* Return true if the V can overlap with KILL.  */
> +
> +static bool
> +ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v,
> + const modref_access_node &kill)
> +{
> +  if (kill.parm_index == v.index)
> +{
> +  gcc_assert (kill.parm_offset_known);
> +  gcc_assert (known_eq (kill.max_size, kill.size));
> +  poly_int64 repl_size;
> +  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
> +  &repl_size);
> +  gcc_assert (ok);
> +  poly_int64 repl_offset (v.unit_offset);
> +  repl_offset <<= LOG2_BITS_PER_UNIT;
> +  poly_int64 combined_offset
> + = (kill.parm_offset << LOG2_BITS_PER_UNIT) + kill.offset;
parm_offset may be negative which I think will confuse ranges_maybe_overlap_p. 
I think you need to test for this and if it is negative adjust
repl_offset instead of kill.offset
> +  if (ranges_maybe_overlap_p (repl_offset, repl_size,
> +   combined_offset, kill.size))
> + return true;
> +}
> +  return false;
> +}
> +
>  /* If signature changed, update the summary.  */
>  
>  static void
>  update_signature (struct cgraph_node *node)
>  {
> -  clone_info *info = clone_info::get (node);
> -  if (!info || !info->param_adjustments)
> -return;
> -
>modref_summary *r = optimization_summaries
> ? optimization_summaries->get (node) : NULL;
>modref_summary_lto *r_lto = summaries_lto
> ? summaries_lto->get (node) : NULL;
>if (!r && !r_lto)
>  return;
> +
> +  ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node);
Please add comment on why this is necessary.
> +  if (ipcp_ts)
> +{
> +for (auto &v : ipcp_ts->m_agg_values)
> +  {
> + if (!v.by_ref)
> +   continue;
> + if (r)
> +   for (const modref_access_node &kill : r->kills)
> + if (ipcp_argagg_and_kill_overlap_p (v, kill))
> +   {
> + v.killed = true;
> + break;
> +   }
> + if (!v.killed && r_lto)
> +   for (const modref_access_node &kill : r_lto->kills)
> + if (ipcp_argagg_and_kill_overlap_p (v, kill))
> +   {
> + v.killed = 1;
 = true?
> + break;
> +   }
> +  }
> +}
> +
> +  clone_info *info = clone_info::get (node);
> +  if (!info || !info->param_adjustments)
> +return;
> +
OK.
Honza


Re: [PATCH v4] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> On Thu, Oct 05, 2023 at 03:04:55PM +0200, Jan Hubicka wrote:
> > > diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> > > index 956c6294fd7..1355ccac6f0 100644
> > > --- a/gcc/ipa-utils.cc
> > > +++ b/gcc/ipa-utils.cc
> > > @@ -651,13 +651,16 @@ ipa_merge_profiles (struct cgraph_node *dst,
> > >   {
> > > edge srce = EDGE_SUCC (srcbb, i);
> > > edge dste = EDGE_SUCC (dstbb, i);
> > > -   dste->probability = 
> > > - dste->probability * dstbb->count.ipa ().probability_in
> > > -  (dstbb->count.ipa ()
> > > -   + srccount.ipa ())
> > > - + srce->probability * srcbb->count.ipa ().probability_in
> > > -  (dstbb->count.ipa ()
> > > -   + srccount.ipa ());
> > > +   profile_count sum =
> > > + dstbb->count.ipa () + srccount.ipa ();
> > > +   if (sum.nonzero_p ())
> > > + dste->probability =
> > > +   dste->probability * dstbb->count.ipa ().probability_in
> > > +(dstbb->count.ipa ()
> > > + + srccount.ipa ())
> > > +   + srce->probability * srcbb->count.ipa ().probability_in
> > > +(dstbb->count.ipa ()
> > > + + srccount.ipa ());
> > 
> > looks good.  You can use probability_in (sum) 
> > in both of the places.
> 
> Oh, great point! Completely forgot about it. Attached v4.
> 
> If it still looks reasonable I'll check again if `python` and
> `profiledbootstrap` still survives it and will push.
Looks good, thanks!
Honza


Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jan Hubicka
> On 05/10/2023 22:39, Jørgen Kvalsvik wrote:
> > On 05/10/2023 21:59, Jan Hubicka wrote:
> > > > 
> > > > Like Wahlen et al this implementation records coverage in fixed-size
> > > > bitsets which gcov knows how to interpret. This is very fast, but
> > > > introduces a limit on the number of terms in a single boolean
> > > > expression, the number of bits in a gcov_unsigned_type (which is
> > > > typedef'd to uint64_t), so for most practical purposes this would be
> > > > acceptable. This limitation is in the implementation and not the
> > > > algorithm, so support for more conditions can be added by also
> > > > introducing arbitrary-sized bitsets.
> > > 
> > > This should not be too hard to do - if conditionalis more complex you
> > > simply introduce more than one counter for it, right?
> > > How many times this trigers on GCC sources?
> > 
> > It shouldn't be, no. But when dynamic bitsets are on the table it would
> > be much better to length-encode in smaller multiples than the 64-bit
> > counters. Most expressions are small (<4 terms), so the savings would be
> > substantial. I opted for the simpler fixed-size to start with because it
> > is much simpler and would not introduce any branching or decisions in
> > the instrumentation.
> 
> Oh, and I forgot - I have never seen a real world case that have been more
> than 64 conditions, but I suppose it may happen with generated code.

reload.cc has some long hand-written conditionals in it.  The first one
I counted had 38 conditions. Some of them are macros that may expand to
sub-conditions :)

But I agree that such code should not be common and probably the
conditional should be factored to multiple predicates.

Honza


Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jan Hubicka
> Hi!
> 
> The following patch removes ipa_bits struct pointer/vector from ipa
> jump functions and ipa cp transformations.
> 
> The reason is because the struct uses widest_int to represent
> mask/value pair, which in the RFC patches to allow larger precisions
> for wide_int/widest_int is GC unfriendly because those types become
> non-trivially default constructible/copyable/destructible.
> One option would be to use trailing_wide_int for that instead, but
> as pointed out by Aldy, irange_storage which we already use under
> the hood for ipa_vr when type of parameter is integral or pointer
> already stores the mask/value pair because VRP now does the bit cp
> as well.
> So, this patch just uses m_vr to store both the value range and
> the bitmask.  There is still separate propagation of the
> ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
> when storing we merge the two into the same container.
> 
> I've bootstrapped/regtested a slightly older version of this
> patch on x86_64-linux and i686-linux and that version regressed
> +FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:"
> +FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test"
> +FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param 0 
> to 0x7"
> +FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param 0 
> to 0xf"
> The last 2 were solely about the earlier patch not actually copying
> the if (dump_file) dumping of message that we set some mask for some
> parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk).
> The first testcase is a test for -fno-ipa-bit-cp disabling bit cp
> for alignments.  For integral types I'm afraid it is a lost case
> when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp
> as well, but for pointer alignments I've added
>   && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp)
> and
>   && opt_for_fn (node->decl, flag_ipa_bit_cp)
> guards such that even just -fno-ipa-bit-cp disables it (alternatively
> we could just add -fno-ipa-vrp to propalign-3.c dg-options).
> 
> Ok for trunk if this passes another bootstrap/regtest?
> Or defer until it is really needed (when the wide_int/widest_int
> changes are about to be committed)?

It does look like a nice cleanup to me.
I wonder if you did some compare of the bit information propagated with
new code and old code?  Theoretically they should be equivalent?

Honza


Re: [PATCH] tree-optimization/111773 - avoid CD-DCE of noreturn special calls

2023-10-12 Thread Jan Hubicka
> The support to elide calls to allocation functions in DCE runs into
> the issue that when implementations are discovered noreturn we end
> up DCEing the calls anyway, leaving blocks without termination and
> without outgoing edges which is both invalid IL and wrong-code when
> as in the example the noreturn call would throw.  The following
> avoids taking advantage of both noreturn and the ability to elide
> allocation at the same time.
> 
> For the testcase it's valid to throw or return 10 by eliding the
> allocation.  But we have to do either where currently we'd run
> off the function.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Honza, any objections here?

Looks good to me.  Optimizing out noreturn new seems like odd
optimization anyway.

Honza
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/111773
>   * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do
>   not elide noreturn calls that are reflected to the IL.
> 
>   * g++.dg/torture/pr111773.C: New testcase.


Re: Check that passes do not forget to define profile

2023-10-17 Thread Jan Hubicka
> So OK to commit this?
> 
> This patch makes sure the profile_count information is initialized for the
> new
> bb created in move_sese_region_to_fn.
> 
> gcc/ChangeLog:
> 
>   * tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for
>   new basic block.
> 
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu.

This is OK,
thanks!
Honza
> 
> On 04/10/2023 12:02, Jan Hubicka wrote:
> > > Hi Honza,
> > > 
> > > My current patch set for AArch64 VLA omp codegen started failing on
> > > gcc.dg/gomp/pr87898.c after this. I traced it back to
> > > 'move_sese_region_to_fn' in tree/cfg.cc not setting count for the bb
> > > created.
> > > 
> > > I was able to 'fix' it locally by setting the count of the new bb to the
> > > accumulation of e->count () of all the entry_endges (if initialized). I'm
> > > however not even close to certain that's the right approach, attached 
> > > patch
> > > for illustration.
> > > 
> > > Kind regards,
> > > Andre
> > > diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> > 
> > > index 
> > > ffab7518b1568b58e610e26feb9e3cab18ddb3c2..32fc47ae683164bf8fac477fbe6e2c998382e754
> > >  100644
> > > --- a/gcc/tree-cfg.cc
> > > +++ b/gcc/tree-cfg.cc
> > > @@ -8160,11 +8160,15 @@ move_sese_region_to_fn (struct function 
> > > *dest_cfun, basic_block entry_bb,
> > > bb = create_empty_bb (entry_pred[0]);
> > > if (current_loops)
> > >   add_bb_to_loop (bb, loop);
> > > +  profile_count count = profile_count::zero ();
> > > for (i = 0; i < num_entry_edges; i++)
> > >   {
> > > e = make_edge (entry_pred[i], bb, entry_flag[i]);
> > > e->probability = entry_prob[i];
> > > +  if (e->count ().initialized_p ())
> > > + count += e->count ();
> > >   }
> > > +  bb->count = count;
> > 
> > This looks generally right - if you create a BB you need to set its
> > count and unless it has self-loop that is the sum of counts of
> > incommping edges.
> > 
> > However the initialized_p check should be unnecessary: if one of entry
> > edges to BB is uninitialized, the + operation will make bb count
> > uninitialized too, which is OK.
> > 
> > Honza
> > > for (i = 0; i < num_exit_edges; i++)
> > >   {
> > 

> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 
> ffab7518b1568b58e610e26feb9e3cab18ddb3c2..ffeb20b717aead756844c5f48c2cc23f5e9f14a6
>  100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -8160,11 +8160,14 @@ move_sese_region_to_fn (struct function *dest_cfun, 
> basic_block entry_bb,
>bb = create_empty_bb (entry_pred[0]);
>if (current_loops)
>  add_bb_to_loop (bb, loop);
> +  profile_count count = profile_count::zero ();
>for (i = 0; i < num_entry_edges; i++)
>  {
>e = make_edge (entry_pred[i], bb, entry_flag[i]);
>e->probability = entry_prob[i];
> +  count += e->count ();
>  }
> +  bb->count = count;
>  
>for (i = 0; i < num_exit_edges; i++)
>  {



Do not stream DECL_FCONTEXT

2018-07-12 Thread Jan Hubicka
Hi,
this is another field that I believe needs no streaming.  I however think we 
are pretty
much done with low hanging fruit.

lto-bootstrapped/regtested x86_64-linux, OK?

Honza

* lto-streamer-out.c (DFS::DFS_write_tree_body): Do not stream
DECL_FCONTEXT
(hash_tree): Do not hash DECL_FCONTEXT
* tree-streamer-in.c (lto_input_ts_field_decl_tree_pointers):
Do not stream DECL_FCONTEXT.
* tree-streamer-out.c (write_ts_field_decl_tree_pointers): Likewise.
* tree.c (free_lang_data_in_decl): Free DECL_FCONTEXT.

Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 262560)
+++ lto-streamer-out.c  (working copy)
@@ -832,7 +832,7 @@ DFS::DFS_write_tree_body (struct output_
   DFS_follow_tree_edge (DECL_BIT_FIELD_TYPE (expr));
   DFS_follow_tree_edge (DECL_BIT_FIELD_REPRESENTATIVE (expr));
   DFS_follow_tree_edge (DECL_FIELD_BIT_OFFSET (expr));
-  DFS_follow_tree_edge (DECL_FCONTEXT (expr));
+  gcc_checking_assert (!DECL_FCONTEXT (expr));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
@@ -1249,7 +1249,6 @@ hash_tree (struct streamer_tree_cache_d
   visit (DECL_BIT_FIELD_TYPE (t));
   visit (DECL_BIT_FIELD_REPRESENTATIVE (t));
   visit (DECL_FIELD_BIT_OFFSET (t));
-  visit (DECL_FCONTEXT (t));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
Index: tree-streamer-in.c
===
--- tree-streamer-in.c  (revision 262560)
+++ tree-streamer-in.c  (working copy)
@@ -758,7 +758,6 @@ lto_input_ts_field_decl_tree_pointers (s
   DECL_BIT_FIELD_TYPE (expr) = stream_read_tree (ib, data_in);
   DECL_BIT_FIELD_REPRESENTATIVE (expr) = stream_read_tree (ib, data_in);
   DECL_FIELD_BIT_OFFSET (expr) = stream_read_tree (ib, data_in);
-  DECL_FCONTEXT (expr) = stream_read_tree (ib, data_in);
 }
 
 
Index: tree-streamer-out.c
===
--- tree-streamer-out.c (revision 262560)
+++ tree-streamer-out.c (working copy)
@@ -646,7 +646,6 @@ write_ts_field_decl_tree_pointers (struc
   stream_write_tree (ob, DECL_BIT_FIELD_TYPE (expr), ref_p);
   stream_write_tree (ob, DECL_BIT_FIELD_REPRESENTATIVE (expr), ref_p);
   stream_write_tree (ob, DECL_FIELD_BIT_OFFSET (expr), ref_p);
-  stream_write_tree (ob, DECL_FCONTEXT (expr), ref_p);
 }
 
 
Index: tree.c
===
--- tree.c  (revision 262560)
+++ tree.c  (working copy)
@@ -5280,6 +5280,7 @@ free_lang_data_in_decl (tree decl)
   free_lang_data_in_one_sizepos (&DECL_SIZE_UNIT (decl));
   if (TREE_CODE (decl) == FIELD_DECL)
 {
+  DECL_FCONTEXT (decl) = NULL;
   free_lang_data_in_one_sizepos (&DECL_FIELD_OFFSET (decl));
   if (TREE_CODE (DECL_CONTEXT (decl)) == QUAL_UNION_TYPE)
DECL_QUALIFIER (decl) = NULL_TREE;


Add dump file for WPA streaming

2018-07-12 Thread Jan Hubicka
Hi,
this patch adds dump file for WPA streaming process. It uses the new
dump file code for parittions.

Bootstrapped/regtsted x86_64-linux, will commit it shortly.

Honza

* lto.c (do_stream_out): Add PART parameter; open dump file.
(stream_out): Add PART parameter; pass it to do_stream_out.
(lto_wpa_write_files): Update call of stream_out.

* lto-streamer-out.c (copy_function_or_variable): Dump info about
copying section.

Index: lto/lto.c
===
--- lto/lto.c   (revision 262591)
+++ lto/lto.c   (working copy)
@@ -2326,13 +2326,15 @@ static lto_file *current_lto_file;
 /* Actually stream out ENCODER into TEMP_FILENAME.  */
 
 static void
-do_stream_out (char *temp_filename, lto_symtab_encoder_t encoder)
+do_stream_out (char *temp_filename, lto_symtab_encoder_t encoder, int part)
 {
   lto_file *file = lto_obj_file_open (temp_filename, true);
   if (!file)
 fatal_error (input_location, "lto_obj_file_open() failed");
   lto_set_current_out_file (file);
 
+  gcc_assert (!dump_file);
+  streamer_dump_file = dump_begin (TDI_lto_stream_out, NULL, part);
   ipa_write_optimization_summaries (encoder);
 
   free (CONST_CAST (char *, file->filename));
@@ -2340,6 +2342,11 @@ do_stream_out (char *temp_filename, lto_
   lto_set_current_out_file (NULL);
   lto_obj_file_close (file);
   free (file);
+  if (streamer_dump_file)
+{
+  dump_end (TDI_lto_stream_out, streamer_dump_file);
+  streamer_dump_file = NULL;
+}
 }
 
 /* Wait for forked process and signal errors.  */
@@ -2372,14 +2379,14 @@ wait_for_child ()
 
 static void
 stream_out (char *temp_filename, lto_symtab_encoder_t encoder,
-   bool ARG_UNUSED (last))
+   bool ARG_UNUSED (last), int part)
 {
 #ifdef HAVE_WORKING_FORK
   static int nruns;
 
   if (lto_parallelism <= 1)
 {
-  do_stream_out (temp_filename, encoder);
+  do_stream_out (temp_filename, encoder, part);
   return;
 }
 
@@ -2399,12 +2406,12 @@ stream_out (char *temp_filename, lto_sym
   if (!cpid)
{
  setproctitle ("lto1-wpa-streaming");
- do_stream_out (temp_filename, encoder);
+ do_stream_out (temp_filename, encoder, part);
  exit (0);
}
   /* Fork failed; lets do the job ourseleves.  */
   else if (cpid == -1)
-do_stream_out (temp_filename, encoder);
+do_stream_out (temp_filename, encoder, part);
   else
nruns++;
 }
@@ -2412,13 +2419,13 @@ stream_out (char *temp_filename, lto_sym
   else
 {
   int i;
-  do_stream_out (temp_filename, encoder);
+  do_stream_out (temp_filename, encoder, part);
   for (i = 0; i < nruns; i++)
wait_for_child ();
 }
   asm_nodes_output = true;
 #else
-  do_stream_out (temp_filename, encoder);
+  do_stream_out (temp_filename, encoder, part);
 #endif
 }
 
@@ -2508,7 +2515,7 @@ lto_wpa_write_files (void)
}
   gcc_checking_assert (lto_symtab_encoder_size (part->encoder) || !i);
 
-  stream_out (temp_filename, part->encoder, i == n_sets - 1);
+  stream_out (temp_filename, part->encoder, i == n_sets - 1, i);
 
   part->encoder = NULL;
 
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 262591)
+++ lto-streamer-out.c  (working copy)
@@ -2293,6 +2304,8 @@ copy_function_or_variable (struct symtab
   struct lto_in_decl_state *in_state;
   struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
 
+  if (streamer_dump_file)
+fprintf (streamer_dump_file, "Copying section for %s\n", name);
   lto_begin_section (section_name, false);
   free (section_name);
 


Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Jan Hubicka
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 9e46b7b136f..762ab89fc9e 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -137,17 +137,22 @@ const struct processor_costs *ix86_cost = NULL;
> > > >  #define m_CORE2 (HOST_WIDE_INT_1U< > > >  #define m_NEHALEM (HOST_WIDE_INT_1U< > > >  #define m_SANDYBRIDGE (HOST_WIDE_INT_1U< > > > -#define m_HASWELL (HOST_WIDE_INT_1U< > > > +#define m_HASWELL ((HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > >
> > >
> > > Please introduce a new per-family define and group processors in this
> > > define. Something like m_BDVER, m_BTVER and m_AMD_MULTIPLE for AMD
> > targets.
> > > We should not redefine m_HASWELL to include unrelated families.
> > >
> >
> > Here is the updated patch.  OK for trunk if all tests pass?
> >
> >
> OK.

We have also noticed that benchmarks on skylake are not good compared to
haswell, this nicely explains it.  I think this is -march=native regression
compared to GCC versions that did not suppored better CPUs than Haswell.  So it
would be nice to backport it.

Honza


Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Jan Hubicka
> > We have also noticed that benchmarks on skylake are not good compared to
> > haswell, this nicely explains it.  I think this is -march=native regression
> > compared to GCC versions that did not suppored better CPUs than Haswell.  
> > So it
> > would be nice to backport it.
> 
> Yes, we should.   Here is the patch to backport to GCC 8.  OK for GCC 8 after
> it has been checked into trunk?

OK,
Honza
> 
> Thanks.
> 
> -- 
> H.J.

> From 40a1050b330b421a1f445cb2a40b5a002da2e6d6 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Mon, 4 Jun 2018 19:16:06 -0700
> Subject: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell
> 
> r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
> generates slower codes on Skylake than before.  The same also applies
> to Cannonlake and Icelak tuning.
> 
> This patch changes -mtune={skylake|cannonlake|icelake} to tune like
> -mtune=haswell for until their tuning is properly adjusted. It also
> enables -mprefer-vector-width=256 for -mtune=haswell, which has no
> impact on codegen when AVX512 isn't enabled.
> 
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> 
> -march=native -mfpmath=sse -O2 -m64
> 
> are
> 
> 1. On Broadwell server:
> 
> 500.perlbench_r   -0.56%
> 502.gcc_r -0.18%
> 505.mcf_r 0.24%
> 520.omnetpp_r 0.00%
> 523.xalancbmk_r   -0.32%
> 525.x264_r-0.17%
> 531.deepsjeng_r   0.00%
> 541.leela_r   0.00%
> 548.exchange2_r   0.12%
> 557.xz_r  0.00%
> Geomean   0.00%
> 
> 503.bwaves_r  0.00%
> 507.cactuBSSN_r   0.21%
> 508.namd_r0.00%
> 510.parest_r  0.19%
> 511.povray_r  -0.48%
> 519.lbm_r 0.00%
> 521.wrf_r 0.28%
> 526.blender_r 0.19%
> 527.cam4_r0.39%
> 538.imagick_r 0.00%
> 544.nab_r -0.36%
> 549.fotonik3d_r   0.51%
> 554.roms_r0.00%
> Geomean   0.17%
> 
> On Skylake client:
> 
> 500.perlbench_r   0.96%
> 502.gcc_r 0.13%
> 505.mcf_r -1.03%
> 520.omnetpp_r -1.11%
> 523.xalancbmk_r   1.02%
> 525.x264_r0.50%
> 531.deepsjeng_r   2.97%
> 541.leela_r   0.50%
> 548.exchange2_r   -0.95%
> 557.xz_r  2.41%
> Geomean   0.56%
> 
> 503.bwaves_r  0.49%
> 507.cactuBSSN_r   3.17%
> 508.namd_r4.05%
> 510.parest_r  0.15%
> 511.povray_r  0.80%
> 519.lbm_r 3.15%
> 521.wrf_r 10.56%
> 526.blender_r 2.97%
> 527.cam4_r2.36%
> 538.imagick_r 46.40%
> 544.nab_r 2.04%
> 549.fotonik3d_r   0.00%
> 554.roms_r1.27%
> Geomean   5.49%
> 
> On Skylake server:
> 
> 500.perlbench_r   0.71%
> 502.gcc_r -0.51%
> 505.mcf_r -1.06%
> 520.omnetpp_r -0.33%
> 523.xalancbmk_r   -0.22%
> 525.x264_r1.72%
> 531.deepsjeng_r   -0.26%
> 541.leela_r   0.57%
> 548.exchange2_r   -0.75%
> 557.xz_r  -1.28%
> Geomean   -0.21%
> 
> 503.bwaves_r  0.00%
> 507.cactuBSSN_r   2.66%
> 508.namd_r3.67%
> 510.parest_r  1.25%
> 511.povray_r  2.26%
> 519.lbm_r 1.69%
> 521.wrf_r 11.03%
> 526.blender_r 3.39%
> 527.cam4_r1.69%
> 538.imagick_r 64.59%
> 544.nab_r -0.54%
> 549.fotonik3d_r   2.68%
> 554.roms_r0.00%
> Geomean   6.19%
> 
> This patch improves -march=native performance on Skylake up to 60% and
> leaves -march=native performance unchanged on Haswell.
> 
> gcc/
> 
>   Backport from mainline
>   2018-07-12  H.J. Lu  
>   Sunil K Pandey  
> 
>   PR target/84413
>   * config/i386/i386.c (m_CORE_AVX512): New.
>   (m_CORE_AVX2): Likewise.
>   (m_CORE_ALL): Add m_CORE_AVX2.
>   * config/i386/x86-tune.def: Replace m_HASWELL with m_CORE_AVX2.
>   Replace m_SKYLAKE_AVX512 with m_CORE_AVX512 on avx256_optimal
>   and remove the rest of m_SKYLAKE_AVX512.
> 
> gcc/testsuite/
> 
>   Backport from mainline
>   2018-07-12  H.J. Lu  
>   Sunil K Pandey  
> 
>   PR target/84413
>   * gcc.target/i386/pr84413-1.c: New test.
>   * gcc.target/i386/pr84413-2.c: Likewise.
>   * gcc.target/i386/pr84413-3.c: Likewise.
>   * gcc.target/i386/pr84413-4.c: Likewise.
> ---
>  gcc/config/i386/i386.c|  5 -
>  gcc/config/i386/x86-tune.def  | 26 +++
>  gcc/testsuite/gcc.target/i386/pr84413-1.c | 17 +++
>  gcc/testsuite/gcc.target/i386/pr84

[wwwdocs] Mention LTO link-time issue fixed in gcc 8.2

2018-07-19 Thread Jan Hubicka
Hi,
since we now mention the problem with Intel tuning, I tought we also may mention
the LTO link-time issue that was fixed.  It was mentioned by several folks at
the phoronix forum. (Basicaly sometimes the partition size has overlfown which
made partitioner to put every symbol into separate parition causing fork bomb
while streaming and overall very slow compile times).

Honza

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.89
diff -u -r1.89 changes.html
--- changes.html15 Jul 2018 12:57:34 -  1.89
+++ changes.html19 Jul 2018 09:45:27 -
@@ -1321,6 +1321,12 @@
 complete (that is, it is possible that some PRs that have been fixed
 are not listed here).
 
+General Improvements
+  
+Fixed LTO link-time performance problems caused by an overflow
+   in the partitioning algorithm while building large binaries.
+  
+
 Target Specific Changes
 
 IA-32/x86-64


Re: [PATCH] Introduce __builtin_expect_with_probability (PR target/83610).

2018-07-31 Thread Jan Hubicka
> Hi.
> 
> This is implementation of new built-in that can be used for more fine
> tweaking of probability. Micro benchmark is attached as part of the PR.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?

It reasonale to me to add the fature. Years ago I made similar patch and at
that time it did not go in based on argumentation that programers are not good
on guessing probabilities and this is too much of fine control while it should
be done by profile feedback. 

However I guess it is better to have way to specify probability than tweak with
--param that controls the builtin_expect outcome globally.

What I think would be useful is to tie this to the code giving loop trip
estimate.  If you know that the loop iterates 100 times at the average, you
can specify probability 1%.   For this it seems to me more sensible to have
the percentage parameter to be double rather than long so one can specify larger
trip counts this way.

Honza


> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-24  Martin Liska  
> 
> PR target/83610
>   * builtin-types.def (BT_FN_LONG_LONG_LONG_LONG): New type.
>   * builtins.c (expand_builtin_expect_with_probability):
> New function.
>   (expand_builtin): Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>   (build_builtin_expect_predicate): Likewise.
>   (fold_builtin_expect): Likewise.
>   (fold_builtin_2): Likewise.
>   (fold_builtin_3): Likewise.
>   * builtins.def (BUILT_IN_EXPECT_WITH_PROBABILITY): Define new
> builtin.
>   * builtins.h (fold_builtin_expect): Add new argument
> (probability).
>   * doc/extend.texi: Document the new builtin.
>   * doc/invoke.texi: Likewise.
>   * gimple-fold.c (gimple_fold_call): Pass new argument.
>   * ipa-fnsummary.c (find_foldable_builtin_expect):
> Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>   * predict.c (expr_expected_value): Add new out argument which
> is probability.
>   (expr_expected_value_1): Likewise.
>   (tree_predict_by_opcode): Predict edge based on
> provided probability.
>   (pass_strip_predict_hints::execute): Use newly added
> DECL_BUILT_IN_P macro.
>   * predict.def (PRED_BUILTIN_EXPECT_WITH_PROBABILITY):
> Define new predictor.
>   * tree.h (DECL_BUILT_IN_P): Define.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-24  Martin Liska  
> 
>   * gcc.dg/predict-16.c: New test.
> ---
>  gcc/builtin-types.def |  2 +
>  gcc/builtins.c| 65 ---
>  gcc/builtins.def  |  1 +
>  gcc/builtins.h|  2 +-
>  gcc/doc/extend.texi   |  8 
>  gcc/doc/invoke.texi   |  3 ++
>  gcc/gimple-fold.c |  3 +-
>  gcc/ipa-fnsummary.c   |  1 +
>  gcc/predict.c | 61 ++---
>  gcc/predict.def   |  5 +++
>  gcc/testsuite/gcc.dg/predict-16.c | 13 +++
>  gcc/tree.h|  6 +++
>  12 files changed, 140 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/predict-16.c
> 
> 

> diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
> index b01095c420f..6e87bcbbf1d 100644
> --- a/gcc/builtin-types.def
> +++ b/gcc/builtin-types.def
> @@ -531,6 +531,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_ULONG_ULONG_ULONG_ULONG,
>BT_ULONG, BT_ULONG, BT_ULONG, BT_ULONG)
>  DEF_FUNCTION_TYPE_3 (BT_FN_LONG_LONG_UINT_UINT,
>BT_LONG, BT_LONG, BT_UINT, BT_UINT)
> +DEF_FUNCTION_TYPE_3 (BT_FN_LONG_LONG_LONG_LONG,
> +  BT_LONG, BT_LONG, BT_LONG, BT_LONG)
>  DEF_FUNCTION_TYPE_3 (BT_FN_ULONG_ULONG_UINT_UINT,
>BT_ULONG, BT_ULONG, BT_UINT, BT_UINT)
>  DEF_FUNCTION_TYPE_3 (BT_FN_STRING_CONST_STRING_CONST_STRING_INT,
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 539a6d17688..29d77d3d83b 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -148,6 +148,7 @@ static rtx expand_builtin_unop (machine_mode, tree, rtx, 
> rtx, optab);
>  static rtx expand_builtin_frame_address (tree, tree);
>  static tree stabilize_va_list_loc (location_t, tree, int);
>  static rtx expand_builtin_expect (tree, rtx);
> +static rtx expand_builtin_expect_with_probability (tree, rtx);
>  static tree fold_builtin_constant_p (tree);
>  static tree fold_builtin_classify_type (tree);
>  static tree fold_builtin_strlen (location_t, tree, tree);
> @@ -5237,6 +5238,27 @@ expand_builtin_expect (tree exp, rtx target)
>return target;
>  }
>  
> +/* Expand a call to __builtin_expect_with_probability.  We just return our
> +   argument as the builtin_expect semantic should've been already executed by
> +   tree branch prediction pass.  */
> +
> +static rtx
> +expand_builtin_expect_with_probability (tree exp, rtx target)
> +{
> +  tree arg;
> +
> +  if (call_expr_nargs (exp) < 3)
> +return const0_rtx;

Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-07-31 Thread Jan Hubicka
> Hi.
> 
> Following patch implements new predictors that annotates malloc-like 
> functions.
> These almost every time return a non-null value.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-26  Martin Liska  
> 
> PR middle-end/83023
>   * predict.c (expr_expected_value_1): Handle DECL_IS_MALLOC
> declarations.
>   * predict.def (PRED_MALLOC_NONNULL): New predictor.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-26  Martin Liska  
> 
> PR middle-end/83023
>   * gcc.dg/predict-16.c: New test.

These are two conceptually different things - wether you return new memory
and whether the return value is commonly non-null. While it goes together
for majority of malloc function I wonder if this is safe WRT the auto-detected
malloc attributes.  I do not know how common is code that returns new memory
only under some conditions?  

Honza


Re: [PATCH] Add memmove to value profiling (PR value-prof/35543).

2018-08-01 Thread Jan Hubicka
> Hi.
> 
> As requested in the PR, I would like to add value profiling for
> BUILT_IN_MEMMOVE.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-31  Martin Liska  
> 
> PR value-prof/35543
>   * value-prof.c (interesting_stringop_to_profile_p):
> Simplify the code and add BUILT_IN_MEMMOVE.
>   (gimple_stringops_transform): Likewise.
OK, thanks!
We have other builtins that may fold into string function which we expand
internally (str variants comes to mind) perhaps they could be instrumented,
too.

Honza
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-31  Martin Liska  
> 
> PR value-prof/35543
>   * gcc.dg/tree-prof/val-prof-7.c: Add __builtin_memmove.
> ---
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c | 10 ++
>  gcc/value-prof.c|  8 +++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> 

> diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c 
> b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
> index c9303e053ee..bb9dd210eec 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
> @@ -23,6 +23,11 @@ __attribute__((noinline)) \
>  void memset_test_ ## N (int len) \
>  { \
>__builtin_memset (buffer1, 'c', len); \
> +} \
> +__attribute__((noinline)) \
> +void memmove_test_ ## N (int len) \
> +{ \
> +  __builtin_memmove (buffer1, buffer2, len); \
>  } \
>   \
>  void test_stringops_ ## N(int len) \
> @@ -30,6 +35,7 @@ void test_stringops_ ## N(int len) \
>memcpy_test_## N (len); \
>mempcpy_test_ ## N (len); \
>memset_test_ ## N (len); \
> +  memmove_test_ ## N (len); \
>  } \
>   \
>  void test_stringops_with_values_ ## N (int common, int not_common) \
> @@ -70,3 +76,7 @@ int main() {
>  /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop 
> transformation on __builtin_memset" "profile" } } */
>  /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop 
> transformation on __builtin_memset" "profile" } } */
>  /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 
> stringop transformation on __builtin_memset" 0 "profile" } } */
> +
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop 
> transformation on __builtin_memmove" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop 
> transformation on __builtin_memmove" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 
> stringop transformation on __builtin_memmove" 0 "profile" } } */
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 77d4849d5b1..a7c4be7a7d8 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -1527,14 +1527,11 @@ interesting_stringop_to_profile_p (gcall *call, int 
> *size_arg)
>enum built_in_function fcode;
>  
>fcode = DECL_FUNCTION_CODE (gimple_call_fndecl (call));
> -  if (fcode != BUILT_IN_MEMCPY && fcode != BUILT_IN_MEMPCPY
> -  && fcode != BUILT_IN_MEMSET && fcode != BUILT_IN_BZERO)
> -return false;
> -
>switch (fcode)
>  {
>   case BUILT_IN_MEMCPY:
>   case BUILT_IN_MEMPCPY:
> + case BUILT_IN_MEMMOVE:
> *size_arg = 2;
> return validate_gimple_arglist (call, POINTER_TYPE, POINTER_TYPE,
>  INTEGER_TYPE, VOID_TYPE);
> @@ -1547,7 +1544,7 @@ interesting_stringop_to_profile_p (gcall *call, int 
> *size_arg)
> return validate_gimple_arglist (call, POINTER_TYPE, INTEGER_TYPE,
>  VOID_TYPE);
>   default:
> -   gcc_unreachable ();
> +   return false;
>  }
>  }
>  
> @@ -1710,6 +1707,7 @@ gimple_stringops_transform (gimple_stmt_iterator *gsi)
>  {
>  case BUILT_IN_MEMCPY:
>  case BUILT_IN_MEMPCPY:
> +case BUILT_IN_MEMMOVE:
>src = gimple_call_arg (stmt, 1);
>src_align = get_pointer_alignment (src);
>if (!can_move_by_pieces (val, MIN (dest_align, src_align)))
> 



Re: [PATCH] Improve dumping of value profiling transformations.

2018-08-01 Thread Jan Hubicka
> Hi.
> 
> This is format clean-up of value transformations that can happen.
> It makes it easier to grep them and find how many were actually
> applied.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-31  Martin Liska  
> 
>   * value-prof.c (gimple_divmod_fixed_value_transform): Unify
> format how successful transformation is dumped.
>   (gimple_mod_pow2_value_transform): Likewise.
>   (gimple_mod_subtract_transform): Likewise.
>   (gimple_stringops_transform): Likewise.

OK,
thanks!
Honza
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-31  Martin Liska  
> 
>   * gcc.dg/tree-prof/stringop-1.c: Adjust scanned pattern.
>   * gcc.dg/tree-prof/stringop-2.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-1.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-2.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-3.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-4.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-5.c: Likewise.
>   * gcc.dg/tree-prof/val-prof-7.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/tree-prof/stringop-1.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/stringop-2.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-1.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-3.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-4.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-5.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c | 24 -
>  gcc/value-prof.c| 29 +++--
>  9 files changed, 28 insertions(+), 39 deletions(-)
> 
> 

> diff --git a/gcc/testsuite/gcc.dg/tree-prof/stringop-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/stringop-1.c
> index 6f8908a3431..d75b2548dbc 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/stringop-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/stringop-1.c
> @@ -15,7 +15,7 @@ main()
> return 0;
>  }
>  /* autofdo doesn't support value profiling for now: */
> -/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 4 stringop" 
> "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: single 
> value 4 stringop" "profile"} } */
>  /* Really this ought to simplify into assignment, but we are not there yet.  
> */
>  /* a[0] = b[0] is what we fold the resulting memcpy into.  */
>  /* { dg-final-use-not-autofdo { scan-tree-dump " = MEM.*&b" "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/stringop-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/stringop-2.c
> index 330b159b7fc..3242cf5b8a2 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/stringop-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/stringop-2.c
> @@ -20,6 +20,6 @@ main()
> return 0;
>  }
>  /* autofdo doesn't support value profiling for now: */
> -/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 4 stringop" 
> "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: single 
> value 4 stringop" "profile"} } */
>  /* The versioned memset of size 4 should be optimized to an assignment.  */
>  /* { dg-final-use-not-autofdo { scan-tree-dump "MEM\\\[\\(void .\\)&a\\\] = 
> 168430090" "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/val-prof-1.c
> index 35e0f908f24..492c4c1c4b2 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-1.c
> @@ -17,6 +17,6 @@ main ()
>return 0;
>  }
>  /* autofdo does not do value profiling so far */
> -/* { dg-final-use-not-autofdo { scan-ipa-dump "Div.mod by constant 
> n_\[0-9\]*=257 transformation on insn" "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: div.mod 
> by constant 257" "profile"} } */
>  /* { dg-final-use-not-autofdo { scan-tree-dump "if \\(n_\[0-9\]* != 257\\)" 
> "optimized"} } */
>  /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
> index ad78043ddd6..8cb3c64fd17 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
> @@ -25,7 +25,7 @@ main ()
>return 0;
>  }
>  /* autofdo does not do value profiling so far */
> -/* { dg-final-use-not-autofdo { scan-ipa-dump "Mod power of 2 transformation 
> on insn" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: mod 
> power of 2" "profile" } } */
>  /* This is part of code checking that n is power of 2, so we are sure that 
> the transformation
> didn't get optimized out.  */
>  /* { dg-final-use-not-autofdo { scan-tree-dump "n_\[0-9\]* \\+ 
> (4294967295|0x0*)" "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-3.c 
> b/gcc/testsuite/gcc.dg/tree-prof/val-prof-3.c
> index 366ada1fa22..60953d09b15 100644
> --- 

Re: [PATCH][WIP] libiberty: Support for relocation output

2023-10-23 Thread Jan Hubicka
> This patch teaches libiberty to output X86-64 Relocations.
Hello,
for actual patch submission you will need to add changelog :)
> diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
> index 86b7a27dc74..0bbaf4b489f 100644
> --- a/libiberty/simple-object-elf.c
> +++ b/libiberty/simple-object-elf.c
> @@ -238,6 +238,7 @@ typedef struct
>  #define STT_NOTYPE 0 /* Symbol type is unspecified */
>  #define STT_OBJECT 1 /* Symbol is a data object */
>  #define STT_FUNC 2 /* Symbol is a code object */
> +#define STT_SECTION 3 /* Symbol is associate with a section */
Associated I guess.
>  #define STT_TLS 6 /* Thread local data object */
>  #define STT_GNU_IFUNC 10 /* Symbol is an indirect code object */
> 
> @@ -248,6 +249,63 @@ typedef struct
>  #define STV_DEFAULT 0 /* Visibility is specified by binding type */
>  #define STV_HIDDEN 2 /* Can only be seen inside currect component */
> 
> +typedef struct
> +{
> +  unsigned char r_offset[4]; /* Address */
> +  unsigned char r_info[4];  /* relocation type and symbol index */
> +} Elf32_External_Rel;
> +
> +typedef struct
> +{
> +  unsigned char r_offset[8]; /* Address */
> +  unsigned char r_info[8]; /* Relocation type and symbol index */
> +} Elf64_External_Rel;
> +typedef struct
> +{
> +  unsigned char r_offset[4]; /* Address */
> +  unsigned char r_info[4];  /* Relocation type and symbol index */
> +  char r_addend[4]; /* Addend */
> +} Elf32_External_Rela;
> +typedef struct
> +{
> +  unsigned char r_offset[8]; /* Address */
> +  unsigned char r_info[8]; /* Relocation type and symbol index */
> +  unsigned char r_addend[8]; /* Addend */
> +} Elf64_External_Rela;
> +
> +/* How to extract and insert information held in the r_info field.  */
> +
> +#define ELF32_R_SYM(val) ((val) >> 8)
> +#define ELF32_R_TYPE(val) ((val) & 0xff)
> +#define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff))
> +
> +#define ELF64_R_SYM(i) ((i) >> 32)
> +#define ELF64_R_TYPE(i) ((i) & 0x)
> +#define ELF64_R_INFO(sym,type) unsigned long) (sym)) << 32) + (type))
> +
> +/* AMD x86-64 relocations.  */
> +#define R_X86_64_NONE 0 /* No reloc */
> +#define R_X86_64_64 1 /* Direct 64 bit  */
> +#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
> +#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
> +#define R_X86_64_PLT32 4 /* 32 bit PLT address */
> +#define R_X86_64_COPY 5 /* Copy symbol at runtime */
> +#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
> +#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
> +#define R_X86_64_RELATIVE 8 /* Adjust by program base */
> +#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative
> +   offset to GOT */
> +#define R_X86_64_32 10 /* Direct 32 bit zero extended */
> +#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
> +#define R_X86_64_16 12 /* Direct 16 bit zero extended */

This will eventually need to go into per-architecture table.
You support only those needed for Dwarf2out ouptut, right?

I think we need Iant's opinion on thi part of patch (he is the
maintainer of simple-object) but to me it looks reasonable. For longer
term it will be necessary to think how to make this extensible to other
architectures without writting too much of code.  (have some more
declarative way to specify relocations we output)

Honza


Re: [PATCH][WIP] dwarf2out: extend to output debug section directly to object file during debug_early phase

2023-10-23 Thread Jan Hubicka
Hello,
thanks for the patch.

Overall it looks in right direction except for the code duplication in
output_die and friends.
> +/* Given a die and id, produce the appropriate abbreviations
> +   directly to lto object file */
> +
> +static void
> +output_die_abbrevs_to_object_file(unsigned long abbrev_id, dw_die_ref
> abbrev)
> +{
> +  unsigned ix;
> +  dw_attr_node *a_attr;
> +
> +  output_data_uleb128_to_object_file(abbrev_id);
> +  output_data_uleb128_to_object_file(abbrev->die_tag);
> +
> +
> +  if (abbrev->die_child != NULL)
> +output_data_to_object_file(1,DW_children_yes);
> +  else
> +output_data_to_object_file(1,DW_children_no);
> +
> +  for (ix = 0; vec_safe_iterate (abbrev->die_attr, ix, &a_attr); ix++)
> +{
> +  output_data_uleb128_to_object_file(a_attr->dw_attr);
> +  output_value_format_to_object_file(a_attr);
> +  if (value_format (a_attr) == DW_FORM_implicit_const)
> + {
> +  if (AT_class (a_attr) == dw_val_class_file_implicit)
> +{
> +  int f = maybe_emit_file (a_attr->dw_attr_val.v.val_file);
> + output_data_sleb128_to_object_file(f);
> +}
> +  else
> +  output_data_sleb128_to_object_file(a_attr->dw_attr_val.v.val_int);
> + }
> +}
> +
> +  output_data_to_object_file (1, 0);
> +  output_data_to_object_file (1, 0);

So this basically renames dw2_asm_output_data to
output_data_to_object_file and similarly for other output functions.

What would be main problems of making dw2_asm_* functions to do the
right thing when outputting to object file?
Either by conditionals or turning them to virtual functions/hooks as
Richi suggested?

It may be performance critical how quickly we sput out the bytecode.
In future we may templateize this, but right now it is likely premature
optimization.
> 
> +struct lto_simple_object
lto_simple_object is declared in lto frontend.  Why do you need to
duplicate it here?

It looks like adding relocations should be abstracted by lto API,
so you don't need to look inside this structure that is
lto/lto-object.cc only.

> +/* Output one line number table into the .debug_line section.  */
> +
> +static void
> +output_one_line_info_table (dw_line_info_table *table)
It is hard to tell from the diff.  Did you just moved these functions
earlier in source file?

Honza


Re: [PATCH][WIP] dwarf2out: extend to output debug section directly to object file during debug_early phase

2023-10-23 Thread Jan Hubicka
> > > +  output_data_to_object_file (1, 0);
> > > +  output_data_to_object_file (1, 0);
> >
> > So this basically renames dw2_asm_output_data to
> > output_data_to_object_file and similarly for other output functions.
> >
> > What would be main problems of making dw2_asm_* functions to do the
> > right thing when outputting to object file?
> > Either by conditionals or turning them to virtual functions/hooks as
> > Richi suggested?
> >
> I think it's doable via conditionals. Can you explain the second approach
> in more detail?

Basically you want to have output functions
like dw2_asm_output_data to do the right thing and either store
it to the LTO simple object section or the assembly file.
So either we can add conditionals to every dw2_asm_* function needed
of the form
  if (outputting_to_lto)
 ... new code ...
  else
 ... existing code ...

Or have a virtual table with two different dw2_asm implementations.
Older GCC code uses hooks which is essencially a structure holding
function pointers, mostly because it was implemented before we converted
source base to C++. Some newer code uses virtual functions for this.
> > > +struct lto_simple_object
> > lto_simple_object is declared in lto frontend.  Why do you need to
> > duplicate it here?
> >
> > It looks like adding relocations should be abstracted by lto API,
> > so you don't need to look inside this structure that is
> > lto/lto-object.cc only.
> >
> I should have taken this approach, but instead, I exposed simple objects to
> dwarf2out.
> That's the reason to duplicate the above struct. I will take care of this
> while refactoring
> and abstracting it by lto API

Yep, this should not be hard to do.

Thanks for all the work!
Honza
> 
> 
> >
> > > +/* Output one line number table into the .debug_line section.  */
> > > +
> > > +static void
> > > +output_one_line_info_table (dw_line_info_table *table)
> > It is hard to tell from the diff.  Did you just moved these functions
> > earlier in source file?
> >
> Yeah. I will refactor the dwarf2out soon to clear these confusions.
> 
> -- 
> Rishi
> 
> 
> >
> > Honza
> >


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> On 1/21/21 9:25 AM, Richard Biener wrote:
> > On Wed, Jan 20, 2021 at 5:25 PM Martin Liška  wrote:
> > > 
> > > On 1/20/21 5:00 PM, Jan Hubicka wrote:
> > > > There are two thinks that I would like to discuss first
> > > >1) I think the option is stil used for value profiling of divisors
> > > 
> > > It's not. Right now the only usage lives in get_nth_most_common_value 
> > > which
> > > is an entry point being used for stringops, indirect call and divmod 
> > > histograms.
> > > 
> > > >2) it is not clear to me how the new counter establishes
> > > >reproducibility for indiect calls that have more than 32 targets 
> > > > during
> > > >the train run.
> > > 
> > > We cannot ensure, but I would say that it's very unlikely to happen.
> > > In case of Firefox, there are definitely other reasons why the build is 
> > > not reproducible.
> > > I would expect arc counters to differ in between multiple training runs.
> > > 
> > > If it's really a problem we can come up with other approaches:
> > > - GCOV run-time control over # of tracked values (32 right now)
> > > - similarly we can save more values in .gcda files
> > > 
> > > I'm sending updated version of the patch.
> > 
> > So the discussion tells me that we want the option and of course want to 
> > have
> > it work.
> 
> Yep, I've just sent patch for that.
> 
> > In the patch I see the =multithreaded enum part was not documented
> > (I don't see how it differs from =parallel-runs?), so I wonder if we really 
> > need
> > three states.
> 
> It's a reserved option value Honza though will be useful for the future (:

Not exactly - I intended it to work already in gcc10 as follows.

 - With =serial one can trust all counters coming from gcda file
   (I looked again in details to gcc10 implementation and I think the
   handling of sign bit is correct, contrary to my previous claim)
 - With =paralel-runs we can use the sign bit trick to signalize that
   merging was lossy
 - With =multithreaded we can only use the counter if sum of individual
   targets matches the total number of executions (so we know no target
   got lost).

Basically =serial means that you get reproducible profile only if the
events (profile counter invocation, profile streaming) come in precisely
same order in both train runs. (Such as profiledbootstrap running with
make -j1)

With =parallel-run you get reproducible profile under the assumption
that train run consist of multiple invocations (or does forking), each
invocation is reproducible but streaming happens in random order
(profiledbootstrap with make -j16).

With =multithreaded you get reproducible profile if the profile
counter invocations match in both train runs, but they can happen in any
order (profiledbootstrap with make -j16  once we make gcc multithreaded,
or build of clang with FDO).

> 
> > 
> > That said, the option handling is indeed broken at the moment.  While
> > the implementation is not perfect, does it have some pieces that help?
> > 
> > That said, why not simply fix option handling by adding the missing =
> > to the option?  Using -fprofile-reproducibleserial etc. work but before
> > adding backwards compatibility aliases I'd say we change w/o them
> > for GCC 11 and only if there are complaints introduce them (and eventually
> > backport the option handling fix to GCC 10).
> 
> Yes, I would like to backport the option fix for GCC 10. But apparently, 
> there's
> nobody using the option.

I think easy way to get users of this option is to make profile not
reproducible by default and modify packages to use right reproducibility
option when reproducible builds are intended.  It is not feature that
comes for free and I think most users of PGO does not care, so I think
it should be opt in.

In general getting profile reroducible one needs to make train
reproducible that is hard when you look at details (such as /tmp/ file
name generation issue in gcc) and may lead to need for user to annotate
such code.

This will become more common problem for multithreaded profiles where
one needs to annotate locking and busy waiting loops in them for example
(or the scheduler responsible for executing paralle tasks).

I can see this to be practically achievable but we probably want to
produce some guidelines for doing that and probably teach gcov-tool to
compare profiles and say to which degree they match (i.e. which
functions match for each of levels of reproducibility).

The problem is that profiles are continuous and the errors too, but
optimizaitons looks for certain thresholds, so small errors may lead to
code changes, so I think our current method of looking at relatively few
packages and patching errors when they appear is not very good long term
strategy... Especially if it makes us to drop useful transformations by
default with -fprofile-use and no additional option.

Honza
> 
> Martin
> 
> > 
> > Richard.
> > 
> > > Martin
> 


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> 
> Plus I'm planning to send one more patch that will ignore time profile when 
> -fprofile-reproduce != serial.

Why you need to disable time profiling?

Honza


Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2021-01-21 Thread Jan Hubicka
> Hi All,
> 
> James and I have been investigating this regression and have tracked it down 
> to register allocation.
> 
> I have create a new PR with our findings 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately
> we don't know how to proceed.
> 
> This does seem like a genuine bug in RA.  It looks like some magic threshold 
> has been crossed, but we're having
> trouble determining what this magic number is.

Thank you for the analysis - it was on my TODO list for very long
time, but the function is large.  I will read it carefully and lets see
if we can come up with something useful.  

Honza
> 
> Any help is appreciated.
> 
> Thanks,
> Tamar
> 
> > -Original Message-
> > From: Xionghu Luo 
> > Sent: Friday, October 16, 2020 9:47 AM
> > To: Tamar Christina ; Martin Jambor
> > ; Richard Sandiford ;
> > luoxhu via Gcc-patches 
> > Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com;
> > li...@gcc.gnu.org; Jan Hubicka ; dje@gmail.com
> > Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive
> > calls
> > 
> > 
> > 
> > On 2020/9/12 01:36, Tamar Christina wrote:
> > > Hi Martin,
> > >
> > >>
> > >> can you please confirm that the difference between these two is all
> > >> due to the last option -fno-inline-functions-called-once ?  Is LTo
> > >> necessary?  I.e., can you run the benchmark also built with the
> > >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-
> > inline-functions-called-once ?
> > >>
> > >
> > > Done, see below.
> > >
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > >> | -24% |  |  |
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> > >> | -26% |  |  |
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>
> > >>>
> > >>> (Hopefully the table shows up correct)
> > >>
> > >> it does show OK for me, thanks.
> > >>
> > >>>
> > >>> It looks like your patch definitely does improve the basic cases. So
> > >>> there's not much difference between lto and non-lto anymore and it's
> > >> much Better than GCC 10. However it still contains the regression
> > >> introduced by Honza's changes.
> > >>
> > >> I assume these are rates, not times, so negative means bad.  But do I
> > >> understand it correctly that you're comparing against GCC 10 with the
> > >> two parameters set to rather special values?  Because your table
> > >> seems to indicate that even for you, the branch is faster than GCC 10
> > >> with just - mcpu=native -Ofast -fomit-frame-pointer.
> > >
> > > Yes these are indeed rates, and indeed I am comparing against the same
> > > options we used to get the fastest rates on before which is the two
> > > parameters and the inline flag.
> > >
> > >>
> > >> So is the problem that the best obtainable run-time, even with
> > >> obscure options, from the branch is slower than the best obtainable
> > >> run-time from GCC 10?
> > >>
> > >
> > > Yeah that's the problem, when we compare the two we're still behind.
> > >
> > > I've done the additional two runs
> > >
> > > +--+--
> > +--+
> > > | Compiler | Flags
> > | diff GCC 10  |
> >

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> On 1/21/21 3:01 PM, Jan Hubicka wrote:
> > > 
> > > Plus I'm planning to send one more patch that will ignore time profile 
> > > when -fprofile-reproduce != serial.
> > 
> > Why you need to disable time profiling?
> 
> Because you can have 2 training runs (running in parallel) when order is:
> runA: foo -> bar
> runB: bar -> foo
> 
> Then based on order of profile merging you get a final output.

For this reason we merge by computing average, which is stable over
reordering the indices

Honza
> 
> I would like to address it with the attached patch.
> 
> Martin
> 
> > 
> > Honza
> > 
> 

> From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 21 Jan 2021 09:22:45 +0100
> Subject: [PATCH 2/2] Consider time profilers only when
>  -fprofile-reproducible=serial.
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * cgraphunit.c (expand_all_functions): Consider tp_first_run
>   only when -fprofile-reproducible=serial.
> 
> gcc/lto/ChangeLog:
> 
>   PR gcov-profile/98739
>   * lto-partition.c (lto_balanced_map): Consider tp_first_run
>   only when -fprofile-reproducible=serial.
> ---
>  gcc/cgraphunit.c| 5 +++--
>  gcc/lto/lto-partition.c | 3 ++-
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index b401f0817a3..042c03d819e 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1961,8 +1961,9 @@ expand_all_functions (void)
>}
>  
>/* First output functions with time profile in specified order.  */
> -  qsort (tp_first_run_order, tp_first_run_order_pos,
> -  sizeof (cgraph_node *), tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +qsort (tp_first_run_order, tp_first_run_order_pos,
> +sizeof (cgraph_node *), tp_first_run_node_cmp);
>for (i = 0; i < tp_first_run_order_pos; i++)
>  {
>node = tp_first_run_order[i];
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..f9e632776e6 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
> max_partition_size)
>   unit tends to import a lot of global trees defined there.  We should
>   get better about minimizing the function bounday, but until that
>   things works smoother if we order in source order.  */
> -  order.qsort (tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +order.qsort (tp_first_run_node_cmp);
>noreorder.qsort (node_cmp);
>  
>if (dump_file)
> -- 
> 2.30.0
> 



Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> On 1/21/21 7:45 PM, Jan Hubicka wrote:
> > For this reason we merge by computing average, which is stable over
> > reordering the indices
> 
> Looking at the implementation, we merge by using minimum value:
> 
> /* Time profiles are merged so that minimum from all valid (greater than zero)
>is stored. There could be a fork that creates new counters. To have
>the profile stable, we chosen to pick the smallest function visit time.  */

Yep, sorry for confussion.  I just noticed that as well.
Minimum should be still safe for parallel-run profiling (not for
multithreaded where we probably really want to disabe it, but we can do
that on per-function basis using opt_for_fn so it works with LTO).

Honza
> void
> __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
> {
>   unsigned int i;
>   gcov_type value;
> 
>   for (i = 0; i < n_counters; i++)
> {
>   value = gcov_get_counter_target ();
> 
>   if (value && (!counters[i] || value < counters[i]))
> counters[i] = value;
> }
> }
> #endif /* L_gcov_merge_time_profile */
> 
> Martin


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index b401f0817a3..042c03d819e 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1961,8 +1961,9 @@ expand_all_functions (void)
>}
>  
>/* First output functions with time profile in specified order.  */
> -  qsort (tp_first_run_order, tp_first_run_order_pos,
> -  sizeof (cgraph_node *), tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +qsort (tp_first_run_order, tp_first_run_order_pos,
> +sizeof (cgraph_node *), tp_first_run_node_cmp);

This you need to check eariler in

  for (i = 0; i < order_pos; i++)   
if (order[i]->process)  
  { 
if (order[i]->tp_first_run  
&& opt_for_fn (order[i]->decl, flag_profile_reorder_functions)) 
^ here
and check only for REPRODUCIBILITY_MULTITHREADED.  We probably also want
to document this.

However easier fix is to simply clear tp_first_run at profile read time
if we do multithreaded reproducibility instead of attaching it and
ignoring later.  This will make both places you modified to do the right
thing.

Honza


>for (i = 0; i < tp_first_run_order_pos; i++)
>  {
>node = tp_first_run_order[i];
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..f9e632776e6 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
> max_partition_size)
>   unit tends to import a lot of global trees defined there.  We should
>   get better about minimizing the function bounday, but until that
>   things works smoother if we order in source order.  */
> -  order.qsort (tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +order.qsort (tp_first_run_node_cmp);
>noreorder.qsort (node_cmp);
>  
>if (dump_file)
> -- 
> 2.30.0
> 



Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> > This will become more common problem for multithreaded profiles where
> > one needs to annotate locking and busy waiting loops in them for example
> > (or the scheduler responsible for executing paralle tasks).
> > 
> > I can see this to be practically achievable but we probably want to
> > produce some guidelines for doing that and probably teach gcov-tool to
> > compare profiles and say to which degree they match (i.e. which
> > functions match for each of levels of reproducibility).
> > 
> > The problem is that profiles are continuous and the errors too, but
> > optimizaitons looks for certain thresholds, so small errors may lead to
> > code changes, so I think our current method of looking at relatively few
> > packages and patching errors when they appear is not very good long term
> > strategy... Especially if it makes us to drop useful transformations by
> > default with -fprofile-use and no additional option.
> 
> To be honest we have very few packages that use PGO in openSUSE:Factory.

We should aim to have more :)
> 
> Anyway, are you fine with the suggested?

What exactly is suggested?
Honza
> 
> Thanks for discussion,
> Martin


Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-21 Thread Jan Hubicka
> On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > >  wrote:
> > > > gimple.h has this comment for gimple's uid field:
> > > > 
> > > >   /* UID of this statement.  This is used by passes that want to
> > > >  assign IDs to statements.  It must be assigned and used by
> > > > each
> > > >  pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > >   unsigned uid;
> > > > 
> > > > and gimple_set_uid has:
> > > > 
> > > >Please note that this UID property is supposed to be undefined
> > > > at
> > > >pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >contains any useful value when the pass starts and thus can
> > > > set it
> > > >to any value it sees fit.
> > > > 
> > > > which suggests that any pass can use the uid field as an
> > > > arbitrary
> > > > scratch space.
> > > > 
> > > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > > mode:
> > > >   fatal error: Cgraph edge statement index out of range
> > > > on certain inputs with -fanalyzer.
> > > > 
> > > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > stmts.
> > > > 
> > > > The error occurs when LTRANS is streaming callgraph edges back
> > > > in.
> > > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > gimple_uid, and input_edge reads it back in.
> > > > 
> > > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > > or the stream-in at LTRANS will fail.
> > > > 
> > > > Is it intended that the LTO machinery relies on the value of the
> > > > uid
> > > > field being preserved during WPA (or, at least, needs to be saved
> > > > and
> > > > restored by passes that touch it)?
> > > 
> > > I belive this is solely at the cgraph stream out to stream in
> > > boundary but
> > > this may be a blurred area since while we materialize the whole
> > > cgraph
> > > at once the function bodies are streamed in on demand.
> > > 
> > > Honza can probably clarify things.
> > 
> > Well, the uids are used to associate cgraph edges with
> > statements.  At
> > WPA stage you do not have function bodies and thus uids serves role
> > of
> > pointers to the statement.  If you load the body in (via get_body)
> > the
> > uids are replaced by pointers and when you stream out uids are
> > recomputed again.
> > 
> > When do you touch the uids? At WPA time or from small IPA pass in
> > ltrans?
> 
> The analyzer is here in passes.def:
>   INSERT_PASSES_AFTER (all_regular_ipa_passes)
>   NEXT_PASS (pass_analyzer);
> 
> and so in LTO runs as the first regular IPA pass at WPA time,
> when do_whole_program_analysis calls:
>   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> 
> FWIW I hope to eventually have a way to summarize function bodies in
> the analyzer, but I don't yet, so I'm currently brute forcing things by
> loading all function bodies at the start of the analyzer (when
> -fanalyzer is enabled).
> 
> I wonder if that's messing things up somehow?

Actually I think it should work.  If you do get_body or
get_untransformed_body (that will be equal at that time) you ought to
get ids in symtab datastructure rewritten to pointers and at stream out
time we should assign new ids...
> 
> Does the stream-out from WPA make any assumptions about the stmt uids?
> For example, 
>   #define STMT_UID_NOT_IN_RANGE(uid) \
> (gimple_stmt_max_uid (fn) < uid || uid == 0)
> seems to assume that the UIDs are per-function ranges from
>   [0-gimple_stmt_max_uid (fn)]
> which isn't the case for the uids set by the analyzer.  Maybe that's
> the issue here?
> 
> Sorry for not being more familiar with the IPA/LTO code

There is lto_prepare_function_for_streaming which assigns uids to be
incremental.   So I guess problem is that it is not called at WPA time
if function is in memory (since at m

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> On 1/21/21 8:03 PM, Jan Hubicka wrote:
> > What exactly is suggested?
> 
> This one.
> 
> Martin

> From 22bbf5342f2b73fad6c0286541bba6699c617380 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 21 Jan 2021 09:02:31 +0100
> Subject: [PATCH 1/2] Restore -fprofile-reproducibility flag.
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * common.opt: Add missing equal symbol.
>   * value-prof.c (get_nth_most_common_value): Fix comment
>   and skip TOP N counter properly when -fprofile-reproducibility
>   is not serial.
> ---
>  gcc/common.opt   |  2 +-
>  gcc/value-prof.c | 18 --
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
>  bool
>  get_nth_most_common_value (gimple *stmt, const char *counter_type,
> @@ -765,15 +762,16 @@ get_nth_most_common_value (gimple *stmt, const char 
> *counter_type,
>*count = 0;
>*value = 0;
>  
> -  gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
> +  gcov_type read_all = hist->hvalue.counters[0];
> +  gcov_type covered = 0;
> +  for (unsigned i = 0; i < counters; ++i)
> +covered += hist->hvalue.counters[2 * i + 3];
>  
>gcov_type v = hist->hvalue.counters[2 * n + 2];
>gcov_type c = hist->hvalue.counters[2 * n + 3];
>  
> -  if (hist->hvalue.counters[0] < 0
> -  && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
> -   || (flag_profile_reproducible
> -   == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
> +  if (read_all != covered
> +  && flag_profile_reproducible != PROFILE_REPRODUCIBILITY_SERIAL)

This should be right for REPRODUCIBILITY_MULTITHREADED but is too strict
for PARALLEL_RUNS (and I think we now have data that this difference
matters).  If you
 1) re-add logic that avoids stremaing targets with no more than 1/32 of
 overall execution counts from each run
 (we may want to have way to tweak the threshold, but I guess we may
 want to first see if it is necessary since it is easy to add and we
 already have bit too many environment variables)
 2) re-add logic tracking if any values was lost during merging
 using the sign of first counter
 3) make PARALLEL_RUNS to disregard the profile if the first counter is
 negetaive
We sould be able to track most of cases where number of values exceeds
32 but there is one or two really dominating.

Also I think it makes sense to default to =serial and use the new flag
in the few packages where we do profile feedback and care about
reproducibility.

Thanks a lot for looking into this!
Honza
>  return false;
>  
>/* Indirect calls can't be verified.  */
> -- 
> 2.30.0
> 



Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka
> On 1/21/21 2:46 PM, Jan Hubicka wrote:
> > I think easy way to get users of this option is to make profile not
> > reproducible by default and modify packages to use right reproducibility
> > option when reproducible builds are intended.  It is not feature that
> > comes for free and I think most users of PGO does not care, so I think
> > it should be opt in.
> 
> I agree that most users really don't care.
> 
> > 
> > In general getting profile reroducible one needs to make train
> > reproducible that is hard when you look at details (such as/tmp/  file
> > name generation issue in gcc) and may lead to need for user to annotate
> > such code.
> 
> Yes, right now I'm testing both patches and I still see difference in GCC PGO
> bootstrap (with -fprofile-reproducible=parallel-runs):
> 
> $ objfolderdiff.py /dev/shm/objdir2/gcc /dev/shm/objdir3/gcc
>138/   649: cgraphunit.o: different
>230/   649: dwarf2out.o: different
>356/   649: ipa-cp.o: different
>357/   649: ipa-devirt.o: different
>360/   649: ipa-icf.o: different
>533/   649: tree-affine.o: different
>574/   649: tree-ssa-loop-im.o: different
>632/   649: var-tracking.o: different
> 
> Most of the changes are in known contexts:
> 
> ;; Function hash_table::hash_entry, false, 
> xcallocator>::find_with_hash 
> (_ZN10hash_tableIN8hash_mapIP10im_mem_refP6sm_aux21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE14find_with_hashERKS2_j,
>  funcdef_no=3431, decl_uid=106116, cgraph_uid=2554, symbol_order=2712)
> 
> ;; Function hash_table, false, 
> xcallocator>::find_empty_slot_for_expand 
> (_ZN10hash_tableI19default_hash_traitsIPvELb0E11xcallocatorE26find_empty_slot_for_expandEj,
>  funcdef_no=4875, decl_uid=134656, cgraph_uid=3901, symbol_order=4074)
> 
> ;; Function hash_table_mod2 (_Z15hash_table_mod2jj, funcdef_no=1047, 
> decl_uid=32691, cgraph_uid=345, symbol_order=356)
> 
> ;; Function hash_table::hash_entry, 
> false, xcallocator>::find_empty_slot_for_expand 
> (_ZN10hash_tableIN8hash_mapIP9tree_nodeP14name_expansion21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE26find_empty_slot_for_expandEj,
>  funcdef_no=3639, decl_uid=102478, cgraph_uid=2760, symbol_order=2916)
> 
> So likely hashing related functions where we hash some pointers :/ 
> Unfortunately, that's enough for final binary
> divergence.

Yep, this is really anoying (and it shows how hard full reproducibility
is). I think we could try to disable profiling for the hash functions
and see if we can get reproducibility right for GCC...

Honza
> 
> > 
> > This will become more common problem for multithreaded profiles where
> > one needs to annotate locking and busy waiting loops in them for example
> > (or the scheduler responsible for executing paralle tasks).
> > 
> > I can see this to be practically achievable but we probably want to
> > produce some guidelines for doing that and probably teach gcov-tool to
> > compare profiles and say to which degree they match (i.e. which
> > functions match for each of levels of reproducibility).
> > 
> > The problem is that profiles are continuous and the errors too, but
> > optimizaitons looks for certain thresholds, so small errors may lead to
> > code changes, so I think our current method of looking at relatively few
> > packages and patching errors when they appear is not very good long term
> > strategy... Especially if it makes us to drop useful transformations by
> > default with -fprofile-use and no additional option.
> 
> To be honest we have very few packages that use PGO in openSUSE:Factory.
> 
> Anyway, are you fine with the suggested?
> 
> Thanks for discussion,
> Martin


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-22 Thread Jan Hubicka
> On 1/21/21 8:13 PM, Martin Liška wrote:
> > Yes, it will be a better place!
> > 
> > Martin
> 
> There's an updated version of the patch.
> 
> Thoughts?
> Thanks,
> Martin

> From 0be300d1d69e98624f7be5b54931126965f1436e Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Fri, 22 Jan 2021 14:00:30 +0100
> Subject: [PATCH] Drop time profile for multi-threaded training run.
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * profile.c (compute_value_histograms): Drop time profile for
>   -fprofile-reproducible=multithreaded.

This is OK.  To save future debugging, perhaps I would keep the code
printing the tp first run value to dump file and do
  fprintf (dump_file, "Read tp_first_run: %d; ignored because 
profile reproducibility is multithreaded\n", node->tp_first_run);
In few years we may forget about this logic and wonder why it does not
work...

Thanks,
Honza
> ---
>  gcc/profile.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/profile.c b/gcc/profile.c
> index 1f1d60c8180..010c5627c89 100644
> --- a/gcc/profile.c
> +++ b/gcc/profile.c
> @@ -897,7 +897,10 @@ compute_value_histograms (histogram_values values, 
> unsigned cfg_checksum,
> node->tp_first_run = 0;
>   }
>  
> -  if (dump_file)
> +   /* Drop profile for -fprofile-reproducible=multithreaded.  */
> +   if (flag_profile_reproducible == 
> PROFILE_REPRODUCIBILITY_MULTITHREADED)
> + node->tp_first_run = 0;
> +   else if (dump_file)
>  fprintf (dump_file, "Read tp_first_run: %d\n", 
> node->tp_first_run);
>  }
>  }
> -- 
> 2.30.0
> 



Re: [PATCH] gcov: use mmap pools for KVP.

2021-01-22 Thread Jan Hubicka
> Hello.
> 
> AS mentioned here, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97461#c25, I 
> like
> what Richard suggested. So instead of usage of malloc, we should use mmap 
> memory
> chunks that serve as a memory pool for struct gcov_kvp.
> 
> Malloc is used as a fallback when mmap is not available. I also drop 
> statically
> pre-allocated static pool, mmap solves the root problem.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

This looks like reasonable solution for Linux (i was thinking of it too)
but I wonder what about setups w/o mmap support, like mingw32?
I think we need some fallback there.  I was wondering if simply
disabling topn profiling until gcov_init time (where we seems to assume
that malloc already works) would work in that case.
We may lose some speculation during program construction, but that does
not seem very bad...

Honza
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/97461
>   * gcov-io.h (GCOV_PREALLOCATED_KVP): Remove.
> 
> libgcc/ChangeLog:
> 
>   PR gcov-profile/97461
>   * config.in: Regenerate.
>   * configure: Likewise.
>   * configure.ac: Check sys/mman.h header file
>   * libgcov-driver.c (struct gcov_kvp): Remove static
>   pre-allocated pool and use a dynamic one.
>   * libgcov.h (MMAP_CHUNK_SIZE): New.
>   (gcov_counter_add): Use mmap to allocate pool for struct
>   gcov_kvp.
> ---
>  gcc/gcov-io.h   |  3 ---
>  libgcc/config.in|  3 +++
>  libgcc/configure|  4 ++--
>  libgcc/configure.ac |  2 +-
>  libgcc/libgcov-driver.c | 11 +++
>  libgcc/libgcov.h| 42 -
>  6 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
> index baed67609e2..75f16a274c7 100644
> --- a/gcc/gcov-io.h
> +++ b/gcc/gcov-io.h
> @@ -292,9 +292,6 @@ GCOV_COUNTERS
>  /* Maximum number of tracked TOP N value profiles.  */
>  #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
> -/* Number of pre-allocated gcov_kvp structures.  */
> -#define GCOV_PREALLOCATED_KVP 64
> -
>  /* Convert a counter index to a tag.  */
>  #define GCOV_TAG_FOR_COUNTER(COUNT)  \
>   (GCOV_TAG_COUNTER_BASE + ((gcov_unsigned_t)(COUNT) << 17))
> diff --git a/libgcc/config.in b/libgcc/config.in
> index 5be5321d258..f93c64a00c3 100644
> --- a/libgcc/config.in
> +++ b/libgcc/config.in
> @@ -49,6 +49,9 @@
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_AUXV_H
> +/* Define to 1 if you have the  header file. */
> +#undef HAVE_SYS_MMAN_H
> +
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_STAT_H
> diff --git a/libgcc/configure b/libgcc/configure
> index 78fc22a5784..dd3afb2c957 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -4458,7 +4458,7 @@ as_fn_arith $ac_cv_sizeof_long_double \* 8 && 
> long_double_type_size=$as_val
>  for ac_header in inttypes.h stdint.h stdlib.h ftw.h \
>   unistd.h sys/stat.h sys/types.h \
> - string.h strings.h memory.h sys/auxv.h
> + string.h strings.h memory.h sys/auxv.h sys/mman.h
>  do :
>as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
>  ac_fn_c_check_header_preproc "$LINENO" "$ac_header" "$as_ac_Header"
> @@ -4913,7 +4913,7 @@ case "$host" in
>  case "$enable_cet" in
>auto)
>   # Check if target supports multi-byte NOPs
> - # and if assembler supports CET insn.
> + # and if compiler and assembler support CET insn.
>   cet_save_CFLAGS="$CFLAGS"
>   CFLAGS="$CFLAGS -fcf-protection"
>   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index ed50c0e9b49..10ffb046415 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -224,7 +224,7 @@ AC_SUBST(long_double_type_size)
>  AC_CHECK_HEADERS(inttypes.h stdint.h stdlib.h ftw.h \
>   unistd.h sys/stat.h sys/types.h \
> - string.h strings.h memory.h sys/auxv.h)
> + string.h strings.h memory.h sys/auxv.h sys/mman.h)
>  AC_HEADER_STDC
>  # Check for decimal float support.
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index e474e032b54..91462350132 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -588,11 +588,14 @@ struct gcov_root __gcov_root;
>  struct gcov_master __gcov_master =
>{GCOV_VERSION, 0};
> -/* Pool of pre-allocated gcov_kvp strutures.  */
> -struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
> +/* Dynamic pool for gcov_kvp structures.  */
> +struct gcov_kvp *__gcov_kvp_dynamic_pool;
> -/* Index to first free gcov_kvp in the pool.  */
> -unsigned __gcov_kvp_pool_index;
> +/* Index into __gcov_kvp_dynamic_pool array.  */
> +unsigned __gcov_kvp_dynamic_pool_index;
> +
> +/* Size of _gcov_kvp_dynamic_pool array.  */
> +unsigned __gcov_kvp_dynamic_pool_size;
>  void
>  __gcov_exit (void)
> diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
> index

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-22 Thread Jan Hubicka
> diff --git a/Makefile.in b/Makefile.in
> index 247cb9c8711..03785200dc7 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -565,7 +565,7 @@ STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
>  STAGEtrain_CFLAGS = $(filter-out -fchecking=1,$(STAGE3_CFLAGS))
>  STAGEtrain_TFLAGS = $(filter-out -fchecking=1,$(STAGE3_TFLAGS))
>  
> -STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use
> +STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs

I would make this go in separately from the feature itself (it is build
machinery change). Especially since you say it does not reach
reproducibility anyway until we patch hashtables?
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
>  
>  STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
> diff --git a/gcc/common.opt b/gcc/common.opt
> index bde1711870d..a8a2b67a99d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2248,7 +2248,7 @@ Enum(profile_reproducibility) String(parallel-runs) 
> Value(PROFILE_REPRODUCIBILIT
>  EnumValue
>  Enum(profile_reproducibility) String(multithreaded) 
> Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
>  
> -fprofile-reproducible
> +fprofile-reproducible=
>  Common Joined RejectNegative Var(flag_profile_reproducible) 
> Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
>  -fprofile-reproducible=[serial|parallel-runs|multithreaded]  Control level 
> of reproducibility of profile gathered by -fprofile-generate.
>  
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 4c916f4994f..8078a9569d7 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -747,8 +747,8 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
> profile_probability prob,
>  
> abs (counters[0]) is the number of executions
> for i in 0 ... TOPN-1
> - counters[2 * i + 1] is target
> - abs (counters[2 * i + 2]) is corresponding hitrate counter.
> + counters[2 * i + 2] is target
> + counters[2 * i + 3] is corresponding hitrate counter.
>  
> Value of counters[0] negative when counter became
> full during merging and some values are lost.  */
> @@ -766,14 +766,18 @@ get_nth_most_common_value (gimple *stmt, const char 
> *counter_type,
>*value = 0;
>  
>gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
> +  gcov_type covered = 0;
> +  for (unsigned i = 0; i < counters; ++i)
> +covered += hist->hvalue.counters[2 * i + 3];
>  
>gcov_type v = hist->hvalue.counters[2 * n + 2];
>gcov_type c = hist->hvalue.counters[2 * n + 3];
>  
>if (hist->hvalue.counters[0] < 0
> -  && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
> -   || (flag_profile_reproducible
> -   == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
> +  && flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS)
> +return false;
> +  else if (covered != read_all
> +&& flag_profile_reproducible == 
> PROFILE_REPRODUCIBILITY_MULTITHREADED)
>  return false;

I think it would be useful to add dump messages here for easier
debugging.  Pehraps we could have even some kind of user visible
missed optimization warning?
>  
>/* Indirect calls can't be verified.  */
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index e474e032b54..55df1bafa79 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -213,6 +213,63 @@ static struct gcov_fn_buffer *fn_buffer;
>  /* Including system dependent components. */
>  #include "libgcov-driver-system.c"
>  
> +/* Prune TOP N value COUNTERS.  It's needed in order to preserve
> +   reproducibility of builds.  */
> +
> +static void
> +prune_topn_counter (gcov_type *counters)
> +{
> +  gcov_type total = counters[0];
> +  gcov_type threshold = total / GCOV_TOPN_MAXIMUM_TRACKED_VALUES;
> +  gcov_type *nodes = &counters[1];
> +
> +  struct gcov_kvp **prev = (struct gcov_kvp **)(intptr_t)&counters[2];
> +
> +  for (struct gcov_kvp *node = *prev; node != NULL; node = node->next)
> +/* Count is small in this run, skip it.  */
> +{
> +  if (node->count < threshold)
> + {
> +   --(*nodes);
> +   /* Skip the node from linked list.  */
> +   *prev = node->next;
> + }
> +  else
> + prev = &node->next;
> +}
> +}

I remember we had issues with streaming running in parallel with
threads.  Can't we get here corruption without atomic updates of nndoes
and the next pointer?

I also remember that these parlalel updates was pretty bad, because if
you have multithreaded concurent update of very frequent indirect
branch, like in firefox, the likelyness that update happens between
pruning and quite slow stream-out is high.

One option would be to do the skipping non-destructivly while streaming
out. Other option would be to have global flag and turn off topn profile
collecting while streaming is in progress.
> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
> index 9306e8d688c..35936a8364b 100644
> --- a/libgcc/libgcov-merge.c
> +++ b/libgcc/l

Re: [PATCH] gcov: use mmap pools for KVP.

2021-01-22 Thread Jan Hubicka
> On 1/22/21 2:38 PM, Jan Hubicka wrote:
> > This looks like reasonable solution for Linux (i was thinking of it too)
> > but I wonder what about setups w/o mmap support, like mingw32?
> 
> The code still uses malloc approach then.
> 
> > I think we need some fallback there.  I was wondering if simply
> > disabling topn profiling until gcov_init time (where we seems to assume
> > that malloc already works) would work in that case.
> > We may lose some speculation during program construction, but that does
> > not seem very bad...
> 
> This does not help you as we may still potentially call malloc during context
> when alternative allocator locks malloc/free functions.
> 
> Note that situation is very rare and assuming mmap seems to me a reasonable.

I defnitly like using mmap since it should be quite robust on Posix
platforms.  However we essentially make it impossible to build firefox
with profile feedback on Windows with jemalloc profilled, for example?

I am not sure how deeply we care and I am sure Mingw must have way to
implement memory allocation by hand as well, so we can just be ready to
add more ifdef for targets where this become an issue.
It would be nice to add some documentation of it, since the hang is very
non-obvious, but I do not know if we have good place for this.

Honza


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-22 Thread Jan Hubicka
> 
> I remember we had issues with streaming running in parallel with
> threads.  Can't we get here corruption without atomic updates of nndoes
> and the next pointer?
> 
> I also remember that these parlalel updates was pretty bad, because if
> you have multithreaded concurent update of very frequent indirect
> branch, like in firefox, the likelyness that update happens between
> pruning and quite slow stream-out is high.
> 
> One option would be to do the skipping non-destructivly while streaming
> out. Other option would be to have global flag and turn off topn profile
> collecting while streaming is in progress.

Actually this would be only issue if we asked for reproducible profile
and in that case all this logic is useless, so I suppose parallel update
is not that much of an issue.

So I guess this is OK as long as we do not corrupt the datastructure.

However for profile-reproducibilty=multithreaded the profile pruning
will disable optimization even in case of indirect calls with fewer than
32 targets.  I suppose this is something we can track incrementally for
gcc12, but we should not forget completely.  Perhaps by adding comment
into the place we drop profile.

So I think we should
 1) add the dump message when profile is dropped
and add TODO comment that pruning interfere with multithreaded
mode next to the conditional.
 2) ensure that datstructure is not corrupted
And it should be good for gcc11?
Honza


Re: [PATCH] gcov: use mmap pools for KVP.

2021-01-22 Thread Jan Hubicka
> On Fri, Jan 22, 2021 at 2:42 PM Martin Liška  wrote:
> >
> > On 1/22/21 2:38 PM, Jan Hubicka wrote:
> > > This looks like reasonable solution for Linux (i was thinking of it too)
> > > but I wonder what about setups w/o mmap support, like mingw32?
> >
> > The code still uses malloc approach then.
> >
> > > I think we need some fallback there.  I was wondering if simply
> > > disabling topn profiling until gcov_init time (where we seems to assume
> > > that malloc already works) would work in that case.
> > > We may lose some speculation during program construction, but that does
> > > not seem very bad...
> >
> > This does not help you as we may still potentially call malloc during 
> > context
> > when alternative allocator locks malloc/free functions.
> >
> > Note that situation is very rare and assuming mmap seems to me a reasonable.
> 
> Btw, you may want to copy/split out the code from generic-morestack.c which
> has a comment
> 
> /* Some systems use LD_PRELOAD or similar tricks to add hooks to
>mmap/munmap.  That breaks this code, because when we call mmap
> ...
>   Try to avoid the
>problem by calling syscall directly.  We only do this on GNU/Linux
>for now, but it should be easy to add support for more systems with
>testing.  */
Fun, but I can imagine people doing that...
> 
> which suggests that we're going to run into the same issue as with malloc
> when people profile their mmap hook ...
> 
> Btw, I wonder if it's possible to bring back the original non-allocated 
> counter
> mode via some -fXYZ flag and using a different counter kind (so both
> allocation schemes can co-exist?).  On systems that cannot handle the
> mmap path we could default to the old scheme.

It is definitly doable (gcov machinery is quite flexible WRT having more
types of counters).  We could however also ask users to either resort to
-fno-profile-values or implement mmap equivalent ifdefs to libgcov if
they really care about malloc profiling.

So personally I do not see this as a must feature (and I think Martin
was really looking forward to drop the former topn profile
implementation :)

Another intersting case would be, of course, profiling of kernel...

Honza



Re: [PATCH] gcov: use mmap pools for KVP.

2021-01-22 Thread Jan Hubicka
> > It is definitly doable (gcov machinery is quite flexible WRT having more
> > types of counters).
> 
> Yes, that would introduce back the dropped TOPN counters which I intentionally
> dropped.

We could bring back topn counters or the easier dominating vlaue ones
and add command line option.  However perhaps better in long term would
be keep adding mmap ifdefs for targets where it is important...

Kernel guys seems to be getting on profile feedback with clang, so we
should keep in mind posibility of supporting that as well.
> 
> >  We could however also ask users to either resort to
> > -fno-profile-values or implement mmap equivalent ifdefs to libgcov if
> > they really care about malloc profiling.
> 
> Seems reasonable to me.
> 
> Well, the current approach makes some assumptions on mmap (and malloc), but
> they seem reasonable to me. I don't expect anybody building Mingw Firefox PGO 
> build,
> it's likely unsupported configuration.

It is possible to build Firefox with mingw on windows and I would
expected feedback to work.
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Compiling_Mozilla_With_Mingw
However this is not only about Firefox - we notice problems with Firefox
since it is only real word app where we look at PGO more systematically.
But we definitly should aim for PGO to be useful for apps of similar
complexity.  It would be nice to incrase this testing coverage.

> 
> Another observation about the tcmalloc 'malloc' implementation. It hangs in a 
> PGO build,
> but libgcov would be happy if NULL would be returned.

Yep, I would expect other folks to try to PGO optimize malloc
implementations an we could run into variety of issues...

Honza
> 
> Martin
> 
> > 
> > So personally I do not see this as a must feature (and I think Martin
> > was really looking forward to drop the former topn profile
> > implementation :)
> > 
> > Another intersting case would be, of course, profiling of kernel...
> > 
> > Honza
> > 
> 


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-22 Thread Jan Hubicka
> > 
> > I would make this go in separately from the feature itself (it is build
> > machinery change).
> 
> All right.
> 
> > Especially since you say it does not reach
> > reproducibility anyway until we patch hashtables?
> 
> Yep, I'm testing a patch that should improve the reproducible build.
> 
Thanks!
> > I remember we had issues with streaming running in parallel with
> > threads.  Can't we get here corruption without atomic updates of nndoes
> > and the next pointer?
> 
> You are right, it can get into an inconsistent state in the pruning code.
> I likely tend to drop the pruning path now, it's optional feature.

Agreed, lets get the implementation working (i.e. sign bit tracking
non-reproducible gcov merges and gcc consuming it correctly) and care
about pruning incrementally. 

In meantime I seem to have got the firefox inlining working right on GCC
so we can see how well the three settings works.
> 
> > 
> > I also remember that these parlalel updates was pretty bad, because if
> > you have multithreaded concurent update of very frequent indirect
> > branch, like in firefox, the likelyness that update happens between
> > pruning and quite slow stream-out is high.
> 
> Yes.

Yep, but as said in other mail, it makes sense to care about profile
quality with parallel updates if profile-reproducibility=multithreaded
or single.  I do not see point of configuring it to parallel-runs since
it buys you nothing.
> 
> Unfortunately, starting from this point we may set total to a negative value 
> (until
> it's streamed). But training run can still do total++ for each instrumented 
> value.

So this will have effect of decreasing the counter rahter than
increasing.

One interesting datapoint is that firefox train run seems to spend more
time by streaming than by other tasks (becuase llvm training takes 7
minutes and gcc training cca 30 minutes) and if there is parallel task
doing stuff like tab animation it may have enough time to corrupt the
counter considerably.

Unless you see very easy solution, I think we could handle this
incrementally as well.  With basic infrastrucutre on palce and way to
trigger warnings on rejected profiles that would be otherwise useful, we
should get better idea how important this is.

Since we allocate the entries dynamically, perhaps we could steal the
bit somewhere else than in actual counter to avoid concurent updating
issues.

Honza
> 
> Martin
> 
> > 
> > Honza
> > 
> 


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-22 Thread Jan Hubicka
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * common.opt: Add missing sign symbol.
>   * value-prof.c (get_nth_most_common_value): Restore handling
>   of PROFILE_REPRODUCIBILITY_PARALLEL_RUNS and
>   PROFILE_REPRODUCIBILITY_MULTITHREADED.
> 
> libgcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * libgcov-merge.c (__gcov_merge_topn): Mark when merging
>   ends with a dropped counter.
>   * libgcov.h (gcov_topn_add_value): Add return value.
> ---
>  gcc/common.opt |  2 +-
>  gcc/value-prof.c   | 26 --
>  libgcc/libgcov-merge.c | 11 ---
>  libgcc/libgcov.h   | 13 +
>  4 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index bde1711870d..a8a2b67a99d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2248,7 +2248,7 @@ Enum(profile_reproducibility) String(parallel-runs) 
> Value(PROFILE_REPRODUCIBILIT
>  EnumValue
>  Enum(profile_reproducibility) String(multithreaded) 
> Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
>  
> -fprofile-reproducible
> +fprofile-reproducible=
>  Common Joined RejectNegative Var(flag_profile_reproducible) 
> Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
>  -fprofile-reproducible=[serial|parallel-runs|multithreaded]  Control level 
> of reproducibility of profile gathered by -fprofile-generate.
>  
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 4c916f4994f..3e899a39b84 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -747,8 +747,8 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
> profile_probability prob,
>  
> abs (counters[0]) is the number of executions
> for i in 0 ... TOPN-1
> - counters[2 * i + 1] is target
> - abs (counters[2 * i + 2]) is corresponding hitrate counter.
> + counters[2 * i + 2] is target
> + counters[2 * i + 3] is corresponding hitrate counter.
>  
> Value of counters[0] negative when counter became
> full during merging and some values are lost.  */
> @@ -766,15 +766,29 @@ get_nth_most_common_value (gimple *stmt, const char 
> *counter_type,
>*value = 0;
>  
>gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
> +  gcov_type covered = 0;
> +  for (unsigned i = 0; i < counters; ++i)
> +covered += hist->hvalue.counters[2 * i + 3];
>  
>gcov_type v = hist->hvalue.counters[2 * n + 2];
>gcov_type c = hist->hvalue.counters[2 * n + 3];
>  
>if (hist->hvalue.counters[0] < 0
> -  && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
> -   || (flag_profile_reproducible
> -   == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
> -return false;
> +  && flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS)
> +{
> +  if (dump_file)
> + fprintf (dump_file, "Histogram value dropped in %qs mode",
> +  "-fprofile-reproducible=parallel-runs");
> +  return false;
> +}
> +  else if (covered != read_all
> +&& flag_profile_reproducible == 
> PROFILE_REPRODUCIBILITY_MULTITHREADED)
> +{
> +  if (dump_file)
> + fprintf (dump_file, "Histogram value dropped in %qs mode",
> +  "-fprofile-reproducible=multithreaded");
> +  return false;
> +}

We can do that incrementally, but having opt-info=missed to print
warning when profile is rejected with info if it would be useful and how
frequent it is would make it easy for me to track this with firefox
builds.

It also seems a reasonable user facing feature - we could mention that
in documentation of profile-reproducibility flag and tell users they can
look for warnings about disabled transformations.
>  
>/* Indirect calls can't be verified.  */
>if (stmt
> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
> index 9306e8d688c..35936a8364b 100644
> --- a/libgcc/libgcov-merge.c
> +++ b/libgcc/libgcov-merge.c
> @@ -107,7 +107,9 @@ __gcov_merge_topn (gcov_type *counters, unsigned 
> n_counters)
>gcov_type all = gcov_get_counter_ignore_scaling (-1);
>gcov_type n = gcov_get_counter_ignore_scaling (-1);
>  
> -  counters[GCOV_TOPN_MEM_COUNTERS * i] += all;
> +  unsigned full = all < 0;
> +  gcov_type *total = &counters[GCOV_TOPN_MEM_COUNTERS * i];
> +  *total += full ? -all : all;
>  
>for (unsigned j = 0; j < n; j++)
>   {
> @@ -115,9 +117,12 @@ __gcov_merge_topn (gcov_type *counters, unsigned 
> n_counters)
> gcov_type count = gcov_get_counter_ignore_scaling (-1);
>  
> // TODO: we should use atomic here
Is the atomic possibly disasterou?
> -   gcov_topn_add_value (counters + GCOV_TOPN_MEM_COUNTERS * i, value,
> -count, 0, 0);
> +   full |= gcov_topn_add_value (counters + GCOV_TOPN_MEM_COUNTERS * i,
> +value, count, 0, 0);
>   }
> +
> +  if (full)
> + *total = -(*total);

Please add comment somewhere in _

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-25 Thread Jan Hubicka
> I've just installed the patch.
> 
> About the negative total value. Something similar can handle it:
> diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
> index df08e882dd7..ddc688509bd 100644
> --- a/libgcc/libgcov.h
> +++ b/libgcc/libgcov.h
> @@ -443,7 +443,13 @@ gcov_topn_add_value (gcov_type *counters, gcov_type 
> value, gcov_type count,
>int use_atomic, int increment_total)
>  {
>if (increment_total)
> -gcov_counter_add (&counters[0], 1, use_atomic);
> +{
> +  /* In the multi-threaded mode, we can have an already merged profile
> +  with a negative total value.  In that case, we should bail out.  */
> +  if (counters[0] < 0)
> + return 0;
> +  gcov_counter_add (&counters[0], 1, use_atomic);
> +}
>struct gcov_kvp *prev_node = NULL;
>struct gcov_kvp *minimal_node = NULL;
> 
> What do you think?

Looks good to me, modlo the obvious race condition of concurent upate
between if and gcov, but that makes the chance significantly smaller
than before (between merging and stream-out).  So I think it makes sense
to do it.

Thanks,
Honza


Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-25 Thread Jan Hubicka
> On 1/21/21 7:40 PM, Martin Liška wrote:
> > Most of the changes are in known contexts:
> 
> I've made some progress here, but still I'm unable to get a reproducible 
> build.
> Now I see difference in:
> 
>161/   649: cp/decl.o: different
>180/   649: cp/parser.o: different
>262/   649: generic-match.o: different
>283/   649: gimple-match.o: different
>343/   649: insn-emit.o: different
>360/   649: ipa-icf.o: different
> 
> looking at *profile dump files, I only have changes in tp_first_run which end
> with a different output.
> 
> Do you have any idea what can influence the minimum merging of tp_first_run 
> profiles?

What is function with smallest profile difference that differs?  There
are no really differences in edge counters at all?

I suppose with memory randomization we may end up with different hash
collisions and different number of calls in code resolving them or
diferent leaders of equialence classes that may result in some
divergence, but I would expect this to also show in edge counters...

Honza
> 
> Martin
> 

> diff --git a/Makefile.in b/Makefile.in
> index 03785200dc7..c8ebc90f622 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -565,7 +565,7 @@ STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
>  STAGEtrain_CFLAGS = $(filter-out -fchecking=1,$(STAGE3_CFLAGS))
>  STAGEtrain_TFLAGS = $(filter-out -fchecking=1,$(STAGE3_TFLAGS))
>  
> -STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs
> +STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs -fdump-ipa-profile
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
>  
>  STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 1a114a2e2d0..6aeeb7118e1 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -3927,6 +3927,7 @@ struct typename_hasher : ggc_ptr_hash
>  
>static bool
>equal (tree t1, const typename_info *t2)
> +  __attribute__ ((no_profile_instrument_function))
>{
>  return (TYPE_IDENTIFIER (t1) == t2->name
>   && TYPE_CONTEXT (t1) == t2->scope
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 4b2bca3fd11..3cd2130c3de 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -1492,6 +1492,7 @@ static struct obstack declarator_obstack;
>  /* Alloc BYTES from the declarator memory pool.  */
>  
>  static inline void *
> +__attribute__ ((no_profile_instrument_function))
>  alloc_declarator (size_t bytes)
>  {
>return obstack_alloc (&declarator_obstack, bytes);
> @@ -23056,6 +23057,7 @@ cp_parser_late_return_type_opt (cp_parser* parser, 
> cp_declarator *declarator,
> unqualified-id.  */
>  
>  static tree
> +__attribute__ ((no_profile_instrument_function))
>  cp_parser_declarator_id (cp_parser* parser, bool optional_p)
>  {
>tree id;
> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index a6e0ac8eea9..3a4d01d5ef8 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -347,6 +347,7 @@ hash_table_mod1 (hashval_t hash, unsigned int index)
>  /* Compute the secondary table index for HASH given current prime index.  */
>  
>  inline hashval_t
> +__attribute__ ((no_profile_instrument_function))
>  hash_table_mod2 (hashval_t hash, unsigned int index)
>  {
>const struct prime_ent *p = &prime_tab[index];
> @@ -422,7 +423,7 @@ public:
>/* This function searches for a hash table entry equal to the given
>   COMPARABLE element starting with the given HASH value.  It cannot
>   be used to insert or delete an element. */
> -  value_type &find_with_hash (const compare_type &, hashval_t);
> +  value_type &find_with_hash (const compare_type &, hashval_t) __attribute__ 
> ((no_profile_instrument_function));
>  
>/* Like find_slot_with_hash, but compute the hash value from the element.  
> */
>value_type &find (const value_type &value)
> @@ -443,7 +444,7 @@ public:
>   write the value you want into the returned slot.  When inserting an
>   entry, NULL may be returned if memory allocation fails. */
>value_type *find_slot_with_hash (const compare_type &comparable,
> -hashval_t hash, enum insert_option insert);
> +hashval_t hash, enum insert_option insert)  
> __attribute__ ((no_profile_instrument_function));
>  
>/* This function deletes an element with the given COMPARABLE value
>   from hash table starting with the given HASH.  If there is no
> @@ -527,7 +528,7 @@ private:
>void empty_slow ();
>  
>value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) const;
> -  value_type *find_empty_slot_for_expand (hashval_t);
> +  value_type *find_empty_slot_for_expand (hashval_t) __attribute__ 
> ((no_profile_instrument_function));
>void verify (const compare_type &comparable, hashval_t hash);
>bool too_empty_p (unsigned int);
>void expand ();



Re: [PATCH] varpool: Restore GENERIC TREE_READONLY automatic var optimization [PR7260]

2021-01-26 Thread Jan Hubicka
> On Tue, Jan 26, 2021 at 10:03:16AM +0100, Richard Biener wrote:
> > > In 4.8 and earlier we used to fold the following to 0 during GENERIC 
> > > folding,
> > > but we don't do that anymore because ctor_for_folding etc. has been 
> > > turned into a
> > > GIMPLE centric API, but as the testcase shows, it is invoked even during
> > > GENERIC folding and there the automatic vars still should have meaningful
> > > initializers.  I've verified that the C++ FE drops TREE_READONLY on
> > > automatic vars with const qualified types if they require non-constant
> > > (runtime) initialization.
> 
> > > --- gcc/varpool.c.jj  2021-01-26 08:57:36.184290279 +0100
> > > +++ gcc/varpool.c 2021-01-26 09:46:16.453619140 +0100
> > > @@ -412,6 +412,12 @@ ctor_for_folding (tree decl)
> > >if (!TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
> > >  {
> > >gcc_assert (!TREE_PUBLIC (decl));
> > > +  /* Unless this is called during FE folding.  */
> > > +  if (!in_gimple_form
> > 
> > Wouldn't it be better to use symtab->state == PARSING?  Why isn't
> 
> That would work for me too, in_gimple_form is just what is normally used
> for the folding.  And the routine is doing folding, but sits in a file where
> symtab->state == PARSING check is more natural.

I think it is property of function to be in generic form.  We allow
functions to be constructed late and we could allow doing that using
generic (even though as far as I know we generate them all as gimple -
profiling code, collected ctors, offlining etc).

So perhaps we don't need to tie that with symtab state.

Honza
> 
> > it safe to always return DECL_INITIAL (but demote NULL to
> > error_mark_node and/or change the gimplifier to set it to
> 
> I'm not sure if it is safe to look at DECL_INITIAL after gimplification for
> automatic vars because the gimplifier is destructive and could have changed
> the initializers in random ways.  But I admit I haven't investigated whether
> e.g. debug info cares about DECL_INITIAL for automatic vars or not.
> 
> Looking at gimplify_decl_expr I see there:
>   if (init && init != error_mark_node)
> {
>   if (!TREE_STATIC (decl))
> {
>   DECL_INITIAL (decl) = NULL_TREE;
>   init = build2 (INIT_EXPR, void_type_node, decl, init);
>   gimplify_and_add (init, seq_p);
>   ggc_free (init);
> }
>   else
> /* We must still examine initializers for static variables
>as they may contain a label address.  */
> walk_tree (&init, force_labels_r, NULL, NULL);
> }
> so perhaps we are usually fine.
> gimplify_compound_literal_expr doesn't clear DECL_INITIAL when
> optimizing non-addressable complit decls though (though perhaps those
> decls shouldn't appear in the IL then).
> 
> > error_mark_node for autos) for TREE_READONLY && !TREE_SIDE_EFFECTS
> > decls (including autos)?  Thus, why key on is_gimple_form at all?
> 
> Perhaps it is not needed, I was just trying to be safe.  Because
> after gimplification it will generally not be useful anyway.
> 
>   Jakub
> 


Re: [PATCH] varpool: Restore GENERIC TREE_READONLY automatic var optimization [PR7260]

2021-01-26 Thread Jan Hubicka
> On Tue, Jan 26, 2021 at 10:55:35AM +0100, Jan Hubicka wrote:
> > > On Tue, Jan 26, 2021 at 10:03:16AM +0100, Richard Biener wrote:
> > > > > In 4.8 and earlier we used to fold the following to 0 during GENERIC 
> > > > > folding,
> > > > > but we don't do that anymore because ctor_for_folding etc. has been 
> > > > > turned into a
> > > > > GIMPLE centric API, but as the testcase shows, it is invoked even 
> > > > > during
> > > > > GENERIC folding and there the automatic vars still should have 
> > > > > meaningful
> > > > > initializers.  I've verified that the C++ FE drops TREE_READONLY on
> > > > > automatic vars with const qualified types if they require non-constant
> > > > > (runtime) initialization.
> > > 
> > > > > --- gcc/varpool.c.jj  2021-01-26 08:57:36.184290279 +0100
> > > > > +++ gcc/varpool.c 2021-01-26 09:46:16.453619140 +0100
> > > > > @@ -412,6 +412,12 @@ ctor_for_folding (tree decl)
> > > > >if (!TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
> > > > >  {
> > > > >gcc_assert (!TREE_PUBLIC (decl));
> > > > > +  /* Unless this is called during FE folding.  */
> > > > > +  if (!in_gimple_form
> > > > 
> > > > Wouldn't it be better to use symtab->state == PARSING?  Why isn't
> > > 
> > > That would work for me too, in_gimple_form is just what is normally used
> > > for the folding.  And the routine is doing folding, but sits in a file 
> > > where
> > > symtab->state == PARSING check is more natural.
> > 
> > I think it is property of function to be in generic form.  We allow
> > functions to be constructed late and we could allow doing that using
> > generic (even though as far as I know we generate them all as gimple -
> > profiling code, collected ctors, offlining etc).
> > 
> > So perhaps we don't need to tie that with symtab state.
> 
> So do you want
> (cfun && (cfun->curr_properties & PROP_trees))
> instead then?

I think it would make more sense.
We could also simply clean it up next stage1.

Honza
> I'd really prefer such a check at least for GCC11 because while for most
> automatic vars DECL_INITIAL will be cleared during gimplification, I don't
> see guarantees for that and guarantees that it isn't left to be random
> garbage after that.
> 
>   Jakub
> 


Re: [PATCH] rtl-optimization/98863 - tame i386 specific RPAD pass

2021-01-29 Thread Jan Hubicka
> This removes adding very expensive DF problems which we do not
> use and which somehow cause 5GB of memory to leak.

Impressive :)
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> 2021-01-29  Richard Biener  
> 
>   PR rtl-optimization/98863
>   * config/i386/i386-features.c (remove_partial_avx_dependency):
>   Do not add DF chain and MD problems.
OK (if regtests passes :)
Honza
> ---
>  gcc/config/i386/i386-features.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index ef4f9406102..52c51a32e14 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -2295,8 +2295,6 @@ remove_partial_avx_dependency (void)
>   {
> calculate_dominance_info (CDI_DOMINATORS);
> df_set_flags (DF_DEFER_INSN_RESCAN);
> -   df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> -   df_md_add_problem ();
> df_analyze ();
> v4sf_const0 = gen_reg_rtx (V4SFmode);
>   }
> -- 
> 2.26.2


Re: [PATCH] rtl-optimization/98863 - tame i386 specific RPAD pass

2021-01-29 Thread Jan Hubicka
> On Fri, 29 Jan 2021, Jan Hubicka wrote:
> 
> > > This removes adding very expensive DF problems which we do not
> > > use and which somehow cause 5GB of memory to leak.

Reading through the logs, isn't the leak just caused by tings going to
memory pool that we do not trim during late optimization?
> > 
> > Impressive :)
> > > 
> > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > 2021-01-29  Richard Biener  
> > > 
> > >   PR rtl-optimization/98863
> > >   * config/i386/i386-features.c (remove_partial_avx_dependency):
> > >   Do not add DF chain and MD problems.
> > OK (if regtests passes :)
> 
> After discussion on IRC I am testing the following which removes
> the unneded df_analyze completely.
> 
> Bootstrap / regtest running on x86_64-unkown-linux-gnu.
> 
> Still OK if testing passes (I'll also build/test WRF which triggered
> this work)
> 
> Richard.
> 
> From 1657183c8cdbaea329df47fe4d76c4f871a06bdc Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Fri, 29 Jan 2021 16:02:36 +0100
> Subject: [PATCH] rtl-optimization/98863 - tame i386 specific RPAD pass
> To: gcc-patches@gcc.gnu.org
> 
> This removes analyzing DF with expensive problems which we do not
> use at all and which somehow cause 5GB of memory to leak.  Instead
> just do a defered rescan of added insns.
> 
> 2021-01-29  Richard Biener  
> 
>   PR rtl-optimization/98863
>   * config/i386/i386-features.c (remove_partial_avx_dependency):
>   Do not perform DF analysis.
>   (pass_data_remove_partial_avx_dependency): Remove
>   TODO_df_finish.
> ---
>  gcc/config/i386/i386-features.c | 17 +++--
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index ef4f9406102..c845ba90caf 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -2272,6 +2272,9 @@ remove_partial_avx_dependency (void)
>  
>auto_vec control_flow_insns;
>  
> +  /* We create invalid RTL initially so defer rescans.  */
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
>FOR_EACH_BB_FN (bb, cfun)
>  {
>FOR_BB_INSNS (bb, insn)
> @@ -2292,14 +2295,7 @@ remove_partial_avx_dependency (void)
>   continue;
>  
> if (!v4sf_const0)
> - {
> -   calculate_dominance_info (CDI_DOMINATORS);
> -   df_set_flags (DF_DEFER_INSN_RESCAN);
> -   df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> -   df_md_add_problem ();
> -   df_analyze ();
> -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> - }
> + v4sf_const0 = gen_reg_rtx (V4SFmode);
>  
> /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
>SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> @@ -2360,6 +2356,7 @@ remove_partial_avx_dependency (void)
>  {
>/* (Re-)discover loops so that bb->loop_father can be used in the
>analysis below.  */
> +  calculate_dominance_info (CDI_DOMINATORS);
>loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
>  
>/* Generate a vxorps at entry of the nearest dominator for basic
> @@ -2391,7 +2388,6 @@ remove_partial_avx_dependency (void)
>   set_insn = emit_insn_after (set,
>   insn ? PREV_INSN (insn) : BB_END (bb));
>df_insn_rescan (set_insn);
> -  df_process_deferred_rescans ();
>loop_optimizer_finalize ();
>  
>if (!control_flow_insns.is_empty ())
> @@ -2412,6 +2408,7 @@ remove_partial_avx_dependency (void)
>   }
>  }
>  
> +  df_process_deferred_rescans ();
>bitmap_obstack_release (NULL);
>BITMAP_FREE (convert_bbs);
>  
> @@ -2441,7 +2438,7 @@ const pass_data pass_data_remove_partial_avx_dependency 
> =
>0, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  TODO_df_finish, /* todo_flags_finish */
> +  0, /* todo_flags_finish */

I am not sure why df needs no longer finishing?

But patch looks OK to me (but pushing my DF knowledge though)

Honza
>  };
>  
>  class pass_remove_partial_avx_dependency : public rtl_opt_pass
> -- 
> 2.26.2
> 


Re: [PATCH] tree-optimization/98499 - fix modref analysis on RVO statements

2021-02-01 Thread Jan Hubicka
> From: Sergei Trofimovich 
> 
> Before the change RVO gimple statements were treated as local
> stores by modres analysis. But in practice RVO escapes target.
> 
> 2021-01-30  Sergei Trofimovich  
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/98499
>   * ipa-modref.c: treat RVO conservatively and assume
>   all possible side-effects.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/98499
>   * g++.dg/pr98499.C: new test.

This is OK.  Thanks a lot for debugging this.
> +   /* Return slot optiomization would require bit of propagation;
> +  give up for now.  */
> +   if (gimple_call_return_slot_opt_p (call)
> +   && gimple_call_lhs (call) != NULL_TREE
> +   && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (call
> + {
> +   if (dump_file)
> + fprintf (dump_file, "%*s  Unhandled return slot opt\n",
> +  depth * 4, "");
> +   lattice[index].merge (0);

Here we are really missing a way to track that argument is "write only".
We could probably still set EAF_DIRECT, but it is useless withtout
noescape anyway, so 0 is OK.

I implemented tracking of noescape here but only at local modref, since
for global modref we are missing jump functions tracking the fact that
the return value is passed to another function, so probably someting for
next stage1 (I am gathering some stats now though)

Honza
> + }
> /* Recursion would require bit of propagation; give up for now.  */
> -   if (callee && !ipa && recursive_call_p (current_function_decl,
> +   else if (callee && !ipa && recursive_call_p (current_function_decl,
> callee))
>   lattice[index].merge (0);
> else
> diff --git a/gcc/testsuite/g++.dg/pr98499.C b/gcc/testsuite/g++.dg/pr98499.C
> new file mode 100644
> index 000..ace088aeed9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr98499.C
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/98499.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +struct string {
> +  // pointer to local store
> +  char * _M_buf;
> +  // local store
> +  char _M_local_buf[16];
> +
> +  __attribute__((noinline)) string() : _M_buf(_M_local_buf) {}
> +
> +  ~string() {
> +if (_M_buf != _M_local_buf)
> +  __builtin_trap();
> +  }
> +
> +  string(const string &__str); // no copies
> +};
> +
> +__attribute__((noinline)) static string dir_name() { return string(); }
> +class Importer {
> +  string base_path;
> +
> +public:
> +  __attribute__((noinline)) Importer() : base_path (dir_name()) {}
> +};
> +
> +int main() {
> +  Importer imp;
> +}
> -- 
> 2.30.0
> 


Re: [PATCH] ipa/97346 - fix leak of reference_vars_to_consider

2021-02-14 Thread Jan Hubicka
> This cleans up allocation/deallocation of reference_vars_to_consider,
> specifically always releasing the vector allocated in ipa_init and
> also making sure to release it before re-allocating it in
> ipa_reference_write_optimization_summary.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2021-02-10  Richard Biener  
> 
>   PR ipa/97346
>   * ipa-reference.c (propagate): Always free
>   reference_vars_to_consider.
>   (ipa_reference_write_optimization_summary): Free
>   reference_vars_to_consider before re-allocating it.
>   (ipa_reference_write_optimization_summary): Use vec_free
>   and NULL reference_vars_to_consider.

Hi,
this is version I commited after discussion on the PR
(it makes it more explicit that reference_vars_to_consider are used
during analysis only to aid dumping).

Honza


2021-02-14  Jan Hubicka  
Richard Biener  

PR ipa/97346
* ipa-reference.c (ipa_init): Only conditinally initialize
reference_vars_to_consider.
(propagate): Conditionally deninitialize reference_vars_to_consider.
(ipa_reference_write_optimization_summary): Sanity check that
reference_vars_to_consider is not allocated.

diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index 2ea2a6d5327..6cf78ff94a6 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -458,8 +458,8 @@ ipa_init (void)
 
   ipa_init_p = true;
 
-  vec_alloc (reference_vars_to_consider, 10);
-
+  if (dump_file)
+vec_alloc (reference_vars_to_consider, 10);
 
   if (ipa_ref_opt_sum_summaries != NULL)
 {
@@ -967,8 +967,12 @@ propagate (void)
 }
 
   if (dump_file)
-vec_free (reference_vars_to_consider);
-  reference_vars_to_consider = NULL;
+{
+  vec_free (reference_vars_to_consider);
+  reference_vars_to_consider = NULL;
+}
+  else
+gcc_checking_assert (!reference_vars_to_consider);
   return remove_p ? TODO_remove_functions : 0;
 }
 
@@ -1059,6 +1063,7 @@ ipa_reference_write_optimization_summary (void)
   auto_bitmap ltrans_statics;
   int i;
 
+  gcc_checking_assert (!reference_vars_to_consider);
   vec_alloc (reference_vars_to_consider, ipa_reference_vars_uids);
   reference_vars_to_consider->safe_grow (ipa_reference_vars_uids, true);
 
@@ -1117,7 +1122,8 @@ ipa_reference_write_optimization_summary (void)
  }
   }
   lto_destroy_simple_output_block (ob);
-  delete reference_vars_to_consider;
+  vec_free (reference_vars_to_consider);
+  reference_vars_to_consider = NULL;
 }
 
 /* Deserialize the ipa info for lto.  */


Re: [PATCH] gcov: use mmap pools for KVP.

2021-03-03 Thread Jan Hubicka
> Hello.
> 
> AS mentioned here, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97461#c25, I 
> like
> what Richard suggested. So instead of usage of malloc, we should use mmap 
> memory
> chunks that serve as a memory pool for struct gcov_kvp.
> 
> Malloc is used as a fallback when mmap is not available. I also drop 
> statically
> pre-allocated static pool, mmap solves the root problem.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/97461
>   * gcov-io.h (GCOV_PREALLOCATED_KVP): Remove.
> 
> libgcc/ChangeLog:
> 
>   PR gcov-profile/97461
>   * config.in: Regenerate.
>   * configure: Likewise.
>   * configure.ac: Check sys/mman.h header file
>   * libgcov-driver.c (struct gcov_kvp): Remove static
>   pre-allocated pool and use a dynamic one.
>   * libgcov.h (MMAP_CHUNK_SIZE): New.
>   (gcov_counter_add): Use mmap to allocate pool for struct
>   gcov_kvp.
OK, thanks.
We should be ready to add support for non-mmap targets (especially
mingw).  

Honza
> ---
>  gcc/gcov-io.h   |  3 ---
>  libgcc/config.in|  3 +++
>  libgcc/configure|  4 ++--
>  libgcc/configure.ac |  2 +-
>  libgcc/libgcov-driver.c | 11 +++
>  libgcc/libgcov.h| 42 -
>  6 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
> index baed67609e2..75f16a274c7 100644
> --- a/gcc/gcov-io.h
> +++ b/gcc/gcov-io.h
> @@ -292,9 +292,6 @@ GCOV_COUNTERS
>  /* Maximum number of tracked TOP N value profiles.  */
>  #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
> -/* Number of pre-allocated gcov_kvp structures.  */
> -#define GCOV_PREALLOCATED_KVP 64
> -
>  /* Convert a counter index to a tag.  */
>  #define GCOV_TAG_FOR_COUNTER(COUNT)  \
>   (GCOV_TAG_COUNTER_BASE + ((gcov_unsigned_t)(COUNT) << 17))
> diff --git a/libgcc/config.in b/libgcc/config.in
> index 5be5321d258..f93c64a00c3 100644
> --- a/libgcc/config.in
> +++ b/libgcc/config.in
> @@ -49,6 +49,9 @@
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_AUXV_H
> +/* Define to 1 if you have the  header file. */
> +#undef HAVE_SYS_MMAN_H
> +
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_SYS_STAT_H
> diff --git a/libgcc/configure b/libgcc/configure
> index 78fc22a5784..dd3afb2c957 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -4458,7 +4458,7 @@ as_fn_arith $ac_cv_sizeof_long_double \* 8 && 
> long_double_type_size=$as_val
>  for ac_header in inttypes.h stdint.h stdlib.h ftw.h \
>   unistd.h sys/stat.h sys/types.h \
> - string.h strings.h memory.h sys/auxv.h
> + string.h strings.h memory.h sys/auxv.h sys/mman.h
>  do :
>as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
>  ac_fn_c_check_header_preproc "$LINENO" "$ac_header" "$as_ac_Header"
> @@ -4913,7 +4913,7 @@ case "$host" in
>  case "$enable_cet" in
>auto)
>   # Check if target supports multi-byte NOPs
> - # and if assembler supports CET insn.
> + # and if compiler and assembler support CET insn.
>   cet_save_CFLAGS="$CFLAGS"
>   CFLAGS="$CFLAGS -fcf-protection"
>   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index ed50c0e9b49..10ffb046415 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -224,7 +224,7 @@ AC_SUBST(long_double_type_size)
>  AC_CHECK_HEADERS(inttypes.h stdint.h stdlib.h ftw.h \
>   unistd.h sys/stat.h sys/types.h \
> - string.h strings.h memory.h sys/auxv.h)
> + string.h strings.h memory.h sys/auxv.h sys/mman.h)
>  AC_HEADER_STDC
>  # Check for decimal float support.
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index e474e032b54..91462350132 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -588,11 +588,14 @@ struct gcov_root __gcov_root;
>  struct gcov_master __gcov_master =
>{GCOV_VERSION, 0};
> -/* Pool of pre-allocated gcov_kvp strutures.  */
> -struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
> +/* Dynamic pool for gcov_kvp structures.  */
> +struct gcov_kvp *__gcov_kvp_dynamic_pool;
> -/* Index to first free gcov_kvp in the pool.  */
> -unsigned __gcov_kvp_pool_index;
> +/* Index into __gcov_kvp_dynamic_pool array.  */
> +unsigned __gcov_kvp_dynamic_pool_index;
> +
> +/* Size of _gcov_kvp_dynamic_pool array.  */
> +unsigned __gcov_kvp_dynamic_pool_size;
>  void
>  __gcov_exit (void)
> diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
> index b4a7e942a7e..e848811d89d 100644
> --- a/libgcc/libgcov.h
> +++ b/libgcc/libgcov.h
> @@ -45,6 +45,10 @@
>  #include "libgcc_tm.h"
>  #include "gcov.h"
> +#if HAVE_SYS_MMAN_H
> +#include 
> +#endif
> +
>  #if __CHAR_BIT__ == 8
>  typedef unsigned gcov_unsigned_t __attribute__ ((mode (SI)));
>  typedef unsigned gcov_position_t __

Re: [PATCH] profiling: fix streaming of TOPN counters

2021-03-03 Thread Jan Hubicka
> 
> libgcc/ChangeLog:
> 
>   PR gcov-profile/99105
>   * libgcov-driver.c (write_top_counters): Rename to ...
>   (write_topn_counters): ... this.
>   (write_one_data): Pre-allocate buffer for number of items
>   in the corresponding linked lists.
>   * libgcov-merge.c (__gcov_merge_topn): Use renamed function.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR gcov-profile/99105
>   * gcc.dg/tree-prof/indir-call-prof-malloc.c: Use profile
>   correction as the wrapped malloc is called one more time
>   from libgcov.

>for (unsigned i = 0; i < counters; i++)
>  {
> -  gcov_type pair_count = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 1];
>gcov_write_counter (ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i]);
> -  gcov_write_counter (pair_count);
> +  gcov_write_counter (list_sizes[i]);
>gcov_type start = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 2];
> +
> +  unsigned j = 0;
>for (struct gcov_kvp *node = (struct gcov_kvp *)(intptr_t)start;
> -node != NULL; node = node->next)
> +node != NULL; node = node->next, j++)
>   {
> gcov_write_counter (node->value);
> gcov_write_counter (node->count);
> +
> +   /* Stop when we reach expected number of items.  */
> +   if (j + 1 == list_sizes[i])
> + break;

Since you counted number of entries earlier, I would expect loop to
always terminate here and thus the node != NULL condition in for loop
above to be unnecesary.
>   }
>  }
> +
> +  free (list_sizes);

We already have our own mmap allocator, so I wonder if we don't want to
avoid malloc/free pair here.  The counters are per-function, right?  I
wonder if they happen to be large on some bigger project, but it may
reduct risk of user messing this up with his own malloc/free
implementation if we used alloca for counts of reasonable size.
>  }
>  
>  /* Write counters in GI_PTR and the summary in PRG to a gcda file. In
> @@ -425,7 +446,7 @@ write_one_data (const struct gcov_info *gi_ptr,
> n_counts = ci_ptr->num;
>  
> if (t_ix == GCOV_COUNTER_V_TOPN || t_ix == GCOV_COUNTER_V_INDIR)
> - write_top_counters (ci_ptr, t_ix, n_counts);
> + write_topn_counters (ci_ptr, t_ix, n_counts);
> else
>   {
> /* Do not stream when all counters are zero.  */
> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
> index 7db188a4f4c..3379b688128 100644
> --- a/libgcc/libgcov-merge.c
> +++ b/libgcc/libgcov-merge.c
> @@ -109,6 +109,7 @@ __gcov_merge_topn (gcov_type *counters, unsigned 
> n_counters)
>/* First value is number of total executions of the profiler.  */
>gcov_type all = gcov_get_counter_ignore_scaling (-1);
>gcov_type n = gcov_get_counter_ignore_scaling (-1);
> +  gcc_assert (n <= GCOV_TOPN_MAXIMUM_TRACKED_VALUES);

I think in the runtime we do not want to have asserts checking
implementation correctness since it bloats it up.  So I would leave it
out.

I wonder if we can have some testcase for parallel updating/streaming in
testsuite?

Otherwise the patch looks good to me.
Honza
>  
>unsigned full = all < 0;
>gcov_type *total = &counters[GCOV_TOPN_MEM_COUNTERS * i];
> -- 
> 2.30.0
> 



Re: [PATCH] profiling: fix streaming of TOPN counters

2021-03-04 Thread Jan Hubicka
>  .../gcc.dg/tree-prof/indir-call-prof-malloc.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/pr97461.c  |  2 +-
>  libgcc/libgcov-driver.c   | 56 ---
>  3 files changed, 50 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c 
> b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
> index 454e224c95f..7bda4aedfc8 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-O2 -ldl" } */
> +/* { dg-options "-O2 -ldl -fprofile-correction" } */
>  
>  #define _GNU_SOURCE
>  #include 
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr97461.c 
> b/gcc/testsuite/gcc.dg/tree-prof/pr97461.c
> index 213fac9af04..f684be4d80f 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/pr97461.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/pr97461.c
> @@ -1,5 +1,5 @@
>  /* PR gcov-profile/97461 */
> -/* { dg-options "-O2 -ldl" } */
> +/* { dg-options "-O2 -ldl -fprofile-correction" } */
>  
>  #define _GNU_SOURCE
>  
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index 91462350132..d2e60a5a6df 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -42,6 +42,10 @@ void __gcov_init (struct gcov_info *p __attribute__ 
> ((unused))) {}
>  #include 
>  #endif
>  
> +#if HAVE_SYS_MMAN_H
> +#include 
> +#endif
> +
>  #ifdef L_gcov
>  
>  /* A utility function for outputting errors.  */
> @@ -334,30 +338,66 @@ read_error:
>return -1;
>  }
>  
> +#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
> +
>  /* Store all TOP N counters where each has a dynamic length.  */
>  
>  static void
> -write_top_counters (const struct gcov_ctr_info *ci_ptr,
> - unsigned t_ix,
> - gcov_unsigned_t n_counts)
> +write_topn_counters (const struct gcov_ctr_info *ci_ptr,
> +  unsigned t_ix,
> +  gcov_unsigned_t n_counts)
>  {
>unsigned counters = n_counts / GCOV_TOPN_MEM_COUNTERS;
>gcc_assert (n_counts % GCOV_TOPN_MEM_COUNTERS == 0);
> +
> +  /* It can happen in a multi-threaded environment that number of counters is
> + different from the size of the corresponding linked lists.  */
> +#define LIST_SIZE_MIN_LENGTH 4 * 1024
> +
> +  static unsigned *list_sizes = NULL;
> +  static unsigned list_size_length = 0;
> +
> +  if (list_sizes == NULL || counters > list_size_length)
> +{
> +  list_size_length = MAX (LIST_SIZE_MIN_LENGTH, counters);
> +#if HAVE_SYS_MMAN_H
> +  list_sizes = (unsigned *)mmap (NULL, list_size_length * sizeof 
> (unsigned),
> +  PROT_READ | PROT_WRITE,
> +  MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +#endif
> +
> +  /* Malloc fallback.  */
> +  if (list_sizes == NULL)
> + list_sizes = (unsigned *)xmalloc (list_size_length * sizeof (unsigned));
I see, you switched to allocating buffer permanently. This also works.
Given that you do not deallocate the memory, I think you want to
allocate list_size_length*2 so we do not end up taking O(mn) memory
where n is the largest number of counters in a file and m is number of
sections with counters.

Can you please unify the mmap code in list allocation and here, so there
is only one place in libgcov Windows folks will need to update?

Otherwise the patch looks OK.
Honza


Re: ipa-modref: merge flags when adding escape

2021-08-11 Thread Jan Hubicka
> While working on some function splitting changes, I've got a
> miscompilation in stagefeedback that I've tracked down to a
> complicated scenario:
> 
> - ipa-modref miscomputes a function parameter as having EAF_DIRECT,
>   because it's dereferenced and passed on to another function, but
>   add_escape_point does not update the flags for the dereferenced
>   SSA_NAME passed as a parameter, and the EAF_UNUSED in the value that
>   first initializes it, that remains unchanged throughout, causes
>   deref_flags to set EAF_DIRECT, among other flags.
> 
> - structalias, seeing the EAF_DIRECT in the parameter for that
>   function, refrains from mak[ing]_transitive_closure_constraints for
>   a pointer passed in that parameter.
> 
> - tree dse2 concludes the initializer of the pointed-to variable is a
>   dead store and removes it.
> 
> The test depends on gimple passes's processing of functions in a
> certain order to expose parm flag miscomputed by ipa-modref.  A
> different order may enable the non-ipa modref2 pass to compute flags
> differently and avoid the problem.
> 
> I've arranged for add_escape_point to merge flags, as the non-ipa path
> does.  I've also caught and fixed an error in the dumping of escaping
> flags.
> 
> The problem affects at least trunk and gcc-11.  I've so far bootstrapped
> GCC 11, and I'm now regstrapping trunk.  Ok to install if it passes?
> 
> 
> for  gcc/ChangeLog
> 
>   * ipa-modref.c (modref_lattice::add_escape_point): Merge
>   min_flags into flags.
>   (modref_lattice::dump): Fix escape_point's min_flags dumping.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * c-c++-common/modref-dse.c: New.

Hi,
thank you for looking into the bug and sorry for taking so long to
respond.  The fix you propose is bit too generous, since it essentially
disable IPA bits of the ipa-modref (it will resort to worst case
solution w/o any IPA propagation).  

In IPA mode the proper flags are supposed to be determined by
propagation via "escape points".  The bug is bit subtle caused by
optimization that avoids recording flags for escape points where
we know that we do not care.  This is tested by comparing min_flags
(which is the known conservative estimation used by local analysis) with
flags of the value being determined.  If these flags are subset of
min_flags there is nothing to gain.

While merging lattices there is case where direct escape becomes
indirect and in this case I forgot to update min_flags to dereferenced
version which in turn makes the escape point to be skipped.

This is improved patch I have bootstrapped/regtested x86_64-linux and I
am collecting stats for (it should have minimal effect on overal
effectivity of modref).

Honza

gcc/ChangeLog:

2021-08-11  Jan Hubicka  
Alexandre Oliva  

* ipa-modref.c (modref_lattice::dump): Fix escape_point's min_flags
dumping.
(modref_lattice::merge_deref): Fix handling of indirect scape points.
(update_escape_summary_1): Likewise.
(update_escape_summary): Likewise.
(ipa_merge_modref_summary_after_inlining): Likewise.

gcc/testsuite/ChangeLog:

2021-08-11  Alexandre Oliva  

* c-c++-common/modref-dse.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index ef5e62beb0e..dccaf658720 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1392,7 +1392,7 @@ modref_lattice::dump (FILE *out, int indent) const
  fprintf (out, "%*s  Arg %i (%s) min flags", indent, "",
   escape_points[i].arg,
   escape_points[i].direct ? "direct" : "indirect");
- dump_eaf_flags (out, flags, false);
+ dump_eaf_flags (out, escape_points[i].min_flags, false);
  fprintf (out, " in call ");
  print_gimple_stmt (out, escape_points[i].call, 0);
}
@@ -1489,10 +1489,18 @@ modref_lattice::merge_deref (const modref_lattice 
&with, bool ignore_stores)
   if (!flags)
 return changed;
   for (unsigned int i = 0; i < with.escape_points.length (); i++)
-changed |= add_escape_point (with.escape_points[i].call,
-with.escape_points[i].arg,
-with.escape_points[i].min_flags,
-false);
+{
+  int min_flags = with.escape_points[i].min_flags;
+
+  if (with.escape_points[i].direct)
+   min_flags = deref_flags (min_flags, ignore_stores);
+  else if (ignore_stores)
+   min_flags |= EAF_NOCLOBBER | EAF_NOESCAPE | EAF_NODIRECTESCAPE;
+  changed |= add_escape_point (with.escape_points[i].call,
+  with.escape_points[i].arg,
+  min_flags,
+  false);
+}
 

Fix condition testing void functions in ipa-split

2021-08-12 Thread Jan Hubicka
Hi,
while looking into the code I noticed the following thinko.
VOID_TYPE_P (TREE_TYPE (current_function_decl)) is always false since
TREE_TYPE (current_function_decl) is either function_type or
method_type.  One extra TREE_TYPE is needed to get to type of return
value.

Bootstrapped/regtested x86_64-linux. Comitted.

gcc/ChangeLog:

2021-08-12  Jan Hubicka  

* ipa-split.c (consider_split): Fix condition testing void functions.

diff --git a/gcc/ipa-split.c b/gcc/ipa-split.c
index 5e918ee3fbf..c68577d04a9 100644
--- a/gcc/ipa-split.c
+++ b/gcc/ipa-split.c
@@ -546,8 +546,9 @@ consider_split (class split_point *current, bitmap 
non_ssa_vars,
}
}
 }
-  if (!VOID_TYPE_P (TREE_TYPE (current_function_decl)))
-call_overhead += estimate_move_cost (TREE_TYPE (current_function_decl),
+  if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (current_function_decl
+call_overhead += estimate_move_cost (TREE_TYPE (TREE_TYPE
+(current_function_decl)),
 false);
 
   if (current->split_size <= call_overhead)


Introduce EAF_NOREAD and cleanup EAF_UNUSED + ipa-modref

2021-08-12 Thread Jan Hubicka
Hi,
this patch add EAF_NOREAD (as disucssed on IRC earlier) and fixes meaning
of EAF_UNUSED to be really unused and not "does not escape, is not clobbered,
read or returned" since we have separate flags for each of the properties
now.

Number of flags has grown and I thus I refactored the code a bit to avoid
repeated uses of complex flag combinations and also simplified the logic of
merging.

Merging is bit tricky since we have flags that implies other flags (like
NOESCAPE implies NODIRECTESCAPE) but code that sets only NOESCAPE and
not NODIRECTESCAPE and therefore simple and does not right thing.  I
added code that deals with the implications.  Perhaps it would make
sense to update fnspecs to always set flag along with all the
implications, but for now I am handlingit in merge.

I made only trivial update to tree-ssa-structalias and will send changes
needed for EAF_NOREAD incrementally, so it can be discussed.  I think
logical step is to track whether function reads/stores global memory and
rewrite the constraint generation so we can handle normal, pure and
const in unified manner.

Bootstrapped/regtested x86_64-linux, plan to commit it after furhter testing.

The patch improves alias oracle stats for cc1plus somewhat.

From:

Alias oracle query stats:
  refs_may_alias_p: 72380497 disambiguations, 82649832 queries
  ref_maybe_used_by_call_p: 495184 disambiguations, 73366950 queries
  call_may_clobber_ref_p: 259312 disambiguations, 263253 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 38006 queries
  nonoverlapping_refs_since_match_p: 21157 disambiguations, 65698 must 
overlaps, 87756 queries
  aliasing_component_refs_p: 63141 disambiguations, 2164695 queries
  TBAA oracle: 25975753 disambiguations 61449632 queries
   12138220 are in alias set 0
   11316663 queries asked about the same object
   144 queries asked about the same alias set
   0 access volatile
   10472885 are dependent in the DAG
   1545967 are aritificially in conflict with void *

Modref stats:
  modref use: 23857 disambiguations, 754515 queries
  modref clobber: 1392162 disambiguations, 17753512 queries
  3450241 tbaa queries (0.194341 per modref query)
  534816 base compares (0.030125 per modref query)

PTA query stats:
  pt_solution_includes: 12394915 disambiguations, 20235925 queries
  pt_solutions_intersect: 1365299 disambiguations, 14638068 queries

To:

Alias oracle query stats:
  refs_may_alias_p: 72629640 disambiguations, 8290 queries
  ref_maybe_used_by_call_p: 502474 disambiguations, 73612186 queries
  call_may_clobber_ref_p: 261806 disambiguations, 265659 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 38007 queries
  nonoverlapping_refs_since_match_p: 21139 disambiguations, 65772 must 
overlaps, 87816 queries
  aliasing_component_refs_p: 63144 disambiguations, 2164330 queries
  TBAA oracle: 26059018 disambiguations 61571714 queries
   12158033 are in alias set 0
   11326115 queries asked about the same object
   144 queries asked about the same alias set
   0 access volatile
   10484493 are dependent in the DAG
   1543911 are aritificially in conflict with void *

Modref stats:
  modref use: 24008 disambiguations, 712712 queries
  modref clobber: 1395917 disambiguations, 17163694 queries
  3465657 tbaa queries (0.201918 per modref query)
  537591 base compares (0.031321 per modref query)

PTA query stats:
  pt_solution_includes: 12468934 disambiguations, 20295402 queries
  pt_solutions_intersect: 1391917 disambiguations, 14665265 queries

I think it is mostly due to better heandling of EAF_NODIRECTESCAPE.

Honza

gcc/ChangeLog:

2021-08-12  Jan Hubicka  

* ipa-modref.c (dump_eaf_flags): Dump EAF_NOREAD.
(implicit_const_eaf_flags, implicit_pure_eaf_flags,
 ignore_stores_eaf_flags): New constants.
(remove_useless_eaf_flags): New function.
(eaf_flags_useful_p): Use it.
(deref_flags): Add EAF_NOT_RETURNED if flag is unused;
handle EAF_NOREAD.
(modref_lattice::init): Add EAF_NOREAD.
(modref_lattice::add_escape_point): Do not reacord escape point if
result is unused.
(modref_lattice::merge): EAF_NOESCAPE implies EAF_NODIRECTESCAPE;
use remove_useless_eaf_flags.
(modref_lattice::merge_deref): Use ignore_stores_eaf_flags.
(modref_lattice::merge_direct_load): Add EAF_NOREAD
(analyze_ssa_name_flags): Fix handling EAF_NOT_RETURNED
(analyze_parms): Use remove_useless_eaf_flags.
(ipa_merge_modref_summary_after_inlining): Use ignore_stores_eaf_flags.
(modref_merge_call_site_flags): Add caller and ecf_flags parameter;
use remove_useless_eaf_flags.
(modref_propagate_flags_in_scc): Update.
* ipa-modref.h: Turn eaf_flags_t back to char.
* tree-core.h (EAF_NOT_RETURNED): Fix.
(

Re: [PATCH] ipa: do not make localaliases for target_clones [PR101261]

2021-08-13 Thread Jan Hubicka
> Hello.
> 
> Right now, target_clone pass complains when a target_clone function is an 
> alias.
> That happens when localalias is created by callgraph. I think we should not 
> create
> such aliases as we won't benefit much from it in case of target_clones.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
OK, thanks
Honza
> Thanks,
> Martin
> 
>   PR ipa/101261
> 
> gcc/ChangeLog:
> 
>   * symtab.c (symtab_node::noninterposable_alias): Do not create
> local aliases for target_clone functions as the clonning pass
> rejects aliases.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr101261.c: New test.
> ---
>  gcc/symtab.c |  2 ++
>  gcc/testsuite/gcc.target/i386/pr101261.c | 11 +++
>  2 files changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101261.c
> 
> diff --git a/gcc/symtab.c b/gcc/symtab.c
> index 8c4cb70b015..c7ea8ecef74 100644
> --- a/gcc/symtab.c
> +++ b/gcc/symtab.c
> @@ -1959,6 +1959,8 @@ symtab_node::noninterposable_alias (void)
>/* If aliases aren't supported by the assembler, fail.  */
>if (!TARGET_SUPPORTS_ALIASES)
>  return NULL;
> +  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (node->decl)))
> +return NULL;
>/* Otherwise create a new one.  */
>new_decl = copy_node (node->decl);
> diff --git a/gcc/testsuite/gcc.target/i386/pr101261.c 
> b/gcc/testsuite/gcc.target/i386/pr101261.c
> new file mode 100644
> index 000..d25d1a202c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101261.c
> @@ -0,0 +1,11 @@
> +/* PR middle-end/101261 */
> +/* { dg-do compile { target fpic } } */
> +/* { dg-options "-fno-semantic-interposition -fPIC" } */
> +/* { dg-require-ifunc "" } */
> +
> +void
> +__attribute__((target_clones("default", "avx2")))
> +dt_ioppr_transform_image_colorspace()
> +{
> +  dt_ioppr_transform_image_colorspace();
> +}
> -- 
> 2.32.0
> 


Re: [PATCH] ipa: "naked" attribute implies "noipa" attribute

2021-08-13 Thread Jan Hubicka
> Hi.
> 
> This is a first part fixing the PR. It makes sense making "naked" functions 
> "noipa".
> What's missing is IPA MOD pass support where the pass should not optimize fns
> with "noipa" attributes.
> 
> @Honza: Can you please implement that?

Hmm, I had patch for that somewhere, will do that.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

OK, except for..
> +  && lookup_attribute_spec (get_identifier ("naked"))
> +  &&(lookup_attribute ("noipa", attributes) == NULL))

missing space or extra braces here.  I am not sure how much we play with
NULL_TREE these days.

Honza
> + attributes = tree_cons (get_identifier ("noipa"), NULL, attributes);
>/* A "noipa" function attribute implies "noinline", "noclone" and "no_icf"
>   for those targets that support it.  */
> -- 
> 2.32.0
> 


Re: [PATCH] ipa: add debug counter for IPA MODREF PTA

2021-08-22 Thread Jan Hubicka
> Hi.
> 
> We already have a IPA modref debug counter, but it's only used in 
> tree-ssa-alias,
> which is only a part of what IPA modref does. I used the dbg counter in 
> isolation
> of PR101949.
> 
> Ready for master?
OK,
thanks!

Honza
> 
> gcc/ChangeLog:
> 
>   * dbgcnt.def (DEBUG_COUNTER): New counter.
>   * gimple.c (gimple_call_arg_flags): Use it in IPA PTA.
> ---
>  gcc/dbgcnt.def | 1 +
>  gcc/gimple.c   | 5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 2345899ba68..c2bcc4eef5e 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -175,6 +175,7 @@ DEBUG_COUNTER (ipa_cp_bits)
>  DEBUG_COUNTER (ipa_cp_values)
>  DEBUG_COUNTER (ipa_cp_vr)
>  DEBUG_COUNTER (ipa_mod_ref)
> +DEBUG_COUNTER (ipa_mod_ref_pta)
>  DEBUG_COUNTER (ipa_sra_params)
>  DEBUG_COUNTER (ipa_sra_retvalues)
>  DEBUG_COUNTER (ira_move)
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 4e2653cab2f..bed7ff9e71c 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "attr-fnspec.h"
>  #include "ipa-modref-tree.h"
>  #include "ipa-modref.h"
> -
> +#include "dbgcnt.h"
>  /* All the tuples have their operand vector (if present) at the very bottom
> of the structure.  Therefore, the offset required to find the
> @@ -1601,7 +1601,8 @@ gimple_call_arg_flags (const gcall *stmt, unsigned arg)
> if ((modref_flags & EAF_DIRECT) && !(flags & EAF_DIRECT))
>   modref_flags &= ~EAF_DIRECT;
>   }
> -   flags |= modref_flags;
> +   if (dbg_cnt (ipa_mod_ref_pta))
> + flags |= modref_flags;
>   }
>  }
>return flags;
> -- 
> 2.32.0
> 


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-22 Thread Jan Hubicka
> Hello.
> 
> As showed in the PR, returning (EAF_NOCLOBBER | EAF_NOESCAPE) for an argument
> that is a function pointer is problematic. Doing such a function call is a 
> clobber.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>   PR 101949
> 
> gcc/ChangeLog:
> 
>   * ipa-modref.c (analyze_ssa_name_flags): Do not propagate EAF
> flags arguments for indirect functions.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/lto/pr101949_0.c: New test.
>   * gcc.dg/lto/pr101949_1.c: New test.
> 
> Co-Authored-By: Richard Biener 
> ---
>  gcc/ipa-modref.c  |  3 +++
>  gcc/testsuite/gcc.dg/lto/pr101949_0.c | 20 
>  gcc/testsuite/gcc.dg/lto/pr101949_1.c |  4 
>  3 files changed, 27 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr101949_0.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr101949_1.c
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fafd804d4ba..380ba6926b9 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -1715,6 +1715,9 @@ analyze_ssa_name_flags (tree name, vec 
> &lattice, int depth,
> else if (callee && !ipa && recursive_call_p (current_function_decl,
> callee))
>   lattice[index].merge (0);
> +   /* Ignore indirect calls (PR101949).  */
> +   else if (callee == NULL_TREE)
> + lattice[index].merge (0);

Thanks for looking into this bug - it is interesting that ipa-pta
requires !EAF_NOCLOBBER when function is called...

I have some work done on teaching ipa-modref (and other propagation
passes) to use ipa-devirt info when the full set of callees is known.
This goes oposite way.

You can drop flags only when callee == NAME and you can just frop
EAF_NOCLOBBER.  For example in testcase

struct a {
  void (*foo)();
  void *bar;
}

void wrap (struct a *a)
{
  a->foo ();
}

will prevent us from figuring out that bar can not be modified when you
pass non-ecaping instance of struct a to wrap.

Honza


Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-08-22 Thread Jan Hubicka
> Good hint. I added hash based on object file name (I don't want to handle
> proper string escaping) and -frandom-seed.
> 
> What do you think about the patch?
Sorry for taking so long - I remember I was sending reply earlier but it
seems I only wrote it and never sent.
> Thanks,
> Martin

> From 372d2944571906932fd1419bfc51a949d67b857e Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Fri, 21 May 2021 10:25:49 +0200
> Subject: [PATCH] LTO: add lto_priv suffixfor LTO_LINKER_OUTPUT_NOLTOREL.
> 
> gcc/lto/ChangeLog:
> 
>   * lto-partition.c (privatize_symbol_name_1): Add random suffix
>   based on hash of the object file and -frandom-seed.
> ---
>  gcc/lto/lto-partition.c | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..fef48c869a2 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-fnsummary.h"
>  #include "lto-partition.h"
>  #include "sreal.h"
> +#include "toplev.h"
>  
>  vec ltrans_partitions;
>  
> @@ -941,9 +942,23 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>  
>name = maybe_rewrite_identifier (name);
>unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
> -  symtab->change_decl_assembler_name (decl,
> -   clone_function_name (
> -   name, "lto_priv", clone_number));
> +
> +  char *suffix = NULL;
> +  if (flag_lto_linker_output == LTO_LINKER_OUTPUT_NOLTOREL)
> +{
> +  hashval_t fnhash = 0;
> +  if (node->lto_file_data != NULL)
> + fnhash = htab_hash_string (node->lto_file_data->file_name);
> +  suffix = XNEWVEC (char, 128);
> +  char sep = symbol_table::symbol_suffix_separator ();
> +  sprintf (suffix, "lto_priv%c%u%c%" PRIu64, sep, fnhash, sep,
> +(unsigned HOST_WIDE_INT)get_random_seed (false));

We have get_file_function_name which does similar work but also working
without random seeds.  Perhaps we can reuse it here: use
get_file_function_name once and use it as prefix or compute hash from
it.

The logic to get unique symbol name is not completely easy and it would
be better to not duplciate it.  Patch is OK with that change
(and indeed it is bugfix - even if it is relatively little used partial
linking of LTO objects into non-LTO should be supported and working).
Honza
> +}
> +
> +  tree clone
> += clone_function_name (name, suffix ? suffix : "lto_priv", clone_number);
> +  symtab->change_decl_assembler_name (decl, clone);
> +  free (suffix);
>clone_number++;
>  
>if (node->lto_file_data)
> -- 
> 2.31.1
> 



Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-22 Thread Jan Hubicka
> 
> for  gcc/ChangeLog
> 
>   * ipa-modref.c (analyze_function): Skip debug stmts.
>   * tree-inline.c (estimate_num_insn): Consider builtins even
>   without a cgraph_node.

OK, thanks for looking into this issue!
(for mainline and release brances bit later)
> ---
>  gcc/ipa-modref.c  |3 ++-
>  gcc/tree-inline.c |4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fafd804d4bae4..f0cddbb077aaa 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -2108,7 +2108,8 @@ analyze_function (function *f, bool ipa)
>FOR_EACH_BB_FN (bb, f)
>  {
>gimple_stmt_iterator si;
> -  for (si = gsi_after_labels (bb); !gsi_end_p (si); gsi_next (&si))
> +  for (si = gsi_start_nondebug_after_labels_bb (bb);
> +!gsi_end_p (si); gsi_next_nondebug (&si))

It seems that analye_stmt indeed does not skip debug stmts.  It is very
odd we got so far without hitting build difference.

Honza
>   {
> if (!analyze_stmt (summary, summary_lto,
>gsi_stmt (si), ipa, &recursive_calls)
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index d0e9f52d5f138..636130fe0019e 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -4436,8 +4436,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
>   /* Do not special case builtins where we see the body.
>  This just confuse inliner.  */
>   struct cgraph_node *node;
> - if (!(node = cgraph_node::get (decl))
> - || node->definition)
> + if ((node = cgraph_node::get (decl))
> + && node->definition)
> ;
>   /* For buitins that are likely expanded to nothing or
>  inlined do not account operand costs.  */
> 
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-22 Thread Jan Hubicka
> Thanks for looking into this bug - it is interesting that ipa-pta
> requires !EAF_NOCLOBBER when function is called...
> 
> I have some work done on teaching ipa-modref (and other propagation
> passes) to use ipa-devirt info when the full set of callees is known.
> This goes oposite way.
> 
> You can drop flags only when callee == NAME and you can just frop
> EAF_NOCLOBBER.  For example in testcase
> 
> struct a {
>   void (*foo)();
>   void *bar;
> }
> 
> void wrap (struct a *a)
> {
>   a->foo ();
> }
> 
> will prevent us from figuring out that bar can not be modified when you
> pass non-ecaping instance of struct a to wrap.
> 

I am testing this updated patch which implements that.  I am not very
happy about this (it punishes -fno-ipa-pta path for not very good
reason), but we need bugfix for release branch.  

It is very easy now to add now EAF flags at modref side
so we can track EAF_NOT_CALLED. 
tree-ssa-structalias side is always bit anoying wrt new EAF flags
because it has three copies of the code building constraints for call
(for normal, pure and const).

Modref is already tracking if function can read/modify global memory.  I
plan to add flags for NRC and link chain and then we can represent
effect of ECF_CONST and PURE by simply adding flags.  I would thus would
like to merge that code.  We do various optimizations to reduce amount
of constriants produced, but hopefully this is not very important (or
can be implemented by special casing in unified code).

Honza

gcc/ChangeLog:

2021-08-22  Jan Hubicka  
Martin Liska  

* ipa-modref.c (analyze_ssa_name_flags): Indirect call implies
    ~EAF_NOCLOBBER.

gcc/testsuite/ChangeLog:

2021-08-22  Jan Hubicka  
Martin Liska  

* gcc.dg/lto/pr101949_0.c: New test.
* gcc.dg/lto/pr101949_1.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index fafd804d4ba..549153865b8 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1700,6 +1700,15 @@ analyze_ssa_name_flags (tree name, vec 
&lattice, int depth,
   else if (gcall *call = dyn_cast  (use_stmt))
{
  tree callee = gimple_call_fndecl (call);
+
+ /* IPA PTA internally it treats calling a function as "writing" to
+the argument space of all functions the function pointer points to
+(PR101949).  We can not drop EAF_NOCLOBBER only when ipa-pta
+is on since that would allow propagation of this from -fno-ipa-pta
+to -fipa-pta functions.  */
+ if (gimple_call_fn (use_stmt) == name)
+   lattice[index].merge (~EAF_NOCLOBBER);
+
  /* Return slot optimization would require bit of propagation;
 give up for now.  */
  if (gimple_call_return_slot_opt_p (call)
diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_0.c 
b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
new file mode 100644
index 000..142dffe8780
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
@@ -0,0 +1,20 @@
+/* { dg-lto-do run } */
+/* { dg-lto-options { "-O2 -fipa-pta -flto -flto-partition=1to1" } } */
+
+extern int bar (int (*)(int *), int *);
+
+static int x;
+
+static int __attribute__ ((noinline)) foo (int *p)
+{
+  *p = 1;
+  x = 0;
+  return *p;
+}
+
+int main ()
+{
+  if (bar (foo, &x) != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_1.c 
b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
new file mode 100644
index 000..871d15c9bfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
@@ -0,0 +1,4 @@
+int __attribute__((noinline,noclone)) bar (int (*fn)(int *), int *p)
+{
+  return fn (p);
+}


Improve handling of return slots in ipa-modref

2021-08-23 Thread Jan Hubicka
Hi,
while looking at Martin's patch I also noticed that return slots are
handled but overactively.  We only care if the SSA name we analyze is
base of return slot.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

* ipa-modref.c (analyze_ssa_name_flags): Improve handling of return 
slot.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/modref-1.C: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 549153865b8..cb0a314cbeb 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1709,19 +1709,8 @@ analyze_ssa_name_flags (tree name, vec 
&lattice, int depth,
  if (gimple_call_fn (use_stmt) == name)
lattice[index].merge (~EAF_NOCLOBBER);
 
- /* Return slot optimization would require bit of propagation;
-give up for now.  */
- if (gimple_call_return_slot_opt_p (call)
- && gimple_call_lhs (call) != NULL_TREE
- && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (call
-   {
- if (dump_file)
-   fprintf (dump_file, "%*s  Unhandled return slot opt\n",
-depth * 4, "");
- lattice[index].merge (0);
-   }
  /* Recursion would require bit of propagation; give up for now.  */
- else if (callee && !ipa && recursive_call_p (current_function_decl,
+ if (callee && !ipa && recursive_call_p (current_function_decl,
  callee))
lattice[index].merge (0);
  else
@@ -1735,7 +1724,15 @@ analyze_ssa_name_flags (tree name, vec 
&lattice, int depth,
  /* Handle *name = func (...).  */
  if (gimple_call_lhs (call)
  && memory_access_to (gimple_call_lhs (call), name))
-   lattice[index].merge_direct_store ();
+   {
+ lattice[index].merge_direct_store ();
+ /* Return slot optimization passes address of
+LHS to callee via hidden parameter and this
+may make LHS to escape.  See PR 98499.  */
+ if (gimple_call_return_slot_opt_p (call)
+ && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (call
+   lattice[index].merge (EAF_NOREAD | EAF_DIRECT);
+   }
 
  /* We do not track accesses to the static chain (we could)
 so give up.  */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/modref-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/modref-1.C
new file mode 100644
index 000..c742dfe8b33
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/modref-1.C
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+struct S { int a; char b[20]; S(); S(S const&); };
+volatile int global;
+
+__attribute__ ((noinline,noclone))
+struct S noescape (int *b)
+{
+  struct S a;
+  a.a = b!=0;
+  global = 1;
+  return a;
+}
+
+void escape (struct S *p);
+
+__attribute__ ((noinline,noclone))
+int
+test(int *b)
+{
+  struct S s = noescape (b);
+  escape (&s);
+  return *b;
+}
+int test2()
+{
+  int b=1234;
+  test (&b);
+  return b;
+}
+// ipa-modref should analyze parameter B of test as noescape.
+// { dg-final { scan-tree-dump "return 1234" } }


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-23 Thread Jan Hubicka
> Hello.
> 
> Thanks for working on that. But have really run the test-cases as the newly
> added test still aborts as it used to before you installed this patch?

Eh, sorry, I had earlier version of patch that did

  if (gimple_call_fn (use_stmt) == name)
lattice[index].merge (0);

like yours and then I noticed that dropping things like EAF_NOT_RETURNED
is not necessary. However instead

  if (gimple_call_fn (use_stmt) == name)
lattice[index].merge (~EAF_NOCLOBBER);

It should be

  if (gimple_call_fn (use_stmt) == name)
lattice[index].merge (~(EAF_NOCLOBBER | EAF_UNUSED));

Since EAF_UNUSED implies all the other flags so the merge becomes noop
with ~EAF_NOCLOBBER.  I will test the fix.
I remember re-running bootstrap®test not sure how I missed the
failure.

Honza


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-23 Thread Jan Hubicka
Hi,
> 
> Why does it "punish" -fno-ipa-pta?  It merely "punishes" modref of
> no longer claiming that we do not alter the instruction stream pointed
> to by a->foo, sth that shouldn't be very common.

For example
struct a {
  void (*foo)();
  void *bar;
}
fn(struct a *a)
{
   a->foo();
}

With Maritn's patch we will drop EAF flags of A to NODIRECTESCAPE since 
we will think its derefernece is is used in all posible ways. 

With my patch we get NOT_RETURNED | NOESCAPE.
Still we will make PTA to think that whatever is pointed to by bar may
be clobbered and this seems unnecessary.

I have to look into ipa-pta how it haldnes the "instruction stream
clobbering". I was not aware it does something smart about indirect
calls.

Honza


Re: [PATCH] ipa/97565 - fix IPA PTA body availability check

2021-08-23 Thread Jan Hubicka
> Looks like the existing check using has_gimple_body_p isn't enough
> at LTRANS time but I need to check in_other_partition as well.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK?
> 
> Thanks,
> Richard.
> 
> 2021-08-23  Richard Biener  
> 
>   PR ipa/97565
>   * tree-ssa-structalias.c (ipa_pta_execute): Check in_other_partition
>   in addition to has_gimple_body.
> 
>   * g++.dg/lto/pr97565_0.C: New testcase.
>   * g++.dg/lto/pr97565_1.C: Likewise.
OK,
thanks!

Honza
> ---
>  gcc/testsuite/g++.dg/lto/pr97565_0.C |  7 +++
>  gcc/testsuite/g++.dg/lto/pr97565_1.C |  6 ++
>  gcc/tree-ssa-structalias.c   | 22 ++
>  3 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr97565_0.C
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr97565_1.C
> 
> diff --git a/gcc/testsuite/g++.dg/lto/pr97565_0.C 
> b/gcc/testsuite/g++.dg/lto/pr97565_0.C
> new file mode 100644
> index 000..f4572e17bf5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr97565_0.C
> @@ -0,0 +1,7 @@
> +// { dg-lto-do link }
> +// { dg-lto-options { "-O -flto -fipa-pta" } }
> +
> +extern "C" void abort(void)
> +{
> +  abort();
> +}
> diff --git a/gcc/testsuite/g++.dg/lto/pr97565_1.C 
> b/gcc/testsuite/g++.dg/lto/pr97565_1.C
> new file mode 100644
> index 000..ff7b638e9c5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr97565_1.C
> @@ -0,0 +1,6 @@
> +extern "C" void abort(void);
> +
> +int main(int argc, char * argv[])
> +{
> +  abort();
> +}
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index fb0e4299703..c4308551d1b 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -8220,10 +8220,12 @@ ipa_pta_execute (void)
>FOR_EACH_DEFINED_FUNCTION (node)
>  {
>varinfo_t vi;
> -  /* Nodes without a body are not interesting.  Especially do not
> - visit clones at this point for now - we get duplicate decls
> -  there for inline clones at least.  */
> -  if (!node->has_gimple_body_p () || node->inlined_to)
> +  /* Nodes without a body in this partition are not interesting.
> +  Especially do not visit clones at this point for now - we
> +  get duplicate decls there for inline clones at least.  */
> +  if (!node->has_gimple_body_p ()
> +   || node->in_other_partition
> +   || node->inlined_to)
>   continue;
>node->get_body ();
>  
> @@ -8301,8 +8303,10 @@ ipa_pta_execute (void)
>struct function *func;
>basic_block bb;
>  
> -  /* Nodes without a body are not interesting.  */
> -  if (!node->has_gimple_body_p () || node->clone_of)
> +  /* Nodes without a body in this partition are not interesting.  */
> +  if (!node->has_gimple_body_p ()
> +   || node->in_other_partition
> +   || node->clone_of)
>   continue;
>  
>if (dump_file)
> @@ -8431,8 +8435,10 @@ ipa_pta_execute (void)
>unsigned i;
>basic_block bb;
>  
> -  /* Nodes without a body are not interesting.  */
> -  if (!node->has_gimple_body_p () || node->clone_of)
> +  /* Nodes without a body in this partition are not interesting.  */
> +  if (!node->has_gimple_body_p ()
> +   || node->in_other_partition
> +   || node->clone_of)
>   continue;
>  
>fn = DECL_STRUCT_FUNCTION (node->decl);
> -- 
> 2.31.1


Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor

2021-08-23 Thread Jan Hubicka
> 
> Any strong opinions?
> 
> Richard.
> 
> 2021-08-23  Richard Biener  
> 
>   * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
>   documentation.
>   * params.opt (--param vect-inner-loop-cost-factor): Remove.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
>   Initialize inner_loop_cost_factor to 1.
>   (vect_analyze_loop_form): Initialize inner_loop_cost_factor
>   from the estimated number of iterations of the inner loop.
> ---
>  gcc/doc/invoke.texi  |  5 -
>  gcc/params.opt   |  4 
>  gcc/tree-vect-loop.c | 12 +++-
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c057cc1e4ae..054950132f6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and 
> stores in all loops.
>  The parameter only has an effect on targets that support partial
>  vector loads and stores.
>  
> -@item vect-inner-loop-cost-factor
> -The factor which the loop vectorizer applies to the cost of statements
> -in an inner loop relative to the loop being vectorized.  The default
> -value is 50.
> -
>  @item avoid-fma-max-bits
>  Maximum number of bits for which we avoid creating FMAs.
>  
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f9264887b40..f7b19fa430d 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the 
> vectorizer's loop versioning f
>  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) 
> IntegerRange(0, 2) Param Optimization
>  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means 
> only for loops whose need to iterate can be removed, 2 means for all loops.  
> The default value is 2.
>  
> --param=vect-inner-loop-cost-factor=
> -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
> IntegerRange(1, 99) Param Optimization
> -The factor which the loop vectorizer applies to the cost of statements in an 
> inner loop relative to the loop being vectorized.
> -
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cb48717f20e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  single_scalar_iteration_cost (0),
>  vec_outside_cost (0),
>  vec_inside_cost (0),
> -inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> +inner_loop_cost_factor (1),
>  vectorizable (false),
>  can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
>  using_partial_vectors_p (false),
> @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, 
> vec_info_shared *shared)
>stmt_vec_info inner_loop_cond_info
>   = loop_vinfo->lookup_stmt (inner_loop_cond);
>STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  /* If we have an estimate on the number of iterations of the inner
> +  loop use that as the scale for costing, otherwise conservatively
> +  assume a single inner iteration.  */
> +  widest_int nit;
> +  if (get_estimated_loop_iterations (loop->inner, &nit))
> + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> +   /* Since costing is done on unsigned int cap the scale on
> +  some large number consistent with what we'd see in
> +  CFG counts.  */
> +   = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
This looks sane to me, but for the case where profile info is missing,
you will get false from get_estimated_loop_iterations and in that case
it will think that inner loop iterates once which seems bit unrealistic
for vectorizable loop nests.

I assume you want to deterine here that reducing cost of stmt in
inner loop is more importnat than increasing cost of stmt outside..

So perhaps keeping parmeter and capping it with get_max_loop_iterations
would be sane?

Honza
>  }
>  
>gcc_assert (!loop->aux);
> -- 
> 2.31.1


Avoid redundant entries in modref's access lists

2021-08-23 Thread Jan Hubicka
Hi,
in PR101296 Richard noticed that modref is giving up on analysis in milc by
hitting --param=modref-max-accesses limit.  While cleaning up original modref
patch I removed code that tried to do smart things while merging accesses
because it had bugs and wanted to reimplement it later which I later forgot.

This patch adds logic that avoids adding access and its subaccess to the list
which is just waste of memory and compile time.  Incrementally I will add logic
merging the ranges.

Bootstrapped/regtested x86_64-linux, comitted.  Current cc1plus stats are

Alias oracle query stats:   
  refs_may_alias_p: 72546769 disambiguations, 82870545 queries  
  ref_maybe_used_by_call_p: 497089 disambiguations, 73535250 queries
  call_may_clobber_ref_p: 259485 disambiguations, 263258 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 38042 queries 
  nonoverlapping_refs_since_match_p: 21125 disambiguations, 65780 must 
overlaps, 87810 queries
  aliasing_component_refs_p: 63132 disambiguations, 2186210 queries 
  TBAA oracle: 26058958 disambiguations 61665515 queries
   12157742 are in alias set 0  
   11350680 queries asked about the same object 
   144 queries asked about the same alias set   
   0 access volatile
   10552147 are dependent in the DAG
   1545844 are aritificially in conflict with void *

Modref stats:   
  modref use: 24018 disambiguations, 713486 queries 
  modref clobber: 1400403 disambiguations, 17119339 queries 
  3473726 tbaa queries (0.202912 per modref query)  
  535259 base compares (0.031266 per modref query)  

PTA query stats:
  pt_solution_includes: 12436890 disambiguations, 20321783 queries  
  pt_solutions_intersect: 1390457 disambiguations, 14654884 queries 

This is pretty much the same as last time I measured.  This is bit of expected
since we do not hit the limit on GCC very often.  It is 43 times during cc1plus
LTO build (release checking), however code that does a lot of array/fields
initialization may hit the limit easily.

gcc/ChangeLog:

2021-08-23  Jan Hubicka  

* ipa-modref-tree.h (modref_access_node::range_info_useful_p):
Improve range compare.
(modref_access_node::contains): New member function.
(modref_access_node::search): Remove.
(modref_access_node::insert): Be smarter about subaccesses.


gcc/testsuite/ChangeLog:

2021-08-23  Jan Hubicka  

* gcc.dg/tree-ssa/modref-7.c: New test.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index d36c28c0470..2e26b75e21f 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -66,7 +66,10 @@ struct GTY(()) modref_access_node
   /* Return true if range info is useful.  */
   bool range_info_useful_p () const
 {
-  return parm_index != -1 && parm_offset_known;
+  return parm_index != -1 && parm_offset_known
+&& (known_size_p (size)
+|| known_size_p (max_size)
+|| known_ge (offset, 0));
 }
   /* Return true if both accesses are the same.  */
   bool operator == (modref_access_node &a) const
@@ -88,6 +91,35 @@ struct GTY(()) modref_access_node
return false;
   return true;
 }
+  /* Return true A is a subaccess.  */
+  bool contains (modref_access_node &a) const
+{
+  if (parm_index != a.parm_index)
+   return false;
+  if (parm_index >= 0)
+   {
+ if (parm_offset_known
+ && (!a.parm_offset_known
+ || !known_eq (parm_offset, a.parm_offset)))
+   return false;
+   }
+  if (range_info_useful_p ())
+   {
+ if (!a.range_info_useful_p ())
+   return false;
+ /* Sizes of stores are used to check that object is big enough
+to fit the store, so smaller or unknown sotre is more general
+than large store.  */
+ if (known_size_p (size)
+ && !known_le (size, a.size))
+   return false;
+ if (known_size_p (max_size))
+   return known_subrange_p (a.offset, a.max_size, offset, max_size);
+ else
+   return known_le (offset, a.offset);
+   }
+  return true;
+  

Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor

2021-08-24 Thread Jan Hubicka
> > 
> > I noticed loop-doloop.c use _int version and likely_max, maybe you want 
> > that here?
> >  
> >   est_niter = get_estimated_loop_iterations_int (loop);
> >   if (est_niter == -1)
> > est_niter = get_likely_max_loop_iterations_int (loop)
> 
> I think that are two different things - get_estimated_loop_iterations_int
> are the average number of iterations while 
> get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
> want to use an upper bound for costing.
> 
> Based on feedback from Honza I'm currently testing the variant below
> which keeps the --param and uses it to cap the estimated number of
> iterations.  That makes the scaling more precise for inner loops that
> don't iterate much but keeps the --param to avoid overflow and to
> keep the present behavior when there's no reliable profile info
> available.

indeed, get_likely_max_loop_iterations_int may be very large.  In some
cases it however will give useful value - for example when loop travels
small array.

So what one can use it for is to cap the --param value.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cbdd5b407da 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, 
> vec_info_shared *shared)
>stmt_vec_info inner_loop_cond_info
>   = loop_vinfo->lookup_stmt (inner_loop_cond);
>STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  /* If we have an estimate on the number of iterations of the inner
> +  loop use that to limit the scale for costing, otherwise use
> +  --param vect-inner-loop-cost-factor literally.  */
> +  widest_int nit;
> +  if (get_estimated_loop_iterations (loop->inner, &nit))
> + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> +   = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();

  if (get_estimated_loop_iterations (loop->inner, &nit))
LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
  = wi::smin (nit, REG_BR_PROB_BASE /*or other random big cap  
*/).to_uhwi ();
  else if (get_likely_max_loop_iterations (loop->inner, &nit))
LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
  = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();
  else
LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
  = param_vect_inner_loop_cost_factor;

I.e. if we really know the number of iterations, we probably want to
weight by it but we want to cap to avoid overflows.  I assume if we kno
that tripcount is 1 or more we basically do not care about damage
done to outer loop as long as iner loop improves?

If we know max number of iterations and it is smaller then the param,
we want to use it as cap.

The situation where get_estimated_loop_iteraitons returns wrong value
should be rare - basically when the loop was duplicated by inliner
(or other transform) and it behaves a lot differently then the average
execution of the loop in the train run.  In this case we could also
argue that the loop is not statistically important :)

Honza
>  }
>  
>gcc_assert (!loop->aux);
> -- 
> 2.31.1
> 


Merge stores/loads in modref summaries

2021-08-25 Thread Jan Hubicka
Hi,
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries.  This helps analyzing array initializers and similar code.  It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely.  For this reason I
added counter tracking number of adjustments and a cap to limit them during the
dataflow.  This triggers in:
void
recurse (char *p, int n)
{
*p = 0;
if (n)
  recurse (p+1,n-1);
}

Where we work now hard enugh to determine:

access: Parm 0 param offset:0 offset:0 size:8 max_size:-1 adjusted 8 times

which is correct access info saying that param0 can be accessed from byte 0
in 8bit accesses with unknown max_size.

--param max-modref-accesses is now hit 8 times instead of 45 before the patch.
We hit --param param=modref-max-adjustments once for fft algorithm (where the
recursion really seems to walk array) and max-bases once for late modref and 9
times for IPA modref (it works on types rather than alias sets so it is more
likely to hit the limit).

I would be happy for suggestions how to simplify the merging logic. It is bit
convoluted since I need to know if I am going to adjust the range and need
to deal with poly_ints and possibly unknown sizes.

Incrementally I will add logic on improving behaviour when limits are met
instead of just giving up on the analysis.

With the patch I get following cc1plus stats:

Alias oracle query stats:
  refs_may_alias_p: 83135089 disambiguations, 101581194 queries
  ref_maybe_used_by_call_p: 590484 disambiguations, 84157326 queries
  call_may_clobber_ref_p: 345434 disambiguations, 349295 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 39520 queries
  nonoverlapping_refs_since_match_p: 33266 disambiguations, 66411 must 
overlaps, 100574 queries
  aliasing_component_refs_p: 66251 disambiguations, 9920037 queries
  TBAA oracle: 31033174 disambiguations 93485041 queries
   14359693 are in alias set 0
   11930606 queries asked about the same object
   129 queries asked about the same alias set
   0 access volatile
   34218393 are dependent in the DAG
   1943046 are aritificially in conflict with void *

Modref stats:
  modref use: 26293 disambiguations, 705198 queries 
  modref clobber: 1828340 disambiguations, 21213011 queries
  4748965 tbaa queries (0.223870 per modref query)  
  711083 base compares (0.033521 per modref query)  

PTA query stats:
  pt_solution_includes: 13119524 disambiguations, 33183481 queries
  pt_solutions_intersect: 1510541 disambiguations, 15368102 queries

this would suggest quite large improvement over my last run
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577962.html
(about 12% on overall disambiguation count)

but I also updated my setup so part of the increase may be accounted for
diferent libraries.  The overall size of modref access lists is about halved on
cc1plus that looks promising though. It may be that we less often hit the limit
on number of querries done in tree-ssa-alias.

Bootstrapped/regtested x86_64-linux.
I plan to commit the patch after bit more testing.

gcc/ChangeLog:

* doc/invoke.texi: Document --param modref-max-adjustments
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref-tree.h (struct modref_access_node): Add adjustments;
(modref_access_node::operator==): Fix handling of access ranges.
(modref_access_node::contains): Constify parameter; handle also
mismatched parm offsets.
(modref_access_node::update): New function.
(modref_access_node::merge0: New function.
(unspecified_modref_access_node): Update constructor.
(modref_ref_node::insert_access): Add record_adjustments parameter;
handle merging.
(modref_ref_node::try_merge_with): New private function.
(modref_tree::insert): New record_adjustments parameter.
(modref_tree::merge): New record_adjustments parameter.
(modref_tree::copy_from): Update.
* ipa-modref.c (dump_access): Dump adjustments field.
(get_access): Update constructor.
(record_access): Update call of insert.
(record_access_lto): Update call of insert.
(merge_call_side_effects): Add record_adjustments parameter.
(get_access_for_fnspec): Update.
(process_fnspec): Update.
(analyze_call): Update.
(analyze_function): Update.
(read_modref_records): Update.
(ipa_merge_modref_summary_after_inlining): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
* params.opt: (param-max-modref-adjustments=): New.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/modref-1.c: Update testcase.
* gcc.dg/tree-ssa/modref-4.c: Update testcase.
* gcc.dg/tree-ssa/modref-8.c: New test.

diff --git a/

Re: Merge stores/loads in modref summaries

2021-08-26 Thread Jan Hubicka
> 
> This patch is causing ICEs on arm:
>  FAIL: g++.dg/torture/pr89303.C   -O1  (internal compiler error)
> FAIL: g++.dg/torture/pr89303.C   -O1  (test for excess errors)

It happens on 32bit arches only it seems.  For some reason we end up
merging
  access: Parm 0 param offset:12 offset:0 size:96 max_size:96
  access: Parm 0 param offset:0 offset:0 size:96 max_size:96
as 
  access: Parm 0 param offset:0 offset:0 size:96 max_size:192
which is correct but we already have
  access: Parm 0 param offset:0 offset:0 size:32 max_size:192
in the list and merging asserts since we have proper subaccess
which is supposed to be handled earlier.

try_merge_with does not consider the case but there is already proble
with both
  access: Parm 0 param offset:12 offset:0 size:96 max_size:96
  access: Parm 0 param offset:0 offset:0 size:32 max_size:192
being in the list since the first is subaccess of the second. So after
lunch I will need to debug how those two gets into the list at first
place...

Honza
  
> Excess errors:
> during GIMPLE pass: modref
> /gcc/testsuite/g++.dg/torture/pr89303.C:792:1: internal compiler error: in
> merge, at ipa-modref-tree.h:203
> 0xdc9b2b modref_access_node::merge(modref_access_node const&, bool)
> /gcc/ipa-modref-tree.h:203
> 0xdcbbb9 modref_ref_node::try_merge_with(unsigned long)
> /gcc/ipa-modref-tree.h:397
> 0xdcc4aa modref_ref_node::insert_access(modref_access_node, unsigned
> long, bool)
> /gcc/ipa-modref-tree.h:366
> 0xdcc71b modref_tree::insert(int, int, modref_access_node, bool)
> /gcc/ipa-modref-tree.h:597
> 0xdc1312 record_access
> /gcc/ipa-modref.c:713
> 0xdc1e34 analyze_store
> /gcc/ipa-modref.c:1245
> 0xd00f2e walk_stmt_load_store_addr_ops(gimple*, void*, bool (*)(gimple*,
> tree_node*, tree_node*, void*), bool (*)(gimple*, tree_node*, tree_node*,
> void*), bool (*)(gimple*, tree_node*, tree_node*, void*))
> /gcc/gimple-walk.c:767
> 0xdc6f4a analyze_stmt
> /gcc/ipa-modref.c:1269
> 0xdc6f4a analyze_function
> /gcc/ipa-modref.c:2131
> 0xdc860d execute
> /gcc/ipa-modref.c:2957
> 
>  FAIL: 20_util/enable_shared_from_this/89303.cc (test for excess errors)
> Excess errors:
> during GIMPLE pass: modref
> /libstdc++-v3/testsuite/20_util/enable_shared_from_this/89303.cc:39:
> internal compiler error: in merge, at ipa-modref-tree.h:203
> 0xdc9b2b modref_access_node::merge(modref_access_node const&, bool)
> /gcc/ipa-modref-tree.h:203
> 0xdcbbb9 modref_ref_node::try_merge_with(unsigned long)
> /gcc/ipa-modref-tree.h:397
> 0xdcc4aa modref_ref_node::insert_access(modref_access_node, unsigned
> long, bool)
> /gcc/ipa-modref-tree.h:366
> 0xdcc71b modref_tree::insert(int, int, modref_access_node, bool)
> /gcc/ipa-modref-tree.h:597
> 0xdc1312 record_access
> /gcc/ipa-modref.c:713
> 0xdc1e34 analyze_store
> /gcc/ipa-modref.c:1245
> 0xd00f2e walk_stmt_load_store_addr_ops(gimple*, void*, bool (*)(gimple*,
> tree_node*, tree_node*, void*), bool (*)(gimple*, tree_node*, tree_node*,
> void*), bool (*)(gimple*, tree_node*, tree_node*, void*))
> /gcc/gimple-walk.c:767
> 0xdc6f4a analyze_stmt
> /gcc/ipa-modref.c:1269
> 0xdc6f4a analyze_function
> /gcc/ipa-modref.c:2131
> 0xdc860d execute
> /gcc/ipa-modref.c:2957
> 
> Can you have a look?
> 
> thanks,
> 
> Christophe
> 
> 
> 
> 
> 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index b8f5d9e1cce..b83bd902cec 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13423,6 +13423,10 @@ Setting to 0 disables the analysis completely.
> >  @item modref-max-escape-points
> >  Specifies the maximum number of escape points tracked by modref per
> > SSA-name.
> >
> > +@item modref-max-adjustments
> > +Specifies the maximum number the access range is enlarged during modref
> > dataflow
> > +analysis.
> > +
> >  @item profile-func-internal-id
> >  A parameter to control whether to use function internal id in profile
> >  database lookup. If the value is 0, the compiler uses an id that
> > diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
> > index 64e57f52147..69395b0113c 100644
> > --- a/gcc/ipa-modref-tree.c
> > +++ b/gcc/ipa-modref-tree.c
> > @@ -41,7 +41,7 @@ test_insert_search_collapse ()
> >ASSERT_FALSE (t->every_base);
> >
> >/* Insert into an empty tree.  */
> > -  t->insert (1, 2, a);
> > +  t->insert (1, 2, a, false);
> >ASSERT_NE (t->bases, NULL);
> >ASSERT_EQ (t->bases->length (), 1);
> >ASSERT_FALSE (t->every_base);
> > @@ -59,7 +59,7 @@ test_insert_search_collapse ()
> >ASSERT_EQ (ref_node->ref, 2);
> >
> >/* Insert when base exists but ref does not.  */
> > -  t->insert (1, 3, a);
> > +  t->insert (1, 3, a, false);
> >ASSERT_NE (t->bases, NULL);
> >ASSERT_EQ (t->bases->length (), 1);
> >ASSERT_EQ (t->search (1), base_node);
> > @@ -72,7 +72,7 @@ test_insert_search_collapse ()
> >
> >/* Insert when base and

Re: Merge stores/loads in modref summaries

2021-08-26 Thread Jan Hubicka
> On 8/26/21 10:33, Christophe Lyon via Gcc-patches wrote:
> > Can you have a look?
> 
> Please create a PR for it.
I have fix, so perhaps there is no need for PR :)
I am testing the following - the problem was that try_merge_with missed
some merges because how unoredered_remove handles the vector.

Bootstrapping/regtesteing x86_64-linux.

Honza

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 6f6932f0875..b27c9689987 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -322,6 +322,20 @@ struct GTY((user)) modref_ref_node
 every_access = true;
   }
 
+  /* Verify that list does not contain redundant accesses.  */
+  void verify ()
+  {
+size_t i, i2;
+modref_access_node *a, *a2;
+
+FOR_EACH_VEC_SAFE_ELT (accesses, i, a)
+  {
+   FOR_EACH_VEC_SAFE_ELT (accesses, i2, a2)
+ if (i != i2)
+   gcc_assert (!a->contains (*a2));
+  }
+  }
+
   /* Insert access with OFFSET and SIZE.
  Collapse tree if it has more than MAX_ACCESSES entries.
  If RECORD_ADJUSTMENTs is true avoid too many interval extensions.
@@ -337,6 +351,9 @@ struct GTY((user)) modref_ref_node
 size_t i;
 modref_access_node *a2;
 
+if (flag_checking)
+  verify ();
+
 if (!a.useful_p ())
   {
if (!every_access)
@@ -392,13 +409,15 @@ private:
 size_t i;
 
 FOR_EACH_VEC_SAFE_ELT (accesses, i, a2)
-  if (i != index)
-   if ((*accesses)[index].contains (*a2)
-   || (*accesses)[index].merge (*a2, false))
+  if (i != index
+ && ((*accesses)[index].contains (*a2)
+ || (*accesses)[index].merge (*a2, false)))
{
- if (index == accesses->length () - 1)
-   index = i;
  accesses->unordered_remove (i);
+ if (index == accesses->length ())
+   index = i;
+ else
+   i--;
}
   }
 };


Improve handling of modref --params

2021-08-26 Thread Jan Hubicka
Hi,
this patch makes insertion to modref access tree smarter when --param
modref-max-bases and moredref-max-refs are hit.  Instead of giving up
we either give up on base alias set (make it equal to ref) or turn the
alias set to 0.  This lets us to track useful info on quite large
functions, such as ggc_free.

Bootstrapped/regtested x86_64-linux, also tested with lto bootstrap.  Actual
modref disambiguation counts are not chnaged much. Will commit it soon.

max-bases limit is hit 233 times, max-refs 243 times during the WPA+ltrans.

Honza

gcc/ChangeLog:

* ipa-modref-tree.c (test_insert_search_collapse): Update test.
* ipa-modref-tree.h (modref_base_node::insert): Be smarter when
hiting --param modref-max-refs limit.
(modref_tree:insert_base): Be smarter when hitting
--param modref-max-bases limit. Add new parameter REF.
(modref_tree:insert): Update.
(modref_tree:merge): Update.
* ipa-modref.c (read_modref_records): Update.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index 69395b0113c..8d147a18aed 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -101,7 +101,7 @@ test_insert_search_collapse ()
   ASSERT_TRUE (base_node->every_ref);
 
   /* Insert base to trigger base list collapse.  */
-  t->insert (5, 6, a, false);
+  t->insert (5, 0, a, false);
   ASSERT_TRUE (t->every_base);
   ASSERT_EQ (t->bases, NULL);
   ASSERT_EQ (t->search (1), NULL);
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 4edec4efded..97934a91ada 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -463,18 +463,23 @@ struct GTY((user)) modref_base_node
 if (ref_node)
   return ref_node;
 
-if (changed)
-  *changed = true;
-
-/* Collapse the node if too full already.  */
-if (refs && refs->length () >= max_refs)
+/* We always allow inserting ref 0.  For non-0 refs there is upper
+   limit on number of entries and if exceeded,
+   drop ref conservatively to 0.  */
+if (ref && refs && refs->length () >= max_refs)
   {
if (dump_file)
- fprintf (dump_file, "--param param=modref-max-refs limit reached\n");
-   collapse ();
-   return NULL;
+ fprintf (dump_file, "--param param=modref-max-refs limit reached;"
+  " using 0\n");
+   ref = 0;
+   ref_node = search (ref);
+   if (ref_node)
+ return ref_node;
   }
 
+if (changed)
+  *changed = true;
+
 ref_node = new (ggc_alloc  > ())modref_ref_node 
 (ref);
 vec_safe_push (refs, ref_node);
@@ -532,9 +537,10 @@ struct GTY((user)) modref_tree
 
   /* Insert BASE; collapse tree if there are more than MAX_REFS.
  Return inserted base and if CHANGED is non-null set it to true if
- something changed.  */
+ something changed.
+ If table gets full, try to insert REF instead.  */
 
-  modref_base_node  *insert_base (T base, bool *changed = NULL)
+  modref_base_node  *insert_base (T base, T ref, bool *changed = NULL)
   {
 modref_base_node  *base_node;
 
@@ -547,18 +553,31 @@ struct GTY((user)) modref_tree
 if (base_node)
   return base_node;
 
-if (changed)
-  *changed = true;
-
-/* Collapse the node if too full already.  */
-if (bases && bases->length () >= max_bases)
+/* We always allow inserting base 0.  For non-0 base there is upper
+   limit on number of entries and if exceeded,
+   drop base conservatively to ref and if it still does not fit to 0.  */
+if (base && bases && bases->length () >= max_bases)
   {
+   base_node = search (ref);
+   if (base_node)
+ {
+   if (dump_file)
+ fprintf (dump_file, "--param param=modref-max-bases"
+  " limit reached; using ref\n");
+   return base_node;
+ }
if (dump_file)
- fprintf (dump_file, "--param param=modref-max-bases limit reached\n");
-   collapse ();
-   return NULL;
+ fprintf (dump_file, "--param param=modref-max-bases"
+  " limit reached; using 0\n");
+   base = 0;
+   base_node = search (base);
+   if (base_node)
+ return base_node;
   }
 
+if (changed)
+  *changed = true;
+
 base_node = new (ggc_alloc  > ())
 modref_base_node  (base);
 vec_safe_push (bases, base_node);
@@ -582,8 +601,15 @@ struct GTY((user)) modref_tree
return true;
   }
 
-modref_base_node  *base_node = insert_base (base, &changed);
-if (!base_node || base_node->every_ref)
+modref_base_node  *base_node = insert_base (base, ref, &changed);
+base = base_node->base;
+/* If table got full we may end up with useless base.  */
+if (!base && !ref && !a.useful_p ())
+  {
+   collapse ();
+   return true;
+  }
+if (base_node->every_ref)
   return changed;
 gcc_checking_ass

Re: Merge stores/loads in modref summaries

2021-08-26 Thread Jan Hubicka
> 
> commit f075b8c5adcf9cb6336563c472c8d624c54184db
> Author: Jan Hubicka 
> Date:   Thu Aug 26 15:33:56 2021 +0200
> 
> Fix off-by-one error in try_merge_with
> 
> gcc/ChangeLog:
> 
> * ipa-modref-tree.h (modref_ref_node::verify): New member
> functoin.
> (modref_ref_node::insert): Use it.
> (modref_ref_node::try_mere_with): Fix off by one error.
> 
> caused libgo build failure on Linux/i686:
Sorry for that.  Jeff sent me independent testcase and it seems to be
same problem.  It turns out that after merging two access ranges one
needs to restart the walk since after this earlier access ranges may
merge or be contained in the bigger range produced. I missed this case
and apologize for it.

* ipa-modref-tree.h (modref_access_node::try_merge_with): Restart
search after merging.
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 97934a91ada..fc55583e571 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -405,20 +411,35 @@ private:
   void
   try_merge_with (size_t index)
   {
-modref_access_node *a2;
 size_t i;
 
-FOR_EACH_VEC_SAFE_ELT (accesses, i, a2)
-  if (i != index
- && ((*accesses)[index].contains (*a2)
- || (*accesses)[index].merge (*a2, false)))
+for (i = 0; i < accesses->length ();)
+  if (i != index)
{
- accesses->unordered_remove (i);
- if (index == accesses->length ())
-   index = i;
+ bool found = false, restart = false;
+ modref_access_node *a = &(*accesses)[i];
+ modref_access_node *n = &(*accesses)[index];
+
+ if (n->contains (*a))
+   found = true;
+ if (!found && n->merge (*a, false))
+   found = restart = true;
+ if (found)
+   {
+ accesses->unordered_remove (i);
+ if (index == accesses->length ())
+   {
+ index = i;
+ i++;
+   }
+ if (restart)
+   i = 0;
+   }
  else
-   i--;
+   i++;
}
+  else
+   i++;
   }
 };
 


Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-28 Thread Jan Hubicka
> On Aug 22, 2021, Jan Hubicka  wrote:
> 
> > OK, thanks for looking into this issue!
> 
> Thanks, I've finally installed it in the trunk.
> 
> > It seems that analye_stmt indeed does not skip debug stmts.  It is very
> > odd we got so far without hitting build difference.
> 
> Indeed.  That got me thinking...  The comments state:
> 
>  If the statement cannot be analyzed (for any reason), the entire
>  function cannot be analyzed by modref.
> 
> but the implementation also tests, for every statement:
> 
> || ((!summary || !summary->useful_p (ecf_flags, false))
> && (!summary_lto
> || !summary_lto->useful_p (ecf_flags, false
> 
> which means that, if the first stmt of a block doesn't add useful
> information to the summary, we give up.  Was this really the intent?
It is just early exit condition in case we already found enough side
effects to give up on any useful info about loads/stores.
Summaries are computed from optimistic one (function has no side
effects) by becoming more pesimistic as statements are being visited.

So in most cases useful_p is true at the begining of the loop (since we
see now loads/stores).  I suppose we was hitting the difference because
in const functions the summaries can be !useful_p from begining of the
loop.  I guess it is harmless to process the first statement in that
case (if we do not produce debug bootstrap difference)
Honza
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>


Improve merging of modref_access_node

2021-08-28 Thread Jan Hubicka
Hi,
this should be final bit of the fancy access merging.  We limit number of
accesses to 16 and on the overflow we currently just throw away the whole
table.  This patch instead looks for closest pair of entries in the table and
merge them (losing some precision).  This is not very often during normal gcc
bootstrap, but with -fno-strict-aliasing the overflows are much more common
and happens 272 times (for stuff like our autogenerated param handling).

I hope this may be useful for some real world code that does a lot of array 
manipulations.
Bootstrapped/regtested x86_64-linux, comitted.

Since I produced aliasing stats for cc1plus build with lto-bootstrap and
-fno-strict-aliasing, I am attaching it.

Alias oracle query stats:
  refs_may_alias_p: 73520790 disambiguations, 96898146 queries
  ref_maybe_used_by_call_p: 515551 disambiguations, 74487359 queries
  call_may_clobber_ref_p: 348504 disambiguations, 352504 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 0 queries
  nonoverlapping_refs_since_match_p: 33251 disambiguations, 42895 must 
overlaps, 76905 queries
  aliasing_component_refs_p: 0 disambiguations, 0 queries
  TBAA oracle: 0 disambiguations 128 queries
   0 are in alias set 0
   0 queries asked about the same object
   128 queries asked about the same alias set
   0 access volatile
   0 are dependent in the DAG
   0 are aritificially in conflict with void *

Modref stats:
  modref use: 7260 disambiguations, 640326 queries
  modref clobber: 591264 disambiguations, 20039893 queries
  0 tbaa queries (0.00 per modref query)
  1145567 base compares (0.057164 per modref query)

PTA query stats:
  pt_solution_includes: 13729755 disambiguations, 35737015 queries
  pt_solutions_intersect: 1703678 disambiguations, 13200534 queries

It seems that modref is still quite effective on handling clobbers.

gcc/ChangeLog:

* ipa-modref-tree.h (modref_access_node::merge): Break out
logic combining offsets and logic merging ranges to ...
(modref_access_node::combined_offsets): ... here
(modref_access_node::update2): ... here
(modref_access_node::closer_pair_p): New member function.
(modref_access_node::forced_merge): New member function.
(modre_ref_node::insert): Do merging when table is full.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/modref-9.c: New test.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index a86e684a030..6a9ed5ce54b 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -196,8 +196,9 @@ struct GTY(()) modref_access_node
  was prolonged and punt when there are too many.  */
   bool merge (const modref_access_node &a, bool record_adjustments)
 {
-  poly_int64 aoffset_adj = 0, offset_adj = 0;
-  poly_int64 new_parm_offset = parm_offset;
+  poly_int64 offset1 = 0;
+  poly_int64 aoffset1 = 0;
+  poly_int64 new_parm_offset = 0;
 
   /* We assume that containment was tested earlier.  */
   gcc_checking_assert (!contains (a) && !a.contains (*this));
@@ -209,29 +210,13 @@ struct GTY(()) modref_access_node
{
  if (!a.parm_offset_known)
return false;
- if (known_le (a.parm_offset, parm_offset))
-   {
- offset_adj = (parm_offset - a.parm_offset)
-   << LOG2_BITS_PER_UNIT;
- aoffset_adj = 0;
- new_parm_offset = a.parm_offset;
-   }
- else if (known_le (parm_offset, a.parm_offset))
-   {
- aoffset_adj = (a.parm_offset - parm_offset)
-<< LOG2_BITS_PER_UNIT;
- offset_adj = 0;
-   }
- else
+ if (!combined_offsets (a, &new_parm_offset, &offset1, &aoffset1))
return false;
}
}
   /* See if we can merge ranges.  */
   if (range_info_useful_p ())
{
- poly_int64 offset1 = offset + offset_adj;
- poly_int64 aoffset1 = a.offset + aoffset_adj;
-
  /* In this case we have containment that should be
 handled earlier.  */
  gcc_checking_assert (a.range_info_useful_p ());
@@ -255,46 +240,211 @@ struct GTY(()) modref_access_node
return false;
  if (known_le (offset1, aoffset1))
{
- if (!known_size_p (max_size))
+ if (!known_size_p (max_size)
+ || known_ge (offset1 + max_size, aoffset1))
{
- update (new_parm_offset, offset1, size, max_size,
- record_adjustments);
- return true;
-   }
- else if (known_ge (offset1 + max_size, aoffset1))
-   {
- poly_int64 new_max_size = max_size;
- if (known_le (max_size, a.max_size + aoffset1 - offset1))
-

Re: [PATCH] options, lto: Optimize streaming of optimization nodes

2020-09-14 Thread Jan Hubicka
> On Mon, Sep 14, 2020 at 09:31:52AM +0200, Richard Biener wrote:
> > But does it make any noticable difference in the end?  Using
> 
> Yes.
> 
> > bp_pack_var_len_unsigned just causes us to [u]leb encode half-bytes
> > rather than full bytes.  Using hardcoded 8/16/32/64 makes it still
> > dependent on what 'int' is at maximum on the host.
> > 
> > That is, I'd indeed prefer bp_pack_var_len_unsigned over hard-coding
> > 8, 16, etc., but can you share a size comparison of the bitpack?
> > I guess with bp_pack_var_len_unsigned it might shrink in half
> > compared to the current code and streaming standard -O2?
> 
> So, I've tried
> --- gcc/tree-streamer-out.c.jj2020-07-28 15:39:10.079755251 +0200
> +++ gcc/tree-streamer-out.c   2020-09-14 10:31:29.106957258 +0200
> @@ -489,7 +489,11 @@ streamer_write_tree_bitfields (struct ou
>  pack_ts_translation_unit_decl_value_fields (ob, &bp, expr);
>  
>if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
> +{
> +long ts = ob->main_stream->total_size;
>  cl_optimization_stream_out (ob, &bp, TREE_OPTIMIZATION (expr));
> +fprintf (stderr, "total_size %ld\n", (long) (ob->main_stream->total_size - 
> ts));
> +}

You should be able to read the sizes from streaming dump file as well.
>  
>if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
>  bp_pack_var_len_unsigned (&bp, CONSTRUCTOR_NELTS (expr));
> hack without and with the following patch on a simple small testcase with
> -O2 -flto.
> Got 574 bytes without the opc-save-gen.awk change and 454 bytes with it,
> that is ~ 21% saving on the TREE_OPTIMIZATION streaming.
> 
> 2020-09-14  Jakub Jelinek  
> 
>   * optc-save-gen.awk: In cl_optimization_stream_out use
>   bp_pack_var_len_{int,unsigned} instead of bp_pack_value.  In
>   cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
>   instead of bp_unpack_value.  Formatting fix.
> 
> --- gcc/optc-save-gen.awk.jj  2020-09-14 09:04:35.879854156 +0200
> +++ gcc/optc-save-gen.awk 2020-09-14 10:38:47.722424942 +0200
> @@ -1257,8 +1257,10 @@ for (i = 0; i < n_opt_val; i++) {
>   otype = var_opt_val_type[i];
>   if (otype ~ "^const char \\**$")
>   print "  bp_pack_string (ob, bp, ptr->" name", true);";
> + else if (otype ~ "^unsigned")
> + print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
>   else
> - print "  bp_pack_value (bp, ptr->" name", 64);";
> + print "  bp_pack_var_len_int (bp, ptr->" name");";
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> @@ -1274,14 +1276,15 @@ print "{";
>  for (i = 0; i < n_opt_val; i++) {
>   name = var_opt_val[i]
>   otype = var_opt_val_type[i];
> - if (otype ~ "^const char \\**$")
> - {
> -   print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> -   print "  if (ptr->" name")";
> -   print "ptr->" name" = xstrdup (ptr->" name");";
> + if (otype ~ "^const char \\**$") {
> + print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> + print "  if (ptr->" name")";
> + print "ptr->" name" = xstrdup (ptr->" name");";
>   }
> + else if (otype ~ "^unsigned")
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_unsigned (bp);";
>   else
> -   print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value 
> (bp, 64);";
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_int (bp);";

Not making difference between signed/unsigned was my implementation
lazyness at the time code was added. So this looks like nice cleanup.

Especially for the new param machinery, most of streamed values are
probably going to be the default values.  Perhaps somehow we could
stream them more effectively.

Overall we sould not get much more than 1 optimize/target node per unit
so the size should show up only when you stream a lot of very small .o
files.

Honza
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";
> 
> 
>   Jakub
> 


Fix availability of functions in other partitions

2020-09-17 Thread Jan Hubicka
Hi,
this patch fixes rather old bug that prevents ipa-reference to wrok
across partition boundary.  What happens is that availability code
thinks that function is not available and thus we ignore the summary we
stream.

Bootstrapped/regtested x86_64-linux. Comitted.

I have a testcase that I will sent with additional changes.

* cgraph.c (cgraph_node::get_availability): Fix availability of
functions in other partitions
* varpool.c (varpool_node::get_availability): Likewise.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c0b45795059..b43adaac7c0 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2360,7 +2360,7 @@ cgraph_node::get_availability (symtab_node *ref)
ref = cref->inlined_to;
 }
   enum availability avail;
-  if (!analyzed)
+  if (!analyzed && !in_other_partition)
 avail = AVAIL_NOT_AVAILABLE;
   else if (local)
 avail = AVAIL_LOCAL;
diff --git a/gcc/predict.def b/gcc/predict.def
index 2d8d4958b6d..d8e502152b8 100644
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 458cdf1bf37..31ea2132331 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -479,7 +479,7 @@ varpool_node::add (tree decl)
 enum availability
 varpool_node::get_availability (symtab_node *ref)
 {
-  if (!definition)
+  if (!definition && !in_other_partition)
 return AVAIL_NOT_AVAILABLE;
   if (!TREE_PUBLIC (decl))
 return AVAIL_AVAILABLE;


Re: New modref/ipa_modref optimization passes

2020-09-19 Thread Jan Hubicka
 several areas for improvements but I think it is not in shape
for mainline and rest can be dealt with incrementally.

Bootstrapped/regtested x86_64-linux including ada, d and go.  I plan to commit
it after bit more testing tomorrow.

2020-09-19  David Cepelik  
Jan Hubicka  

* Makefile.in: Add ipa-modref.c and ipa-modref-tree.c.
* alias.c: (reference_alias_ptr_type_1): Export.
* alias.h (reference_alias_ptr_type_1): Declare.
* common.opt (fipa-modref): New.
* gengtype.c (open_base_files): Add ipa-modref-tree.h and ipa-modref.h
* ipa-modref-tree.c: New file.
* ipa-modref-tree.h: New file.
* ipa-modref.c: New file.
* ipa-modref.h: New file.
* lto-section-in.c (lto_section_name): Add ipa_modref.
* lto-streamer.h (enum lto_section_type): Add LTO_section_ipa_modref.
* opts.c (default_options_table): Enable ipa-modref at -O1+.
* params.opt (-param=modref-max-bases, -param=modref-max-refs,
-param=modref-max-tests): New params.
* passes.def: Schedule pass_modref and pass_ipa_modref.
* timevar.def (TV_IPA_MODREF): New timevar.
(TV_TREE_MODREF): New timevar.
* tree-pass.h (make_pass_modref): Declare.
(make_pass_ipa_modref): Declare.
* tree-ssa-alias.c (dump_alias_stats): Include ipa-modref-tree.h
and ipa-modref.h
(alias_stats): Add modref_use_may_alias, modref_use_no_alias,
modref_clobber_may_alias, modref_clobber_no_alias, modref_tests.
(dump_alias_stats): Dump new stats.
(nonoverlapping_array_refs_p): Fix formating.
(modref_may_conflict): New function.
(ref_maybe_used_by_call_p_1): Use it.
(call_may_clobber_ref_p_1): Use it.
(call_may_clobber_ref_p): Update.
(stmt_may_clobber_ref_p_1): Update.
* tree-ssa-alias.h (call_may_clobber_ref_p_1): Update.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..c710bad27b1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1419,6 +1419,8 @@ OBJS = \
ipa-visibility.o \
ipa-inline-analysis.o \
ipa-inline-transform.o \
+   ipa-modref.o \
+   ipa-modref-tree.o \
ipa-predicate.o \
ipa-profile.o \
ipa-prop.o \
@@ -2587,6 +2589,8 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/alias.c $(srcdir)/bitmap.c $(srcdir)/cselib.c $(srcdir)/cgraph.c \
   $(srcdir)/ipa-prop.c $(srcdir)/ipa-cp.c $(srcdir)/ipa-utils.h \
   $(srcdir)/ipa-param-manipulation.h $(srcdir)/ipa-sra.c $(srcdir)/dbxout.c \
+  $(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.c \
+  $(srcdir)/ipa-modref-tree.h \
   $(srcdir)/signop.h \
   $(srcdir)/dwarf2out.h \
   $(srcdir)/dwarf2asm.c \
diff --git a/gcc/alias.c b/gcc/alias.c
index df85f07ee9a..1cb702be2ce 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -737,7 +737,7 @@ get_deref_alias_set (tree t)
adjusted to point to the outermost component reference that
can be used for assigning an alias set.  */
  
-static tree
+tree
 reference_alias_ptr_type_1 (tree *t)
 {
   tree inner;
diff --git a/gcc/alias.h b/gcc/alias.h
index 4453d9723ce..807af957f02 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -36,6 +36,7 @@ extern int objects_must_conflict_p (tree, tree);
 extern int nonoverlapping_memrefs_p (const_rtx, const_rtx, bool);
 extern void dump_alias_stats_in_alias_c (FILE *s);
 tree reference_alias_ptr_type (tree);
+tree reference_alias_ptr_type_1 (tree *);
 bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
 bool refs_same_for_tbaa_p (tree, tree);
diff --git a/gcc/common.opt b/gcc/common.opt
index dd68c61ae1d..b833b98dfb8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1825,6 +1825,10 @@ fipa-bit-cp
 Common Report Var(flag_ipa_bit_cp) Optimization
 Perform interprocedural bitwise constant propagation.
 
+fipa-modref
+Common Report Var(flag_ipa_modref) Optimization
+Perform interprocedural modref analysis
+
 fipa-profile
 Common Report Var(flag_ipa_profile) Init(0) Optimization
 Perform interprocedural profile propagation.
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index 981577481af..a59a8823f82 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -1726,7 +1726,7 @@ open_base_files (void)
   "except.h", "output.h",  "cfgloop.h", "target.h", "lto-streamer.h",
   "target-globals.h", "ipa-ref.h", "cgraph.h", "symbol-summary.h",
   "ipa-prop.h", "ipa-fnsummary.h", "dwarf2out.h", "omp-general.h",
-  "omp-offload.h", NULL
+  "omp-offload.h", "ipa-modref-tree.h", "ipa-modref.h", NULL
 };
 const char *const *ifp;
 outf_p gtype_desc_c;
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
new file mode 100644
index 000..f982ce94a75
--- /dev/null
+++ b/gcc/ipa-modref.c
@@ -0,0 +1,

Re: New modref/ipa_modref optimization passes

2020-09-19 Thread Jan Hubicka
Hi,
this is patch I am using to fix the assumed_alias_type.f90 failure by
simply arranging alias set 0 for the problematic array descriptor.

I am not sure this is the best option, but I suppose it is better than
setting all array descritors to have same canonical type (as done by
LTO)?

Honza

* trans-types.c (gfc_get_array_type_bounds): Set alias set to 0 for
arrays of unknown element type.
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 26fdb2803a7..bef3d270c06 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -1903,6 +1903,12 @@ gfc_get_array_type_bounds (tree etype, int dimen, int 
codimen, tree * lbound,
   base_type = gfc_get_array_descriptor_base (dimen, codimen, false);
   TYPE_CANONICAL (fat_type) = base_type;
   TYPE_STUB_DECL (fat_type) = TYPE_STUB_DECL (base_type);
+  /* Arrays of unknown type must alias with all array descriptors.  */
+  if (etype == ptr_type_node)
+{
+  TYPE_ALIAS_SET (base_type) = 0;
+  TYPE_ALIAS_SET (fat_type) = 0;
+}
 
   tmp = TYPE_NAME (etype);
   if (tmp && TREE_CODE (tmp) == TYPE_DECL)


Re: New modref/ipa_modref optimization passes

2020-09-20 Thread Jan Hubicka
> On Sun, 2020-09-20 at 00:32 +0200, Jan Hubicka wrote:
> > Hi,
> > this is cleaned up version of the patch.  I removed unfinished bits,
> > fixed
> > propagation, cleaned it up and fixed fallout.
> 
> [...]
> 
> > While there are several areas for improvements but I think it is not
> > in shape
> > for mainline and rest can be dealt with incrementally.
> 
> FWIW I think you typoed:
>   "not in shape for mainline"
> when you meant:
>   "now in shape for mainline"
> given...

Yep, sorry for that :)
> 
> Should new C++ source files have a .cc suffix, rather than .c?
> 
> [...]
> 
> > +  $(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.c \
> 
> ...which would affect this^

I was wondering about that and decided to stay with .c since it is what
other ipa passes use.  I can rename the files. We have some sources with
.c extension and others with .cc while they are now all C++. Is there
some plan to clean it up?
> > +void
> > +modref_tree_c_tests ()
> > +{
> > +  test_insert_search_collapse ();
> > +  test_merge ();
> > +}
> > +
> > +#endif
> 
> It looks like these tests aren't being called anywhere; should they be?
> 
> The convention is to put such tests inside namespace selftest and to
> call the "run all the tests in this file" function from
> selftest::run_tests.
> 
> If you change this to be a .cc file, then the _c_ in the function name
> should become a _cc_.
> 
> [...]
> 
> > diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
> > new file mode 100644
> > index 000..3bdd3058aa1
> > --- /dev/null
> > +++ b/gcc/ipa-modref-tree.h
> 
> [...]
> 
> > +void modref_c_tests ();
> 
> Likewise here; the convention is to declare these within
> selftest.h
> 
> [...]

Thanks, indeed is seems that the self tests code was dropped at some
point which I did not noticed.  I will fix that up.
> 
> Hope this is constructive
It is :)

Honza
> Dave
> 


Re: New modref/ipa_modref optimization passes

2020-09-21 Thread Jan Hubicka
> On Sun, 20 Sep 2020, Jan Hubicka wrote:
> 
> > Hi,
> > this is patch I am using to fix the assumed_alias_type.f90 failure by
> > simply arranging alias set 0 for the problematic array descriptor.
> 
> There's no such named testcase on trunk.  Can you be more specific
> as to the problem at hand?  It looks like gfortran.dg/assumed_type_9.f90
> execute FAILs at the moment.
> 
> In particular how's this not an issue w/o IPA modref?

> 
> For TYPE(*) I think the object itself cannot be accessed but for
> arrays the meta-info in the array descriptor can.  Now my question
> would be why the Fortran FE at the call site does not build an
> appropriately typed array descriptor?
> 
> CCing the fortran list.

The problem is:

alsize (struct array15_unknown & restrict a)
{
...
  _2 = *a_13(D).dtype.rank;
  _3 = (integer(kind=8)) _2;
...
}
}
and in main:

  struct array02_integer(kind=4) am;
   :
  MEM  [(struct dtype_type *)&am + 24B] = {};
  am.dtype.elem_len = 4;
  am.dtype.rank = 2;
  am.dtype.type = 1;
...
  _52 = alsize (&am);

Here array15_unknown and array02_integer are different structures with
different canonical types and thus we end up disambiguating the accesses
via base alias sets.

My understanding is that this _unknown array descriptor is supposed to
be universal and work with all kinds of arrays.

Wihtout modref this works because alsize is not inlined (we think code
size would grow). Forcing inliner to inline stil leads to working code
because we first constant propagate the pointer and then we see accesses
from same base DECL thus bypass the TBAA checks.  Disabling the
constant propagation leads to wrong code as wel.

Honza


Re: New modref/ipa_modref optimization passes

2020-09-21 Thread Jan Hubicka
> > 
> > The problem is:
> > 
> > alsize (struct array15_unknown & restrict a)
> > {
> > ...
> >   _2 = *a_13(D).dtype.rank;
> >   _3 = (integer(kind=8)) _2;
> > ...
> > }
> > }
> > and in main:
> > 
> >   struct array02_integer(kind=4) am;
> >:
> >   MEM  [(struct dtype_type *)&am + 24B] = {};
> >   am.dtype.elem_len = 4;
> >   am.dtype.rank = 2;
> >   am.dtype.type = 1;
> > ...
> >   _52 = alsize (&am);
> > 
> > Here array15_unknown and array02_integer are different structures with
> > different canonical types and thus we end up disambiguating the accesses
> > via base alias sets.
> > 
> > My understanding is that this _unknown array descriptor is supposed to
> > be universal and work with all kinds of arrays.
> 
> But the FE builds a new descriptor for each individual call and thus
> should build a universal descriptor for a call to an universal
> descriptor argument.

I see, so you would expect call to alsize to initialize things in
array15_unkonwn type?  That would work too.

Honza


Fix sse2-andnpd-1.c, avx-vandnps-1.c and sse-andnps-1.c testscases

2020-09-21 Thread Jan Hubicka
Hi,
these testcases now fails because they contains an invalid type puning
that happens via const VALUE_TYPE *v pointer. Since the check function
is noinline, modref is needed to trigger the wrong code.
I think it is easiest to fix it by no-strict-aliasing.

Regtested x86_64-linux, OK?

* gcc.target/i386/m128-check.h: Add no-strict aliasing to
CHECK_EXP macro.

diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h 
b/gcc/testsuite/gcc.target/i386/m128-check.h
index 48b23328539..6f414b07be7 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -78,6 +78,7 @@ typedef union
 
 #define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \
 static int \
+__attribute__((optimize ("no-strict-aliasing")))   \
 __attribute__((noinline, unused))  \
 check_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v) \
 {  \


Re: New modref/ipa_modref optimization passes

2020-09-21 Thread Jan Hubicka
> On Sun, 2020-09-20 at 19:30 +0200, Jan Hubicka wrote:
> > > On Sun, 2020-09-20 at 00:32 +0200, Jan Hubicka wrote:
> > > > Hi,
> > > > this is cleaned up version of the patch.  I removed unfinished
> > > > bits,
> > > > fixed
> > > > propagation, cleaned it up and fixed fallout.
> > > 
> > > [...]
> > > 
> > > > While there are several areas for improvements but I think it is
> > > > not
> > > > in shape
> > > > for mainline and rest can be dealt with incrementally.
> > > 
> > > FWIW I think you typoed:
> > >   "not in shape for mainline"
> > > when you meant:
> > >   "now in shape for mainline"
> > > given...
> > 
> > Yep, sorry for that :)
> 
> I've started seeing crashes in the jit testsuite even with trivial
> inputs, which are happening at pass_modref::~pass_modref at:
> 
> 772   ggc_delete (summaries);
> 
> on the first in-process iteration of the code, with:
> 
> (gdb) p summaries
> $3 = (fast_function_summary *) 0x0
> 
> I'm still investigating (but may have to call halt for the night), but
> this could be an underlying issue with the new passes; the jit
> testsuite runs with the equivalent of:
> 
> --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> 
> throughout to shake out GC issues (to do a full collection at each GC
> opportunity).
> 
> Was this code tested with the jit?  Do you see issues in cc1 if you set
> those params?  Anyone else seeing "random" crashes?

I suppose this happes when pass gets constructed but no summary is
computed.  Dos the NULL pointer guard here help?

Honza
> 
> Thanks
> Dave
> 
> 


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> > (gdb) p summaries
> > $3 = (fast_function_summary *) 0x0
> > 
> > I'm still investigating (but may have to call halt for the night), but
> > this could be an underlying issue with the new passes; the jit
> > testsuite runs with the equivalent of:
> > 
> > --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > 
> > throughout to shake out GC issues (to do a full collection at each GC
> > opportunity).
> > 
> > Was this code tested with the jit?  Do you see issues in cc1 if you set
> > those params?  Anyone else seeing "random" crashes?
> 
> I suppose this happes when pass gets constructed but no summary is
> computed.  Dos the NULL pointer guard here help?

Hi,
I am currently in train and can not test the patch easilly, but this
should help.  If you run the pass on empty input then the destruction
happens with NULL summaries pointer.

My apologizes for that.
Honza

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..cd92b5a81d3 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -769,7 +885,8 @@ class pass_modref : public gimple_opt_pass
 
 ~pass_modref ()
   {
-   ggc_delete (summaries);
+   if (summaries)
+ ggc_delete (summaries);
summaries = NULL;
   }
 
> 
> Honza
> > 
> > Thanks
> > Dave
> > 
> > 


Re: [committed] ipa: Fix up ipa modref option help texts

2020-09-22 Thread Jan Hubicka
> On Tue, Sep 22, 2020 at 10:13:46AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > --- gcc/params.opt.jj   2020-09-21 11:15:53.816516949 +0200
> > +++ gcc/params.opt  2020-09-22 09:59:37.121115589 +0200
> > @@ -882,7 +882,7 @@ Maximum number of refs stored in each mo
> >  
> >  -param=modref-max-tests=
> >  Common Joined UInteger Var(param_modref_max_tests) Init(64)
> > -Maximum number of tests perofmed by modref query
> > +Maximum number of tests perofmed by modref query.
> 
> And seeing the above typo led me to do some spell checking around.
> Here is the result, committed as obvious to trunk:
> 
> 2020-09-22  Jakub Jelinek  
> 
> gcc/
>   * params.opt (--param=modref-max-tests=): Fix typo in help text:
>   perofmed -> performed.
>   * common.opt: Fix typo: incrmeental -> incremental.
>   * ipa-modref.c: Fix typos: recroding -> recording, becaue -> because,
>   analsis -> analysis.
>   (class modref_summaries): Fix typo: betweehn -> between.
>   (analyze_call): Fix typo: calle -> callee.
>   (read_modref_records): Fix typo: expcted -> expected.
>   (pass_ipa_modref::execute): Fix typo: calle -> callee.
> gcc/c-family/
>   * c.opt (Wbuiltin-declaration-mismatch): Fix typo in variable name:
>   warn_builtin_declaraion_mismatch -> warn_builtin_declaration_mismatch.

Thanks a lot and sorry for these.

Honza


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> 
> Thanks; with that it survives the first in-process iteration, but then
> dies inside the 3rd in-process iteration, on a different finalizer. 
> I'm beginning to suspect a pre-existing bad interaction between
> finalizers and jit which perhaps this patch has exposed.
> 
> I'll continue to investigate it.

I will look into it tonight (I have meeting now about remote teaching),
I think the pass should not have destructor but there should be fini
function as other passes do, so I will clean this up.  Those dtors make
no sense and there also seems to be some memory leaks.

Honza
> 
> Dave
> 


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
David,
with jit I get the following:
/usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed: nonrepresentable 
section on output
collect2: error: ld returned 1 exit status
make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1] Error

Is there a fix/workaround?
Patch I am trying to test/debug is attached, it fixes the selftest issue
and the destructor.

Honza

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e37dee67fa3..a84508a2268 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+namespace selftest {
 
 static void
 test_insert_search_collapse ()
@@ -156,12 +157,14 @@ test_merge ()
 
 
 void
-modref_tree_c_tests ()
+ipa_modref_tree_c_tests ()
 {
   test_insert_search_collapse ();
   test_merge ();
 }
 
+} // namespace selftest
+
 #endif
 
 void
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..ac7579a9e75 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -767,12 +770,6 @@ class pass_modref : public gimple_opt_pass
 pass_modref (gcc::context *ctxt)
: gimple_opt_pass (pass_data_modref, ctxt) {}
 
-~pass_modref ()
-  {
-   ggc_delete (summaries);
-   summaries = NULL;
-  }
-
 /* opt_pass methods: */
 opt_pass *clone ()
 {
@@ -1373,4 +1370,14 @@ unsigned int pass_ipa_modref::execute (function *)
   return 0;
 }
 
+/* Summaries must stay alive until end of compilation.  */
+
+void
+ipa_modref_c_finalize ()
+{
+  if (summaries)
+ggc_delete (summaries);
+  summaries = NULL;
+}
+
 #include "gt-ipa-modref.h"
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 6f979200cc2..6cccdfe7af3 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -44,5 +44,6 @@ struct GTY(()) modref_summary
 };
 
 modref_summary *get_modref_function_summary (cgraph_node *func);
+void ipa_modref_c_finalize ();
 
 #endif
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index f0a81d43fd6..7a89b2df5bd 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -90,6 +90,7 @@ selftest::run_tests ()
   read_rtl_function_c_tests ();
   digraph_cc_tests ();
   tristate_cc_tests ();
+  ipa_modref_tree_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 5cffa13aedd..6c6c7f28675 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -268,6 +268,7 @@ extern void vec_perm_indices_c_tests ();
 extern void wide_int_cc_tests ();
 extern void opt_proposer_c_tests ();
 extern void dbgcnt_c_tests ();
+extern void ipa_modref_tree_c_tests ();
 
 extern int num_passes;
 
diff --git a/gcc/toplev.c b/gcc/toplev.c
index cdd4b5b4f92..a4cb8bb262e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -84,6 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dump-context.h"
 #include "print-tree.h"
 #include "optinfo-emit-json.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -2497,6 +2499,7 @@ toplev::finalize (void)
   /* Needs to be called before cgraph_c_finalize since it uses symtab.  */
   ipa_reference_c_finalize ();
   ipa_fnsummary_c_finalize ();
+  ipa_modref_c_finalize ();
 
   cgraph_c_finalize ();
   cgraphunit_c_finalize ();


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Sun, 2020-09-20 at 19:30 +0200, Jan Hubicka wrote:
> > > 
> 
> [...]
> 
> > > Should new C++ source files have a .cc suffix, rather than .c?
> > > 
> > > [...]
> > > 
> > > > +  $(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.c \
> > > 
> > > ...which would affect this^
> > 
> > I was wondering about that and decided to stay with .c since it is
> > what
> > other ipa passes use.  I can rename the files. 
> 
> Given that they're in the source tree now, maybe better to wait until
> some mass renaming in the future?

At the same time, I am only having patches against it, and I have no
problem update the name.
> 
> > We have some sources with
> > .c extension and others with .cc while they are now all C++. Is there
> > some plan to clean it up?
> 
> I think we've been avoiding it, partly out of a concern of making
> backports harder, and also because someone has to do the work.
> 
> That said, it's yet another unfinished transition, and is technical
> debt for the project.  It's confusing to newcomers.
> 
> It's been bugging me for a while, so I might take a look at doing it in
> this cycle.

Agreed. It would be nice to do the mass rename and at the same time make
a sane directory structure so newcomers can locate optimization passes
and other components.

David Cepelik was my student so he may have some feedback about what he
found hard.  I think main stoppers was the garbage collector (since he
decided to do the template for modref tree) and the flat includes that
breaks if you do not do them in right order and resolve all
dependencies.

Honza
> 
> Dave
> 


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
Hello,
this patch fixes the selftests and destructor issue noticed by David
(Malcolm).  According to David jit still crashes at different but I am
hitting different build failure in libjit that I will need to analyze
tomorrow.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-modref-tree.c: Add namespace selftest.
(modref_tree_c_tests): Rename to ...
(ipa_modref_tree_c_tests): ... this.
* ipa-modref.c (pass_modref): Remove destructor.
(ipa_modref_c_finalize): New function.
* ipa-modref.h (ipa_modref_c_finalize): Declare.
* selftest-run-tests.c (selftest::run_tests): Call
ipa_modref_c_finalize.
* selftest.h (ipa_modref_tree_c_tests): Declare.
* toplev.c: Include ipa-modref-tree.h and ipa-modref.h
(toplev::finalize): Call ipa_modref_c_finalize.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e37dee67fa3..a84508a2268 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+namespace selftest {
 
 static void
 test_insert_search_collapse ()
@@ -156,12 +157,14 @@ test_merge ()
 
 
 void
-modref_tree_c_tests ()
+ipa_modref_tree_c_tests ()
 {
   test_insert_search_collapse ();
   test_merge ();
 }
 
+} // namespace selftest
+
 #endif
 
 void
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..ac7579a9e75 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -767,12 +770,6 @@ class pass_modref : public gimple_opt_pass
 pass_modref (gcc::context *ctxt)
: gimple_opt_pass (pass_data_modref, ctxt) {}
 
-~pass_modref ()
-  {
-   ggc_delete (summaries);
-   summaries = NULL;
-  }
-
 /* opt_pass methods: */
 opt_pass *clone ()
 {
@@ -1373,4 +1370,14 @@ unsigned int pass_ipa_modref::execute (function *)
   return 0;
 }
 
+/* Summaries must stay alive until end of compilation.  */
+
+void
+ipa_modref_c_finalize ()
+{
+  if (summaries)
+ggc_delete (summaries);
+  summaries = NULL;
+}
+
 #include "gt-ipa-modref.h"
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 6f979200cc2..6cccdfe7af3 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -44,5 +44,6 @@ struct GTY(()) modref_summary
 };
 
 modref_summary *get_modref_function_summary (cgraph_node *func);
+void ipa_modref_c_finalize ();
 
 #endif
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index f0a81d43fd6..7a89b2df5bd 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -90,6 +90,7 @@ selftest::run_tests ()
   read_rtl_function_c_tests ();
   digraph_cc_tests ();
   tristate_cc_tests ();
+  ipa_modref_tree_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 5cffa13aedd..6c6c7f28675 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -268,6 +268,7 @@ extern void vec_perm_indices_c_tests ();
 extern void wide_int_cc_tests ();
 extern void opt_proposer_c_tests ();
 extern void dbgcnt_c_tests ();
+extern void ipa_modref_tree_c_tests ();
 
 extern int num_passes;
 
diff --git a/gcc/toplev.c b/gcc/toplev.c
index cdd4b5b4f92..a4cb8bb262e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -84,6 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dump-context.h"
 #include "print-tree.h"
 #include "optinfo-emit-json.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -2497,6 +2499,7 @@ toplev::finalize (void)
   /* Needs to be called before cgraph_c_finalize since it uses symtab.  */
   ipa_reference_c_finalize ();
   ipa_fnsummary_c_finalize ();
+  ipa_modref_c_finalize ();
 
   cgraph_c_finalize ();
   cgraphunit_c_finalize ();


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > David,
> > with jit I get the following:
> > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > nonrepresentable section on output
> > collect2: error: ld returned 1 exit status
> > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > Error
> > 
> > Is there a fix/workaround?
> 
> I don't recognize that specific error, but googling suggests it may
> relate to position-independent code.
> 
> Are you configuring with --enable-host-shared ?  This is needed when
> enabling "jit" in --enable-languages (but slows down the compiler by a
> few percent, which is why "jit" isn't in "all").

Yes --enable-languages=all,jit --enable-host-shared.
I suppose my binutils may show the age, I will check that tomorrow. It
looks like weird error.
> 
> 
> > Patch I am trying to test/debug is attached, it fixes the selftest
> > issue
> > and the destructor.
> 
> Thanks.
> 
> Sadly it doesn't fix the jit crashes, which are now in bugzilla (as PR
> jit/97169).
> 
> Without the patch, the jit testcases crash at the end of the 1st in-
> process iteration, in the dtor for the the new pass.
> 
> With the patch the jit testcases crash inside the 3rd in-process
> iteration, invoking a modref_summaries finalizer at whichever GC-
> collection point happens first, I think, where the modref_summaries *
> seems to be pointing at corrupt data:
> 
> (gdb) p *(modref_summaries *)p
> $2 = {> =
> {> = {
>   _vptr.function_summary_base = 0x20001,
> m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
>   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> m_insertion_enabled = 112, m_allocator = {m_allocator = {
>   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> m_returned_free_list = 0x7afafaf01, 
>   m_virgin_free_list = 0xafafafafafaf0001  memory at address 0xafafafafafaf0001>, 
>   m_virgin_elts_remaining = 0, m_elts_allocated =
> 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
>   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> m_initialized = false, m_location = {
> m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin =
> 2947481856, m_ggc = false, 
> m_vector = 0x0}, ipa = false}
> 
> I think this latter crash may be a pre-existing bug in how the jit
> interacts with gc finalizers.  I think the finalizers are accumulating
> from in-process run to run, leading to chaos, but I need to debug it
> some more to be sure.  Alternatively, is there a way that a finalizer
> is being registered, and then the object is somehow clobbered without
> the finalizer being unregistered from the vec of finalizers?

It looks like released memory. I saw similar problem with ggc calling
finalizer in "wrong" order.  David's modref tree has two layers and
destructor of one was freeing the anohter that is good if you destroy
first the outer type, but not good if you do it in wrong order.
I will try to reproduce it.  I also plan to turn the classes to pods and
put them directly into the vectors.
I should not have allowed David to make a template at first place :)

Honza
> 
> Dave
> 


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote:
> > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > > David,
> > > with jit I get the following:
> > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > > nonrepresentable section on output
> > > collect2: error: ld returned 1 exit status
> > > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > > Error
> > > 
> > > Is there a fix/workaround?
> > 
> > I don't recognize that specific error, but googling suggests it may
> > relate to position-independent code.
> > 
> > Are you configuring with --enable-host-shared ?  This is needed when
> > enabling "jit" in --enable-languages (but slows down the compiler by
> > a
> > few percent, which is why "jit" isn't in "all").
> > 
> > 
> > > Patch I am trying to test/debug is attached, it fixes the selftest
> > > issue
> > > and the destructor.
> > 
> > Thanks.
> > 
> > Sadly it doesn't fix the jit crashes, which are now in bugzilla (as
> > PR
> > jit/97169).
> > 
> > Without the patch, the jit testcases crash at the end of the 1st in-
> > process iteration, in the dtor for the the new pass.
> > 
> > With the patch the jit testcases crash inside the 3rd in-process
> > iteration, invoking a modref_summaries finalizer at whichever GC-
> > collection point happens first, I think, where the modref_summaries *
> > seems to be pointing at corrupt data:
> > 
> > (gdb) p *(modref_summaries *)p
> > $2 = {> =
> > {> = {
> >   _vptr.function_summary_base = 0x20001,
> > m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
> >   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> > m_insertion_enabled = 112, m_allocator = {m_allocator = {
> >   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> > m_returned_free_list = 0x7afafaf01, 
> >   m_virgin_free_list = 0xafafafafafaf0001  > access
> > memory at address 0xafafafafafaf0001>, 
> >   m_virgin_elts_remaining = 0, m_elts_allocated =
> > 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
> >   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> > m_initialized = false, m_location = {
> > m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin
> > =
> > 2947481856, m_ggc = false, 
> > m_vector = 0x0}, ipa = false}
> > 
> > I think this latter crash may be a pre-existing bug in how the jit
> > interacts with gc finalizers.  I think the finalizers are
> > accumulating
> > from in-process run to run, leading to chaos, but I need to debug it
> > some more to be sure.  Alternatively, is there a way that a finalizer
> > is being registered, and then the object is somehow clobbered without
> > the finalizer being unregistered from the vec of finalizers?
> 
> Aha: this patch on top of yours seems to fix it, at least for the test
> I've been debugging.
> 
> Calling gcc_delete on something seems to delete it without removing the
> finalizer, leaving the finalizer around to run on whatever the memory
> eventually gets reused for, leading to segfaults:
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 4b9c4db4ee9..64d314321cb 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute (function *)
>  void
>  ipa_modref_c_finalize ()
>  {
> -  if (summaries)
> -ggc_delete (summaries);
>summaries = NULL;
>  }

Ah, thanks.  That is very odd behaviour of delete indeed.

Honza
>  
> 
> 
> 


Re: Issue with ggc_delete and finalizers (was Re: New modref/ipa_modref optimization passes)

2020-09-23 Thread Jan Hubicka
> 
> Summarizing what's going on:
> 
> We have a use-after-ggc_delete happening with the finalizers code.
> 
> analyze_function has:
> 
> summaries = new (ggc_alloc  ())
>modref_summaries (symtab);
> 
> ggc_alloc (as opposed to ggc_alloc_no_dtor) uses need_finalization_p
> and "sees" that the class has a nontrivial dtor, and hence it passes
> finalize to ggc_internal_alloc as the "f" callback.
> 
> Within ggc_internal_alloc we have:
> 
>   if (f)
> add_finalizer (result, f, s, n);
> 
> and so that callback is registered within G.finalizers - but there's
> nothing stored in the object itself to track that finalizer.
> 
> Later, in ipa_modref_c_finalize, gcc_delete is called on the
> mod_summaries object, but the finalizer is still registered in
> G.finalizers with its address.
> 
> Later, a GC happens, and if the bit for "marked" on that old
> modref_summaries object happens to be cleared (with whatever that
> memory is now being used for, if anything), the finalizer callback is
> called, and ~modref_summaries is called with its "this" pointing at
> random bits, and we have a segfault.
> 
> This seems like a big "gotcha" in ggc_delete: it doesn't remove any
> finalizer for the object, instead leaving it as a timebomb to happen in
> some future collection.
> 
> Should ggc_delete remove the finalizer?  It would have to scan the
> G.finalizer vecs to find them - O(N).  Alternatively, perhaps we could
> stash some kind of reference to the finalizer in memory near the object
> (perhaps allocating a header).

I think the difference between ggc_free and ggc_delete should be just
like the difference between free and delete, that is, ggc_delete should
call finalizer.

I think the mistake is that ggc_delete would work well with
ggc_alloc_no_dtor, but the patch uses ggc_alloc.  I guess we do not see
the problem in normal compilation since ggc_delete is used in the patch
3 times for objects with no destructor and once for object with a
destructor but only at the end of compilation when ggc is not run w/o
libjit.
> 
> Or we could put a big warning on ggc_delete saying not to use it on
> ggc_alloc-ed objects with dtors.

I do not think it is reasonable for ggc_delete to walk all finalizers,
so perhaps we just want a sanity check with --enable-checking that
things are not mixed up?
That is, maintain on-side hash of finalizers introduced by ggc_alloc and
look it up in ggc_delete, assert on a spot with a comment saying how
things are intended to work?
> 
> I'm not sure what the best approach here is.
> 
> There were only 4 places in the source tree where ggc_delete were used
> before the patch, which added 4 places; maybe we should just remove
> those new ggc_deletes and set the pointers to NULL and let them be
> cleared as regular garbage?

For modref we really need to release things explicitly since it runs at
WPA time and I am trying to maintain the fact that WPA do not need
ggc_collect runs: they have large memory footprint and it is easy to
explicitly manage all memory used by symtab/optimization summaries.

Of course I can reorganize the code to not have a destructor as well
(which is not very ++-sy).

In fact since the templates have hand written markers anyway, I was
htinking of moving them off ggc memory and just walk to discover the
tree pointers in them.

Honza
> 
> Dave
> 


Cleanup handling of local/readonly memory in modref and ipa-pure-const

2020-09-23 Thread Jan Hubicka
Hi,
this is first of cleanup patches for mod-ref interfaces.  It removes code
duplication between ipa-pure-const and ipa-modref that both wants to check
whether given memory access can interfere with memory acesses before function
call or after function return.

I pulled the logic out to refs_local_or_readonly_memory_p for references and
points_to_local_or_readonly_memory_p for pointers. It is in ipa-fnsummary.c
because I have incremental patches to track this info about function parameter
so we can ignore dereferences from parameters that are local or readonly in
modref oracle.

I am not sure this is right placement.  We could also move it to more general
place, ipa-utils or some of gimple infrastructure.
The extra includes in ipa-modref will however soon be necessary.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* ipa-fnsummary.c (refs_local_or_readonly_memory_p): New function.
(points_to_local_or_readonly_memory_p): New function.
* ipa-fnsummary.h (refs_local_or_readonly_memory_p,
points_to_local_or_readonly_memory_p): Declare.
* ipa-modref.c: Include value-range.h, ipa-prop.h and ipa-fnsummary.h
(record_access_p): Use refs_local_or_readonly_memory_p.
* ipa-pure-const.c (check_op): Likewise.

* testsuite/gcc.dg/tree-ssa/local-pure-const.c: Update template.
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 86d01addb44..ce168230105 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -2430,6 +2433,51 @@ fp_expression_p (gimple *stmt)
   return false;
 }
 
+/* Return true if T references memory location that is local
+   for the function (that means, dead after return) or read-only.  */
+
+bool
+refs_local_or_readonly_memory_p (tree t)
+{
+  /* Non-escaping memory is fine  */
+  t = get_base_address (t);
+  if ((TREE_CODE (t) == MEM_REF
+  || TREE_CODE (t) == TARGET_MEM_REF))
+return points_to_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
+
+  /* Automatic variables are fine.  */
+  if (DECL_P (t)
+  && auto_var_in_fn_p (t, current_function_decl))
+return true;
+
+  /* Read-only variables are fine.  */
+  if (DECL_P (t) && TREE_READONLY (t))
+return true;
+
+  return false;
+}
+
+/* Return true if T is a pointer pointing to memory location that is local
+   for the function (that means, dead after return) or read-only.  */
+
+bool
+points_to_local_or_readonly_memory_p (tree t)
+{
+  /*if (!POINTER_TYPE_P (TREE_TYPE (t)))
+return false; */
+  STRIP_NOPS (t);
+  /* See if memory location is clearly invalid.  */
+  if (TREE_CODE (t) == INTEGER_CST)
+return flag_delete_null_pointer_checks;
+  if (TREE_CODE (t) == SSA_NAME)
+return SSA_NAME_POINTS_TO_READONLY_MEMORY (t)
+  || !ptr_deref_may_alias_global_p (t);
+  if (TREE_CODE (t) == ADDR_EXPR)
+return refs_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
+  return false;
+}
+
+
 /* Analyze function body for NODE.
EARLY indicates run from early optimization pipeline.  */
 
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index c6ddc9f3199..a3223d06acc 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -357,6 +357,8 @@ void estimate_ipcp_clone_size_and_time (struct cgraph_node 
*,
 void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
 void ipa_update_overall_fn_summary (struct cgraph_node *node, bool reset = 
true);
 void compute_fn_summary (struct cgraph_node *, bool);
+bool refs_local_or_readonly_memory_p (tree);
+bool points_to_local_or_readonly_memory_p (tree);
 
 
 void evaluate_properties_for_edge (struct cgraph_edge *e,
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..d1b091ae29e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -62,6 +62,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
+#include "value-range.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* Class (from which there is one global instance) that holds modref summaries
for all analyzed functions.  */
@@ -317,36 +415,12 @@ record_access_lto (modref_records_lto *tt, ao_ref *ref)
 static bool
 record_access_p (tree expr)
 {
-  /* Non-escaping memory is fine  */
-  tree t = get_base_address (expr);
-  if (t && (INDIRECT_REF_P (t)
-   || TREE_CODE (t) == MEM_REF
-   || TREE_CODE (t) == TARGET_MEM_REF)
-   && TREE_CODE (TREE_OPERAND (t, 0)) == SSA_NAME
-   && !ptr_deref_may_alias_global_p (TREE_OPERAND (t, 0)))
+  if (refs_local_or_readonly_memory_p (expr))
 {
   if (dump_file)
-   fprintf (dump_file, "   - Non-escaping memory, ignoring.\n");
+   fprintf (dump_file, "   - Read-only or local, ignoring.\n");
   return false;
 }
-
-  /* Automatic variables are fine.  */
-  if (DECL_P (t)
-  && auto_var_in_fn_p (t, current_function_decl))
-{
-  if (dump_file)
-   fprintf (dump_file, "   - Automatic variable, ignoring.\n");
-  return false;
-   

Re: Cleanup handling of local/readonly memory in modref and ipa-pure-const

2020-09-23 Thread Jan Hubicka
> > +/* Return true if T is a pointer pointing to memory location that is local
> > +   for the function (that means, dead after return) or read-only.  */
> > +
> > +bool
> > +points_to_local_or_readonly_memory_p (tree t)
> > +{
> > +  /*if (!POINTER_TYPE_P (TREE_TYPE (t)))
> > +return false; */
> 
> remove ^^^
Ahh, sorry.
> 
> > +  STRIP_NOPS (t);
> 
> This wasn't in the original code - did you really run into (long)&x or what?

Old code works on references only.  In the followup I want to run it on
function arguments, so we can propagate info that i.e. this pointer
passed to function points to automatic variable.

I think I can see nops in the gimple args so I should strip it?
I did not verify if that happens in pratice.

Honza


Re: [PATCH] Fix UBSAN errors in ipa-cp.

2020-09-23 Thread Jan Hubicka
> I see the following UBSAN errors:
> 
> ./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ipa/pr96806.C 
> -std=c++11 -O -fipa-cp -fipa-cp-clone --param=ipa-cp-max-recursive-depth=94 
> --param=logical-op-non-short-circuit=0
> /home/marxin/Programming/gcc2/gcc/ipa-cp.c:3866:20: runtime error: signed 
> integer overflow: 64 + 2147483584 cannot be represented in type 'int'
> /home/marxin/Programming/gcc2/gcc/ipa-cp.c:3843:16: runtime error: signed 
> integer overflow: -2147483648 + -2147483648 cannot be represented in type 
> 'int'
> /home/marxin/Programming/gcc2/gcc/ipa-cp.c:3864:20: runtime error: signed 
> integer overflow: 1 + 2147483647 cannot be represented in type 'int'
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * ipa-cp.c (safe_add): Handle also very small negative values.
>   (value_topo_info::propagate_effects): Use properly safe_add.

Perhaps it is time to turn the profile count scaled valued to sreals
like we do in inline heuristics and other places?

Honza
> ---
>  gcc/ipa-cp.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index b3e7d41ea10..e39ee28726d 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3832,13 +3832,15 @@ propagate_constants_topo (class ipa_topo_info *topo)
>  /* Return the sum of A and B if none of them is bigger than INT_MAX/2, return
> -   the bigger one if otherwise.  */
> +   the bigger one if otherwise.  Similarly for negative numbers.  */
>  static int
>  safe_add (int a, int b)
>  {
>if (a > INT_MAX/2 || b > INT_MAX/2)
>  return a > b ? a : b;
> +  else if (a < -INT_MAX/2 || b < -INT_MAX/2)
> +return a > b ? b : a;
>else
>  return a + b;
>  }
> @@ -3861,9 +3863,10 @@ value_topo_info::propagate_effects ()
>for (val = base; val; val = val->scc_next)
>   {
> -   time = safe_add (time,
> -val->local_time_benefit + val->prop_time_benefit);
> -   size = safe_add (size, val->local_size_cost + val->prop_size_cost);
> +   time = safe_add (time, val->local_time_benefit);
> +   time = safe_add (time, val->prop_time_benefit);
> +   size = safe_add (size, val->local_size_cost);
> +   size = safe_add (size, val->prop_size_cost);
>   }
>for (val = base; val; val = val->scc_next)
> -- 
> 2.28.0
> 


Re: Cleanup handling of local/readonly memory in modref and ipa-pure-const

2020-09-23 Thread Jan Hubicka
> On Wed, Sep 23, 2020 at 11:55 AM Jan Hubicka  wrote:
> >
> > > > +/* Return true if T is a pointer pointing to memory location that is 
> > > > local
> > > > +   for the function (that means, dead after return) or read-only.  */
> > > > +
> > > > +bool
> > > > +points_to_local_or_readonly_memory_p (tree t)
> > > > +{
> > > > +  /*if (!POINTER_TYPE_P (TREE_TYPE (t)))
> > > > +return false; */
> > >
> > > remove ^^^
> > Ahh, sorry.
> > >
> > > > +  STRIP_NOPS (t);
> > >
> > > This wasn't in the original code - did you really run into (long)&x or 
> > > what?
> >
> > Old code works on references only.  In the followup I want to run it on
> > function arguments, so we can propagate info that i.e. this pointer
> > passed to function points to automatic variable.
> >
> > I think I can see nops in the gimple args so I should strip it?
> 
> No, there cannot be conversions in gimple args.

I will re-test with the strip dropped. OK with that change?

Honza
> 
> > I did not verify if that happens in pratice.
> >
> > Honza


  1   2   3   4   5   6   7   8   9   10   >