Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> > (gdb) p summaries
> > $3 = (fast_function_summary *) 0x0
> > 
> > I'm still investigating (but may have to call halt for the night), but
> > this could be an underlying issue with the new passes; the jit
> > testsuite runs with the equivalent of:
> > 
> > --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > 
> > throughout to shake out GC issues (to do a full collection at each GC
> > opportunity).
> > 
> > Was this code tested with the jit?  Do you see issues in cc1 if you set
> > those params?  Anyone else seeing "random" crashes?
> 
> I suppose this happes when pass gets constructed but no summary is
> computed.  Dos the NULL pointer guard here help?

Hi,
I am currently in train and can not test the patch easilly, but this
should help.  If you run the pass on empty input then the destruction
happens with NULL summaries pointer.

My apologizes for that.
Honza

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..cd92b5a81d3 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -769,7 +885,8 @@ class pass_modref : public gimple_opt_pass
 
 ~pass_modref ()
   {
-   ggc_delete (summaries);
+   if (summaries)
+ ggc_delete (summaries);
summaries = NULL;
   }
 
> 
> Honza
> > 
> > Thanks
> > Dave
> > 
> > 


Re: [r11-3315 Regression] FAIL: g++.dg/ext/timevar2.C -std=gnu++98 (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-22 Thread Martin Liška

On 9/21/20 9:43 PM, Marek Polacek wrote:

Ok for trunk?  Tested by running timevar2.C a couple of dozen times.


Thank you for the fix. It seems to me obvious, I would install it.

Martin


Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-22 Thread Tobias Burnus

Hi Honza & Jakub,

@Honza: please look at the decl alias handling of the actual patch.

a minor update – testing only on gcn did turn out to be insufficient:
nvptx does not support alias, cf. PR 97102 + 97106; hence xfailed.

I also had a typo in one 'dg-do run' but as -O0 is set explicitly,
'dg-do run' does not make sense. (no dg-do = run once, dg-do run =
run with multiple options, dg-do compile = compile only).
Hence, I have now removed the dg-do line.

On 9/17/20 1:15 AM, Tobias Burnus wrote:

Hi Honza – some input would be really helpful!

Hi Jakub – updated version below.

On 9/16/20 12:36 PM, Jakub Jelinek wrote:

I think you want Honza on this primarily, I'm always lost
in the cgraph alias code.

(Likewise as this thread shows)

+  while (node->alias_target)
+node = symtab_node::get (node->alias_target);
+  node = node->ultimate_alias_target ();

I think the above is either you walk the aliases yourself, or use
ultimate_alias_target, but not both.


I think we need to distinguish between:
* aliases which end up with the same symbol name
  and are stored in the ref_list; example: cpp_implicit_alias.
* aliases like with the alias attribute, which is handled
  via alias_target and have different names.

Just experimentally:
* The 'while (node->alias_target)' properly resolves the
  attribute testcase (libgomp.c-c++-common/pr96390.c).
  Here, ultimate_alias_target () does not help as
  node->analyzed == 0.

* The 'node->ultimate_alias_target ()' works for the
  cpp_implicit_alias case (libgomp.c++/pr96390.C).
  Just looking at the alias target does not help as in this
  case, alias_target == NULL.


And the second thing is, I'm not sure how the aliases behave if the
ultimate alias target is properly marked as omp declare target, but some
of the aliases are not.  The offloaded code will still call the alias,
so do we somehow arrange for the aliases to be also emitted into the
offloading LTO IL?
[...] I wonder if the aliases that are needed shouldn't
be marked node->offloadable and have "omp declare target" attribute
added
for them too.


Done now.

Okay – or do we find more issues?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

gcc/ChangeLog:

	PR middle-end/96390
	* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle
	alias nodes.

libgomp/ChangeLog:

	PR middle-end/96390
	* testsuite/libgomp.c++/pr96390.C: New test.
	* testsuite/libgomp.c-c++-common/pr96390.c: New test.

 gcc/omp-offload.c| 51 
 libgomp/testsuite/libgomp.c++/pr96390.C  | 49 +++
 libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 26 
 3 files changed, 118 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 32c2485abd4..222ebff1d1e 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -196,21 +196,56 @@ omp_declare_target_var_p (tree decl)
 static tree
 omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data)
 {
-  if (TREE_CODE (*tp) == FUNCTION_DECL
-  && !omp_declare_target_fn_p (*tp)
-  && !lookup_attribute ("omp declare target host", DECL_ATTRIBUTES (*tp)))
+  if (TREE_CODE (*tp) == FUNCTION_DECL)
 {
-  tree id = get_identifier ("omp declare target");
-  if (!DECL_EXTERNAL (*tp) && DECL_SAVED_TREE (*tp))
-	((vec *) data)->safe_push (*tp);
-  DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp));
+  tree decl = *tp;
   symtab_node *node = symtab_node::get (*tp);
   if (node != NULL)
 	{
-	  node->offloadable = 1;
+	  /* First, find final FUNCTION_DECL; find final alias target and there
+	 ensure alias like cpp_implicit_alias are resolved by calling
+	 ultimate_alias_target; the latter does not resolve alias_target as
+	 node->analyzed = 0.  */
+	  symtab_node *orig_node = node;
+	  while (node->alias_target)
+	node = symtab_node::get (node->alias_target);
+	  node = node->ultimate_alias_target ();
+	  decl = node->decl;
+
+	  if (omp_declare_target_fn_p (decl)
+	  || lookup_attribute ("omp declare target host",
+   DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
 	  if (ENABLE_OFFLOADING)
 	g->have_offload = true;
+
+	  /* Now mark original node and all alias targets for offloading.  */
+	  node->offloadable = 1;
+	  if (orig_node != node)
+	{
+	  tree id = get_identifier ("omp declare target");
+	  while (orig_node->alias_target)
+		{
+		  orig_node = orig_node->ultimate_alias_target ();
+		  orig_node->offloadable = 1;
+		  DECL_ATTRIBUTES (orig_node->decl)
+		= tree_cons (id, NULL_TREE,
+ DECL_ATTRIBUTES (orig_node->decl));
+		  orig_node = symtab_node::get (orig_node->alias_target);
+		}
+	}
 	}
+  else if (omp_declare_ta

Re: Do we need to do a loop invariant motion after loop interchange ?

2020-09-22 Thread Richard Biener via Gcc-patches
On Tue, Sep 22, 2020 at 4:31 AM HAO CHEN GUI via Gcc-patches
 wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>[local count: 955630226]:
># l_32 = PHI <1(12), l_54(21)>
># ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>_26 = (integer(kind=8)) l_32;
>_27 = _25 + _26;
>y__I_lsm.119_136 = (*y_135(D))[_27];
>y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>_37 = _36 * stride.88_111;
>_38 = _35 + _37;
>_39 = _26 + _38;
>_40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  -->vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  -->vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  -->vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  -->vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM  real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.

I'd like to see us instead apply the existing LIM code (w/o store-motion)
to the loop nests we applied interchange to instead of adding yet another
invariant motion code.  It should be reasonably "easy" to make the code
have the complexity of the region it operates on (again, without store-motion
which would introduce some complexities).

Btw, with an extra LIM pass scheduled after interchange I observed ~20%
improvement on bwaves for x86_64, mainly due to better induction variable
selection IIRC.

Richard.

>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI  wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>


Re: [PATCH] gcov: fix TOPN streaming from shared libraries

2020-09-22 Thread Martin Liška

On 9/22/20 1:13 AM, Sergei Trofimovich wrote:

On Mon, 21 Sep 2020 20:38:07 +0300 (MSK)
Alexander Monakov  wrote:


On Mon, 21 Sep 2020, Martin Liška wrote:


On 9/6/20 1:24 PM, Sergei Trofimovich wrote:

From: Sergei Trofimovich 

Before the change gcc did not stream correctly TOPN counters
if counters belonged to a non-local shared object.

As a result zero-section optimization generated TOPN sections
in a form not recognizable by '__gcov_merge_topn'.

The problem happens because in a case of multiple shared objects
'__gcov_merge_topn' function is present in address space multiple
times (once per each object).

The fix is to never rely on function address and predicate on TOPN
counter types.


Hello.

Thank you for the analysis! I think it's the correct fix and it's probably
similar to what we used to see for indirect_call_tuple.

@Alexander: Am I right?


Yes, analysis presented by Sergei in Bugzilla looks correct. Pedantically I
wouldn't say the indirect call issue was similar: it's a different gotcha
arising from mixing static and dynamic linking. There we had some symbols
preempted by the main executable (but not all symbols), here we have lack
of preemption/unification as relevant libgcov symbol is hidden.


Thank you Alexander.



I cannot judge if the fix is correct (don't know the code that well) but it
looks reasonable. If you could come up with a clearer wording for the new
comment it would be nice, I struggled to understand it.


Yeah, I agree the comment is very misleading. The code is already very clear
about special casing of TOPN counters. How about dropping the comment?

v2:

 From 300585164f0a719a3a283c8da3a4061615f6da3a Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich 
Date: Sun, 6 Sep 2020 12:13:54 +0100
Subject: [PATCH v2] gcov: fix TOPN streaming from shared libraries

Before the change gcc did not stream correctly TOPN counters
if counters belonged to a non-local shared object.

As a result zero-section optimization generated TOPN sections
in a form not recognizable by '__gcov_merge_topn'.

The problem happens because in a case of multiple shared objects
'__gcov_merge_topn' function is present in address space multiple
times (once per each object).

The fix is to never rely on function address and predicate on TOPN
counter types.

libgcc/ChangeLog:

 PR gcov-profile/96913
 * libgcov-driver.c (write_one_data): Avoid function pointer
 comparison in TOP streaming decision.
---
  libgcc/libgcov-driver.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 58914268d4e..e53e4dc392a 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -424,7 +424,7 @@ write_one_data (const struct gcov_info *gi_ptr,

   n_counts = ci_ptr->num;

- if (gi_ptr->merge[t_ix] == __gcov_merge_topn)
+ if (t_ix == GCOV_COUNTER_V_TOPN || t_ix == GCOV_COUNTER_V_INDIR)
 write_top_counters (ci_ptr, t_ix, n_counts);
   else
 {



The fix is fine, please install it.

Martin


[PATCH] gcov: fix streaming corruption

2020-09-22 Thread Martin Liška

As mentioned in the PR, line number equal to 0 is a bogus. Moreover, it messes
up GCNO file format where we use 0 as a separator. I'm thus suggesting to start
lines with at least 1.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'll install the patch if there are no comments.

Thanks,
Martin

gcc/ChangeLog:

PR gcov-profile/97069
* profile.c (branch_prob): Line number must be at least 1.

gcc/testsuite/ChangeLog:

PR gcov-profile/97069
* g++.dg/gcov/pr97069.C: New test.
---
 gcc/profile.c   |  6 +++---
 gcc/testsuite/g++.dg/gcov/pr97069.C | 20 
 2 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/pr97069.C

diff --git a/gcc/profile.c b/gcc/profile.c
index fe8963cc9e9..45409591629 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -1375,7 +1375,7 @@ branch_prob (bool thunk)
  seen_locations.add (loc);
  expanded_location curr_location = expand_location (loc);
  output_location (&streamed_locations, curr_location.file,
-  curr_location.line, &offset, bb);
+  MAX (1, curr_location.line), &offset, bb);
}
 
 	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))

@@ -1386,7 +1386,7 @@ branch_prob (bool thunk)
{
  seen_locations.add (loc);
  output_location (&streamed_locations, gimple_filename (stmt),
-  gimple_lineno (stmt), &offset, bb);
+  MAX (1, gimple_lineno (stmt)), &offset, bb);
}
}
 
@@ -1401,7 +1401,7 @@ branch_prob (bool thunk)

{
  expanded_location curr_location = expand_location (loc);
  output_location (&streamed_locations, curr_location.file,
-  curr_location.line, &offset, bb);
+  MAX (1, curr_location.line), &offset, bb);
}
 
 	  if (offset)

diff --git a/gcc/testsuite/g++.dg/gcov/pr97069.C 
b/gcc/testsuite/g++.dg/gcov/pr97069.C
new file mode 100644
index 000..040e336662a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/pr97069.C
@@ -0,0 +1,20 @@
+// PR gcov-profile/97069
+// { dg-options "--coverage" }
+// { dg-do run { target native } }
+
+# 0 "pr97069.C"
+# 0 ""
+# 0 ""
+# 1 "/usr/include/stdc-predef.h" 1 3 4
+# 0 "" 2
+# 1 "pr97069.C"
+int main()
+{
+  return 0;
+}
+# 0 "pr97069.C"
+void zero_line_directive()
+{
+}
+
+// { dg-final { run-gcov pr97069.C } }
--
2.28.0



Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 22, 2020 at 09:11:48AM +0200, Tobias Burnus wrote:
> > Okay – or do we find more issues?

I'm afraid so.

> +   if (omp_declare_target_fn_p (decl)
> +   || lookup_attribute ("omp declare target host",
> +DECL_ATTRIBUTES (decl)))
> + return NULL_TREE;

I'm worried that omp_declare_target_fn_p could be true and so this would
punt, but the intermediate aliases would be marked.
Or the aliases would be marked and the ultimate alias would not.
Consider:
int v;
#pragma omp declare target to (v)
void foo (void) { v++; }
void bar (void) __attribute__((alias ("foo")));
#pragma omp declare target to (bar)
void baz (void) __attribute__((alias ("foo")));
void qux (void) {
#pragma omp target
{
  bar (); // Here the ultimate alias is not marked, so the code marks it,
  // and adds another "omp declare target" attribute to bar,
  // which it shouldn't.
  baz (); // At this point, foo is marked, so the code wouldn't mark
  // baz alias as "omp declare target".
}
}

So, I think it is fine to find the ultimate alias, but the loop to mark
the intermediate aliases should be invoked regardless of how decl is or is
not marked, and should test in each step whether it should or should not be
marked.

Jakub



Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-22 Thread Alex Coplan
Hi Segher,

On 21/09/2020 18:35, Segher Boessenkool wrote:
> Hi!
> 
> So, I tested this patch.  The test builds Linux for all targets, and the
> number reported here is just binary size (usually a good indicator for
> combine effectiveness).  C0 is the unmodified compiler, C1 is with your
> patch.  A size of 0 means it did not build.
> 
> C0C1
>alpha   6403469  100.000%
>  arc 0 0
>  arm  10196358  100.000%
>arm64 0  20228766
>armhf  15042594  100.000%
>  c6x   2496218  100.000%
> csky 0 0
>h8300   1217198  100.000%
> i386  11966700  100.000%
> ia64  18814277  100.000%
> m68k   3856350  100.000%
>   microblaze   5864258  100.000%
> mips   9142108  100.000%
>   mips64   7344744  100.000%
>nds32 0 0
>nios2   3909477  100.000%
> openrisc   4554446  100.000%
>   parisc   7721195  100.000%
> parisc64 0 0
>  powerpc  10447477  100.000%
>powerpc64  22257111  100.000%
>  powerpc64le  19292786  100.000%
>  riscv32   1630934  100.000%
>  riscv64   7628058  100.000%
> s390  15173928  100.000%
>   sh   3410671  100.000%
>  shnommu   1685616  100.000%
>sparc   4737096  100.000%
>  sparc64   7167122  100.000%
>   x86_64  19718928  100.000%
>   xtensa   2639363  100.000%

Thanks for doing this testing. The results look good, then: no code size
changes and no build regressions.

> 
> So, there is no difference for most targets (I checked some targets and
> there really is no difference).  The only exception is aarch64 (which
> the kernel calls "arm64"): the unpatched compiler ICEs!  (At least three
> times, even).

Indeed, this is the intended purpose of the patch, see the PR (96998).

> 
> during RTL pass: reload
> /home/segher/src/kernel/kernel/cgroup/cgroup.c: In function 
> 'rebind_subsystems':
> /home/segher/src/kernel/kernel/cgroup/cgroup.c:1777:1: internal compiler 
> error: in lra_set_insn_recog_data, at lra.c:1004
>  1777 | }
>   | ^
> 0x1096215f lra_set_insn_recog_data(rtx_insn*)
> /home/segher/src/gcc/gcc/lra.c:1004
> 0x109625d7 lra_get_insn_recog_data
> /home/segher/src/gcc/gcc/lra-int.h:488
> 0x109625d7 lra_update_insn_regno_info(rtx_insn*)
> /home/segher/src/gcc/gcc/lra.c:1625
> 0x10962d03 lra_update_insn_regno_info(rtx_insn*)
> /home/segher/src/gcc/gcc/lra.c:1623
> 0x10962d03 lra_push_insn_1
> /home/segher/src/gcc/gcc/lra.c:1780
> [etc]
> 
> This means LRA found an unrecognised insn; and that insn is
> 
> (insn 817 804 818 21 (set (reg:DI 324)
> (sign_extract:DI (ashift:DI (subreg:DI (reg:SI 232) 0)
> (const_int 3 [0x3]))
> (const_int 35 [0x23])
> (const_int 0 [0]))) 
> "/home/segher/src/kernel/kernel/cgroup/cgroup.c":1747:3 -1
>  (nil))
> 
> LRA created that as a reload for
> 
> (insn 347 819 348 21 (parallel [
> (set (mem/f:DI (reg:DI 324) [233 *__p_84+0 S8 A64])
> (asm_operands/v:DI ("stlr %1, %0") ("=Q") 0 [
> (reg:DI 325 [orig:106 prephitmp_18 ] [106])
> ]
>  [
> (asm_input:DI ("r") 
> /home/segher/src/kernel/kernel/cgroup/cgroup.c:1747)
> ]
>  [] /home/segher/src/kernel/kernel/cgroup/cgroup.c:1747))
> (clobber (mem:BLK (scratch) [0  A8]))
> ]) "/home/segher/src/kernel/kernel/cgroup/cgroup.c":1747:3 -1
>  (expr_list:REG_DEAD (reg:SI 232)
> (expr_list:REG_DEAD (reg:DI 106 [ prephitmp_18 ])
> (nil
> 
> as
> 
>   347: {[r324:DI]=asm_operands;clobber [scratch];}
>   REG_DEAD r232:SI
>   REG_DEAD r106:DI
> Inserting insn reload before:
>   817: r324:DI=sign_extract(r232:SI#0<<0x3,0x23,0)
>   818: r324:DI=r324:DI+r284:DI
>   819: r325:DI=r106:DI
> 
> (and then it died).
> 
> 
> Can you fix this first?  There probably is something target-specific
> wrong related to zero_extract.

The intent is to fix this in combine here. See the earlier replies in
this thread.

> 
> 
> Segher

Thanks,
Alex


Re: [committed] libstdc++: Use correct argument type for __use_alloc [PR 96803]

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 19:34 +0100, Jonathan Wakely wrote:

The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.


While backporting 5494edae83ad33c769bd1ebc98f0c492453a6417 I noticed
that it's still not correct. I made the allocator-extended constructor
use the right type for the uses-allocator construction detection, but I
used an rvalue when it should be a const lvalue.

This should fix it properly this time.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl&)):
Use correct value category in __use_alloc call.
* testsuite/20_util/tuple/cons/96803.cc: Check with constructors
that require correct value category to be used.


Tested powerpc64le-linux. Committed to trunk.



commit 7825399092d572ce8ea82c4aa8dfeb65076b0e52
Author: Jonathan Wakely 
Date:   Tue Sep 22 08:42:18 2020

libstdc++: Use correct argument type for __use_alloc, again [PR 96803]

While backporting 5494edae83ad33c769bd1ebc98f0c492453a6417 I noticed
that it's still not correct. I made the allocator-extended constructor
use the right type for the uses-allocator construction detection, but I
used an rvalue when it should be a const lvalue.

This should fix it properly this time.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl&)):
Use correct value category in __use_alloc call.
* testsuite/20_util/tuple/cons/96803.cc: Check with constructors
that require correct value category to be used.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 06f56337ce4..11ad1991108 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -355,7 +355,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		const _Tuple_impl<_Idx, _UHead, _UTails...>& __in)
 	: _Inherited(__tag, __a,
 		 _Tuple_impl<_Idx, _UHead, _UTails...>::_M_tail(__in)),
-	  _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
+	  _Base(__use_alloc<_Head, _Alloc, const _UHead&>(__a),
 		_Tuple_impl<_Idx, _UHead, _UTails...>::_M_head(__in))
 	{ }
 
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc b/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc
index 9d3c07d55b2..867a42150e0 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc
@@ -38,4 +38,25 @@ test01()
   // std::tuple chooses wrong constructor for uses-allocator construction
   std::tuple o;
   std::tuple nok(std::allocator_arg, std::allocator(), o);
+
+  std::tuple oo;
+  std::tuple nn(std::allocator_arg, std::allocator(), oo);
+}
+
+struct Y
+{
+  using allocator_type = std::allocator;
+
+  Y(const X&) { }
+  Y(const X&, const allocator_type&) { }
+
+  Y(X&&) { }
+  Y(std::allocator_arg_t, const allocator_type&, X&&) { }
+};
+
+void
+test02()
+{
+  std::tuple o{1, 1};
+  std::tuple oo(std::allocator_arg, std::allocator(), o);
 }


Re: [PATCH 2/3] Use MiB unit when displaying memory allocation.

2020-09-22 Thread Christophe Lyon via Gcc-patches
On Wed, 2 Sep 2020 at 15:29, Martin Liška  wrote:
>
> On 9/1/20 4:04 PM, Jan Hubicka wrote:
> >> The patch is about usage of MiB in memory allocation reports.
> >> I see it much better readable than values displayed in KiB:
> >>
> >> Reading object files: tramp3d-v4.o {GC released 1 MiB} {GC 19 MiB -> 19 
> >> MiB} {GC 19 MiB}  {heap 12 MiB}
> >> Reading the symbol table:
> >> Merging declarations: {GC released 1 MiB madv_dontneed 0 MiB} {GC 27 MiB 
> >> -> 27 MiB} {GC 27 MiB}  {heap 15 MiB}
> >> Reading summaries:  {GC 27 MiB}  {heap 15 MiB}  {GC 
> >> 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  
> >> {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 30 MiB}  
> >> {heap 15 MiB}  {GC 30 MiB}  {heap 15 MiB} {GC 30 MiB}
> >> Merging symbols: {heap 15 MiB}Materializing decls:
> >>{heap 15 MiB}  {heap 15 MiB}  
> >> {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 
> >> MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB} 
> >>  {heap 15 MiB}  {heap 15 MiB}  {GC 
> >> released 1 MiB madv_dontneed 2 MiB} {GC trimmed to 27 MiB, 28 MiB mapped} 
> >> {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB} 
> >>  {heap 15 MiB}
> >> Streaming out {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} 
> >> ./a.ltrans0.o ( 11257 insns) ./a.ltrans1.o ( 11293 insns) ./a.ltrans2.o ( 
> >> 8669 insns) ./a.ltrans3.o ( 138934 insns)
> >
> > One problem I see here is that while it is OK for Firefox builds it is
> > bit too coarse for smaller testcases where the memory use is still
> > importnat.  I guess we may just print KBs before the large gets too
> > large, just like norton commander does? :)
>
> Sure, let's do it using SIZE_AMOUNT macro.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>

Hi,

This change is causing gcc.dg/timevar[12].C to fail randomly, eg:
FAIL: g++.dg/ext/timevar1.C  -std=gnu++2a (test for excess errors)
Excess errors:
 phase opt and generate :   0.00 (  0%)   0.00 (  0%)
0.01 ( 50%)  8904  (  0%)
 callgraph construction :   0.00 (  0%)   0.00 (  0%)
0.01 ( 50%)  4096  (  0%)

because SIZE_AMOUNT generates no suffix if the size is < 10k, and those tests
now use dg-prune-output "k" and dg-prune-output " 0 "
which is not enough.

Can you fix this?

Thanks

Christophe

> Ready to be installed?
> Thanks,
> Martin
>
> >
> > Honza
> >>
> >> Thoughts?
> >> Thanks,
> >> Martin
>


[PATCH] c++: Ignore __sanitizer_ptr_{sub,cmp} builtin calls during constant expression evaluation [PR97145]

2020-09-22 Thread Jakub Jelinek via Gcc-patches
Hi!

These two builtin calls are added already during parsing before pointer
subtractions or comparisons, normally they perform runtime verification
of whether the pointers point to the same object or different objects,
but during constant expressione valuation we don't really need those
builtins for anything.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-09-22  Jakub Jelinek  

PR c++/97145
* constexpr.c (cxx_eval_builtin_function_call): Return void_node for
calls to __sanitize_ptr_{sub,cmp} builtins.

* g++.dg/asan/pr97145.C: New test.

--- gcc/cp/constexpr.c.jj   2020-09-07 16:58:53.342195330 +0200
+++ gcc/cp/constexpr.c  2020-09-21 23:50:39.909211245 +0200
@@ -1355,6 +1355,12 @@ cxx_eval_builtin_function_call (const co
   case BUILT_IN_STRSTR:
strops = 2;
strret = 1;
+   break;
+  case BUILT_IN_ASAN_POINTER_COMPARE:
+  case BUILT_IN_ASAN_POINTER_SUBTRACT:
+   /* These builtins shall be ignored during constant expression
+  evaluation.  */
+   return void_node;
   default:
break;
   }
--- gcc/testsuite/g++.dg/asan/pr97145.C.jj  2020-09-21 17:37:50.408562876 
+0200
+++ gcc/testsuite/g++.dg/asan/pr97145.C 2020-09-21 17:38:26.961031023 +0200
@@ -0,0 +1,7 @@
+// PR c++/97145
+// { dg-do compile { target c++11 } }
+// { dg-options "-fsanitize=address,pointer-subtract,pointer-compare" }
+
+constexpr char *a = nullptr;
+constexpr auto b = a - a;
+constexpr auto c = a < a;

Jakub



Re: [RFC] update COUNTs of BB in loop.

2020-09-22 Thread Martin Liška

On 9/22/20 7:37 AM, guojiufu wrote:

Thanks for comments!!


Thanks for the patch. I can confirm it is NO-OP for SPEC2006 benchmarks.

Martin


[committed] ipa: Fix up ipa modref option help texts

2020-09-22 Thread Jakub Jelinek via Gcc-patches
Hi!

This fixes  
   
FAIL: compiler driver --help=common option(s): "^ +-.*[^:.]$" absent from 
output: "  --param=modref-max-tests=   Maximum number of tests perofmed by 
modref query"   
  
FAIL: compiler driver --help=optimizers option(s): "^ +-.*[^:.]$" absent from 
output: "  -fipa-modrefPerform interprocedural modref analysis" 

 

Tested on x86_64-linux, committed to trunk.

2020-09-22  Jakub Jelinek  

* common.opt (-fipa-modref): Add dot at the end of option help.
* params.opt (--param=modref-max-tests=): Likewise.

--- gcc/common.opt.jj   2020-09-21 11:15:53.705518585 +0200
+++ gcc/common.opt  2020-09-22 10:00:00.034779245 +0200
@@ -1827,7 +1827,7 @@ Perform interprocedural bitwise constant
 
 fipa-modref
 Common Report Var(flag_ipa_modref) Optimization
-Perform interprocedural modref analysis
+Perform interprocedural modref analysis.
 
 fipa-profile
 Common Report Var(flag_ipa_profile) Init(0) Optimization
--- gcc/params.opt.jj   2020-09-21 11:15:53.816516949 +0200
+++ gcc/params.opt  2020-09-22 09:59:37.121115589 +0200
@@ -882,7 +882,7 @@ Maximum number of refs stored in each mo
 
 -param=modref-max-tests=
 Common Joined UInteger Var(param_modref_max_tests) Init(64)
-Maximum number of tests perofmed by modref query
+Maximum number of tests perofmed by modref query.
 
 -param=tm-max-aggregate-size=
 Common Joined UInteger Var(param_tm_max_aggregate_size) Init(9) Param 
Optimization


Jakub



Re: Do we need to do a loop invariant motion after loop interchange ?

2020-09-22 Thread Bin.Cheng via Gcc-patches
On Tue, Sep 22, 2020 at 10:30 AM HAO CHEN GUI  wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>[local count: 955630226]:
># l_32 = PHI <1(12), l_54(21)>
># ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>_26 = (integer(kind=8)) l_32;
>_27 = _25 + _26;
>y__I_lsm.119_136 = (*y_135(D))[_27];
>y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>_37 = _36 * stride.88_111;
>_38 = _35 + _37;
>_39 = _26 + _38;
>_40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  -->vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  -->vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  -->vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  -->vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM  real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.
Hmm, IIRC the patch was intended to show what the missing transform
is, and I think it has latent bugs which I haven't got time to refine.
As Richard mentioned, could you please explore this with the existing
LIM facility, rather than introducing new code implementing existing
transforms?

Thanks,
bin
>
>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI  wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>


Re: [PATCH 2/3] Use MiB unit when displaying memory allocation.

2020-09-22 Thread Martin Liška

On 9/22/20 9:47 AM, Christophe Lyon wrote:

On Wed, 2 Sep 2020 at 15:29, Martin Liška  wrote:


On 9/1/20 4:04 PM, Jan Hubicka wrote:

The patch is about usage of MiB in memory allocation reports.
I see it much better readable than values displayed in KiB:

Reading object files: tramp3d-v4.o {GC released 1 MiB} {GC 19 MiB -> 19 MiB} 
{GC 19 MiB}  {heap 12 MiB}
Reading the symbol table:
Merging declarations: {GC released 1 MiB madv_dontneed 0 MiB} {GC 27 MiB -> 27 
MiB} {GC 27 MiB}  {heap 15 MiB}
Reading summaries:  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  
{GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  
{GC 30 MiB}  {heap 15 MiB}  {GC 30 MiB}  {heap 15 MiB} {GC 30 MiB}
Merging symbols: {heap 15 MiB}Materializing decls:
{heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  
{heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB} 
 {GC released 1 MiB madv_dontneed 2 MiB} {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB}  {heap 15 MiB}  
{heap 15 MiB}  {heap 15 MiB}
Streaming out {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} ./a.ltrans0.o 
( 11257 insns) ./a.ltrans1.o ( 11293 insns) ./a.ltrans2.o ( 8669 insns) 
./a.ltrans3.o ( 138934 insns)


One problem I see here is that while it is OK for Firefox builds it is
bit too coarse for smaller testcases where the memory use is still
importnat.  I guess we may just print KBs before the large gets too
large, just like norton commander does? :)


Sure, let's do it using SIZE_AMOUNT macro.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.



Hi,

This change is causing gcc.dg/timevar[12].C to fail randomly, eg:
FAIL: g++.dg/ext/timevar1.C  -std=gnu++2a (test for excess errors)
Excess errors:
  phase opt and generate :   0.00 (  0%)   0.00 (  0%)
0.01 ( 50%)  8904  (  0%)
  callgraph construction :   0.00 (  0%)   0.00 (  0%)
0.01 ( 50%)  4096  (  0%)

because SIZE_AMOUNT generates no suffix if the size is < 10k, and those tests
now use dg-prune-output "k" and dg-prune-output " 0 "
which is not enough.

Can you fix this?


Sorry for the breakage. I hope Marek has a fix that he'll install.

Martin



Thanks

Christophe


Ready to be installed?
Thanks,
Martin



Honza


Thoughts?
Thanks,
Martin






Re: [committed] ipa: Fix up ipa modref option help texts

2020-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 22, 2020 at 10:13:46AM +0200, Jakub Jelinek via Gcc-patches wrote:
> --- gcc/params.opt.jj 2020-09-21 11:15:53.816516949 +0200
> +++ gcc/params.opt2020-09-22 09:59:37.121115589 +0200
> @@ -882,7 +882,7 @@ Maximum number of refs stored in each mo
>  
>  -param=modref-max-tests=
>  Common Joined UInteger Var(param_modref_max_tests) Init(64)
> -Maximum number of tests perofmed by modref query
> +Maximum number of tests perofmed by modref query.

And seeing the above typo led me to do some spell checking around.
Here is the result, committed as obvious to trunk:

2020-09-22  Jakub Jelinek  

gcc/
* params.opt (--param=modref-max-tests=): Fix typo in help text:
perofmed -> performed.
* common.opt: Fix typo: incrmeental -> incremental.
* ipa-modref.c: Fix typos: recroding -> recording, becaue -> because,
analsis -> analysis.
(class modref_summaries): Fix typo: betweehn -> between.
(analyze_call): Fix typo: calle -> callee.
(read_modref_records): Fix typo: expcted -> expected.
(pass_ipa_modref::execute): Fix typo: calle -> callee.
gcc/c-family/
* c.opt (Wbuiltin-declaration-mismatch): Fix typo in variable name:
warn_builtin_declaraion_mismatch -> warn_builtin_declaration_mismatch.

--- gcc/params.opt.jj   2020-09-22 09:59:37.121115589 +0200
+++ gcc/params.opt  2020-09-22 10:17:30.232362715 +0200
@@ -882,7 +882,7 @@ Maximum number of refs stored in each mo
 
 -param=modref-max-tests=
 Common Joined UInteger Var(param_modref_max_tests) Init(64)
-Maximum number of tests perofmed by modref query.
+Maximum number of tests performed by modref query.
 
 -param=tm-max-aggregate-size=
 Common Joined UInteger Var(param_tm_max_aggregate_size) Init(9) Param 
Optimization
--- gcc/common.opt.jj   2020-09-22 10:00:00.034779245 +0200
+++ gcc/common.opt  2020-09-22 10:24:06.702542396 +0200
@@ -47,7 +47,7 @@ Variable
 bool in_lto_p = false
 
 ; This variable is set to non-0 only by LTO front-end.  1 indicates that
-; the output produced will be used for incrmeental linking (thus weak symbols
+; the output produced will be used for incremental linking (thus weak symbols
 ; can still be bound) and 2 indicates that the IL is going to be linked and
 ; and output to LTO object file.
 Variable
--- gcc/ipa-modref.c.jj 2020-09-21 11:15:53.759517788 +0200
+++ gcc/ipa-modref.c2020-09-22 10:32:01.298577455 +0200
@@ -35,10 +35,10 @@ along with GCC; see the file COPYING3.
propagates across the callgraph and is able to handle recursion and works on
whole program during link-time analysis.
 
-   LTO mode differs from the local mode by not recroding alias sets but types
+   LTO mode differs from the local mode by not recording alias sets but types
that are translated to alias sets later.  This is necessary in order stream
-   the information becaue the alias sets are rebuild at stream-in time and may
-   not correspond to ones seen during analsis.  For this reason part of 
analysis
+   the information because the alias sets are rebuild at stream-in time and may
+   not correspond to ones seen during analysis.  For this reason part of 
analysis
is duplicated.  */
 
 #include "config.h"
@@ -77,7 +77,7 @@ public:
  modref_summary *src_data,
  modref_summary *dst_data);
   /* This flag controls whether newly inserted functions should be analyzed
- in IPA or normal mode.  Functions inserted betweehn IPA analysis and
+ in IPA or normal mode.  Functions inserted between IPA analysis and
  ipa-modref pass execution needs to be analyzed in IPA mode while all
  other insertions leads to normal analysis.  */
   bool ipa;
@@ -413,7 +413,7 @@ analyze_call (modref_summary *cur_summar
 
   struct cgraph_node *callee_node = cgraph_node::get_create (callee);
 
-  /* We can not safely optimize based on summary of calle if it does
+  /* We can not safely optimize based on summary of callee if it does
  not always bind to current def: it is possible that memory load
  was optimized out earlier which may not happen in the interposed
  variant.  */
@@ -815,7 +815,7 @@ write_modref_records (modref_records_lto
 /* Read a modref_tree from the input block IB using the data from DATA_IN.
This assumes that the tree was encoded using write_modref_tree.
Either nolto_ret or lto_ret is initialized by the tree depending whether
-   LTO streaming is expcted or not.  */
+   LTO streaming is expected or not.  */
 
 void
 read_modref_records (lto_input_block *ib, struct data_in *data_in,
@@ -1238,7 +1238,7 @@ unsigned int pass_ipa_modref::execute (f
fprintf (dump_file, "Call to %s\n",
 cur->dump_name ());
 
- /* We can not safely optimize based on summary of calle if it
+ /* We can not safely optimize based on summary of callee if it
 does not always bind to current def: it is possible th

Re: [PATCH] libstdc++: use a link test to test for -Wl,-z,relro

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 16/09/20 12:11 +, JonY via Libstdc++ wrote:

On 9/13/20 3:37 PM, JonY wrote:

On 9/10/20 2:23 PM, JonY wrote:

Do a link test instead of just a grep. The linker can
support multiple targets, but not all targets can use it.

Cygwin/MinGW ld can support ELF but the PE format for Windows itself
does not support such a feature. Attached patch OK?

I'm not confident with regenerating configure due to some unrelated
changes added, can someone else help with that?



Ping?



Ping 2?



I don't see a patch, or any previous email to the libstdc++ list.

Please resend with the patch, CCing libstdc++@

Thanks.





Re: [committed] ipa: Fix up ipa modref option help texts

2020-09-22 Thread Jan Hubicka
> On Tue, Sep 22, 2020 at 10:13:46AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > --- gcc/params.opt.jj   2020-09-21 11:15:53.816516949 +0200
> > +++ gcc/params.opt  2020-09-22 09:59:37.121115589 +0200
> > @@ -882,7 +882,7 @@ Maximum number of refs stored in each mo
> >  
> >  -param=modref-max-tests=
> >  Common Joined UInteger Var(param_modref_max_tests) Init(64)
> > -Maximum number of tests perofmed by modref query
> > +Maximum number of tests perofmed by modref query.
> 
> And seeing the above typo led me to do some spell checking around.
> Here is the result, committed as obvious to trunk:
> 
> 2020-09-22  Jakub Jelinek  
> 
> gcc/
>   * params.opt (--param=modref-max-tests=): Fix typo in help text:
>   perofmed -> performed.
>   * common.opt: Fix typo: incrmeental -> incremental.
>   * ipa-modref.c: Fix typos: recroding -> recording, becaue -> because,
>   analsis -> analysis.
>   (class modref_summaries): Fix typo: betweehn -> between.
>   (analyze_call): Fix typo: calle -> callee.
>   (read_modref_records): Fix typo: expcted -> expected.
>   (pass_ipa_modref::execute): Fix typo: calle -> callee.
> gcc/c-family/
>   * c.opt (Wbuiltin-declaration-mismatch): Fix typo in variable name:
>   warn_builtin_declaraion_mismatch -> warn_builtin_declaration_mismatch.

Thanks a lot and sorry for these.

Honza


Re: [PATCH] libstdc++: use a link test to test for -Wl,-z,relro

2020-09-22 Thread JonY via Gcc-patches
On 9/22/20 8:50 AM, Jonathan Wakely wrote:
> 
> I don't see a patch, or any previous email to the libstdc++ list.
> 
> Please resend with the patch, CCing libstdc++@
> 
> Thanks.
> 
> 
> 

Resent for the record. I've been told it might not be appropriate
because some targets cannot link yet and therefore will fail incorrectly.

Currently, the linker support is misdetected since binutils can support
multiple targets. MinGW/Cygwin PE formats don't support such a flag.
From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 22 Mar 2020 01:59:37 + (+0800)
Subject: libstdc++: use a link test to test for -Wl,-z,relro
X-Git-Url: 
https://repo.or.cz/gcc/cygwin-gcc.git/commitdiff_plain/1b20e03e7468760828bfc70fc5e811b5b3738adf

libstdc++: use a link test to test for -Wl,-z,relro

Do a link test instead of just a grep. The linker can
support multiple targets, but not all targets can use it.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index bc7d137dc74..209aa3a91f3 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -274,7 +274,16 @@ AC_DEFUN([GLIBCXX_CHECK_LINKER_FEATURES], [
   ac_ld_relro=no
   if test x"$with_gnu_ld" = x"yes"; then
 AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
-cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
+ac_save_ldflags="$LDFLAGS"
+LDFLAGS="$LDFLAGS -Wl,-z,relro"
+AC_LINK_IFELSE([
+  AC_LANG_SOURCE(
+[[int main() { return 0; }]]
+  )],
+  [cxx_z_relo="1"],
+  [cxx_z_relo=""])
+   LDFLAGS="$ac_save_ldflags"
+
 if test -n "$cxx_z_relo"; then
   OPT_LDFLAGS="-Wl,-z,relro"
   ac_ld_relro=yes


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] libstdc++: use a link test to test for -Wl,-z,relro

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 22/09/20 09:40 +, JonY via Libstdc++ wrote:

On 9/22/20 8:50 AM, Jonathan Wakely wrote:


I don't see a patch, or any previous email to the libstdc++ list.

Please resend with the patch, CCing libstdc++@

Thanks.





Resent for the record.


Thanks.


I've been told it might not be appropriate
because some targets cannot link yet and therefore will fail incorrectly.



We only use the GLIBCXX_CHECK_LINKER_FEATURES macro for native builds,
or for specific cross targets.  And we already do AC_TRY_LINK just
before the relro check, so I think it's probably OK. But I admit to
not fully understanding this.



Currently, the linker support is misdetected since binutils can support
multiple targets. MinGW/Cygwin PE formats don't support such a flag.



From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 22 Mar 2020 01:59:37 + (+0800)
Subject: libstdc++: use a link test to test for -Wl,-z,relro
X-Git-Url: 
https://repo.or.cz/gcc/cygwin-gcc.git/commitdiff_plain/1b20e03e7468760828bfc70fc5e811b5b3738adf

libstdc++: use a link test to test for -Wl,-z,relro

Do a link test instead of just a grep. The linker can
support multiple targets, but not all targets can use it.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index bc7d137dc74..209aa3a91f3 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -274,7 +274,16 @@ AC_DEFUN([GLIBCXX_CHECK_LINKER_FEATURES], [
  ac_ld_relro=no
  if test x"$with_gnu_ld" = x"yes"; then
AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
-cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
+ac_save_ldflags="$LDFLAGS"
+LDFLAGS="$LDFLAGS -Wl,-z,relro"
+AC_LINK_IFELSE([
+  AC_LANG_SOURCE(
+[[int main() { return 0; }]]
+  )],
+  [cxx_z_relo="1"],
+  [cxx_z_relo=""])
+   LDFLAGS="$ac_save_ldflags"
+
if test -n "$cxx_z_relo"; then
  OPT_LDFLAGS="-Wl,-z,relro"
  ac_ld_relro=yes







[PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-22 Thread Andrea Corallo
Hi all,

having a look for force_reg returned rtx later on modified I've found
this other case in `aarch64_general_expand_builtin` while expanding 
pointer authentication builtins.

Regtested and bootsraped on aarch64-linux-gnu.

Okay for trunk?

  Andrea

>From 8869ee04e3788fdec86aa7e5a13e2eb477091d0e Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Mon, 21 Sep 2020 13:52:45 +0100
Subject: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth
 builtins

2020-09-21  Andrea Corallo  

* config/aarch64/aarch64-builtins.c
(aarch64_general_expand_builtin): Do not alter value on a
force_reg returned rtx.
---
 gcc/config/aarch64/aarch64-builtins.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index b787719cf5e..a77718ccfac 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2079,10 +2079,10 @@ aarch64_general_expand_builtin (unsigned int fcode, 
tree exp, rtx target,
   arg0 = CALL_EXPR_ARG (exp, 0);
   op0 = force_reg (Pmode, expand_normal (arg0));
 
-  if (!target)
+  if (!(target
+   && REG_P (target)
+   && GET_MODE (target) == Pmode))
target = gen_reg_rtx (Pmode);
-  else
-   target = force_reg (Pmode, target);
 
   emit_move_insn (target, op0);
 
-- 
2.17.1



Re: [PATH 3/3] libstdc++: Add std::advance ostreambuf_iterator overload

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 14/09/20 22:36 +0200, François Dumont via Libstdc++ wrote:

On 10/09/20 5:19 pm, Jonathan Wakely wrote:

On 09/09/20 22:12 +0200, François Dumont via Libstdc++ wrote:

libstdc++: Add std::advance overload for ostreambuf_iterator

Implement std::advance overload for ostreambuf_iterator using 
basic_streambuf

pubseekof.

libstdc++-v3/ChangeLog:

        * include/bits/streambuf_iterator.h 
(ostreambuf_iterator): Add

        std::advance friend declaration.
        (advance(ostreambuf_iterator<>&, _Distance)): New.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/2.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/2_neg.cc:

        New test.

Tested under Linux x85_64.

Ok to commit ?


I think relying on seeking here is a bad idea for the same reason as
in my previous email. We don't know what seek does for an arbitrary
derived streambuf, or even if it is possible at all.


After having implementing it similarly to the overload on 
istreambuf_iterator I wrote a test to compare the std;;advance 
behavior with a manual increment. And so I realized that incrementing 
a ostreambuf_iterator is a no-op and so must be the std::advance 
unless you tell otherwise.


There was a reason for this overload to be missing.


OK, good to know.




Re: [PATCH] libstdc++: Pretty printers for std::_Bit_reference, std::_Bit_iterator and std::_Bit_const_iterator

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 14/09/20 16:49 +0200, Michael Weghorn via Libstdc++ wrote:

Hi,

the attached patch implements pretty printers relevant for iteration
over std::vector, thus handling the TODO
added in commit 36d0dada6773d7fd7c5ace64c90e723930a3b81e
("Have std::vector printer's iterator return bool for vector",
2019-06-19):

   TODO add printer for vector's _Bit_iterator and
_Bit_const_iterator

Tested on x86_64-pc-linux-gnu (Debian testing).

I haven't filed any copyright assignment for GCC yet, but I'm happy to
do so when pointed to the right place.


Thanks for the patch! I'll send you the form to start the copyuright
assignment process.




Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Tobias Burnus

On 9/21/20 10:10 AM, Richard Biener wrote:


I see, so you would expect call to alsize to initialize things in
array15_unkonwn type?  That would work too.

Yes, that's my expectation.  But let's see what fortran folks say.


RFC patch attached; I think the following should work, but I am not
sure whether I missed something.

I wonder what to do about
  '!GCC$ NO_ARG_CHECK :: x
but that seems to work fine (creates void* type) and as it only
permits assumed size or scalar variables, the descriptor issue
does not occur.

Thoughts?

Tobias

gcc/fortran/ChangeLog:

	* trans-array.c (gfc_conv_expr_descriptor):
	(gfc_conv_array_parameter):
	* trans-array.h (gfc_conv_expr_descriptor):

 gcc/fortran/trans-array.c | 15 +--
 gcc/fortran/trans-array.h |  3 ++-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 6566c47d4ae..a5d1b477a0a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7216,7 +7216,7 @@ walk_coarray (gfc_expr *e)
function call.  */
 
 void
-gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
+gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr, bool want_assumed_type)
 {
   gfc_ss *ss;
   gfc_ss_type ss_type;
@@ -7611,7 +7611,9 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
   else
 	{
 	  /* Otherwise make a new one.  */
-	  if (expr->ts.type == BT_CHARACTER && expr->ts.deferred)
+	  if (want_assumed_type)
+	parmtype = ptr_type_node;
+	  else if (expr->ts.type == BT_CHARACTER && expr->ts.deferred)
 	parmtype = gfc_typenode_for_spec (&expr->ts);
 	  else
 	parmtype = gfc_get_element_type (TREE_TYPE (desc));
@@ -7950,7 +7952,8 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77,
 {
 	  if (sym->attr.dummy || sym->attr.result)
 	{
-	  gfc_conv_expr_descriptor (se, expr);
+	  gfc_conv_expr_descriptor (se, expr,
+	fsym && fsym->ts.type == BT_ASSUMED);
 	  tmp = se->expr;
 	}
 	  if (size)
@@ -8014,7 +8017,7 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77,
 
   if (no_pack || array_constructor || good_allocatable || ultimate_alloc_comp)
 {
-  gfc_conv_expr_descriptor (se, expr);
+  gfc_conv_expr_descriptor (se, expr, fsym && fsym->ts.type == BT_ASSUMED);
   /* Deallocate the allocatable components of structures that are
 	 not variable.  */
   if ((expr->ts.type == BT_DERIVED || expr->ts.type == BT_CLASS)
@@ -8037,7 +8040,7 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77,
   if (this_array_result)
 {
   /* Result of the enclosing function.  */
-  gfc_conv_expr_descriptor (se, expr);
+  gfc_conv_expr_descriptor (se, expr, fsym && fsym->ts.type == BT_ASSUMED);
   if (size)
 	array_parameter_size (se->expr, expr, size);
   se->expr = gfc_build_addr_expr (NULL_TREE, se->expr);
@@ -8053,7 +8056,7 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77,
 {
   /* Every other type of array.  */
   se->want_pointer = 1;
-  gfc_conv_expr_descriptor (se, expr);
+  gfc_conv_expr_descriptor (se, expr, fsym && fsym->ts.type == BT_ASSUMED);
 
   if (size)
 	array_parameter_size (build_fold_indirect_ref_loc (input_location,
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index e561605aaed..be3b1b79860 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -143,7 +143,8 @@ void gfc_get_dataptr_offset (stmtblock_t*, tree, tree, tree, bool, gfc_expr*);
 /* Obtain the span of an array.  */
 tree gfc_get_array_span (tree, gfc_expr *);
 /* Evaluate an array expression.  */
-void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *);
+void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *,
+			   bool want_assumed_type = false);
 /* Convert an array for passing as an actual function parameter.  */
 void gfc_conv_array_parameter (gfc_se *, gfc_expr *, bool,
 			   const gfc_symbol *, const char *, tree *);


[PATCH] AArch64: Implement missing vceq*_p* intrinsics

2020-09-22 Thread Kyrylo Tkachov
Hi all,

This patch implements some missing vceq* intrinsics on poly types.
The behaviour is to produce the appropriate CMEQ instruction as for the 
unsigned types.

Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk and backporting to the branches after testing.

Thanks,
Kyrill

2020-09-22  Kyrylo Tkachov  

PR target/71233
* config/aarch64/arm_neon.h (vceqq_p64, vceqz_p64, vceqzq_p64): Define.

2020-09-22  Kyrylo Tkachov  

PR target/71233
* gcc.target/aarch64/simd/vceq_poly_1.c: New test.


vceq.patch
Description: vceq.patch


[PATCH] AArch64: Implement poly-type vadd intrinsics

2020-09-22 Thread Kyrylo Tkachov
Hi all,

This implements the vadd[p]_p* intrinsics.
In terms of functionality they are aliases of veor operations on the relevant 
unsigned types.

Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk. Will backport it to the active branches after some time 
too.

Thanks,
Kyrill

gcc/
2020-09-18  Kyrylo Tkachov  

PR target/71233
* config/aarch64/arm_neon.h (vadd_p8, vadd_p16, vadd_p64, vaddq_p8,
vaddq_p16, vaddq_p64, vaddq_p128): Define.

gcc/testsuite
2020-09-18  Kyrylo Tkachov  

PR target/71233
* gcc.target/aarch64/simd/vadd_poly_1.c: New test.


vadd-poly.patch
Description: vadd-poly.patch


[PATCH] AArch64: Implement missing vcls intrinsics on unsigned types

2020-09-22 Thread Kyrylo Tkachov
Hi all,

This patch implements some missing intrinsics that perform a CLS on unsigned 
SIMD types.

Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk and to active branches after testing.

Thanks,
Kyrill

2020-09-22  Kyrylo Tkachov  

PR target/71233
* config/aarch64/arm_neon.h (vcls_u8, vcls_u16, vcls_u32,
vclsq_u8, vclsq_u16, vclsq_u32): Define.

2020-09-22  Kyrylo Tkachov  

PR target/71233
* gcc.target/aarch64/simd/vcls_unsigned_1.c: New test.


vcls-u.patch
Description: vcls-u.patch


[PATCH] aix: retrieve AR for configure for FAT library construction

2020-09-22 Thread CHIGOT, CLEMENT via Gcc-patches
Description:
"ar" was changed to use AR value coming from configure,
as several "ar" might be available. But all the -X flags must first be
removed if the -r option is used. Otherwize, it might replace the
wrong object files.

libgcc/Changelog:
2020-09-22 Clement Chigot 
 * config/rs6000/t-slibgcc-aix: Change ar to use AR value from configure

libatomic/Changelog:
2020-09-22 Clement Chigot 
 * config/t-aix: Change ar to use AR value from configure

libgomp/Changelog:
2020-09-22 Clement Chigot 
 * config/t-aix: Change ar to use AR value from configure

libstdc++/Changelog:
2020-09-22 Clement Chigot 
 * config/os/aix/t-aix: Change ar to use AR value from configure

libgfortran/Changelog:
2020-09-22 Clement Chigot 
 * config/t-aix: Change ar to use AR value from configure




Clément Chigot
ATOS Bull SAS
1 rue de Provence - 38432 Échirolles - France



0001-aix-retrieve-AR-for-configure-for-FAT-library-constr.patch
Description: 0001-aix-retrieve-AR-for-configure-for-FAT-library-constr.patch


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 09:07 +0200, Jan Hubicka wrote:
> > > (gdb) p summaries
> > > $3 = (fast_function_summary *) 0x0
> > > 
> > > I'm still investigating (but may have to call halt for the
> > > night), but
> > > this could be an underlying issue with the new passes; the jit
> > > testsuite runs with the equivalent of:
> > > 
> > > --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > > 
> > > throughout to shake out GC issues (to do a full collection at
> > > each GC
> > > opportunity).
> > > 
> > > Was this code tested with the jit?  Do you see issues in cc1 if
> > > you set
> > > those params?  Anyone else seeing "random" crashes?
> > 
> > I suppose this happes when pass gets constructed but no summary is
> > computed.  Dos the NULL pointer guard here help?
> 
> Hi,
> I am currently in train and can not test the patch easilly, but this
> should help.  If you run the pass on empty input then the destruction
> happens with NULL summaries pointer.
> 
> My apologizes for that.
> Honza
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index af0b710333e..cd92b5a81d3 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -769,7 +885,8 @@ class pass_modref : public gimple_opt_pass
>  
>  ~pass_modref ()
>{
> - ggc_delete (summaries);
> + if (summaries)
> +   ggc_delete (summaries);
>   summaries = NULL;
>}

Thanks; with that it survives the first in-process iteration, but then
dies inside the 3rd in-process iteration, on a different finalizer. 
I'm beginning to suspect a pre-existing bad interaction between
finalizers and jit which perhaps this patch has exposed.

I'll continue to investigate it.

Dave



[PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Martin Liška

Hi.

The patch is about a bail out limit that needs to be added to switch lowering.
Currently the algorithm is quadratic and needs some bail out. I've tested value
of 100K which corresponds to about 0.2s in the problematic test-case before
it's reached.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/96979
* doc/invoke.texi: Document new param max-switch-clustering-attempts.
* params.opt: Add new parameter.
* tree-switch-conversion.c (jump_table_cluster::find_jump_tables):
Limit number of attempts.
(bit_test_cluster::find_bit_tests): Likewise.

gcc/testsuite/ChangeLog:

PR tree-optimization/96979
* g++.dg/tree-ssa/pr96979.C: New test.
---
 gcc/doc/invoke.texi |  4 ++
 gcc/params.opt  |  4 ++
 gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 50 +
 gcc/tree-switch-conversion.c| 17 +
 4 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 665c0ffc4a1..6a7833b1d75 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13452,6 +13452,10 @@ The smallest number of different values for which it 
is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
 0, use the default for the machine.
 
+@item max-switch-clustering-attempts

+The maximum number of clustering attempts used
+in bit-test and jump-table switch expansion.
+
 @item jump-table-max-growth-ratio-for-size
 The maximum code size growth ratio when expanding
 into a jump table (in percent).  The parameter is used when
diff --git a/gcc/params.opt b/gcc/params.opt
index 1d864047ad2..f4dcb5426c7 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -82,6 +82,10 @@ The maximum length of a constant string for a builtin string 
cmp call eligible f
 Common Joined UInteger Var(param_case_values_threshold) Param Optimization
 The smallest number of different values for which it is best to use a 
jump-table instead of a tree of conditional branches, if 0, use the default for 
the machine.
 
+-param=max-switch-clustering-attempts=

+Common Joined UInteger Var(param_max_switch_clustering_attempts) Param 
Optimization Init(10)
+The maximum number of clustering attempts used in bit-test and jump-table 
switch expansion.
+
 -param=comdat-sharing-probability=
 Common Joined UInteger Var(param_comdat_sharing_probability) Init(20) Param 
Optimization
 Probability that COMDAT function will be shared with different compilation 
unit.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
new file mode 100644
index 000..85c703a140d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
@@ -0,0 +1,50 @@
+/* PR tree-optimization/96979 */
+/* { dg-do compile } */
+/* { dg-options "-std=c++17 -O2 -fdump-tree-switchlower1" } */
+
+using u64 = unsigned long long;
+
+constexpr inline u64
+foo (const char *str) noexcept
+{
+  u64 value = 0xcbf29ce484222325ULL;
+  for (u64 i = 0; str[i]; i++)
+value = (value ^ u64(str[i])) * 0x10001b3ULL;
+  return value;
+}
+
+struct V
+{
+  enum W
+  {
+#define A(n) n,
+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) 
A(n##8) A(n##9)
+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
B(n##8) B(n##9)
+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) 
C(n##8) C(n##9)
+#define E D(foo1) D(foo2) D(foo3)
+E
+last
+  };
+
+  constexpr static W
+  bar (const u64 h) noexcept
+  {
+switch (h)
+  {
+#undef A
+#define F(n) #n
+#define A(n) case foo (F(n)): return n;
+E
+  }
+return last;
+  }
+};
+
+int
+baz (const char *s)
+{
+  const u64 h = foo (s);
+  return V::bar (h);
+}
+
+/* { dg-final { scan-tree-dump-times ";; Bail out: --param=max-switch-clustering-attempts 
reached" 2 "switchlower1" } } */
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 186411ff3c4..e6a2c7a6a84 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1183,6 +1183,7 @@ jump_table_cluster::find_jump_tables (vec 
&clusters)
 
   min.quick_push (min_cluster_item (0, 0, 0));
 
+  HOST_WIDE_INT attempts = 0;

   for (unsigned i = 1; i <= l; i++)
 {
   /* Set minimal # of clusters with i-th item to infinite.  */
@@ -1194,6 +1195,14 @@ jump_table_cluster::find_jump_tables (vec 
&clusters)
  if (i - j < case_values_threshold ())
s += i - j;
 
+	  if (attempts++ == param_max_switch_clustering_attempts)

+   {
+ if (dump_file)
+   fprintf (dump_file, ";; Bail out: "
+"--param=max-switch-clustering-attempts reached\n");
+ return clusters.copy ();
+   }
+
  /* Prefer cluster

[PATCH][GCC 8] AArch64: Implement new intrinsics vabsd_s64 and vnegd_s64.

2020-09-22 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this patch from Vlad implementing the missing vabsd_s64 
and vnegd_s64 intrinsics to the GCC 8 branch.
They should have been implemented from the start and they don't require any 
surgery/new builtins.

Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to the branch.
Thanks,
Kyrill

gcc/
2018-08-31  Vlad Lazar  

PR target/71233
* config/aarch64/arm_neon.h (vabsd_s64): New.
(vnegd_s64): Likewise.

gcc/testsuite/
2018-08-31  Vlad Lazar  

PR target/71233
* gcc.target/aarch64/scalar_intrinsics.c (test_vnegd_s64): New.
* gcc.target/aarch64/vneg_s.c (RUN_TEST_SCALAR): New.
(test_vnegd_s64): Likewise.
* gcc.target/aarch64/vnegd_64.c: New.
* gcc.target/aarch64/vabsd_64.c: New.
* gcc.tartget/aarch64/vabs_intrinsic_3.c: New.


vlad.patch
Description: vlad.patch


Re: [PATCH] VEC_COND_EXPR: fix ICE in gimple_expand_vec_cond_expr

2020-09-22 Thread Martin Liška

@Richi: May I please ping this?

On 9/1/20 4:27 PM, Martin Liška wrote:

On 8/31/20 10:01 AM, Richard Biener wrote:

On Fri, Aug 28, 2020 at 4:18 PM Martin Liška  wrote:


Hey.

The patch is about VEC_COND_EXP comparison of a SSA_NAME with a constant
that is artifact of -fno-tree-ccp.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err, no - we shouldn't do this at RTL expansion time.  And piecewise
expansion is gross - in this the case is a constant boolean vector IIRC
(not sure what mode?)


Yes, it's:

(gdb) p op0.typed.type
$11 = 
(gdb) p mode
$13 = E_DImode

and TYPE_MODE (TREE_TYPE (op0)) is E_BLKmode


and ISEL only tries matching up with
a comparison operand.  But ISEL should also handle matching up with
a constant "boolean vector" ("boolean vector"s are artifacts of VEC_COND_EXPRs,
but some targets allow bitwise arithmetic on them...).  So the compensation
needs to happen in ISEL, recognizing

  _1 = {0}
  _2 = _1 ? ...;

as

  _2 = .VCOND (0 == 0, ...)


Do we really want to do a scalar arguments of .VCOND? Note that in PR96453
(a similar bug), we end up with:

   _5 = { -1, -1 };
   _6 = VEC_COND_EXPR <_5, { -1, -1 }, { 0, 0 }>;

Thanks for hints,
Martin



or so.

Richard.


Thanks,
Martin

gcc/ChangeLog:

 PR tree-optimization/96466
 * gimple-fold.c (expand_cmp_piecewise): New.
 * gimple-fold.h (nunits_for_known_piecewise_op): New.
 (expand_cmp_piecewise): Moved from ...
 * tree-vect-generic.c (expand_vector_comparison): ... here.
 (nunits_for_known_piecewise_op): Moved to gimple-fold.h.
 * gimple-isel.cc (gimple_expand_vec_cond_expr): Use
 expand_cmp_piecewise fallback for constants.

gcc/testsuite/ChangeLog:

 PR tree-optimization/96466
 * gcc.dg/vect/pr96466.c: New test.
---
   gcc/gimple-fold.c   | 28 
   gcc/gimple-fold.h   | 14 ++
   gcc/gimple-isel.cc  | 10 ---
   gcc/testsuite/gcc.dg/vect/pr96466.c | 18 +
   gcc/tree-vect-generic.c | 41 ++---
   5 files changed, 69 insertions(+), 42 deletions(-)
   create mode 100644 gcc/testsuite/gcc.dg/vect/pr96466.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index dcc1b56a273..86d5d0ed7d8 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -8056,3 +8056,31 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
depth)
 return false;
   }
   }
+
+tree
+expand_cmp_piecewise (gimple_stmt_iterator *gsi, tree type, tree op0, tree op1)
+{
+  tree inner_type = TREE_TYPE (TREE_TYPE (op0));
+  tree part_width = vector_element_bits_tree (TREE_TYPE (op0));
+  tree index = bitsize_int (0);
+  int nunits = nunits_for_known_piecewise_op (TREE_TYPE (op0));
+  int prec = GET_MODE_PRECISION (SCALAR_TYPE_MODE (type));
+  tree ret_type = build_nonstandard_integer_type (prec, 1);
+  tree ret_inner_type = boolean_type_node;
+  int i;
+  tree t = build_zero_cst (ret_type);
+
+  if (TYPE_PRECISION (ret_inner_type) != 1)
+    ret_inner_type = build_nonstandard_integer_type (1, 1);
+  for (i = 0; i < nunits;
+   i++, index = int_const_binop (PLUS_EXPR, index, part_width))
+    {
+  tree a = tree_vec_extract (gsi, inner_type, op0, part_width, index);
+  tree b = tree_vec_extract (gsi, inner_type, op1, part_width, index);
+  tree result = gimplify_build2 (gsi, NE_EXPR, ret_inner_type, a, b);
+  t = gimplify_build3 (gsi, BIT_INSERT_EXPR, ret_type, t, result,
+  bitsize_int (i));
+    }
+
+  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
+}
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 0ed1d1ffe83..7e843b34f53 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -147,6 +147,20 @@ gimple_build_vector (gimple_seq *seq, tree_vector_builder 
*builder)
   extern bool gimple_stmt_nonnegative_warnv_p (gimple *, bool *, int = 0);
   extern bool gimple_stmt_integer_valued_real_p (gimple *, int = 0);

+/* Return the number of elements in a vector type TYPE that we have
+   already decided needs to be expanded piecewise.  We don't support
+   this kind of expansion for variable-length vectors, since we should
+   always check for target support before introducing uses of those.  */
+
+static inline unsigned int
+nunits_for_known_piecewise_op (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
+}
+
+extern tree expand_cmp_piecewise (gimple_stmt_iterator *gsi, tree lhs,
+ tree op0, tree op1);
+
   /* In gimple-match.c.  */
   extern tree gimple_simplify (enum tree_code, tree, tree,
  gimple_seq *, tree (*)(tree));
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index b330cf4c20e..32e3bc31f7f 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -33,8 +33,8 @@ along with GCC; see the file COPYING3.  If not see
   #include "gimplify-me.h"
   #includ

[committed][libgomp, nvptx] Print error log for link error

2020-09-22 Thread Tom de Vries
Hi,

By running libgomp test-case libgomp.c/target-28.c with GOMP_NVPTX_PTXRW=w
(using a maintenance patch that adds support for this env var), we dump the
ptx in target-28.exe to file.  By editing one ptx file to rename
gomp_nvptx_main to gomp_nvptx_main2 in both declaration and call, and
running with GOMP_NVPTX_PTXRW=r, we trigger a link error:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: cuLinkComplete error: unknown error
...
The error is somewhat uninformative.

Fix this by dumping the error log returned by the failing cuda call, such
that we have instead:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: Link error log error   : \
  Undefined reference to 'gomp_nvptx_main2' in ''
libgomp: cuLinkComplete error: unknown error
...

Build on x86_64 with nvptx accelerator, tested libgomp.

Committed to trunk.

Thanks,
- Tom

[libgomp, nvptx] Print error log for link error

libgomp/ChangeLog:

* plugin/plugin-nvptx.c (link_ptx): Print elog if cuLinkComplete call
fails.

---
 libgomp/plugin/plugin-nvptx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 390804ad1fa..a63dd1a99fb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -701,6 +701,7 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj 
*ptx_objs,
 
   if (r != CUDA_SUCCESS)
 {
+  GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
   GOMP_PLUGIN_error ("cuLinkComplete error: %s", cuda_error (r));
   return false;
 }


Re: [PATCH] IBM Z: Try to make use of load-and-test instructions

2020-09-22 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Mon, Sep 21, 2020 at 06:51:00PM +0200, Andreas Krebbel wrote:
> On 18.09.20 13:10, Stefan Schulze Frielinghaus wrote:
> > This patch enables a peephole2 optimization which transforms a load of
> > constant zero into a temporary register which is then finally used to
> > compare against a floating-point register of interest into a single load
> > and test instruction.  However, the optimization is only applied if both
> > registers are dead afterwards and if we test for (in)equality only.
> > This is relaxed in case of fast math.
> > 
> > This is a follow up to PR88856.
> > 
> > Bootstrapped and regtested on IBM Z.
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/s390.md ("*cmp_ccs_0", "*cmp_ccz_0",
> > "*cmp_ccs_0_fastmath"): Basically change "*cmp_ccs_0" into
> > "*cmp_ccz_0" and for fast math add "*cmp_ccs_0_fastmath".
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/load-and-test-fp-1.c: Change test to include all
> > possible combinations of dead/live registers and comparisons (equality,
> > relational).
> > * gcc.target/s390/load-and-test-fp-2.c: Same as load-and-test-fp-1.c
> > but for fast math.
> > * gcc.target/s390/load-and-test-fp.h: New test included by
> > load-and-test-fp-{1,2}.c.
> 
> Ok for mainline. Please see below for some comments.

Pushed with the mentioned changes in commit 1a84651d164.

Thanks for the review!

Cheers,
Stefan

> 
> Thanks!
> 
> Andreas
> 
> > ---
> >  gcc/config/s390/s390.md   | 54 +++
> >  .../gcc.target/s390/load-and-test-fp-1.c  | 19 +++
> >  .../gcc.target/s390/load-and-test-fp-2.c  | 17 ++
> >  .../gcc.target/s390/load-and-test-fp.h| 12 +
> >  4 files changed, 67 insertions(+), 35 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/load-and-test-fp.h
> > 
> > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> > index 4c3e5400a2b..e591aa7c324 100644
> > --- a/gcc/config/s390/s390.md
> > +++ b/gcc/config/s390/s390.md
> > @@ -1391,23 +1391,55 @@
> >  ; (TF|DF|SF|TD|DD|SD) instructions
> >  
> >  
> > -; FIXME: load and test instructions turn SNaN into QNaN what is not
> > -; acceptable if the target will be used afterwards.  On the other hand
> > -; they are quite convenient for implementing comparisons with 0.0. So
> > -; try to enable them via splitter/peephole if the value isn't needed 
> > anymore.
> > -; See testcases: load-and-test-fp-1.c and load-and-test-fp-2.c
> > +; load and test instructions turn a signaling NaN into a quiet NaN.  Thus 
> > they
> > +; may only be used if the target register is dead afterwards or if fast 
> > math
> > +; is enabled.  The former is done via a peephole optimization.  Note, load 
> > and
> > +; test instructions may only be used for (in)equality comparisons because
> > +; relational comparisons must treat a quiet NaN like a signaling NaN which 
> > is
> > +; not the case for load and test instructions.  For fast math insn
> > +; "cmp_ccs_0_fastmath" applies.
> > +; See testcases load-and-test-fp-{1,2}.c
> > +
> > +(define_peephole2
> > +  [(set (match_operand:FP 0 "register_operand")
> > +   (match_operand:FP 1 "const0_operand"))
> > +   (set (reg:CCZ CC_REGNUM)
> > +   (compare:CCZ (match_operand:FP 2 "register_operand")
> > +(match_operand:FP 3 "register_operand")))]
> > +  "TARGET_HARD_FLOAT
> > +   && FP_REG_P (operands[2])
> > +   && REGNO (operands[0]) == REGNO (operands[3])
> > +   && peep2_reg_dead_p (2, operands[0])
> > +   && peep2_reg_dead_p (2, operands[2])"
> > +  [(parallel
> > +[(set (reg:CCZ CC_REGNUM)
> > + (match_op_dup 4 [(match_dup 2) (match_dup 1)]))
> > + (clobber (match_dup 2))])]
> > +  "operands[4] = gen_rtx_COMPARE (CCZmode, operands[2], operands[1]);")
> 
> Couldn't this be written as:
> 
>  [(parallel
> [(set (reg:CCZ CC_REGNUM)
> (compare:CCZ (match_dup 2) (match_dup 1)))
>  (clobber (match_dup 2))])])
> 
> >  
> >  ; ltxbr, ltdbr, ltebr, ltxtr, ltdtr
> > -(define_insn "*cmp_ccs_0"
> > -  [(set (reg CC_REGNUM)
> > -   (compare (match_operand:FP 0 "register_operand"  "f")
> > -(match_operand:FP 1 "const0_operand""")))
> > -   (clobber (match_operand:FP  2 "register_operand" "=0"))]
> > -  "s390_match_ccmode(insn, CCSmode) && TARGET_HARD_FLOAT"
> > +(define_insn "*cmp_ccz_0"
> > +  [(set (reg:CCZ CC_REGNUM)
> > +   (compare:CCZ (match_operand:FP 0 "register_operand" "f")
> > +(match_operand:FP 1 "const0_operand")))
> > +   (clobber (match_operand:FP 2 "register_operand" "=0"))]
> > +  "TARGET_HARD_FLOAT"
> >"ltr\t%0,%0"
> > [(set_attr "op_type" "RRE")
> >  (set_attr "type"  "fsimp")])
> >  
> > +(define_insn "*cmp_ccs_0_fastmath"
> > +  [(set (reg CC_REGNUM)
> > +   (compare (match_operand:FP 0 "register_operand" "f")
> > +(match_operand:FP 1 "const0_operand")))]
> > +  "s390_match_ccmode (insn, CCSmode)
> > +   && TARGET_HARD_FLOAT
> > +   && !flag_trap

Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Richard Biener via Gcc-patches
On September 22, 2020 1:22:12 PM GMT+02:00, "Martin Liška"  
wrote:
>Hi.
>
>The patch is about a bail out limit that needs to be added to switch
>lowering.
>Currently the algorithm is quadratic and needs some bail out. I've
>tested value
>of 100K which corresponds to about 0.2s in the problematic test-case
>before
>it's reached.
>
>Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
>Ready to be installed?

OK. Though the default limit looks high? 

Richard. 

>Thanks,
>Martin
>
>gcc/ChangeLog:
>
>   PR tree-optimization/96979
>   * doc/invoke.texi: Document new param max-switch-clustering-attempts.
>   * params.opt: Add new parameter.
>   * tree-switch-conversion.c (jump_table_cluster::find_jump_tables):
>   Limit number of attempts.
>   (bit_test_cluster::find_bit_tests): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/96979
>   * g++.dg/tree-ssa/pr96979.C: New test.
>---
>  gcc/doc/invoke.texi |  4 ++
>  gcc/params.opt  |  4 ++
> gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 50 +
>  gcc/tree-switch-conversion.c| 17 +
>  4 files changed, 75 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 665c0ffc4a1..6a7833b1d75 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -13452,6 +13452,10 @@ The smallest number of different values for
>which it is best to use a
> jump-table instead of a tree of conditional branches.  If the value is
>  0, use the default for the machine.
>  
>+@item max-switch-clustering-attempts
>+The maximum number of clustering attempts used
>+in bit-test and jump-table switch expansion.
>+
>  @item jump-table-max-growth-ratio-for-size
>  The maximum code size growth ratio when expanding
>  into a jump table (in percent).  The parameter is used when
>diff --git a/gcc/params.opt b/gcc/params.opt
>index 1d864047ad2..f4dcb5426c7 100644
>--- a/gcc/params.opt
>+++ b/gcc/params.opt
>@@ -82,6 +82,10 @@ The maximum length of a constant string for a
>builtin string cmp call eligible f
>Common Joined UInteger Var(param_case_values_threshold) Param
>Optimization
>The smallest number of different values for which it is best to use a
>jump-table instead of a tree of conditional branches, if 0, use the
>default for the machine.
>  
>+-param=max-switch-clustering-attempts=
>+Common Joined UInteger Var(param_max_switch_clustering_attempts) Param
>Optimization Init(10)
>+The maximum number of clustering attempts used in bit-test and
>jump-table switch expansion.
>+
>  -param=comdat-sharing-probability=
>Common Joined UInteger Var(param_comdat_sharing_probability) Init(20)
>Param Optimization
>Probability that COMDAT function will be shared with different
>compilation unit.
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>new file mode 100644
>index 000..85c703a140d
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>@@ -0,0 +1,50 @@
>+/* PR tree-optimization/96979 */
>+/* { dg-do compile } */
>+/* { dg-options "-std=c++17 -O2 -fdump-tree-switchlower1" } */
>+
>+using u64 = unsigned long long;
>+
>+constexpr inline u64
>+foo (const char *str) noexcept
>+{
>+  u64 value = 0xcbf29ce484222325ULL;
>+  for (u64 i = 0; str[i]; i++)
>+value = (value ^ u64(str[i])) * 0x10001b3ULL;
>+  return value;
>+}
>+
>+struct V
>+{
>+  enum W
>+  {
>+#define A(n) n,
>+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6)
>A(n##7) A(n##8) A(n##9)
>+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6)
>B(n##7) B(n##8) B(n##9)
>+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6)
>C(n##7) C(n##8) C(n##9)
>+#define E D(foo1) D(foo2) D(foo3)
>+E
>+last
>+  };
>+
>+  constexpr static W
>+  bar (const u64 h) noexcept
>+  {
>+switch (h)
>+  {
>+#undef A
>+#define F(n) #n
>+#define A(n) case foo (F(n)): return n;
>+E
>+  }
>+return last;
>+  }
>+};
>+
>+int
>+baz (const char *s)
>+{
>+  const u64 h = foo (s);
>+  return V::bar (h);
>+}
>+
>+/* { dg-final { scan-tree-dump-times ";; Bail out:
>--param=max-switch-clustering-attempts reached" 2 "switchlower1" } } */
>diff --git a/gcc/tree-switch-conversion.c
>b/gcc/tree-switch-conversion.c
>index 186411ff3c4..e6a2c7a6a84 100644
>--- a/gcc/tree-switch-conversion.c
>+++ b/gcc/tree-switch-conversion.c
>@@ -1183,6 +1183,7 @@ jump_table_cluster::find_jump_tables (vec*> &clusters)
>  
>min.quick_push (min_cluster_item (0, 0, 0));
>  
>+  HOST_WIDE_INT attempts = 0;
>for (unsigned i = 1; i <= l; i++)
>  {
>/* Set minimal # of clusters with i-th item to infinite.  */
>@@ -1194,6 +1195,14 @@ jump_table_cluster::find_jump_tables
>(vec &clusters)
> if (i - j < case_values_threshold ())
>   s += i - j;
>  
>+if (attempts++ == param_max_switch_clust

Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> 
> Thanks; with that it survives the first in-process iteration, but then
> dies inside the 3rd in-process iteration, on a different finalizer. 
> I'm beginning to suspect a pre-existing bad interaction between
> finalizers and jit which perhaps this patch has exposed.
> 
> I'll continue to investigate it.

I will look into it tonight (I have meeting now about remote teaching),
I think the pass should not have destructor but there should be fini
function as other passes do, so I will clean this up.  Those dtors make
no sense and there also seems to be some memory leaks.

Honza
> 
> Dave
> 


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Martin Liška

On 9/22/20 2:19 PM, Richard Biener wrote:

OK. Though the default limit looks high?


Yep, I'm going to install it with the param default value
equal to 1.

Thanks,
Martin


Re: [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-09-22 Thread Andreas Krebbel via Gcc-patches
On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
> Over the last couple of months quite a few warnings about uninitialized
> variables were raised while building GCC.  A reason why these warnings
> show up on S/390 only is due to the aggressive inlining settings here.
> Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
> 1657178f59b) could be fixed or in case of a false positive silenced by
> initializing the corresponding variable.  Since the latter reoccurs and
> while bootstrapping such warnings are turned into errors bootstrapping
> fails on S/390 consistently.  Therefore, for the moment do not turn
> those warnings into errors.
> 
> config/ChangeLog:
> 
>   * warnings.m4: Do not turn maybe-uninitialized warnings into errors
>   on S/390.
> 
> fixincludes/ChangeLog:
> 
>   * configure: Regenerate.
> 
> gcc/ChangeLog:
> 
>   * configure: Regenerate.
> 
> libcc1/ChangeLog:
> 
>   * configure: Regenerate.
> 
> libcpp/ChangeLog:
> 
>   * configure: Regenerate.
> 
> libdecnumber/ChangeLog:
> 
>   * configure: Regenerate.

That change looks good to me. Could a global reviewer please comment!

Andreas

> ---
>  config/warnings.m4 | 20 ++--
>  fixincludes/configure  |  8 +++-
>  gcc/configure  | 12 +---
>  libcc1/configure   |  8 +++-
>  libcpp/configure   |  8 +++-
>  libdecnumber/configure |  8 +++-
>  6 files changed, 51 insertions(+), 13 deletions(-)
> 
> diff --git a/config/warnings.m4 b/config/warnings.m4
> index ce007f9b73e..d977bfb20af 100644
> --- a/config/warnings.m4
> +++ b/config/warnings.m4
> @@ -101,8 +101,10 @@ AC_ARG_ENABLE(werror-always,
>  AS_HELP_STRING([--enable-werror-always],
>  [enable -Werror despite compiler version]),
>  [], [enable_werror_always=no])
> -AS_IF([test $enable_werror_always = yes],
> -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> +AS_IF([test $enable_werror_always = yes], [dnl
> +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> +  AS_CASE([$host], [s390*-*-*],
> +  [acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
>   m4_if($1, [manual],,
>   [AS_VAR_PUSHDEF([acx_GCCvers], [acx_cv_prog_cc_gcc_$1_or_newer])dnl
>AC_CACHE_CHECK([whether $CC is GCC >=$1], acx_GCCvers,
> @@ -116,7 +118,9 @@ AS_IF([test $enable_werror_always = yes],
> [AS_VAR_SET(acx_GCCvers, yes)],
> [AS_VAR_SET(acx_GCCvers, no)])])
>   AS_IF([test AS_VAR_GET(acx_GCCvers) = yes],
> -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> +AS_CASE([$host], [s390*-*-*],
> +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
>AS_VAR_POPDEF([acx_GCCvers])])
>  m4_popdef([acx_Var])dnl
>  AC_LANG_POP(C)
> @@ -205,8 +209,10 @@ AC_ARG_ENABLE(werror-always,
>  AS_HELP_STRING([--enable-werror-always],
>  [enable -Werror despite compiler version]),
>  [], [enable_werror_always=no])
> -AS_IF([test $enable_werror_always = yes],
> -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> +AS_IF([test $enable_werror_always = yes], [dnl
> +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> +  AS_CASE([$host], [s390*-*-*],
> +  [strict_warn="$strict_warn -Wno-error=maybe-uninitialized"])])
>   m4_if($1, [manual],,
>   [AS_VAR_PUSHDEF([acx_GXXvers], [acx_cv_prog_cxx_gxx_$1_or_newer])dnl
>AC_CACHE_CHECK([whether $CXX is G++ >=$1], acx_GXXvers,
> @@ -220,7 +226,9 @@ AS_IF([test $enable_werror_always = yes],
> [AS_VAR_SET(acx_GXXvers, yes)],
> [AS_VAR_SET(acx_GXXvers, no)])])
>   AS_IF([test AS_VAR_GET(acx_GXXvers) = yes],
> -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> +AS_CASE([$host], [s390*-*-*],
> +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
>AS_VAR_POPDEF([acx_GXXvers])])
>  m4_popdef([acx_Var])dnl
>  AC_LANG_POP(C++)
> diff --git a/fixincludes/configure b/fixincludes/configure
> index 6e2d67b655b..e0d679cc18e 100755
> --- a/fixincludes/configure
> +++ b/fixincludes/configure
> @@ -4753,7 +4753,13 @@ else
>  fi
>  
>  if test $enable_werror_always = yes; then :
> -  WERROR="$WERROR${WERROR:+ }-Werror"
> +WERROR="$WERROR${WERROR:+ }-Werror"
> +  case $host in #(
> +  s390*-*-*) :
> +WERROR="$WERROR -Wno-error=maybe-uninitialized" ;; #(
> +  *) :
> + ;;
> +esac
>  fi
>  
>  ac_ext=c
> diff --git a/gcc/configure b/gcc/configure
> index 0a09777dd42..ea03581537a 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -7064,7 +7064,13 @@ else
>  fi
>  
>  if test $enable_werror_always = yes; then :
> -  strict_warn="$strict_warn${strict_warn:+ }-Werror"
> +strict_warn="$strict_warn${strict_warn:+ }-Werror"
> +  case $host in #(
> +  s390*-*-*) :
> +strict_warn="$strict_warn -Wno-error=maybe-uninitialized" ;; #(
> +  *) :
> + ;;
> +esac
>  fi
>  
>  ac_ext=cpp
> @@ -19013,7 +19019,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dl

Re: [r11-3315 Regression] FAIL: g++.dg/ext/timevar2.C -std=gnu++98 (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-22 Thread Marek Polacek via Gcc-patches
On Tue, Sep 22, 2020 at 09:07:36AM +0200, Martin Liška wrote:
> On 9/21/20 9:43 PM, Marek Polacek wrote:
> > Ok for trunk?  Tested by running timevar2.C a couple of dozen times.
> 
> Thank you for the fix. It seems to me obvious, I would install it.

Ok, I've pushed the patch.

Marek



Re: [PATCH 2/3] Use MiB unit when displaying memory allocation.

2020-09-22 Thread Marek Polacek via Gcc-patches
On Tue, Sep 22, 2020 at 10:30:00AM +0200, Martin Liška wrote:
> On 9/22/20 9:47 AM, Christophe Lyon wrote:
> > On Wed, 2 Sep 2020 at 15:29, Martin Liška  wrote:
> > > 
> > > On 9/1/20 4:04 PM, Jan Hubicka wrote:
> > > > > The patch is about usage of MiB in memory allocation reports.
> > > > > I see it much better readable than values displayed in KiB:
> > > > > 
> > > > > Reading object files: tramp3d-v4.o {GC released 1 MiB} {GC 19 MiB -> 
> > > > > 19 MiB} {GC 19 MiB}  {heap 12 MiB}
> > > > > Reading the symbol table:
> > > > > Merging declarations: {GC released 1 MiB madv_dontneed 0 MiB} {GC 27 
> > > > > MiB -> 27 MiB} {GC 27 MiB}  {heap 15 MiB}
> > > > > Reading summaries:  {GC 27 MiB}  {heap 15 MiB} 
> > > > >  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  
> > > > > {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  
> > > > > {heap 15 MiB}  {GC 30 MiB}  {heap 15 MiB}  {GC 
> > > > > 30 MiB}  {heap 15 MiB} {GC 30 MiB}
> > > > > Merging symbols: {heap 15 MiB}Materializing decls:
> > > > > {heap 15 MiB}  {heap 15 MiB} 
> > > > >  {heap 15 MiB}  {heap 15 MiB}  {heap 
> > > > > 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB} 
> > > > >  {heap 15 MiB}  {heap 15 MiB}  {heap 
> > > > > 15 MiB}  {GC released 1 MiB madv_dontneed 2 MiB} {GC 
> > > > > trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB}  {heap 15 
> > > > > MiB}  {heap 15 MiB}  {heap 15 MiB}
> > > > > Streaming out {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} 
> > > > > ./a.ltrans0.o ( 11257 insns) ./a.ltrans1.o ( 11293 insns) 
> > > > > ./a.ltrans2.o ( 8669 insns) ./a.ltrans3.o ( 138934 insns)
> > > > 
> > > > One problem I see here is that while it is OK for Firefox builds it is
> > > > bit too coarse for smaller testcases where the memory use is still
> > > > importnat.  I guess we may just print KBs before the large gets too
> > > > large, just like norton commander does? :)
> > > 
> > > Sure, let's do it using SIZE_AMOUNT macro.
> > > 
> > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > 
> > 
> > Hi,
> > 
> > This change is causing gcc.dg/timevar[12].C to fail randomly, eg:
> > FAIL: g++.dg/ext/timevar1.C  -std=gnu++2a (test for excess errors)
> > Excess errors:
> >   phase opt and generate :   0.00 (  0%)   0.00 (  0%)
> > 0.01 ( 50%)  8904  (  0%)
> >   callgraph construction :   0.00 (  0%)   0.00 (  0%)
> > 0.01 ( 50%)  4096  (  0%)
> > 
> > because SIZE_AMOUNT generates no suffix if the size is < 10k, and those 
> > tests
> > now use dg-prune-output "k" and dg-prune-output " 0 "
> > which is not enough.
> > 
> > Can you fix this?
> 
> Sorry for the breakage. I hope Marek has a fix that he'll install.

Should be fixed now!

Marek



Re: [PATCH 2/3] Use MiB unit when displaying memory allocation.

2020-09-22 Thread Marek Polacek via Gcc-patches
On Tue, Sep 22, 2020 at 09:02:08AM -0400, Marek Polacek via Gcc-patches wrote:
> On Tue, Sep 22, 2020 at 10:30:00AM +0200, Martin Liška wrote:
> > On 9/22/20 9:47 AM, Christophe Lyon wrote:
> > > On Wed, 2 Sep 2020 at 15:29, Martin Liška  wrote:
> > > > 
> > > > On 9/1/20 4:04 PM, Jan Hubicka wrote:
> > > > > > The patch is about usage of MiB in memory allocation reports.
> > > > > > I see it much better readable than values displayed in KiB:
> > > > > > 
> > > > > > Reading object files: tramp3d-v4.o {GC released 1 MiB} {GC 19 MiB 
> > > > > > -> 19 MiB} {GC 19 MiB}  {heap 12 MiB}
> > > > > > Reading the symbol table:
> > > > > > Merging declarations: {GC released 1 MiB madv_dontneed 0 MiB} {GC 
> > > > > > 27 MiB -> 27 MiB} {GC 27 MiB}  {heap 15 MiB}
> > > > > > Reading summaries:  {GC 27 MiB}  {heap 15 MiB} 
> > > > > >  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  
> > > > > > {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  
> > > > > > {heap 15 MiB}  {GC 30 MiB}  {heap 15 MiB}  
> > > > > > {GC 30 MiB}  {heap 15 MiB} {GC 30 MiB}
> > > > > > Merging symbols: {heap 15 MiB}Materializing decls:
> > > > > > {heap 15 MiB}  {heap 15 MiB} 
> > > > > >  {heap 15 MiB}  {heap 15 MiB}  {heap 
> > > > > > 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 
> > > > > > MiB}  {heap 15 MiB}  {heap 15 MiB}  
> > > > > > {heap 15 MiB}  {GC released 1 MiB madv_dontneed 2 
> > > > > > MiB} {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} 
> > > > > >  {heap 15 MiB}  {heap 15 MiB}  
> > > > > > {heap 15 MiB}
> > > > > > Streaming out {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} 
> > > > > > ./a.ltrans0.o ( 11257 insns) ./a.ltrans1.o ( 11293 insns) 
> > > > > > ./a.ltrans2.o ( 8669 insns) ./a.ltrans3.o ( 138934 insns)
> > > > > 
> > > > > One problem I see here is that while it is OK for Firefox builds it is
> > > > > bit too coarse for smaller testcases where the memory use is still
> > > > > importnat.  I guess we may just print KBs before the large gets too
> > > > > large, just like norton commander does? :)
> > > > 
> > > > Sure, let's do it using SIZE_AMOUNT macro.
> > > > 
> > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > > 
> > > 
> > > Hi,
> > > 
> > > This change is causing gcc.dg/timevar[12].C to fail randomly, eg:
> > > FAIL: g++.dg/ext/timevar1.C  -std=gnu++2a (test for excess errors)
> > > Excess errors:
> > >   phase opt and generate :   0.00 (  0%)   0.00 (  0%)
> > > 0.01 ( 50%)  8904  (  0%)
> > >   callgraph construction :   0.00 (  0%)   0.00 (  0%)
> > > 0.01 ( 50%)  4096  (  0%)
> > > 
> > > because SIZE_AMOUNT generates no suffix if the size is < 10k, and those 
> > > tests
> > > now use dg-prune-output "k" and dg-prune-output " 0 "
> > > which is not enough.
> > > 
> > > Can you fix this?
> > 
> > Sorry for the breakage. I hope Marek has a fix that he'll install.
> 
> Should be fixed now!

Ah, timevar1.C has the same problem.  Will fix that one now too.

Marek



Re: [PATCH] c++: Ignore __sanitizer_ptr_{sub,cmp} builtin calls during constant expression evaluation [PR97145]

2020-09-22 Thread Jason Merrill via Gcc-patches

On 9/22/20 3:48 AM, Jakub Jelinek wrote:

Hi!

These two builtin calls are added already during parsing before pointer
subtractions or comparisons, normally they perform runtime verification
of whether the pointers point to the same object or different objects,
but during constant expressione valuation we don't really need those
builtins for anything.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2020-09-22  Jakub Jelinek  

PR c++/97145
* constexpr.c (cxx_eval_builtin_function_call): Return void_node for
calls to __sanitize_ptr_{sub,cmp} builtins.

* g++.dg/asan/pr97145.C: New test.

--- gcc/cp/constexpr.c.jj   2020-09-07 16:58:53.342195330 +0200
+++ gcc/cp/constexpr.c  2020-09-21 23:50:39.909211245 +0200
@@ -1355,6 +1355,12 @@ cxx_eval_builtin_function_call (const co
case BUILT_IN_STRSTR:
strops = 2;
strret = 1;
+   break;
+  case BUILT_IN_ASAN_POINTER_COMPARE:
+  case BUILT_IN_ASAN_POINTER_SUBTRACT:
+   /* These builtins shall be ignored during constant expression
+  evaluation.  */
+   return void_node;
default:
break;
}
--- gcc/testsuite/g++.dg/asan/pr97145.C.jj  2020-09-21 17:37:50.408562876 
+0200
+++ gcc/testsuite/g++.dg/asan/pr97145.C 2020-09-21 17:38:26.961031023 +0200
@@ -0,0 +1,7 @@
+// PR c++/97145
+// { dg-do compile { target c++11 } }
+// { dg-options "-fsanitize=address,pointer-subtract,pointer-compare" }
+
+constexpr char *a = nullptr;
+constexpr auto b = a - a;
+constexpr auto c = a < a;

Jakub





Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-22 Thread Tobias Burnus

On 9/22/20 9:36 AM, Jakub Jelinek wrote:


On Tue, Sep 22, 2020 at 09:11:48AM +0200, Tobias Burnus wrote:

Okay – or do we find more issues?

I'm afraid so.

We will slowly converge, hopefully ;-)

Consider:
int v;
#pragma omp declare target to (v)
void foo (void) { v++; }
void bar (void) __attribute__((alias ("foo")));
#pragma omp declare target to (bar)
void baz (void) __attribute__((alias ("foo")));
void qux (void) {


etc.  – I did not convert this into a testcase. Should I?

Do you spot something more? Or is it now fine? (It passes on gcn +
xfailed on nvptx.)

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

gcc/ChangeLog:

	PR middle-end/96390
	* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle
	alias nodes.

libgomp/ChangeLog:

	PR middle-end/96390
	* testsuite/libgomp.c++/pr96390.C: New test.
	* testsuite/libgomp.c-c++-common/pr96390.c: New test.

 gcc/omp-offload.c| 38 +++---
 libgomp/testsuite/libgomp.c++/pr96390.C  | 49 
 libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 26 +
 3 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 32c2485abd4..a1b128f 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -196,21 +196,47 @@ omp_declare_target_var_p (tree decl)
 static tree
 omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data)
 {
-  if (TREE_CODE (*tp) == FUNCTION_DECL
-  && !omp_declare_target_fn_p (*tp)
-  && !lookup_attribute ("omp declare target host", DECL_ATTRIBUTES (*tp)))
+  if (TREE_CODE (*tp) == FUNCTION_DECL)
 {
+  tree decl = *tp;
   tree id = get_identifier ("omp declare target");
-  if (!DECL_EXTERNAL (*tp) && DECL_SAVED_TREE (*tp))
-	((vec *) data)->safe_push (*tp);
-  DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp));
   symtab_node *node = symtab_node::get (*tp);
   if (node != NULL)
 	{
+	  while (node->alias_target)
+	{
+	  node = node->ultimate_alias_target ();
+	  if (!omp_declare_target_fn_p (node->decl)
+		  && !lookup_attribute ("omp declare target host",
+	DECL_ATTRIBUTES (node->decl)))
+		{
+		  node->offloadable = 1;
+		  DECL_ATTRIBUTES (node->decl)
+		= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (node->decl));
+		}
+	  node = symtab_node::get (node->alias_target);
+	}
+	  node = node->ultimate_alias_target ();
+	  decl = node->decl;
+
+	  if (omp_declare_target_fn_p (decl)
+	  || lookup_attribute ("omp declare target host",
+   DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
 	  node->offloadable = 1;
 	  if (ENABLE_OFFLOADING)
 	g->have_offload = true;
 	}
+  else if (omp_declare_target_fn_p (decl)
+	   || lookup_attribute ("omp declare target host",
+DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
+  if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
+	((vec *) data)->safe_push (decl);
+  DECL_ATTRIBUTES (decl) = tree_cons (id, NULL_TREE,
+	  DECL_ATTRIBUTES (decl));
 }
   else if (TYPE_P (*tp))
 *walk_subtrees = 0;
diff --git a/libgomp/testsuite/libgomp.c++/pr96390.C b/libgomp/testsuite/libgomp.c++/pr96390.C
new file mode 100644
index 000..8c770ecb80c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/pr96390.C
@@ -0,0 +1,49 @@
+/* { dg-additional-options "-O0 -fdump-tree-omplower" } */
+/* { dg-xfail-if "PR 97106/PR 97102 - .alias not (yet) supported for nvptx" { offload_target_nvptx } } */
+
+#include 
+#include 
+
+template struct V {
+  int version_called;
+
+  template::type>
+  V ()
+  {
+version_called = 1;
+  }
+
+  template::type>::value)>::type>
+  V (TArg0)
+  {
+version_called = 2;
+  }
+};
+
+template struct S {
+  V v;
+};
+
+int
+main ()
+{
+  int version_set[2] = {-1, -1};
+
+#pragma omp target map(from: version_set[0:2])
+  {
+S<0> s;
+version_set[0] = s.v.version_called;
+V<1> v2((unsigned long) 1);
+version_set[1] = v2.version_called;
+  }
+
+  if (version_set[0] != 1 || version_set[1] != 2)
+abort ();
+  return 0;
+}
+
+/* "3" for S<0>::S, V<0>::V<>, and V<1>::V:  */
+/* { dg-final { scan-tree-dump-times "__attribute__..omp declare target" 3 "omplower" } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr96390.c b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c
new file mode 100644
index 000..692bd730069
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-O0 -fdump-tree-omplower" } */
+/* { dg-xfail-if "PR 97102/PR 97106 - .alias not (yet) supported for nvptx" { offload_target_nvptx } } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int foo () { return 42; }
+int bar () __attribute__((alias ("foo"

[committed, OG10] dwarf: Multi-register CFI address support

2020-09-22 Thread Andrew Stubbs

On 03/09/2020 16:29, Andrew Stubbs wrote:
OK to commit? (Although, I'll hold off until AMD release the 
compatible GDB.)


The ROCm 3.8 ROCGDB is now released. I'm committing the attached patches 
to devel/omp/gcc-10 while I wait for review.


The first patch is the multi-register CFI support, and the second is the 
amdgcn DWARF configuration patch that relies on the first.


Andrew
amdgcn: CFI configuration

The necessary adjustments to support CFI in ROCGDB (ROCm 3.8+).

The -fomit-frame-pointer option now has different defaults because it has now
become useful.  Otherwise the only change in output is in the debug info.

gcc/

	* common/config/gcn/gcn-common.c
	(gcn_option_optimization_table): Change OPT_fomit_frame_pointer to -O3.
	* config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for
	prologue register saves.
	(gcn_expand_prologue): Prefer the frame pointer when emitting CFI.
	(gcn_frame_pointer_rqd): New function.
	(gcn_debug_unwind_info): Use UI_DWARF2.
	(gcn_dwarf_register_number): Map DWARF_LINK_REGISTER to DWARF PC.
	(gcn_dwarf_register_span): DWARF_LINK_REGISTER doesn't span.
	(TARGET_FRAME_POINTER_REQUIRED): Define new hook.
	* config/gcn/gcn.h (DWARF_FRAME_RETURN_COLUMN): New define.
	(DWARF_LINK_REGISTER): New define.
	(FIRST_PSEUDO_REGISTER): Increment.
	(FIXED_REGISTERS): Add entry for DWARF_LINK_REGISTER.
	(CALL_USED_REGISTERS): Likewise.
	(REGISTER_NAMES): Likewise.

diff --git a/gcc/common/config/gcn/gcn-common.c b/gcc/common/config/gcn/gcn-common.c
index 9642f9cc5a6..6d10cc9d63f 100644
--- a/gcc/common/config/gcn/gcn-common.c
+++ b/gcc/common/config/gcn/gcn-common.c
@@ -27,7 +27,7 @@
 /* Set default optimization options.  */
 static const struct default_options gcn_option_optimization_table[] =
   {
-{ OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index d78a52fd8c9..63c14b648bf 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2648,6 +2648,7 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,
   rtx as = gen_rtx_CONST_INT (VOIDmode, STACK_ADDR_SPACE);
   HOST_WIDE_INT exec_set = 0;
   int offreg_set = 0;
+  auto_vec saved_sgprs;
 
   start_sequence ();
 
@@ -2664,7 +2665,10 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,
 	int lane = saved_scalars % 64;
 
 	if (prologue)
-	  emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+	  {
+	emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+	saved_sgprs.safe_push (regno);
+	  }
 	else
 	  emit_insn (gen_vec_extractv64sisi (reg, vreg, GEN_INT (lane)));
 
@@ -2697,7 +2701,7 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,
   gcn_gen_undef (V64SImode), exec));
 
   /* Move vectors.  */
-  for (regno = FIRST_VGPR_REG, offset = offsets->pretend_size;
+  for (regno = FIRST_VGPR_REG, offset = 0;
regno < FIRST_PSEUDO_REGISTER; regno++)
 if ((df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
 	|| (regno == VGPR_REGNO (6) && saved_scalars > 0)
@@ -2718,8 +2722,67 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,
 	  }
 
 	if (prologue)
-	  emit_insn (gen_scatterv64si_insn_1offset_exec (vsp, const0_rtx, reg,
-			 as, const0_rtx, exec));
+	  {
+	rtx insn = emit_insn (gen_scatterv64si_insn_1offset_exec
+  (vsp, const0_rtx, reg, as, const0_rtx,
+   exec));
+
+	/* Add CFI metadata.  */
+	rtx note;
+	if (regno == VGPR_REGNO (6) || regno == VGPR_REGNO (7))
+	  {
+		int start = (regno == VGPR_REGNO (7) ? 64 : 0);
+		int count = MIN (saved_scalars - start, 64);
+		int add_lr = (regno == VGPR_REGNO (6)
+			  && df_regs_ever_live_p (LINK_REGNUM));
+		int lrdest = -1;
+		rtvec seq = rtvec_alloc (count + add_lr);
+
+		/* Add an REG_FRAME_RELATED_EXPR entry for each scalar
+		   register that was saved in this batch.  */
+		for (int idx = 0; idx < count; idx++)
+		  {
+		int stackaddr = offset + idx * 4;
+		rtx dest = gen_rtx_MEM (SImode,
+	gen_rtx_PLUS
+	(DImode, sp,
+	 GEN_INT (stackaddr)));
+		rtx src = gen_rtx_REG (SImode, saved_sgprs[start + idx]);
+		rtx set = gen_rtx_SET (dest, src);
+		RTX_FRAME_RELATED_P (set) = 1;
+		RTVEC_ELT (seq, idx) = set;
+
+		if (saved_sgprs[start + idx] == LINK_REGNUM)
+		  lrdest = stackaddr;
+		  }
+
+		/* Add an additional expression for DWARF_LINK_REGISTER if
+		   LINK_REGNUM was saved.  */
+		if (lrdest != -1)
+		  {
+		rtx dest = gen_rtx_MEM (DImode,
+	gen_rtx_PLUS
+	(DImode, sp,
+	 GEN_INT (lrdest)));
+		rtx src = gen_rtx_REG (DImode, DWARF_LINK_REGISTER);
+		rtx set = gen_rtx_SET (dest, src);
+		RTX_FRAME_RELATED_P (set) = 1;
+		RTVEC_ELT (seq, count) = set;
+		  }
+
+		note = gen_rtx_SEQUENCE (VOIDmode, seq);
+	  }
+	else
+	  {
+		rtx dest = gen_rtx_MEM 

Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 22, 2020 at 04:11:19PM +0200, Tobias Burnus wrote:
> +   while (node->alias_target)
> + {
> +   node = node->ultimate_alias_target ();

At least in theory, ultimate_alias_target can look through multiple aliases.
While it might not do that most of the time because this is executed quite
early, I think we have no guarantees it will never do it.
So I'd prefer what you had in the earlier patch, i.e. do the
ultimate_alias_target call + loop to find the ultimate node, and then
in another loop go from the original node (inclusive) up to the ultimate one
(exclusive) and do what you do in this loop now.
Does that make sense?

> +   if (!omp_declare_target_fn_p (node->decl)
> +   && !lookup_attribute ("omp declare target host",
> + DECL_ATTRIBUTES (node->decl)))
> + {
> +   node->offloadable = 1;
> +   DECL_ATTRIBUTES (node->decl)
> + = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (node->decl));
> + }
> +   node = symtab_node::get (node->alias_target);

Jakub



[PATCH] vect: Fix epilogue loop handling of partial vectors

2020-09-22 Thread Richard Sandiford
Richard Sandiford  writes:
> I'll try to have a patch ready tomorrow morning European time.

Well, I totally failed to hit that deadline.  When testing on Power,
I saw a couple of extra failures, but I now think they're improvements
rather than regressions.  See the point about single-iteration
epilogues below.

---

This patch fixes the fallout that Kewen reported on Power after
the recent change to avoid unnecessary use of partial vectors.
As Kewen said, the problem is that vect_analyze_loop_2 doesn't
know how many epilogue iterations there will be, and so it
cannot make a final decision about whether the number of
iterations forces an epilogue loop to use partial vectors.

This is similar to the current situation for peeling: we don't know
during initial analysis whether an epilogue loop will itself require
peeling.  Instead we decide that during vect_do_peeling, where the
final number of epilogue loop iterations is known.

The patch takes a similar approach for the decision about whether
to use partial vectors.  As the comments in the patch say, the
idea is that vect_analyze_loop_2 should make peeling and partial-
vector decisions based on the assumption that the loop_vinfo will
be used as the main loop, while vect_do_peeling should make them
in the knowledge that the loop_vinfo will be used as an epilogue loop.

This allows the same analysis to be used for both cases, which we
rely on for implementing VECT_COMPARE_COSTS; see the big comment
in vect_analyze_loop for details.

I hope the patch makes the (mostly preexisting) structure a bit
more obvious.  It isn't what anyone would design from scratch,
but that's the nature of working with a mature vector framework.

Arranging things this way means that vect_verify_full_masking
and vect_verify_loop_lens now become part of the “can” rather
than “will” test for partial vectors.

Also, while splitting out the logic that handles epilogues with
constant iterations, I added a check to make sure that we don't
try to use partial vectors to vectorise a single-scalar loop.
This required some changes to the Power tests.

Tested on aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
powerpc64le-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function.
* tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
Reject using vector epilogue loops for single iterations.  Install
the constant number of epilogue loop iterations in the associated
loop_vinfo.  Rely on vect_determine_partial_vectors_and_peeling
to do the main part of the test.
(vect_do_peeling): Use vect_update_epilogue_niters to handle
epilogue loops with a known number of iterations.  Skip recomputing
the number of iterations later in that case.  Otherwise, use
vect_determine_partial_vectors_and_peeling to decide whether the
epilogue loop needs to use partial vectors or peeling.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
default can_use_partial_vectors_p to false if partial-vector-usage=0.
(determine_peel_for_niter): Remove in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function,
split out from...
(vect_analyze_loop_2): ...here.  Reflect the vect_verify_full_masking
and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
rather than USING_PARTIAL_VECTORS_P.

gcc/testsuite/
* gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
single-iteration epilogues of the 64-bit loops to be vectorized.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
---
 .../gcc.target/powerpc/p9-vec-length-epil-1.c |   4 +-
 .../gcc.target/powerpc/p9-vec-length-epil-7.c |   2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-8.c |   4 +-
 gcc/tree-vect-loop-manip.c|  83 +---
 gcc/tree-vect-loop.c  | 196 --
 gcc/tree-vectorizer.h |   3 +-
 6 files changed, 192 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9dffc5570e5..b7fa6bc8d2f 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref 
(vec_info *,
 extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo);
 bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
 /* Used in tree-vect-loop-manip.c */
-extern void determine_peel_for_niter (loop_vec_info);
+extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_info,
+ bool);
 /* Used in gimple-loop-interchange.c and tree-parloops.c.  */
 extern bool check_reduction_path (d

[committed] libstdc++: Introduce new headers for C++20 ranges components

2020-09-22 Thread Jonathan Wakely via Gcc-patches
This introduces two new headers:

 defines the minimal components needed
for using C++20 ranges (customization point objects such as
std::ranges::begin, concepts such as std::ranges::range, etc.)

 includes  and additionally
defines subrange, which is needed by .

Most of the content of  was previously defined in
, but a few pieces were only defined in .
This meant the entire  header was needed in  and
, even though they don't use all the range adaptors.

By moving the ranges components out of  that file
is left defining just the contents of [iterator.range] i.e. std::begin,
std::end, std::size etc. and not C++20 ranges components.

For consistency with other C++20 ranges headers,  is
renamed to .

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new headers and adjust for renamed
header.
* include/Makefile.in: Regenerate.
* include/bits/iterator_concepts.h: Adjust for renamed header.
* include/bits/range_access.h (ranges::*): Move to new
 header.
* include/bits/ranges_algobase.h: Include new 
header instead of .
* include/bits/ranges_algo.h: Include new 
header.
* include/bits/range_cmp.h: Moved to...
* include/bits/ranges_cmp.h: ...here.
* include/bits/ranges_base.h: New header.
* include/bits/ranges_util.h: New header.
* include/experimental/string_view: Include new
 header.
* include/std/functional: Adjust for renamed header.
* include/std/ranges (ranges::view_base, ranges::enable_view)
(ranges::dangling, ranges::borrowed_iterator_t): Move to new
 header.
(ranges::view_interface, ranges::subrange)
(ranges::borrowed_subrange_t): Move to new 
header.
* include/std/span: Include new  header.
* include/std/string_view: Likewise.
* testsuite/24_iterators/back_insert_iterator/pr93884.cc: Add
missing  header.
* testsuite/24_iterators/front_insert_iterator/pr93884.cc:
Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 160061ac10f9143d9698daac5f7e46b5a615825c
Author: Jonathan Wakely 
Date:   Tue Sep 22 15:45:54 2020

libstdc++: Introduce new headers for C++20 ranges components

This introduces two new headers:

 defines the minimal components needed
for using C++20 ranges (customization point objects such as
std::ranges::begin, concepts such as std::ranges::range, etc.)

 includes  and additionally
defines subrange, which is needed by .

Most of the content of  was previously defined in
, but a few pieces were only defined in .
This meant the entire  header was needed in  and
, even though they don't use all the range adaptors.

By moving the ranges components out of  that file
is left defining just the contents of [iterator.range] i.e. std::begin,
std::end, std::size etc. and not C++20 ranges components.

For consistency with other C++20 ranges headers,  is
renamed to .

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new headers and adjust for renamed
header.
* include/Makefile.in: Regenerate.
* include/bits/iterator_concepts.h: Adjust for renamed header.
* include/bits/range_access.h (ranges::*): Move to new
 header.
* include/bits/ranges_algobase.h: Include new 
header instead of .
* include/bits/ranges_algo.h: Include new 
header.
* include/bits/range_cmp.h: Moved to...
* include/bits/ranges_cmp.h: ...here.
* include/bits/ranges_base.h: New header.
* include/bits/ranges_util.h: New header.
* include/experimental/string_view: Include new
 header.
* include/std/functional: Adjust for renamed header.
* include/std/ranges (ranges::view_base, ranges::enable_view)
(ranges::dangling, ranges::borrowed_iterator_t): Move to new
 header.
(ranges::view_interface, ranges::subrange)
(ranges::borrowed_subrange_t): Move to new 
header.
* include/std/span: Include new  header.
* include/std/string_view: Likewise.
* testsuite/24_iterators/back_insert_iterator/pr93884.cc: Add
missing  header.
* testsuite/24_iterators/front_insert_iterator/pr93884.cc:
Likewise.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index c9df9a9d6c6..28d273924ee 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -158,10 +158,12 @@ bits_headers = \
${bits_srcdir}/random.h \
${bits_srcdir}/random.tcc \
${bits_srcdir}/range_access.h \
-   ${bits_srcdir}/range_cmp.h \
${bits_srcdir}/ranges_algobase.h \
${bits_srcdir}/ranges_algo.h \
+   ${bits_srcdi

[PATCH] aarch64: Add HF routines to libgcc_s.so

2020-09-22 Thread Richard Sandiford
The libgcc HF support routines were being linked into libgcc_s.so,
but weren't being exported.

Tested on aarch64-linux-gnu and aarch64_be-elf.  Any thoughts?
I'll apply Monday next week if there are no objections by then.

I guess there's the question whether we should backport this
to release branches.  It seems a bit dangerous to change the
ABI there though, even as a pure extension.  It would also
make the choice of symbol version less clear-cut.

Richard


libgcc/
* config/aarch64/libgcc-softfp.ver: New file.
* config/aarch64/t-softfp (SHLIB_MAPFILES): Add it.
---
 libgcc/config/aarch64/libgcc-softfp.ver | 28 +
 libgcc/config/aarch64/t-softfp  |  1 +
 2 files changed, 29 insertions(+)
 create mode 100644 libgcc/config/aarch64/libgcc-softfp.ver

diff --git a/libgcc/config/aarch64/libgcc-softfp.ver 
b/libgcc/config/aarch64/libgcc-softfp.ver
new file mode 100644
index 000..b51a3dedb3f
--- /dev/null
+++ b/libgcc/config/aarch64/libgcc-softfp.ver
@@ -0,0 +1,28 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+GCC_11.0 {
+  __divhc3
+  __extendhftf2
+  __fixhfti
+  __fixunshfti
+  __floattihf
+  __floatuntihf
+  __mulhc3
+  __trunctfhf2
+}
diff --git a/libgcc/config/aarch64/t-softfp b/libgcc/config/aarch64/t-softfp
index c4ce0dc0097..981ced7444f 100644
--- a/libgcc/config/aarch64/t-softfp
+++ b/libgcc/config/aarch64/t-softfp
@@ -8,3 +8,4 @@ softfp_extras := fixhfti fixunshfti floattihf floatuntihf
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
 
 LIB2ADD += $(srcdir)/config/aarch64/sfp-exceptions.c
+SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-softfp.ver


Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-22 Thread Segher Boessenkool
Hi Alex,

On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote:
> On 21/09/2020 18:35, Segher Boessenkool wrote:
> Thanks for doing this testing. The results look good, then: no code size
> changes and no build regressions.

No *code* changes.  I cannot test aarch64 likme this.

> > So, there is no difference for most targets (I checked some targets and
> > there really is no difference).  The only exception is aarch64 (which
> > the kernel calls "arm64"): the unpatched compiler ICEs!  (At least three
> > times, even).
> 
> Indeed, this is the intended purpose of the patch, see the PR (96998).

You want to fix a ICE in LRA caused by an instruction created by LRA,
with a patch to combine?!  That doesn't sound right.

If what you want to do is a) fix the backend bug, and then b) get some
extra performance, then do *that*, and keep the patches separate.


> > Can you fix this first?  There probably is something target-specific
> > wrong related to zero_extract.
> 
> The intent is to fix this in combine here. See the earlier replies in
> this thread.

But that is logically impossible.  The infringing insn does not *exist*
yet during combine.


Segher


Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-22 Thread Tobias Burnus

On 9/22/20 4:24 PM, Jakub Jelinek wrote:

On Tue, Sep 22, 2020 at 04:11:19PM +0200, Tobias Burnus wrote:

+  while (node->alias_target)
+{
+  node = node->ultimate_alias_target ();

At least in theory, ultimate_alias_target can look through multiple aliases.


Granted. But we need to handle two things:
* target alias (such as for __attribute__(alias(...))
  for this one, I can walk 'node = node->alias_target'
  here, I need to mark all nodes on the way.
* C++ aliases
  for this one, I used node = node->ultimate_alias_target ()
  here, I only need to mark the last one as only
  that function decl is streamed out


While it might not do that most of the time because this is executed quite
early, I think we have no guarantees it will never do it.
So I'd prefer what you had in the earlier patch, i.e. do the
ultimate_alias_target call + loop to find the ultimate node, and then
in another loop go from the original node (inclusive) up to the ultimate one
(exclusive) and do what you do in this loop now.
Does that make sense?


I am lost. What do I gain by running the loops twice? Initially
the idea was to return NULL_TREE but as you example showed we need
to mark all unmarked ones until to final node.

Thus, we have to mark all – even if the final one is already
'omp declare target'.

But in that case, why can't we do it in a single loop?



If we assume that there is no c++ aliasing, we could do:

  while (node->alias_target)
{
  // see assumption above: // node = node->ultimate_alias_target ();
  if (!omp_declare_target_fn_p (node->decl)
  && !lookup_attribute ("omp declare target host",
DECL_ATTRIBUTES (node->decl)))
{
  node->offloadable = 1;
  DECL_ATTRIBUTES (node->decl)
= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (node->decl));
}
  node = symtab_node::get (node->alias_target);
}
  // all node->alias_target resolved
  node = node->ultimate_alias_target ();

That would avoid the in-between calling of ultimate_alias_target() but still
calls it if there is no alias_target or for the final alias target.

Is this (really) better?

BTW: When the assumption about the ordering completely changes,
the current __attribute__(alias(…)) testcase will fail; this might
not catch all issues but at least if it completely changes.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [RFC] update COUNTs of BB in loop.

2020-09-22 Thread Segher Boessenkool
Hi!

(Just some random things; the change itself looks fine afaics.)

On Tue, Sep 22, 2020 at 01:37:47PM +0800, guojiufu wrote:
> While after pcom:
> 
>COUNT:268435456  pre-header
> |
> |  ..
> |  ||
> V  v|
>COUNT:268435456|
>/ \  |
>50%/   \ |
>  / \|
> v   v   |
> COUNT:134217728  COUNT:134217728  | 
> exit-edge |   latch |
>   ._.
> 
> COUNT != COUNT + COUNT

And  COUNT != COUNT  which does not make any sense.

> +/* Recacluate the COUNTs of BBs in LOOP, if the probility of exit edge
> +   is NEW_EXIT_P.  */

"Recalculate", "probability".  I would say "where the probability of the
single exit edge is NEW_EXIT_P".  But, _P means this is a predicate, and
it isn't (only function can be predicates, by definition).
"new_exit_prob" maybe?

> +  /* Update BB counts in loop body.
> + COUNT = COUNT
> + COUNT = COUNT * exit_edge_probility
> + The COUNT=COUNT * old_exit_p / new_exit_p.  */

Spaces around the "=" please?


Segher


c++: fix injected friend of template class

2020-09-22 Thread Nathan Sidwell


In working on fixing hiddenness, I discovered some suspicious code in
template instantiation.  I suspect it dates from when we didn't do the
hidden friend injection thing at all.  The xreftag finds the same
class, but makes it visible to name lookup.  Which is wrong.
hurrah, fixing a bug by deleting code!

gcc/cp/
* pt.c (instantiate_class_template_1): Do not repush and unhide
injected friend.
gcc/testsuite/
* g++.old-deja/g++.pt/friend34.C: Check injected friend is still
invisible.

pushed to trunk

nathan

--
Nathan Sidwell
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 97d0c245f7e..44ca14afc4e 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -12030,25 +12030,6 @@ instantiate_class_template_1 (tree type)
 		adjust_processing_template_decl = true;
 		  --processing_template_decl;
 		}
-	  else if (TREE_CODE (friend_type) != BOUND_TEMPLATE_TEMPLATE_PARM
-		   && !CLASSTYPE_USE_TEMPLATE (friend_type)
-		   && TYPE_HIDDEN_P (friend_type))
-		{
-		  /* friend class C;
-
-		 where C hasn't been declared yet.  Let's lookup name
-		 from namespace scope directly, bypassing any name that
-		 come from dependent base class.  */
-		  tree ns = decl_namespace_context (TYPE_MAIN_DECL (friend_type));
-
-		  /* The call to xref_tag_from_type does injection for friend
-		 classes.  */
-		  push_nested_namespace (ns);
-		  friend_type =
-		xref_tag_from_type (friend_type, NULL_TREE,
-	/*tag_scope=*/ts_current);
-		  pop_nested_namespace (ns);
-		}
 	  else if (uses_template_parms (friend_type))
 		/* friend class C;  */
 		friend_type = tsubst (friend_type, args,
diff --git i/gcc/testsuite/g++.old-deja/g++.pt/friend34.C w/gcc/testsuite/g++.old-deja/g++.pt/friend34.C
index 5e80ab98b2e..dcd6df0ce55 100644
--- i/gcc/testsuite/g++.old-deja/g++.pt/friend34.C
+++ w/gcc/testsuite/g++.old-deja/g++.pt/friend34.C
@@ -6,9 +6,12 @@
 template 
 class bar {
 public:
-  friend class foo; // this is not bar::foo, it forward-declares ::foo
+  friend class foo; // this is not bar::foo, it injects hidden ::foo
   class foo {};
   bar() { foo(); } // but this should refer to bar::foo
 };
 
 bar<> baz;
+
+// We still have not made foo visible.
+foo *b;  // { dg-error "does not name a type" }


Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-22 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi Alex,
>
> On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote:
>> On 21/09/2020 18:35, Segher Boessenkool wrote:
>> Thanks for doing this testing. The results look good, then: no code size
>> changes and no build regressions.
>
> No *code* changes.  I cannot test aarch64 likme this.
>
>> > So, there is no difference for most targets (I checked some targets and
>> > there really is no difference).  The only exception is aarch64 (which
>> > the kernel calls "arm64"): the unpatched compiler ICEs!  (At least three
>> > times, even).
>> 
>> Indeed, this is the intended purpose of the patch, see the PR (96998).
>
> You want to fix a ICE in LRA caused by an instruction created by LRA,
> with a patch to combine?!  That doesn't sound right.
>
> If what you want to do is a) fix the backend bug, and then b) get some
> extra performance, then do *that*, and keep the patches separate.

This patch isn't supposed to be a performance optimisation.  It's supposed
to be a canonicalisation improvement.

The situation as things stand is that aarch64 has a bug: it accepts
an odd sign_extract representation of addresses, but doesn't accept
that same odd form of address as an LEA.  We have two options:

(a) add back instructions that recognise the odd form of LEA, or
(b) remove the code that accepts the odd addresses

I think (b) is the way to go here.  But doing that on its own
would regress code quality.  The reason we recognised the odd
addresses in the first place was because that was the rtl that
combine happened to generate for an important case.

Normal operating procedure is to add patterns and address-matching
code that accepts whatever combine happens to throw at the target,
regardless of how sensible the representation is.  But sometimes I think
we should instead think about whether the representation that combine is
using is the right one.  And IMO this is one such case.

At the moment combine does this:

Trying 8 -> 9:
8: r98:DI=sign_extend(r92:SI)
  REG_DEAD r92:SI
9: [r98:DI*0x4+r96:DI]=asm_operands
  REG_DEAD r98:DI
Failed to match this instruction:
(set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 
0)
(const_int 4 [0x4]))
(const_int 34 [0x22])
(const_int 0 [0]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] /tmp/foo.c:13))
allowing combination of insns 8 and 9
original costs 4 + 4 = 8
replacement cost 4

and so that's one of the forms that the aarch64 address code accepts.
But a natural substitution would simply replace r98 with the rhs of
the set:

  (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92))
 (const_int 4))
(reg:DI 96)))
   ...)

The only reason we don't do that is because the substitution
and simplification go through the expand_compound_operation/
make_compound_operation process.

The corresponding (ashift ... (const_int 2)) *does* end up using
the natural sign_extend form rather than sign_extract form.
The only reason we get this (IMO) weird behaviour for mult is
the rule that shifts have to be mults in addresses.  Some code
(like the code being patched) instead expects ashift to be the
canonical form in all situations.

If we make the substitution work “naturally” for mult as well as
ashift, we can remove the addressing-matching code that has no
corresponding LEA pattern, and make the aarch64 address code
self-consistent that way instead.

Thanks,
Richard


Re: [PATCH] arm: Add new vector mode macros

2020-09-22 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: 16 September 2020 11:15
>> To: gcc-patches@gcc.gnu.org
>> Cc: ni...@redhat.com; Richard Earnshaw ;
>> Ramana Radhakrishnan ; Kyrylo
>> Tkachov ; Dennis Zhang
>> 
>> Subject: Re: [PATCH] arm: Add new vector mode macros
>> 
>> Ping
>> 
>> Richard Sandiford  writes:
>> > [ This is related to Dennis's subtraction patch
>> >   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
>> >   and the discussion about how the patterns were written.  I wanted
>> >   to see whether there was a way that we could simplify the current
>> >   addition handling that might perhaps make it easier to add other
>> >   MVE operations in future.  It seemed like one of those situations
>> >   in which the most productive thing would be to try it and see,
>> >   rather than just describe it in words.
>> >
>> >   One of the questions Ramana had in the thread above was: why does
>> >   MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
>> >   is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
>> >   flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
>> >   doesn't need flag_unsafe_math_optimizations.) ]
>> >
>> > The AArch32 port now has three vector extensions: iwMMXt, Neon
>> > and MVE.  We already have some named expanders that are shared
>> > by all three, and soon we'll need more.
>> >
>> > One way of handling this would be to use define_mode_iterators
>> > that specify the condition for each mode.  For example,
>> >
>> >   (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
>> >   (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
>> >   ...
>> >   (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")
>> >
>> > etc.  However, we'll need several mode iterators, and it would
>> > be repetitive to specify the mode condition every time.
>> >
>> > This patch therefore introduces per-mode macros that say whether
>> > we can perform general arithmetic on the mode.  Initially there are
>> > two sets of macros:
>> >
>> > ARM_HAVE_NEON__ARITH
>> >   true if Neon can handle general arithmetic on 
>> >
>> > ARM_HAVE__ARITH
>> >   true if any vector extension can handle general arithmetic on 
>> >
>> > The macro definitions themselves are undeniably ugly, but hopefully
>> > they're justified by the simplifications they allow.
>> >
>> > The patch converts the addition patterns to use this scheme.
>> >
>> > Previously there were three copies of the V8HF and V4HF addition
>> > patterns for Neon:
>> >
>> > (1) *add3_neon, which provided plus:VnHF even without
>> > TARGET_NEON_FP16INST.  This was probably harmless since all the
>> > named patterns had an appropriate guard, but it is possible that
>> > something could have tried to generate the plus directly, such as
>> > by using a REG_EQUAL note to generate a new pattern.
>> >
>> > (2) addv8hf3_neon and addv4hf3, which had the correct
>> > TARGET_NEON_FP16INST target condition, but unnecessarily required
>> > flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
>> > operations do not force flush to zero.
>> >
>> > (3) add3_fp16, which provided plus:VnHF with the
>> > correct conditions (TARGET_NEON_FP16INST, with no
>> > flag_unsafe_math_optimizations test).
>> >
>> > The patch in essence renames add3_fp16 to
>> *add3_neon
>> > (part of *add3_neon) and removes the other two patterns.
>> >
>> > WDYT?  Does this look like a way forward?
>
> This looks like a productive way forward to me.
> Okay if the other maintainer don't object by the end of the week.

Thanks.  Dennis pointed out off-list that it regressed
armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
flush-to-zero control.

This version therefore updates the test to expect the same output
as the -ffast-math version.

Tested on arm-linux-gnueabi (hopefully for real this time -- I must
have messed up the testing last time).  OK for trunk?

FWIW, the non-testsuite part is the same as before.

Richard


gcc/
* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
Likewise.
* config/arm/iterators.md (VNIM, VNINOTM): Delete.
* config/arm/vec-common.md (add3, addv8hf3)
   

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches 
>>  wrote:
>> 
>> 
>> 
>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
> But in cases where there is no underlying concept that can sensibly
> be extracted out, it's OK if targets need to override the default
> to get correct behaviour.
 
 Then, on the target that the default code is not right, and we haven’t 
 provide overridden implementation, what should we inform the end user 
 about this?
 The user might see the documentation about -fzero-call-used-regs in gcc 
 manual, and might try it on that specific target, but the default 
 implementation is not correct, how to deal this?
>>> 
>>> The point is that we're trying to implement this in a target-independent
>>> way, like for most compiler features.  If the option doesn't work for a
>>> particular target, then that's a bug like any other.  The most we can
>>> reasonably do is:
>>> 
>>> (a) try to implement the feature in a way that uses all the appropriate
>>>   pieces of compiler infrastructure (what we've been discussing)
>>> 
>>> (b) add tests for the feature that run on all targets
>>> 
>>> It's possible that bugs could slip through even then.  But that's true
>>> of anything.
>>> 
>>> Targets like x86 support many subtargets, many different compilation
>>> modes, and many different compiler features (register asms, various
>>> fancy function attributes, etc.).  So even after the option is
>>> committed and is supposedly supported on x86, it's possible that
>>> we'll find a bug in the feature on x86 itself.
>>> 
>>> I don't think anyone would suggest that we should warn the user that the
>>> option might be buggy on x86 (it's initial target).  But I also don't
>>> see any reason for believing that a bug on x86 is less likely than
>>> a bug on other targets.
>> 
>> Okay, then I will add the default implementation as you suggested. And also 
>> provide the overriden optimized implementation on X86. 
>
> For X86, looks like that in addition to stack registers (st0 to st7), mask 
> registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  
> should Not be zeroed too.
>
> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use 
> them in middle end similar as “STACK_REG_P”?

No, those are x86-specific like you say.

Taking each in turn: what is the reason for not clearing mask registers?
And what is the reason for not clearing mm0-7?  In each case, is it a
performance or a correctness issue?

Although the registers themselves are target-specific, the reason
for excluding them might be something that could be exposed to
target-independent code.

As a general comment, with at least three sets of excluded registers,
the “all” in one of the suggested option values is beginning to feel
like a misnomer.  (Maybe that has already been dropped though.)

Thanks,
Richard


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-22 Thread Tom de Vries
On 9/16/20 12:39 PM, Tobias Burnus wrote:
> Hi Tom, hi Richard, hello all,
> 
> @Richard: does it look okay from the ME side?
> @Tom: Can you check which IFN_GOMP_SIMT should be
> excluded with -ftracer?
> 
> Pre-remark, I do not know much about SIMT – except that they
> only appear with nvptx and somehow relate to lanes on the
> GPU.
> 
> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
> which happens via -ftracer for this testcase, the result is wrong.
> 
> The testcase ignores all the loop work but via "lastprivate" takes
> the upper loop bound (as assigned to the loop indices); instead of
> the expected 32*32 = 1024, either some number (like 4 or very large
> or negative) is returned.
> 
> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
> have the feeling that at least GOMP_SIMT_LAST_LANE should be
> not copied - but I might be wrong.
> 
> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
> that list – or any of the following added as well?
> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
> GOMP_SIMT_XCHG_IDX
> 
> OK for mainline?
> 

You can commit the test-case bit as obvious.

Thanks,
- Tom

> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/pr66199-5.f90: Make stop codes unique.

> diff --git a/libgomp/testsuite/libgomp.fortran/pr66199-5.f90 
> b/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> index 9482f08..2627a81 100644
> --- a/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> +++ b/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> @@ -67,5 +67,5 @@ program main
>if (f1 (0, 1024) /= 1024) stop 1
>if (f2 (0, 1024, 17) /= 1024 + (17 + 5 * 1023)) stop 2
>if (f3 (0, 32, 0, 32) /= 64) stop 3
> -  if (f4 (0, 32, 0, 32) /= 64) stop 3
> +  if (f4 (0, 32, 0, 32) /= 64) stop 4
>  end



[PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-22 Thread Tom de Vries
[ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
with SIMT LANE [PR95654] ]

On 9/16/20 8:20 PM, Alexander Monakov wrote:
> 
> 
> On Wed, 16 Sep 2020, Tom de Vries wrote:
> 
>> [ cc-ing author omp support for nvptx. ]
> 
> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> recognized it too for their GPU targets). In an attempt to get agreement
> to fix the issue "properly" for GCC I found a similar issue that affects
> all targets, not just offloading, and filed it as PR 80053.
> 
> (yes, there are no addressable labels involved in offloading, but nevertheless
> the nature of the middle-end issue is related)

Hi Alexander,

thanks for looking into this.

Seeing that the attempt to fix things properly is stalled, for now I'm
proposing a point-fix, similar to the original patch proposed by Tobias.

Richi, Jakub, OK for trunk?

Thanks,
- Tom

[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator,
we run into:
...
FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| v
| *
v bb8
*<*
bb5(VOTE_ANY)
*-+
| |
| |
| |
| |
| v
| *
v bb7(XCHG_IDX)
*<*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| \
|  \
v   v
*   *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*   *
|\ /|
| \  ++ |
|  \/   |
|  /\   |
| /  +--v
|/  *
v   bb7(XCHG_IDX)
*<--*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC and EXIT,
effectively treating the SIMT region conservatively.

One could argue that the EXIT and VOTE_ANY can be generated by omp-low in
reverse order, in which case the VOTE_ANY could be duplicated.  This is the
reason VOTE_ANY is not explicitly listed as ignored in this patch.

An argument can also be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC
	and GOMP_SIMT_EXIT.

---
 gcc/tracer.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 82ede722534..de80416f163 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -108,6 +108,16 @@ ignore_bb_p (const_basic_block bb)
 	return true;
 }
 
+  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
+   !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *g = gsi_stmt (gsi);
+  if (is_gimple_call (g)
+	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)))
+	return true;
+}
+
   return false;
 }
 


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 17, 2020, at 11:27 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao mailto:qing.z...@oracle.com>> writes:
 On Sep 17, 2020, at 1:17 AM, Richard Sandiford  
 wrote:
 
 Qing Zhao  writes:
> Segher and Richard, 
> 
> Now there are two major concerns from the discussion so far:
> 
> 1. (From Richard):  Inserting zero insns should be done after 
> pass_thread_prologue_and_epilogue since later passes (for example, 
> pass_regrename) might introduce new used caller-saved registers. 
>So, we should do this in the beginning of pass_late_compilation (some 
> targets wouldn’t cope with doing it later). 
> 
> 2. (From Segher): The inserted zero insns should stay together with the 
> return, no other insns should move in-between zero insns and return 
> insns. Otherwise, a valid gadget could be formed. 
> 
> I think that both of the above 2 concerns are important and should be 
> addressed for the correct implementation. 
> 
> In order to support 1,  we cannot implementing it in 
> “targetm.gen_return()” and “targetm.gen_simple_return()”  since 
> “targetm.gen_return()” and “targetm.gen_simple_return()” are called 
> during pass_thread_prologue_and_epilogue, at that time, the use 
> information still not correct. 
> 
> In order to support 2, enhancing EPILOGUE_USES to include the zeroed 
> registgers is NOT enough to prevent all the zero insns from moving around.
 
 Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
 being deleted as dead.
 
> More restrictions need to be added to these new zero insns.  (I think 
> that marking these new zeroed registers as “unspec_volatile” at RTL level 
> is necessary to prevent them from deleting from moving around). 
> 
> 
> So, based on the above, I propose the following approach that will 
> resolve the above 2 concerns:
> 
> 1. Add 2 new target hooks:
>  A. targetm.pro_epilogue_use (reg)
>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use 
> to
>  prevent deleting register setting instructions in prologue and epilogue.
> 
>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>  This hook will gen a sequence of zeroing insns that zero the registers 
> that specified in NEED_ZEROED_HARDREGS.
> 
>   A default handler of “gen_zero_call_used_regs” could be defined in 
> middle end, which use mov insns to zero registers, and then use 
> “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
 
 This sounds like you're going back to using:
 
 (insn 18 16 19 2 (set (reg:SI 1 dx)
   (const_int 0 [0])) "t10.c":11:1 -1
(nil))
 (insn 19 18 20 2 (unspec_volatile [
   (reg:SI 1 dx)
   ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
(nil))
 
 This also doesn't prevent the zeroing from being moved around.  Like the
 EPILOGUE_USES approach, it only prevents the clearing from being removed
 as dead.  I still think that the EPILOGUE_USES approach is the better
 way of doing that.
>>> 
>>> The following is what I see from i386.md: (I didn’t look at how 
>>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>> 
>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>> ;; all of memory.  This blocks insns from being moved across this point.
>> 
>> Heh, it looks like that comment dates back to 1994. :-)
>> 
>> The comment is no longer correct though.  I wasn't around at the time,
>> but I assume the comment was only locally true even then.
>> 
>> If what the comment said was true, then something like:
>> 
>> (define_insn "cld"
>>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>  ""
>>  "cld"
>>  [(set_attr "length" "1")
>>   (set_attr "length_immediate" "0")
>>   (set_attr "modrm" "0")])
>> 
>> would invalidate the entire register file and so would require all values
>> to be spilt to the stack around the CLD.
>
> Okay, thanks for the info. 
> then, what’s the current definition of UNSPEC_VOLATILE? 

I'm not sure it's written down anywhere TBH.  rtl.texi just says:

  @code{unspec_volatile} is used for volatile operations and operations
  that may trap; @code{unspec} is used for other operations.

which seems like a cyclic definition: volatile expressions are defined
to be expressions that are volatile.

But IMO the semantics are that unspec_volatile patterns with a given
set of inputs and outputs act for dataflow purposes like volatile asms
with the same inputs and outputs.  The semantics of asm volatile are
at least slightly more well-defined (if only by example); see extend.texi
for details.  In particular:

  Note that the compiler can move even @code{volatile asm} instructions relative
  to other code, including across jump instructions. For example, on many 
  targets there i

Re: [PATCH] Fix overflow handling in std::align

2020-09-22 Thread Jonathan Wakely via Gcc-patches

On 21/09/20 15:50 +0100, Jonathan Wakely wrote:

On 21/09/20 10:42 -0400, Glen Fernandes via Libstdc++ wrote:

On Mon, Sep 14, 2020 at 5:44 PM Thomas Rodgers  wrote:

On Sep 14, 2020, at 7:30 AM, Ville Voutilainen  wrote:

On Mon, 14 Sep 2020 at 15:49, Glen Fernandes  wrote:

Sounds like a good idea. Updated patch attached.


Looks good to me.


Agree.


Rebased patch on latest changes to bits/align.h.


Oh nice, I was about to do that myself.

I'll get the patch committed today, thanks!


It's still today by my clock, althought it might be broken ;-)

Pushed to master. Thanks for the patch.

N.B. GCC no longer requires updates to the ChangeLog files. Those
files now get auto-generated from the Git commit logs (which still
need to be in the same format, but you don't modify the ChangeLog
directly).




Go patch committed: Use runtime.eqtype for type switches on AIX

2020-09-22 Thread Ian Lance Taylor via Gcc-patches
This patch by Clément Chigot changes the Go frontend to call
runtime.eqtype for non-interface type switches on AIX.  All type
switch clauses must call runtime.eqtype if the linker isn't able to
merge type descriptors pointers. Previously, only interface-type
clauses were doing it.  This is for https://golang.org/issue/39276.
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
bd68301dee0f1fd6419ab7e1e416f724dffe8bc4
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e4f8fac5ab3..a8ba5a35e44 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-a59167c29d6ad2ddf533b3a12b365f72df0e1476
+b24062f0b2e8f6173731d5654afe0addf857270e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/statements.cc b/gcc/go/gofrontend/statements.cc
index a059ee4d0d9..ad898070f6e 100644
--- a/gcc/go/gofrontend/statements.cc
+++ b/gcc/go/gofrontend/statements.cc
@@ -4627,7 +4627,8 @@ Type_case_clauses::Type_case_clause::traverse(Traverse* 
traverse)
 // statements.
 
 void
-Type_case_clauses::Type_case_clause::lower(Type* switch_val_type,
+Type_case_clauses::Type_case_clause::lower(Gogo* gogo,
+  Type* switch_val_type,
   Block* b,
   Temporary_statement* descriptor_temp,
   Unnamed_label* break_label,
@@ -4666,9 +4667,16 @@ Type_case_clauses::Type_case_clause::lower(Type* 
switch_val_type,
   Expression::make_nil(loc),
   loc);
   else if (type->interface_type() == NULL)
-cond = Expression::make_binary(OPERATOR_EQEQ, ref,
-   Expression::make_type_descriptor(type, 
loc),
-   loc);
+   {
+ if (!gogo->need_eqtype())
+   cond = Expression::make_binary(OPERATOR_EQEQ, ref,
+  
Expression::make_type_descriptor(type, loc),
+  loc);
+ else
+   cond = Runtime::make_call(Runtime::EQTYPE, loc, 2,
+ Expression::make_type_descriptor(type, 
loc),
+ ref);
+   }
   else
cond = Runtime::make_call(Runtime::IFACET2IP, loc, 2,
  Expression::make_type_descriptor(type, loc),
@@ -4826,7 +4834,8 @@ Type_case_clauses::check_duplicates() const
 // BREAK_LABEL is the label at the end of the type switch.
 
 void
-Type_case_clauses::lower(Type* switch_val_type, Block* b,
+Type_case_clauses::lower(Gogo* gogo, Type* switch_val_type,
+Block* b,
 Temporary_statement* descriptor_temp,
 Unnamed_label* break_label) const
 {
@@ -4838,7 +4847,7 @@ Type_case_clauses::lower(Type* switch_val_type, Block* b,
++p)
 {
   if (!p->is_default())
-   p->lower(switch_val_type, b, descriptor_temp, break_label,
+   p->lower(gogo, switch_val_type, b, descriptor_temp, break_label,
 &stmts_label);
   else
{
@@ -4850,7 +4859,7 @@ Type_case_clauses::lower(Type* switch_val_type, Block* b,
   go_assert(stmts_label == NULL);
 
   if (default_case != NULL)
-default_case->lower(switch_val_type, b, descriptor_temp, break_label,
+default_case->lower(gogo, switch_val_type, b, descriptor_temp, break_label,
NULL);
 }
 
@@ -4905,7 +4914,7 @@ Type_switch_statement::do_traverse(Traverse* traverse)
 // equality testing.
 
 Statement*
-Type_switch_statement::do_lower(Gogo*, Named_object*, Block* enclosing,
+Type_switch_statement::do_lower(Gogo* gogo, Named_object*, Block* enclosing,
Statement_inserter*)
 {
   const Location loc = this->location();
@@ -4943,7 +4952,7 @@ Type_switch_statement::do_lower(Gogo*, Named_object*, 
Block* enclosing,
   b->add_statement(s);
 
   if (this->clauses_ != NULL)
-this->clauses_->lower(val_type, b, descriptor_temp, this->break_label());
+this->clauses_->lower(gogo, val_type, b, descriptor_temp, 
this->break_label());
 
   s = Statement::make_unnamed_label_statement(this->break_label_);
   b->add_statement(s);
diff --git a/gcc/go/gofrontend/statements.h b/gcc/go/gofrontend/statements.h
index f1c6be9c98a..47092b4912a 100644
--- a/gcc/go/gofrontend/statements.h
+++ b/gcc/go/gofrontend/statements.h
@@ -2089,7 +2089,7 @@ class Type_case_clauses
 
   // Lower to if and goto statements.
   void
-  lower(Type*, Block*, Temporary_statement* descriptor_temp,
+  lower(Gogo*, Type*, Block*, Temporary_statement* descriptor_temp,
Unnamed_label* break_label) const;
 
   // Return true if these clauses may fall through to the statements
@@ -2138,7 +2138,7 @@ class Type_ca

Re: [PATCH v3] c, c++: Implement -Wsizeof-array-div [PR91741]

2020-09-22 Thread Marek Polacek via Gcc-patches
Ping.

On Tue, Sep 15, 2020 at 04:33:05PM -0400, Marek Polacek via Gcc-patches wrote:
> On Tue, Sep 15, 2020 at 09:04:41AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Mon, Sep 14, 2020 at 09:30:44PM -0400, Marek Polacek via Gcc-patches 
> > wrote:
> > > --- a/gcc/c/c-tree.h
> > > +++ b/gcc/c/c-tree.h
> > > @@ -147,6 +147,11 @@ struct c_expr
> > >   etc), so we stash a copy here.  */
> > >source_range src_range;
> > >  
> > > +  /* True iff the sizeof expression was enclosed in parentheses.
> > > + NB: This member is currently only initialized when .original_code
> > > + is a SIZEOF_EXPR.  ??? Add a default constructor to this class.  */
> > > +  bool parenthesized_p;
> > > +
> > >/* Access to the first and last locations within the source spelling
> > >   of this expression.  */
> > >location_t get_start () const { return src_range.m_start; }
> > 
> > I think a magic tree code would be better, c_expr is used in too many places
> > and returned by many functions, so it is copied over and over.
> > Even if you must add it, it would be better to change the struct layout,
> > because right now there are fields: tree, location_t, tree, 2xlocation_t,
> > which means 32-bit gap on 64-bit hosts before the second tree, so the new
> > field would fit in there.  But, if it is mostly uninitialized, it is kind of
> > unclean.
> 
> Ok, here's a version with PAREN_SIZEOF_EXPR.  It doesn't require changes to
> c_expr, but adding a new tree code is always a pain...
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> This patch implements a new warning, -Wsizeof-array-div.  It warns about
> code like
> 
>   int arr[10];
>   sizeof (arr) / sizeof (short);
> 
> where we have a division of two sizeof expressions, where the first
> argument is an array, and the second sizeof does not equal the size
> of the array element.  See e.g. .
> 
> Clang makes it possible to suppress the warning by parenthesizing the
> second sizeof like this:
> 
>   sizeof (arr) / (sizeof (short));
> 
> so I followed suit.  In the C++ FE this was rather easy, because
> finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
> I've added a new tree code, PAREN_SIZEOF_EXPR, to discern between the
> non-() and () versions.
> 
> This warning is enabled by -Wall.  An example of the output:
> 
> x.c:5:23: warning: expression does not compute the number of elements in this 
> array; element type is ‘int’, not ‘short int’ [-Wsizeof-array-div]
> 5 |   return sizeof (arr) / sizeof (short);
>   |  ~^~~~
> x.c:5:25: note: add parentheses around ‘sizeof (short int)’ to silence this 
> warning
> 5 |   return sizeof (arr) / sizeof (short);
>   | ^~
>   | ( )
> x.c:4:7: note: array ‘arr’ declared here
> 4 |   int arr[10];
>   |   ^~~
> 
> gcc/c-family/ChangeLog:
> 
>   PR c++/91741
>   * c-common.c (verify_tree): Handle PAREN_SIZEOF_EXPR.
>   (c_common_init_ts): Likewise.
>   * c-common.def (PAREN_SIZEOF_EXPR): New tree code.
>   * c-common.h (maybe_warn_sizeof_array_div): Declare.
>   * c-warn.c (sizeof_pointer_memaccess_warning): Unwrap NOP_EXPRs.
>   (maybe_warn_sizeof_array_div): New function.
>   * c.opt (Wsizeof-array-div): New option.
> 
> gcc/c/ChangeLog:
> 
>   PR c++/91741
>   * c-parser.c (c_parser_binary_expression): Implement -Wsizeof-array-div.
>   (c_parser_postfix_expression): Set PAREN_SIZEOF_EXPR.
>   (c_parser_expr_list): Handle PAREN_SIZEOF_EXPR like SIZEOF_EXPR.
>   * c-tree.h (char_type_p): Declare.
>   * c-typeck.c (char_type_p): No longer static.
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/91741
>   * typeck.c (cp_build_binary_op): Implement -Wsizeof-array-div.
> 
> gcc/ChangeLog:
> 
>   PR c++/91741
>   * doc/invoke.texi: Document -Wsizeof-array-div.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/91741
>   * c-c++-common/Wsizeof-pointer-div.c: Add dg-warning.
>   * c-c++-common/Wsizeof-array-div1.c: New test.
>   * g++.dg/warn/Wsizeof-array-div1.C: New test.
>   * g++.dg/warn/Wsizeof-array-div2.C: New test.
> ---
>  gcc/c-family/c-common.c   |  2 +
>  gcc/c-family/c-common.def |  3 +
>  gcc/c-family/c-common.h   |  1 +
>  gcc/c-family/c-warn.c | 47 
>  gcc/c-family/c.opt|  5 ++
>  gcc/c/c-parser.c  | 48 ++--
>  gcc/c/c-tree.h|  1 +
>  gcc/c/c-typeck.c  |  2 +-
>  gcc/cp/typeck.c   | 10 +++-
>  gcc/doc/invoke.texi   | 19 +++
>  .../c-c++-common/Wsizeof-array-div1.c | 56 +++
>  .../c-c++-common/Wsizeof-pointer-div

PING [GCC 10] [PATCH] IRA: Don't make a global register eliminable

2020-09-22 Thread H.J. Lu via Gcc-patches
On Fri, Sep 18, 2020 at 10:21 AM H.J. Lu  wrote:
>
> On Thu, Sep 17, 2020 at 3:52 PM Jeff Law  wrote:
> >
> >
> > On 9/16/20 8:46 AM, Richard Sandiford wrote:
> >
> > "H.J. Lu"  writes:
> >
> > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
> >  wrote:
> >
> > Thanks for looking at this.
> >
> > "H.J. Lu"  writes:
> >
> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> > Author: Richard Sandiford 
> > Date:   Wed Oct 2 07:37:10 2019 +
> >
> > [LRA] Don't make eliminable registers live (PR91957)
> >
> > didn't make eliminable registers live which breaks
> >
> > register void *cur_pro asm("reg");
> >
> > where "reg" is an eliminable register.  Make fixed eliminable registers
> > live to fix it.
> >
> > I don't think fixedness itself is the issue here: it's usual for at
> > least some registers involved in eliminations to be fixed registers.
> >
> > I think what makes this case different is instead that cur_pro/ebp
> > is a global register.  But IMO things have already gone wrong if we
> > think that a global register is eliminable.
> >
> > So I wonder if instead we should check global_regs at the beginning of:
> >
> >   for (i = 0; i < fp_reg_count; i++)
> > if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> > HARD_FRAME_POINTER_REGNUM + i))
> >   {
> > SET_HARD_REG_BIT (eliminable_regset,
> >   HARD_FRAME_POINTER_REGNUM + i);
> > if (frame_pointer_needed)
> >   SET_HARD_REG_BIT (ira_no_alloc_regs,
> > HARD_FRAME_POINTER_REGNUM + i);
> >   }
> > else if (frame_pointer_needed)
> >   error ("%s cannot be used in % here",
> >  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> > else
> >   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
> >
> > (ira_setup_eliminable_regset), and handle the global_regs[] case in
> > the same way as the else case, i.e. short-circuiting both of the ifs.
> >
> > Like this?
> >
> > Sorry for the delay.  I was testing this in parallel.
> >
> > Bootstrapped & regression-tested on x86_64-linux-gnu.
> >
> > Thanks,
> > Richard
> >
> >
> > 0001-ira-Fix-elimination-for-global-hard-FPs-PR91957.patch
> >
> > From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001
> > From: "H.J. Lu" 
> > Date: Tue, 15 Sep 2020 06:23:26 -0700
> > Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957]
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > If the hard frame pointer is being used as a global register,
> > we should skip the usual handling for eliminations.  As the
> > comment says, the register cannot in that case be eliminated
> > (or eliminated to) and is already marked live where appropriate.
> >
> > Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
> > The “cannot be used in 'asm' here” message is meant to be for asm
> > statements rather than register asms, and the function that the
> > error is reported against doesn't use asm.
> >
> > gcc/
> > 2020-09-16  Richard Sandiford  
> >
> > PR middle-end/91957
> > * ira.c (ira_setup_eliminable_regset): Skip the special elimination
> > handling of the hard frame pointer if the hard frame pointer is fixed.
> >
> > gcc/testsuite/
> > 2020-09-16  H.J. Lu  
> >Richard Sandiford  
> >
> > PR middle-end/91957
> > * g++.target/i386/pr97054.C: New test.
> > * gcc.target/i386/pr82673.c: Remove redundant extra message.
> >
> > OK
>
> OK for GCC 10 branch?
>
> Thanks.

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554268.html

-- 
H.J.


Re: [PATCH] bpf: use xBPF signed div, mod insns when available

2020-09-22 Thread Jose E. Marchesi via Gcc-patches


Hi Segher!

> On Thu, Sep 17, 2020 at 10:15:30AM -0700, David Faust via Gcc-patches wrote:
>> The 'mod' and 'div' operators in eBPF are unsigned, with no signed
>> counterpart. xBPF adds two new ALU operations, sdiv and smod, for
>> signed division and modulus, respectively. Update bpf.md with
>> 'define_insn' blocks for signed div and mod to use them when targetting
>> xBPF, and add new tests to ensure they are used appropriately.
>
> So why does xBPF have signed versions of the divides?  Is it because it
> is wanted to have it in eBPF eventually?  Is it because the libgcc
> routines are just too slow?  Is it because (the generic) libgcc does not
> trap for MIN_INT / -1 ?  Some other reason?
>
> (I'm just curious; I cannot figure it out :-) )

I don't know if eBPF will be adopting signed division instructions at
some point... judging from what can be read in their documentation, most
probably they will not.

In xBPF we mainly want to avoid the funcall.  Linking of BPF objects is
still an... eer fuzzy area.  Also, we reckon some potential applications
of xbpf (like using it instead of dwarf in the Infinity project) may
find it simpler to have the instructions than to rely on a software
implementation like libgcc's.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Qing Zhao via Gcc-patches
Hi, Hongjiu, 


> On Sep 22, 2020, at 11:31 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches 
>>>  wrote:
>>> 
>>> 
>>> 
 On Sep 21, 2020, at 2:11 PM, Richard Sandiford  
 wrote:
 
 Qing Zhao  writes:
>> But in cases where there is no underlying concept that can sensibly
>> be extracted out, it's OK if targets need to override the default
>> to get correct behaviour.
> 
> Then, on the target that the default code is not right, and we haven’t 
> provide overridden implementation, what should we inform the end user 
> about this?
> The user might see the documentation about -fzero-call-used-regs in gcc 
> manual, and might try it on that specific target, but the default 
> implementation is not correct, how to deal this?
 
 The point is that we're trying to implement this in a target-independent
 way, like for most compiler features.  If the option doesn't work for a
 particular target, then that's a bug like any other.  The most we can
 reasonably do is:
 
 (a) try to implement the feature in a way that uses all the appropriate
  pieces of compiler infrastructure (what we've been discussing)
 
 (b) add tests for the feature that run on all targets
 
 It's possible that bugs could slip through even then.  But that's true
 of anything.
 
 Targets like x86 support many subtargets, many different compilation
 modes, and many different compiler features (register asms, various
 fancy function attributes, etc.).  So even after the option is
 committed and is supposedly supported on x86, it's possible that
 we'll find a bug in the feature on x86 itself.
 
 I don't think anyone would suggest that we should warn the user that the
 option might be buggy on x86 (it's initial target).  But I also don't
 see any reason for believing that a bug on x86 is less likely than
 a bug on other targets.
>>> 
>>> Okay, then I will add the default implementation as you suggested. And also 
>>> provide the overriden optimized implementation on X86. 
>> 
>> For X86, looks like that in addition to stack registers (st0 to st7), mask 
>> registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  
>> should Not be zeroed too.
>> 
>> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use 
>> them in middle end similar as “STACK_REG_P”?
> 
> No, those are x86-specific like you say.
> 
> Taking each in turn: what is the reason for not clearing mask registers?
> And what is the reason for not clearing mm0-7?  In each case, is it a
> performance or a correctness issue?

Could you please provide more information on the above questions? (Why we 
exclude mask registers and mm0-7 registers from ALL on x86?)

thanks.

Qing

> 
> Although the registers themselves are target-specific, the reason
> for excluding them might be something that could be exposed to
> target-independent code.
> 
> As a general comment, with at least three sets of excluded registers,
> the “all” in one of the suggested option values is beginning to feel
> like a misnomer.  (Maybe that has already been dropped though.)
> 
> Thanks,
> Richard



Re: [PATCH] c++: Return only in-scope tparms in keep_template_parm [PR95310]

2020-09-22 Thread Patrick Palka via Gcc-patches
On Mon, 21 Sep 2020, Jason Merrill wrote:

> On 9/19/20 3:49 PM, Patrick Palka wrote:
> > In the testcase below, the dependent specializations iter_reference_t
> > and iter_reference_t share the same tree due to specialization
> > caching.  So when find_template_parameters walks through the
> > requires-expression (as part of normalization), it sees and includes the
> > out-of-scope template parameter F in the list of template parameters
> > it found within the requires-expression (along with Out and N).
> > 
> >  From a correctness perspective this is harmless since the parameter mapping
> > routines only care about the level and index of each parameter, so F is
> > no different from Out in this sense.  (And it's also harmless that two
> > parameters in the parameter mapping have the same level and index.)
> > 
> > But having both Out and F in the parameter mapping is extra work for
> > hash_atomic_constrant, tsubst_parameter_mapping and get_mapped_args; and
> > it also means we print this irrelevant template parameter in the
> > testcase's diagnostics (via pp_cxx_parameter_mapping):
> > 
> >in requirements with ‘Out o’ [with N = (const int&)&a; F = const int*;
> > Out = const int*]
> > 
> > This patch makes keep_template_parm return only in-scope template
> > parameters by looking into ctx_parms for the corresponding in-scope one.
> > 
> > (That we sometimes print irrelevant template parameters in diagnostics is
> > also the subject of PR99 and PR66968, so the above diagnostic issue
> > could likely be fixed in a more general way, but this targeted fix to
> > keep_template_parm is perhaps worthwhile on its own.)
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
> > cmcstl2 and range-v3.  Does this look OK for trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/95310
> > * pt.c (keep_template_parm): Adjust the given template parameter
> > to the corresponding in-scope one from ctx_parms.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/95310
> > * g++.dg/concepts/diagnostic15.C: New test.
> > * g++.dg/cpp2a/concepts-ttp2.C: New test.
> > ---
> >   gcc/cp/pt.c  | 19 +++
> >   gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 
> >   2 files changed, 35 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index fe45de8d796..c2c70ff02b9 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -10550,6 +10550,25 @@ keep_template_parm (tree t, void* data)
> >  BOUND_TEMPLATE_TEMPLATE_PARM itself.  */
> >   t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
> >   +  /* This template parameter might be an argument to a cached dependent
> > + specalization that was formed earlier inside some other template, in
> > which
> > + case the parameter is not among the ones that are in-scope.  Look in
> > + CTX_PARMS to find the corresponding in-scope template parameter and
> > + always return that instead.  */
> > +  tree cparms = ftpi->ctx_parms;
> > +  while (TMPL_PARMS_DEPTH (cparms) > level)
> > +cparms = TREE_CHAIN (cparms);
> > +  gcc_assert (TMPL_PARMS_DEPTH (cparms) == level);
> > +  if (TREE_VEC_LENGTH (TREE_VALUE (cparms)))
> > +{
> > +  t = TREE_VALUE (TREE_VEC_ELT (TREE_VALUE (cparms), index));
> > +  /* As in template_parm_to_arg.  */
> > +  if (TREE_CODE (t) == TYPE_DECL || TREE_CODE (t) == TEMPLATE_DECL)
> > +   t = TREE_TYPE (t);
> > +  else
> > +   t = DECL_INITIAL (t);
> > +}
> 
> This seems like a useful separate function: given a parmlist and a single
> template parm (or index+level), return the corresponding parm from the
> parmlist.  Basically the reverse of canonical_type_parameter.

Sounds good.  Like this?

-- >8 --

gcc/cp/ChangeLog:

PR c++/95310
* pt.c (corresponding_template_parameter): Define.
(keep_template_parm): Use it to adjust the given template
parameter to the corresponding in-scope one from ctx_parms.

gcc/testsuite/ChangeLog:

PR c++/95310
* g++.dg/concepts/diagnostic15.C: New test.
* g++.dg/cpp2a/concepts-ttp2.C: New test.
---
 gcc/cp/pt.c  | 44 
 gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 +++
 2 files changed, 60 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 44ca14afc4e..bec8396f9f4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10244,6 +10244,42 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
   return convert_from_reference (templ);
 }
 
+/* If the set of template parameters PARMS contains a template with
+   the given LEVEL and INDEX, then return this parameter.  Otherwise
+   return NULL_TREE.  */
+
+static tree
+corresponding_template_parameter (tree parms, int level, int index)
+{
+  while (TMPL_PARMS_

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread H.J. Lu via Gcc-patches
On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao  wrote:
>
> Hi, Hongjiu,
>
>
> > On Sep 22, 2020, at 11:31 AM, Richard Sandiford  
> > wrote:
> >
> > Qing Zhao  writes:
> >>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches 
> >>>  wrote:
> >>>
> >>>
> >>>
>  On Sep 21, 2020, at 2:11 PM, Richard Sandiford 
>   wrote:
> 
>  Qing Zhao  writes:
> >> But in cases where there is no underlying concept that can sensibly
> >> be extracted out, it's OK if targets need to override the default
> >> to get correct behaviour.
> >
> > Then, on the target that the default code is not right, and we haven’t 
> > provide overridden implementation, what should we inform the end user 
> > about this?
> > The user might see the documentation about -fzero-call-used-regs in gcc 
> > manual, and might try it on that specific target, but the default 
> > implementation is not correct, how to deal this?
> 
>  The point is that we're trying to implement this in a target-independent
>  way, like for most compiler features.  If the option doesn't work for a
>  particular target, then that's a bug like any other.  The most we can
>  reasonably do is:
> 
>  (a) try to implement the feature in a way that uses all the appropriate
>   pieces of compiler infrastructure (what we've been discussing)
> 
>  (b) add tests for the feature that run on all targets
> 
>  It's possible that bugs could slip through even then.  But that's true
>  of anything.
> 
>  Targets like x86 support many subtargets, many different compilation
>  modes, and many different compiler features (register asms, various
>  fancy function attributes, etc.).  So even after the option is
>  committed and is supposedly supported on x86, it's possible that
>  we'll find a bug in the feature on x86 itself.
> 
>  I don't think anyone would suggest that we should warn the user that the
>  option might be buggy on x86 (it's initial target).  But I also don't
>  see any reason for believing that a bug on x86 is less likely than
>  a bug on other targets.
> >>>
> >>> Okay, then I will add the default implementation as you suggested. And 
> >>> also provide the overriden optimized implementation on X86.
> >>
> >> For X86, looks like that in addition to stack registers (st0 to st7), mask 
> >> registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  
> >> should Not be zeroed too.
> >>
> >> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use 
> >> them in middle end similar as “STACK_REG_P”?
> >
> > No, those are x86-specific like you say.
> >
> > Taking each in turn: what is the reason for not clearing mask registers?
> > And what is the reason for not clearing mm0-7?  In each case, is it a
> > performance or a correctness issue?
>
> Could you please provide more information on the above questions? (Why we 
> exclude mask registers and mm0-7 registers from ALL on x86?)
>

No particular reason.  You can add them.

H.J.


Re: [PATCH] bpf: use xBPF signed div, mod insns when available

2020-09-22 Thread Jose E. Marchesi via Gcc-patches


 The 'mod' and 'div' operators in eBPF are unsigned, with no signed
 counterpart. xBPF adds two new ALU operations, sdiv and smod, for
 signed division and modulus, respectively. Update bpf.md with
 'define_insn' blocks for signed div and mod to use them when targetting
 xBPF, and add new tests to ensure they are used appropriately.

 2020-09-17  David Faust  

 gcc/
* config/bpf/bpf.md: Add defines for signed div and mod operators.

 gcc/testsuite/
* gcc.target/bpf/diag-sdiv.c: New test.
* gcc.target/bpf/diag-smod.c: New test.
* gcc.target/bpf/xbpf-sdiv-1.c: New test.
* gcc.target/bpf/xbpf-smod-1.c: New test.
>>>
>>> OK.  But give Jose 48hrs before committing just in case there's
>>> something he wants to comment on.  Y'all are far more familiar with bpf
>>> than I ;-)
>> 
>> Looks good to me! :)
>> But the related pending patch in binutils should go in first.
>> 
> Hi!
>
> I just checked in the binutils patch, but I don't actually have write
> access for gcc.

Just did it on your behalf.
Thanks!


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
David,
with jit I get the following:
/usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed: nonrepresentable 
section on output
collect2: error: ld returned 1 exit status
make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1] Error

Is there a fix/workaround?
Patch I am trying to test/debug is attached, it fixes the selftest issue
and the destructor.

Honza

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e37dee67fa3..a84508a2268 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+namespace selftest {
 
 static void
 test_insert_search_collapse ()
@@ -156,12 +157,14 @@ test_merge ()
 
 
 void
-modref_tree_c_tests ()
+ipa_modref_tree_c_tests ()
 {
   test_insert_search_collapse ();
   test_merge ();
 }
 
+} // namespace selftest
+
 #endif
 
 void
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..ac7579a9e75 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -767,12 +770,6 @@ class pass_modref : public gimple_opt_pass
 pass_modref (gcc::context *ctxt)
: gimple_opt_pass (pass_data_modref, ctxt) {}
 
-~pass_modref ()
-  {
-   ggc_delete (summaries);
-   summaries = NULL;
-  }
-
 /* opt_pass methods: */
 opt_pass *clone ()
 {
@@ -1373,4 +1370,14 @@ unsigned int pass_ipa_modref::execute (function *)
   return 0;
 }
 
+/* Summaries must stay alive until end of compilation.  */
+
+void
+ipa_modref_c_finalize ()
+{
+  if (summaries)
+ggc_delete (summaries);
+  summaries = NULL;
+}
+
 #include "gt-ipa-modref.h"
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 6f979200cc2..6cccdfe7af3 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -44,5 +44,6 @@ struct GTY(()) modref_summary
 };
 
 modref_summary *get_modref_function_summary (cgraph_node *func);
+void ipa_modref_c_finalize ();
 
 #endif
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index f0a81d43fd6..7a89b2df5bd 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -90,6 +90,7 @@ selftest::run_tests ()
   read_rtl_function_c_tests ();
   digraph_cc_tests ();
   tristate_cc_tests ();
+  ipa_modref_tree_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 5cffa13aedd..6c6c7f28675 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -268,6 +268,7 @@ extern void vec_perm_indices_c_tests ();
 extern void wide_int_cc_tests ();
 extern void opt_proposer_c_tests ();
 extern void dbgcnt_c_tests ();
+extern void ipa_modref_tree_c_tests ();
 
 extern int num_passes;
 
diff --git a/gcc/toplev.c b/gcc/toplev.c
index cdd4b5b4f92..a4cb8bb262e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -84,6 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dump-context.h"
 #include "print-tree.h"
 #include "optinfo-emit-json.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -2497,6 +2499,7 @@ toplev::finalize (void)
   /* Needs to be called before cgraph_c_finalize since it uses symtab.  */
   ipa_reference_c_finalize ();
   ipa_fnsummary_c_finalize ();
+  ipa_modref_c_finalize ();
 
   cgraph_c_finalize ();
   cgraphunit_c_finalize ();


Re: [PATCH] c++: Return only in-scope tparms in keep_template_parm [PR95310]

2020-09-22 Thread Patrick Palka via Gcc-patches
On Tue, 22 Sep 2020, Patrick Palka wrote:

> On Mon, 21 Sep 2020, Jason Merrill wrote:
> 
> > On 9/19/20 3:49 PM, Patrick Palka wrote:
> > > In the testcase below, the dependent specializations iter_reference_t
> > > and iter_reference_t share the same tree due to specialization
> > > caching.  So when find_template_parameters walks through the
> > > requires-expression (as part of normalization), it sees and includes the
> > > out-of-scope template parameter F in the list of template parameters
> > > it found within the requires-expression (along with Out and N).
> > > 
> > >  From a correctness perspective this is harmless since the parameter 
> > > mapping
> > > routines only care about the level and index of each parameter, so F is
> > > no different from Out in this sense.  (And it's also harmless that two
> > > parameters in the parameter mapping have the same level and index.)
> > > 
> > > But having both Out and F in the parameter mapping is extra work for
> > > hash_atomic_constrant, tsubst_parameter_mapping and get_mapped_args; and
> > > it also means we print this irrelevant template parameter in the
> > > testcase's diagnostics (via pp_cxx_parameter_mapping):
> > > 
> > >in requirements with ‘Out o’ [with N = (const int&)&a; F = const int*;
> > > Out = const int*]
> > > 
> > > This patch makes keep_template_parm return only in-scope template
> > > parameters by looking into ctx_parms for the corresponding in-scope one.
> > > 
> > > (That we sometimes print irrelevant template parameters in diagnostics is
> > > also the subject of PR99 and PR66968, so the above diagnostic issue
> > > could likely be fixed in a more general way, but this targeted fix to
> > > keep_template_parm is perhaps worthwhile on its own.)
> > > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
> > > cmcstl2 and range-v3.  Does this look OK for trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/95310
> > >   * pt.c (keep_template_parm): Adjust the given template parameter
> > >   to the corresponding in-scope one from ctx_parms.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/95310
> > >   * g++.dg/concepts/diagnostic15.C: New test.
> > >   * g++.dg/cpp2a/concepts-ttp2.C: New test.
> > > ---
> > >   gcc/cp/pt.c  | 19 +++
> > >   gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 
> > >   2 files changed, 35 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C
> > > 
> > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > > index fe45de8d796..c2c70ff02b9 100644
> > > --- a/gcc/cp/pt.c
> > > +++ b/gcc/cp/pt.c
> > > @@ -10550,6 +10550,25 @@ keep_template_parm (tree t, void* data)
> > >  BOUND_TEMPLATE_TEMPLATE_PARM itself.  */
> > >   t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
> > >   +  /* This template parameter might be an argument to a cached dependent
> > > + specalization that was formed earlier inside some other template, in
> > > which
> > > + case the parameter is not among the ones that are in-scope.  Look in
> > > + CTX_PARMS to find the corresponding in-scope template parameter and
> > > + always return that instead.  */
> > > +  tree cparms = ftpi->ctx_parms;
> > > +  while (TMPL_PARMS_DEPTH (cparms) > level)
> > > +cparms = TREE_CHAIN (cparms);
> > > +  gcc_assert (TMPL_PARMS_DEPTH (cparms) == level);
> > > +  if (TREE_VEC_LENGTH (TREE_VALUE (cparms)))
> > > +{
> > > +  t = TREE_VALUE (TREE_VEC_ELT (TREE_VALUE (cparms), index));
> > > +  /* As in template_parm_to_arg.  */
> > > +  if (TREE_CODE (t) == TYPE_DECL || TREE_CODE (t) == TEMPLATE_DECL)
> > > + t = TREE_TYPE (t);
> > > +  else
> > > + t = DECL_INITIAL (t);
> > > +}
> > 
> > This seems like a useful separate function: given a parmlist and a single
> > template parm (or index+level), return the corresponding parm from the
> > parmlist.  Basically the reverse of canonical_type_parameter.
> 
> Sounds good.  Like this?
> 
> -- >8 --
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/95310
>   * pt.c (corresponding_template_parameter): Define.
>   (keep_template_parm): Use it to adjust the given template
>   parameter to the corresponding in-scope one from ctx_parms.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/95310
>   * g++.dg/concepts/diagnostic15.C: New test.
>   * g++.dg/cpp2a/concepts-ttp2.C: New test.

Whoops, consider this stray ChangeLog line removed.  diagnostic15.C is
the only new test.

> ---
>  gcc/cp/pt.c  | 44 
>  gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 +++
>  2 files changed, 60 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 44ca14afc4e..bec8396f9f4 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -10244,6 +10244,42 @@ lookup_and_finish_template_variable (tree templ, 
> 

Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Sun, 2020-09-20 at 19:30 +0200, Jan Hubicka wrote:
> > > 
> 
> [...]
> 
> > > Should new C++ source files have a .cc suffix, rather than .c?
> > > 
> > > [...]
> > > 
> > > > +  $(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.c \
> > > 
> > > ...which would affect this^
> > 
> > I was wondering about that and decided to stay with .c since it is
> > what
> > other ipa passes use.  I can rename the files. 
> 
> Given that they're in the source tree now, maybe better to wait until
> some mass renaming in the future?

At the same time, I am only having patches against it, and I have no
problem update the name.
> 
> > We have some sources with
> > .c extension and others with .cc while they are now all C++. Is there
> > some plan to clean it up?
> 
> I think we've been avoiding it, partly out of a concern of making
> backports harder, and also because someone has to do the work.
> 
> That said, it's yet another unfinished transition, and is technical
> debt for the project.  It's confusing to newcomers.
> 
> It's been bugging me for a while, so I might take a look at doing it in
> this cycle.

Agreed. It would be nice to do the mass rename and at the same time make
a sane directory structure so newcomers can locate optimization passes
and other components.

David Cepelik was my student so he may have some feedback about what he
found hard.  I think main stoppers was the garbage collector (since he
decided to do the template for modref tree) and the flat includes that
breaks if you do not do them in right order and resolve all
dependencies.

Honza
> 
> Dave
> 


[committed] analyzer: add -fdump-analyzer-json

2020-09-22 Thread David Malcolm via Gcc-patches
I've found this useful for debugging state explosions in the analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 809192e77e6e112a0fe32dee7fada7a49fbf25cd.

gcc/analyzer/ChangeLog:
* analysis-plan.cc: Include "json.h".
* analyzer.opt (fdump-analyzer-json): New.
* call-string.cc: Include "json.h".
(call_string::to_json): New.
* call-string.h (call_string::to_json): New decl.
* checker-path.cc: Include "json.h".
* constraint-manager.cc: Include "json.h".
(equiv_class::to_json): New.
(constraint::to_json): New.
(constraint_manager::to_json): New.
* constraint-manager.h (equiv_class::to_json): New decl.
(constraint::to_json): New decl.
(constraint_manager::to_json): New decl.
* diagnostic-manager.cc: Include "json.h".
(saved_diagnostic::to_json): New.
(diagnostic_manager::to_json): New.
* diagnostic-manager.h (saved_diagnostic::to_json): New decl.
(diagnostic_manager::to_json): New decl.
* engine.cc: Include "json.h", .
(exploded_node::status_to_str): New.
(exploded_node::to_json): New.
(exploded_edge::to_json): New.
(exploded_graph::to_json): New.
(dump_analyzer_json): New.
(impl_run_checkers): Call it.
* exploded-graph.h (exploded_node::status_to_str): New decl.
(exploded_node::to_json): New.
(exploded_edge::to_json): New.
(exploded_graph::to_json): New.
* pending-diagnostic.cc: Include "json.h".
* program-point.cc: Include "json.h".
(program_point::to_json): New.
* program-point.h (program_point::to_json): New decl.
* program-state.cc: Include "json.h".
(extrinsic_state::to_json): New.
(sm_state_map::to_json): New.
(program_state::to_json): New.
* program-state.h (extrinsic_state::to_json): New decl.
(sm_state_map::to_json): New decl.
(program_state::to_json): New decl.
* region-model-impl-calls.cc: Include "json.h".
* region-model-manager.cc: Include "json.h".
* region-model-reachability.cc: Include "json.h".
* region-model.cc: Include "json.h".
* region-model.h (svalue::to_json): New decl.
(region::to_json): New decl.
* region.cc: Include "json.h".
(region::to_json: New.
* sm-file.cc: Include "json.h".
* sm-malloc.cc: Include "json.h".
* sm-pattern-test.cc: Include "json.h".
* sm-sensitive.cc: Include "json.h".
* sm-signal.cc: Include "json.h".
(signal_delivery_edge_info_t::to_json): New.
* sm-taint.cc: Include "json.h".
* sm.cc: Include "diagnostic.h", "tree-diagnostic.h", and
"json.h".
(state_machine::state::to_json): New.
(state_machine::to_json): New.
* sm.h (state_machine::state::to_json): New.
(state_machine::to_json): New.
* state-purge.cc: Include "json.h".
* store.cc: Include "json.h".
(binding_key::get_desc): New.
(binding_map::to_json): New.
(binding_cluster::to_json): New.
(store::to_json): New.
* store.h (binding_key::get_desc): New decl.
(binding_map::to_json): New decl.
(binding_cluster::to_json): New decl.
(store::to_json): New decl.
* supergraph.cc: Include "json.h".
(supergraph::to_json): New.
(supernode::to_json): New.
(superedge::to_json): New.
* supergraph.h (supergraph::to_json): New decl.
(supernode::to_json): New decl.
(superedge::to_json): New decl.
* svalue.cc: Include "json.h".
(svalue::to_json): New.

gcc/ChangeLog:
* doc/analyzer.texi (Other Debugging Techniques): Mention
-fdump-analyzer-json.
* doc/invoke.texi (Static Analyzer Options): Add
-fdump-analyzer-json.
---
 gcc/analyzer/analysis-plan.cc |   1 +
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/call-string.cc   |  29 +
 gcc/analyzer/call-string.h|   2 +
 gcc/analyzer/checker-path.cc  |   1 +
 gcc/analyzer/constraint-manager.cc|  77 
 gcc/analyzer/constraint-manager.h |   6 +
 gcc/analyzer/diagnostic-manager.cc|  58 +
 gcc/analyzer/diagnostic-manager.h |   4 +
 gcc/analyzer/engine.cc| 146 ++
 gcc/analyzer/exploded-graph.h |   7 ++
 gcc/analyzer/pending-diagnostic.cc|   1 +
 gcc/analyzer/program-point.cc |  38 ++
 gcc/analyzer/program-point.h  |   2 +
 gcc/analyzer/program-state.cc |  85 +
 gcc/analyzer/program-state.h  |   6 +
 gcc/analyzer/region-model-impl-calls.cc   |   1 +
 gcc/analyzer/region-model-manager.cc  |   1 +
 gcc/analyzer/region-model-reachability.cc |   1 +

[committed] libstdc++: Fix out-of-bounds string_view access in filesystem::path [PR 97167]

2020-09-22 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/97167
* src/c++17/fs_path.cc (path::_Parser::root_path()): Check
for empty string before inspecting the first character.
* testsuite/27_io/filesystem/path/append/source.cc: Append
empty string_view to path.

Tested powerpc64le-linux. Committed to trunk.

This needs to be backported to gcc-9 and gcc-10 too.

commit 49ff88bd0d8a36a9e903f01ce05685cfe07dee5d
Author: Jonathan Wakely 
Date:   Tue Sep 22 20:02:58 2020

libstdc++: Fix out-of-bounds string_view access in filesystem::path [PR 
97167]

libstdc++-v3/ChangeLog:

PR libstdc++/97167
* src/c++17/fs_path.cc (path::_Parser::root_path()): Check
for empty string before inspecting the first character.
* testsuite/27_io/filesystem/path/append/source.cc: Append
empty string_view to path.

diff --git a/libstdc++-v3/src/c++17/fs_path.cc 
b/libstdc++-v3/src/c++17/fs_path.cc
index cea7aa08601..6e907b1c54d 100644
--- a/libstdc++-v3/src/c++17/fs_path.cc
+++ b/libstdc++-v3/src/c++17/fs_path.cc
@@ -81,7 +81,7 @@ struct path::_Parser
 const size_t len = input.size();
 
 // look for root name or root directory
-if (is_dir_sep(input[0]))
+if (len && is_dir_sep(input[0]))
   {
 #if SLASHSLASH_IS_ROOTNAME
// look for root name, such as "//foo"
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
index 2fceee9b774..dc7331945fe 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
@@ -161,6 +161,15 @@ test06()
   test(p2, s.c_str());
 }
 
+void
+test07()
+{
+  path p, p0;
+  std::string_view s;
+  p /= s; // PR libstdc++/97167
+  compare_paths(p, p0);
+}
+
 int
 main()
 {
@@ -170,4 +179,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }


c++: Remove a broken error-recovery path

2020-09-22 Thread Nathan Sidwell


The remaining use of xref_tag_from_type was also suspicious.  It turns
out to be an error path.  At parse time we diagnose that a class
definition cannot appear, but we swallow the definition.  This code
was attempting to push it into the global scope (or find a conflict).
This seems needless, just return error_mark_node.  This was the
simpler fix than going through the parser and figuring out how to get
it to put in error_mark_node at the right point.

gcc/cp/
* cp-tree.h (xref_tag_from_type): Don't declare.
* decl.c (xref_tag_from_type): Delete.
* pt.c (lookup_template_class_1): Erroneously located class
definitions just give error_mark, don't try and inject it into the
namespace.

pushed to trunk

--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 71353814973..029a165a3e8 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6502,7 +6502,6 @@ extern void grok_special_member_properties	(tree);
 extern bool grok_ctor_properties		(const_tree, const_tree);
 extern bool grok_op_properties			(tree, bool);
 extern tree xref_tag(enum tag_types, tree, tag_scope, bool);
-extern tree xref_tag_from_type			(tree, tree, tag_scope);
 extern void xref_basetypes			(tree, tree);
 extern tree start_enum(tree, tree, tree, tree, bool, bool *);
 extern void finish_enum_value_list		(tree);
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index bbecebe7a62..f3fdfe3d896 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -15120,23 +15120,6 @@ xref_tag (enum tag_types tag_code, tree name,
   return ret;
 }
 
-
-tree
-xref_tag_from_type (tree old, tree id, tag_scope scope)
-{
-  enum tag_types tag_kind;
-
-  if (TREE_CODE (old) == RECORD_TYPE)
-tag_kind = (CLASSTYPE_DECLARED_CLASS (old) ? class_type : record_type);
-  else
-tag_kind  = union_type;
-
-  if (id == NULL_TREE)
-id = TYPE_IDENTIFIER (old);
-
-  return xref_tag (tag_kind, id, scope, false);
-}
-
 /* Create the binfo hierarchy for REF with (possibly NULL) base list
BASE_LIST.  For each element on BASE_LIST the TREE_PURPOSE is an
access_* node, and the TREE_VALUE is the type of the base-class.
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 44ca14afc4e..69946da09bf 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -9856,12 +9856,11 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 	  && !PRIMARY_TEMPLATE_P (gen_tmpl)
 	  && !LAMBDA_TYPE_P (TREE_TYPE (gen_tmpl))
 	  && TREE_CODE (CP_DECL_CONTEXT (gen_tmpl)) == NAMESPACE_DECL)
-	{
-	  found = xref_tag_from_type (TREE_TYPE (gen_tmpl),
-  DECL_NAME (gen_tmpl),
-  /*tag_scope=*/ts_global);
-	  return found;
-	}
+	/* This occurs when the user has tried to define a tagged type
+	   in a scope that forbids it.  We emitted an error during the
+	   parse.  We didn't complete the bail out then, so here we
+	   are.  */
+	return error_mark_node;
 
   context = DECL_CONTEXT (gen_tmpl);
   if (context && TYPE_P (context))


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Qing Zhao via Gcc-patches



> On Sep 22, 2020, at 1:35 PM, H.J. Lu  wrote:
> 
> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao  > wrote:
>> 
>> Hi, Hongjiu,
>> 
>> 
>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford 
>>  wrote:
>> 
>> Qing Zhao  writes:
 But in cases where there is no underlying concept that can sensibly
 be extracted out, it's OK if targets need to override the default
 to get correct behaviour.
>>> 
>>> Then, on the target that the default code is not right, and we haven’t 
>>> provide overridden implementation, what should we inform the end user 
>>> about this?
>>> The user might see the documentation about -fzero-call-used-regs in gcc 
>>> manual, and might try it on that specific target, but the default 
>>> implementation is not correct, how to deal this?
>> 
>> The point is that we're trying to implement this in a target-independent
>> way, like for most compiler features.  If the option doesn't work for a
>> particular target, then that's a bug like any other.  The most we can
>> reasonably do is:
>> 
>> (a) try to implement the feature in a way that uses all the appropriate
>> pieces of compiler infrastructure (what we've been discussing)
>> 
>> (b) add tests for the feature that run on all targets
>> 
>> It's possible that bugs could slip through even then.  But that's true
>> of anything.
>> 
>> Targets like x86 support many subtargets, many different compilation
>> modes, and many different compiler features (register asms, various
>> fancy function attributes, etc.).  So even after the option is
>> committed and is supposedly supported on x86, it's possible that
>> we'll find a bug in the feature on x86 itself.
>> 
>> I don't think anyone would suggest that we should warn the user that the
>> option might be buggy on x86 (it's initial target).  But I also don't
>> see any reason for believing that a bug on x86 is less likely than
>> a bug on other targets.
> 
> Okay, then I will add the default implementation as you suggested. And 
> also provide the overriden optimized implementation on X86.
 
 For X86, looks like that in addition to stack registers (st0 to st7), mask 
 registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  
 should Not be zeroed too.
 
 As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use 
 them in middle end similar as “STACK_REG_P”?
>>> 
>>> No, those are x86-specific like you say.
>>> 
>>> Taking each in turn: what is the reason for not clearing mask registers?
>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>> performance or a correctness issue?
>> 
>> Could you please provide more information on the above questions? (Why we 
>> exclude mask registers and mm0-7 registers from ALL on x86?)
>> 
> 
> No particular reason.  You can add them.

Okay, thanks.

Then I guess that the reason we didn’t zero mask registers and mm0-7 registers 
on x86  is mainly for the performance consideration.
There might not be too much benefit for mitigating ROP attack if we zero these 
additional registers, but we will got much more performance overhead.

What’s you opinion, Richard?

Qing




> 
> H.J.



Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-09-22 Thread Martin Sebor via Gcc-patches

The rebased and retested patches are attached.

On 9/21/20 3:17 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html

(I'm working on rebasing the patch on top of the latest trunk which
has changed some of the same code but it'd be helpful to get a go-
ahead on substance the changes.  I don't expect the rebase to
require any substantive modifications.)

Martin

On 9/14/20 4:01 PM, Martin Sebor wrote:

On 9/4/20 11:14 AM, Jason Merrill wrote:

On 9/3/20 2:44 PM, Martin Sebor wrote:

On 9/1/20 1:22 PM, Jason Merrill wrote:

On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.


The C++ changes are OK.


Thank you for looking at the rest as well.




-compute_objsize (tree ptr, int ostype, access_ref *pref,
-    bitmap *visited, const vr_values *rvals /* = NULL 
*/)
+compute_objsize (tree ptr, int ostype, access_ref *pref, bitmap 
*visited,

+    const vr_values *rvals)


This reformatting seems unnecessary, and I prefer to keep the 
comment about the default argument.


This overload doesn't take a default argument.  (There was a stray
declaration of a similar function at the top of the file that had
one.  I've removed it.)


Ah, true.


-  if (!size || TREE_CODE (size) != INTEGER_CST)
-   return false;

 >...

You change some failure cases in compute_objsize to return success 
with a maximum range, while others continue to return failure.  
This needs commentary about the design rationale.


This is too much for a comment in the code but the background is
this: compute_objsize initially returned the object size as a constant.
Recently, I have enhanced it to return a range to improve warnings for
allocated objects.  With that, a failure can be turned into success by
having the function set the range to that of the largest object.  That
should simplify the function's callers and could even improve
the detection of some invalid accesses.  Once this change is made
it might even be possible to change its return type to void.

The change that caught your eye is necessary to make the function
a drop-in replacement for the C++ front end code which makes this
same assumption.  Without it, a number of test cases that exercise
VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:

   void f (int n)
   {
 char a[n];
 new (a - 1) int ();
   }

Changing any of the other places isn't necessary for existing tests
to pass (and I didn't want to introduce too much churn).  But I do
want to change the rest of the function along the same lines at some
point.


Please do change the other places to be consistent; better to have 
more churn than to leave the function half-updated.  That can be a 
separate patch if you prefer, but let's do it now rather than later.


I've made most of these changes in the other patch (also attached).
I'm quite happy with the result but it turned out to be a lot more
work than either of us expected, mostly due to the amount of testing.

I've left a couple of failing cases in place mainly as reminders
to handle them better (which means I also didn't change the caller
to avoid testing for failures).  I've also added TODO notes with
reminders to handle some of the new codes more completely.




+  special_array_member sam{ };


sam is always set by component_ref_size, so I don't think it's 
necessary to initialize it at the declaration.


I find initializing pass-by-pointer local variables helpful but
I don't insist on it.




@@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
   tree last_type = TREE_TYPE (last);
   if (TREE_CODE (last_type) != ARRAY_TYPE
   || TYPE_SIZE (last_type))
-    return size;
+    return size ? size : TYPE_SIZE_UNIT (type);


This change seems to violate the comment for the function.


By my reading (and writing) the change is covered by the first
sentence:

    Returns the size of the object designated by DECL considering
    its initializer if it either has one or if it would not affect
    its size, ...


OK, I see it now.


It handles a number of cases in Wplacement-new-size.C fail

Re: [PATCH] c++: Return only in-scope tparms in keep_template_parm [PR95310]

2020-09-22 Thread Jason Merrill via Gcc-patches

On 9/22/20 2:41 PM, Patrick Palka wrote:

On Tue, 22 Sep 2020, Patrick Palka wrote:


On Mon, 21 Sep 2020, Jason Merrill wrote:


On 9/19/20 3:49 PM, Patrick Palka wrote:

In the testcase below, the dependent specializations iter_reference_t
and iter_reference_t share the same tree due to specialization
caching.  So when find_template_parameters walks through the
requires-expression (as part of normalization), it sees and includes the
out-of-scope template parameter F in the list of template parameters
it found within the requires-expression (along with Out and N).

  From a correctness perspective this is harmless since the parameter mapping
routines only care about the level and index of each parameter, so F is
no different from Out in this sense.  (And it's also harmless that two
parameters in the parameter mapping have the same level and index.)

But having both Out and F in the parameter mapping is extra work for
hash_atomic_constrant, tsubst_parameter_mapping and get_mapped_args; and
it also means we print this irrelevant template parameter in the
testcase's diagnostics (via pp_cxx_parameter_mapping):

in requirements with ‘Out o’ [with N = (const int&)&a; F = const int*;
Out = const int*]

This patch makes keep_template_parm return only in-scope template
parameters by looking into ctx_parms for the corresponding in-scope one.

(That we sometimes print irrelevant template parameters in diagnostics is
also the subject of PR99 and PR66968, so the above diagnostic issue
could likely be fixed in a more general way, but this targeted fix to
keep_template_parm is perhaps worthwhile on its own.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and range-v3.  Does this look OK for trunk?

gcc/cp/ChangeLog:

PR c++/95310
* pt.c (keep_template_parm): Adjust the given template parameter
to the corresponding in-scope one from ctx_parms.

gcc/testsuite/ChangeLog:

PR c++/95310
* g++.dg/concepts/diagnostic15.C: New test.
* g++.dg/cpp2a/concepts-ttp2.C: New test.
---
   gcc/cp/pt.c  | 19 +++
   gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 
   2 files changed, 35 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fe45de8d796..c2c70ff02b9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10550,6 +10550,25 @@ keep_template_parm (tree t, void* data)
  BOUND_TEMPLATE_TEMPLATE_PARM itself.  */
   t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
   +  /* This template parameter might be an argument to a cached dependent
+ specalization that was formed earlier inside some other template, in
which
+ case the parameter is not among the ones that are in-scope.  Look in
+ CTX_PARMS to find the corresponding in-scope template parameter and
+ always return that instead.  */
+  tree cparms = ftpi->ctx_parms;
+  while (TMPL_PARMS_DEPTH (cparms) > level)
+cparms = TREE_CHAIN (cparms);
+  gcc_assert (TMPL_PARMS_DEPTH (cparms) == level);
+  if (TREE_VEC_LENGTH (TREE_VALUE (cparms)))
+{
+  t = TREE_VALUE (TREE_VEC_ELT (TREE_VALUE (cparms), index));
+  /* As in template_parm_to_arg.  */
+  if (TREE_CODE (t) == TYPE_DECL || TREE_CODE (t) == TEMPLATE_DECL)
+   t = TREE_TYPE (t);
+  else
+   t = DECL_INITIAL (t);
+}


This seems like a useful separate function: given a parmlist and a single
template parm (or index+level), return the corresponding parm from the
parmlist.  Basically the reverse of canonical_type_parameter.


Sounds good.  Like this?

-- >8 --

gcc/cp/ChangeLog:

PR c++/95310
* pt.c (corresponding_template_parameter): Define.
(keep_template_parm): Use it to adjust the given template
parameter to the corresponding in-scope one from ctx_parms.

gcc/testsuite/ChangeLog:

PR c++/95310
* g++.dg/concepts/diagnostic15.C: New test.
* g++.dg/cpp2a/concepts-ttp2.C: New test.


Whoops, consider this stray ChangeLog line removed.  diagnostic15.C is
the only new test.


OK.


---
  gcc/cp/pt.c  | 44 
  gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 +++
  2 files changed, 60 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 44ca14afc4e..bec8396f9f4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10244,6 +10244,42 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
return convert_from_reference (templ);
  }
  
+/* If the set of template parameters PARMS contains a template with


s/template with/template parameter at/


+   the given LEVEL and INDEX, then return this parameter.  Otherwise
+   return NULL_TREE.  */
+
+static tree
+corresponding_template_parameter (tree parms, int level, int index)
+{
+  while (TMPL_PARMS_DEPTH (parms) > level)
+

Re: [PATCH v3] c, c++: Implement -Wsizeof-array-div [PR91741]

2020-09-22 Thread Jason Merrill via Gcc-patches

On 9/22/20 1:29 PM, Marek Polacek wrote:

Ping.


The C++ change is OK.


On Tue, Sep 15, 2020 at 04:33:05PM -0400, Marek Polacek via Gcc-patches wrote:

On Tue, Sep 15, 2020 at 09:04:41AM +0200, Jakub Jelinek via Gcc-patches wrote:

On Mon, Sep 14, 2020 at 09:30:44PM -0400, Marek Polacek via Gcc-patches wrote:

--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -147,6 +147,11 @@ struct c_expr
   etc), so we stash a copy here.  */
source_range src_range;
  
+  /* True iff the sizeof expression was enclosed in parentheses.

+ NB: This member is currently only initialized when .original_code
+ is a SIZEOF_EXPR.  ??? Add a default constructor to this class.  */
+  bool parenthesized_p;
+
/* Access to the first and last locations within the source spelling
   of this expression.  */
location_t get_start () const { return src_range.m_start; }


I think a magic tree code would be better, c_expr is used in too many places
and returned by many functions, so it is copied over and over.
Even if you must add it, it would be better to change the struct layout,
because right now there are fields: tree, location_t, tree, 2xlocation_t,
which means 32-bit gap on 64-bit hosts before the second tree, so the new
field would fit in there.  But, if it is mostly uninitialized, it is kind of
unclean.


Ok, here's a version with PAREN_SIZEOF_EXPR.  It doesn't require changes to
c_expr, but adding a new tree code is always a pain...

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements a new warning, -Wsizeof-array-div.  It warns about
code like

   int arr[10];
   sizeof (arr) / sizeof (short);

where we have a division of two sizeof expressions, where the first
argument is an array, and the second sizeof does not equal the size
of the array element.  See e.g. .

Clang makes it possible to suppress the warning by parenthesizing the
second sizeof like this:

   sizeof (arr) / (sizeof (short));

so I followed suit.  In the C++ FE this was rather easy, because
finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
I've added a new tree code, PAREN_SIZEOF_EXPR, to discern between the
non-() and () versions.

This warning is enabled by -Wall.  An example of the output:

x.c:5:23: warning: expression does not compute the number of elements in this 
array; element type is ‘int’, not ‘short int’ [-Wsizeof-array-div]
 5 |   return sizeof (arr) / sizeof (short);
   |  ~^~~~
x.c:5:25: note: add parentheses around ‘sizeof (short int)’ to silence this 
warning
 5 |   return sizeof (arr) / sizeof (short);
   | ^~
   | ( )
x.c:4:7: note: array ‘arr’ declared here
 4 |   int arr[10];
   |   ^~~

gcc/c-family/ChangeLog:

PR c++/91741
* c-common.c (verify_tree): Handle PAREN_SIZEOF_EXPR.
(c_common_init_ts): Likewise.
* c-common.def (PAREN_SIZEOF_EXPR): New tree code.
* c-common.h (maybe_warn_sizeof_array_div): Declare.
* c-warn.c (sizeof_pointer_memaccess_warning): Unwrap NOP_EXPRs.
(maybe_warn_sizeof_array_div): New function.
* c.opt (Wsizeof-array-div): New option.

gcc/c/ChangeLog:

PR c++/91741
* c-parser.c (c_parser_binary_expression): Implement -Wsizeof-array-div.
(c_parser_postfix_expression): Set PAREN_SIZEOF_EXPR.
(c_parser_expr_list): Handle PAREN_SIZEOF_EXPR like SIZEOF_EXPR.
* c-tree.h (char_type_p): Declare.
* c-typeck.c (char_type_p): No longer static.

gcc/cp/ChangeLog:

PR c++/91741
* typeck.c (cp_build_binary_op): Implement -Wsizeof-array-div.

gcc/ChangeLog:

PR c++/91741
* doc/invoke.texi: Document -Wsizeof-array-div.

gcc/testsuite/ChangeLog:

PR c++/91741
* c-c++-common/Wsizeof-pointer-div.c: Add dg-warning.
* c-c++-common/Wsizeof-array-div1.c: New test.
* g++.dg/warn/Wsizeof-array-div1.C: New test.
* g++.dg/warn/Wsizeof-array-div2.C: New test.
---
  gcc/c-family/c-common.c   |  2 +
  gcc/c-family/c-common.def |  3 +
  gcc/c-family/c-common.h   |  1 +
  gcc/c-family/c-warn.c | 47 
  gcc/c-family/c.opt|  5 ++
  gcc/c/c-parser.c  | 48 ++--
  gcc/c/c-tree.h|  1 +
  gcc/c/c-typeck.c  |  2 +-
  gcc/cp/typeck.c   | 10 +++-
  gcc/doc/invoke.texi   | 19 +++
  .../c-c++-common/Wsizeof-array-div1.c | 56 +++
  .../c-c++-common/Wsizeof-pointer-div.c|  2 +-
  .../g++.dg/warn/Wsizeof-array-div1.C  | 37 
  .../g++.dg/warn/Wsizeof-array-div2.C  | 15 +
  14 files changed, 

Re: New modref/ipa_modref optimization passes

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> David,
> with jit I get the following:
> /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> nonrepresentable section on output
> collect2: error: ld returned 1 exit status
> make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> Error
> 
> Is there a fix/workaround?

I don't recognize that specific error, but googling suggests it may
relate to position-independent code.

Are you configuring with --enable-host-shared ?  This is needed when
enabling "jit" in --enable-languages (but slows down the compiler by a
few percent, which is why "jit" isn't in "all").


> Patch I am trying to test/debug is attached, it fixes the selftest
> issue
> and the destructor.

Thanks.

Sadly it doesn't fix the jit crashes, which are now in bugzilla (as PR
jit/97169).

Without the patch, the jit testcases crash at the end of the 1st in-
process iteration, in the dtor for the the new pass.

With the patch the jit testcases crash inside the 3rd in-process
iteration, invoking a modref_summaries finalizer at whichever GC-
collection point happens first, I think, where the modref_summaries *
seems to be pointing at corrupt data:

(gdb) p *(modref_summaries *)p
$2 = {> =
{> = {
  _vptr.function_summary_base = 0x20001,
m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
  m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
m_insertion_enabled = 112, m_allocator = {m_allocator = {
  m_name = 0x0, m_id = 0, m_elts_per_block = 1,
m_returned_free_list = 0x7afafaf01, 
  m_virgin_free_list = 0xafafafafafaf0001 , 
  m_virgin_elts_remaining = 0, m_elts_allocated =
140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
  m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
m_initialized = false, m_location = {
m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin =
2947481856, m_ggc = false, 
m_vector = 0x0}, ipa = false}

I think this latter crash may be a pre-existing bug in how the jit
interacts with gc finalizers.  I think the finalizers are accumulating
from in-process run to run, leading to chaos, but I need to debug it
some more to be sure.  Alternatively, is there a way that a finalizer
is being registered, and then the object is somehow clobbered without
the finalizer being unregistered from the vec of finalizers?

Dave



Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
Hello,
this patch fixes the selftests and destructor issue noticed by David
(Malcolm).  According to David jit still crashes at different but I am
hitting different build failure in libjit that I will need to analyze
tomorrow.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-modref-tree.c: Add namespace selftest.
(modref_tree_c_tests): Rename to ...
(ipa_modref_tree_c_tests): ... this.
* ipa-modref.c (pass_modref): Remove destructor.
(ipa_modref_c_finalize): New function.
* ipa-modref.h (ipa_modref_c_finalize): Declare.
* selftest-run-tests.c (selftest::run_tests): Call
ipa_modref_c_finalize.
* selftest.h (ipa_modref_tree_c_tests): Declare.
* toplev.c: Include ipa-modref-tree.h and ipa-modref.h
(toplev::finalize): Call ipa_modref_c_finalize.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e37dee67fa3..a84508a2268 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+namespace selftest {
 
 static void
 test_insert_search_collapse ()
@@ -156,12 +157,14 @@ test_merge ()
 
 
 void
-modref_tree_c_tests ()
+ipa_modref_tree_c_tests ()
 {
   test_insert_search_collapse ();
   test_merge ();
 }
 
+} // namespace selftest
+
 #endif
 
 void
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index af0b710333e..ac7579a9e75 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -767,12 +770,6 @@ class pass_modref : public gimple_opt_pass
 pass_modref (gcc::context *ctxt)
: gimple_opt_pass (pass_data_modref, ctxt) {}
 
-~pass_modref ()
-  {
-   ggc_delete (summaries);
-   summaries = NULL;
-  }
-
 /* opt_pass methods: */
 opt_pass *clone ()
 {
@@ -1373,4 +1370,14 @@ unsigned int pass_ipa_modref::execute (function *)
   return 0;
 }
 
+/* Summaries must stay alive until end of compilation.  */
+
+void
+ipa_modref_c_finalize ()
+{
+  if (summaries)
+ggc_delete (summaries);
+  summaries = NULL;
+}
+
 #include "gt-ipa-modref.h"
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 6f979200cc2..6cccdfe7af3 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -44,5 +44,6 @@ struct GTY(()) modref_summary
 };
 
 modref_summary *get_modref_function_summary (cgraph_node *func);
+void ipa_modref_c_finalize ();
 
 #endif
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index f0a81d43fd6..7a89b2df5bd 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -90,6 +90,7 @@ selftest::run_tests ()
   read_rtl_function_c_tests ();
   digraph_cc_tests ();
   tristate_cc_tests ();
+  ipa_modref_tree_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 5cffa13aedd..6c6c7f28675 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -268,6 +268,7 @@ extern void vec_perm_indices_c_tests ();
 extern void wide_int_cc_tests ();
 extern void opt_proposer_c_tests ();
 extern void dbgcnt_c_tests ();
+extern void ipa_modref_tree_c_tests ();
 
 extern int num_passes;
 
diff --git a/gcc/toplev.c b/gcc/toplev.c
index cdd4b5b4f92..a4cb8bb262e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -84,6 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dump-context.h"
 #include "print-tree.h"
 #include "optinfo-emit-json.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -2497,6 +2499,7 @@ toplev::finalize (void)
   /* Needs to be called before cgraph_c_finalize since it uses symtab.  */
   ipa_reference_c_finalize ();
   ipa_fnsummary_c_finalize ();
+  ipa_modref_c_finalize ();
 
   cgraph_c_finalize ();
   cgraphunit_c_finalize ();


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote:
> On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > David,
> > with jit I get the following:
> > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > nonrepresentable section on output
> > collect2: error: ld returned 1 exit status
> > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > Error
> > 
> > Is there a fix/workaround?
> 
> I don't recognize that specific error, but googling suggests it may
> relate to position-independent code.
> 
> Are you configuring with --enable-host-shared ?  This is needed when
> enabling "jit" in --enable-languages (but slows down the compiler by
> a
> few percent, which is why "jit" isn't in "all").
> 
> 
> > Patch I am trying to test/debug is attached, it fixes the selftest
> > issue
> > and the destructor.
> 
> Thanks.
> 
> Sadly it doesn't fix the jit crashes, which are now in bugzilla (as
> PR
> jit/97169).
> 
> Without the patch, the jit testcases crash at the end of the 1st in-
> process iteration, in the dtor for the the new pass.
> 
> With the patch the jit testcases crash inside the 3rd in-process
> iteration, invoking a modref_summaries finalizer at whichever GC-
> collection point happens first, I think, where the modref_summaries *
> seems to be pointing at corrupt data:
> 
> (gdb) p *(modref_summaries *)p
> $2 = {> =
> {> = {
>   _vptr.function_summary_base = 0x20001,
> m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
>   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> m_insertion_enabled = 112, m_allocator = {m_allocator = {
>   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> m_returned_free_list = 0x7afafaf01, 
>   m_virgin_free_list = 0xafafafafafaf0001  access
> memory at address 0xafafafafafaf0001>, 
>   m_virgin_elts_remaining = 0, m_elts_allocated =
> 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
>   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> m_initialized = false, m_location = {
> m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin
> =
> 2947481856, m_ggc = false, 
> m_vector = 0x0}, ipa = false}
> 
> I think this latter crash may be a pre-existing bug in how the jit
> interacts with gc finalizers.  I think the finalizers are
> accumulating
> from in-process run to run, leading to chaos, but I need to debug it
> some more to be sure.  Alternatively, is there a way that a finalizer
> is being registered, and then the object is somehow clobbered without
> the finalizer being unregistered from the vec of finalizers?

Aha: this patch on top of yours seems to fix it, at least for the test
I've been debugging.

Calling gcc_delete on something seems to delete it without removing the
finalizer, leaving the finalizer around to run on whatever the memory
eventually gets reused for, leading to segfaults:

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 4b9c4db4ee9..64d314321cb 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute (function *)
 void
 ipa_modref_c_finalize ()
 {
-  if (summaries)
-ggc_delete (summaries);
   summaries = NULL;
 }
 





Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > David,
> > with jit I get the following:
> > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > nonrepresentable section on output
> > collect2: error: ld returned 1 exit status
> > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > Error
> > 
> > Is there a fix/workaround?
> 
> I don't recognize that specific error, but googling suggests it may
> relate to position-independent code.
> 
> Are you configuring with --enable-host-shared ?  This is needed when
> enabling "jit" in --enable-languages (but slows down the compiler by a
> few percent, which is why "jit" isn't in "all").

Yes --enable-languages=all,jit --enable-host-shared.
I suppose my binutils may show the age, I will check that tomorrow. It
looks like weird error.
> 
> 
> > Patch I am trying to test/debug is attached, it fixes the selftest
> > issue
> > and the destructor.
> 
> Thanks.
> 
> Sadly it doesn't fix the jit crashes, which are now in bugzilla (as PR
> jit/97169).
> 
> Without the patch, the jit testcases crash at the end of the 1st in-
> process iteration, in the dtor for the the new pass.
> 
> With the patch the jit testcases crash inside the 3rd in-process
> iteration, invoking a modref_summaries finalizer at whichever GC-
> collection point happens first, I think, where the modref_summaries *
> seems to be pointing at corrupt data:
> 
> (gdb) p *(modref_summaries *)p
> $2 = {> =
> {> = {
>   _vptr.function_summary_base = 0x20001,
> m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
>   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> m_insertion_enabled = 112, m_allocator = {m_allocator = {
>   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> m_returned_free_list = 0x7afafaf01, 
>   m_virgin_free_list = 0xafafafafafaf0001  memory at address 0xafafafafafaf0001>, 
>   m_virgin_elts_remaining = 0, m_elts_allocated =
> 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
>   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> m_initialized = false, m_location = {
> m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin =
> 2947481856, m_ggc = false, 
> m_vector = 0x0}, ipa = false}
> 
> I think this latter crash may be a pre-existing bug in how the jit
> interacts with gc finalizers.  I think the finalizers are accumulating
> from in-process run to run, leading to chaos, but I need to debug it
> some more to be sure.  Alternatively, is there a way that a finalizer
> is being registered, and then the object is somehow clobbered without
> the finalizer being unregistered from the vec of finalizers?

It looks like released memory. I saw similar problem with ggc calling
finalizer in "wrong" order.  David's modref tree has two layers and
destructor of one was freeing the anohter that is good if you destroy
first the outer type, but not good if you do it in wrong order.
I will try to reproduce it.  I also plan to turn the classes to pods and
put them directly into the vectors.
I should not have allowed David to make a template at first place :)

Honza
> 
> Dave
> 


Re: New modref/ipa_modref optimization passes

2020-09-22 Thread Jan Hubicka
> On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote:
> > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > > David,
> > > with jit I get the following:
> > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > > nonrepresentable section on output
> > > collect2: error: ld returned 1 exit status
> > > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > > Error
> > > 
> > > Is there a fix/workaround?
> > 
> > I don't recognize that specific error, but googling suggests it may
> > relate to position-independent code.
> > 
> > Are you configuring with --enable-host-shared ?  This is needed when
> > enabling "jit" in --enable-languages (but slows down the compiler by
> > a
> > few percent, which is why "jit" isn't in "all").
> > 
> > 
> > > Patch I am trying to test/debug is attached, it fixes the selftest
> > > issue
> > > and the destructor.
> > 
> > Thanks.
> > 
> > Sadly it doesn't fix the jit crashes, which are now in bugzilla (as
> > PR
> > jit/97169).
> > 
> > Without the patch, the jit testcases crash at the end of the 1st in-
> > process iteration, in the dtor for the the new pass.
> > 
> > With the patch the jit testcases crash inside the 3rd in-process
> > iteration, invoking a modref_summaries finalizer at whichever GC-
> > collection point happens first, I think, where the modref_summaries *
> > seems to be pointing at corrupt data:
> > 
> > (gdb) p *(modref_summaries *)p
> > $2 = {> =
> > {> = {
> >   _vptr.function_summary_base = 0x20001,
> > m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = 0x10004, 
> >   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> > m_insertion_enabled = 112, m_allocator = {m_allocator = {
> >   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> > m_returned_free_list = 0x7afafaf01, 
> >   m_virgin_free_list = 0xafafafafafaf0001  > access
> > memory at address 0xafafafafafaf0001>, 
> >   m_virgin_elts_remaining = 0, m_elts_allocated =
> > 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
> >   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> > m_initialized = false, m_location = {
> > m_filename = 0x0, m_function = 0x0, m_line = 1, m_origin
> > =
> > 2947481856, m_ggc = false, 
> > m_vector = 0x0}, ipa = false}
> > 
> > I think this latter crash may be a pre-existing bug in how the jit
> > interacts with gc finalizers.  I think the finalizers are
> > accumulating
> > from in-process run to run, leading to chaos, but I need to debug it
> > some more to be sure.  Alternatively, is there a way that a finalizer
> > is being registered, and then the object is somehow clobbered without
> > the finalizer being unregistered from the vec of finalizers?
> 
> Aha: this patch on top of yours seems to fix it, at least for the test
> I've been debugging.
> 
> Calling gcc_delete on something seems to delete it without removing the
> finalizer, leaving the finalizer around to run on whatever the memory
> eventually gets reused for, leading to segfaults:
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 4b9c4db4ee9..64d314321cb 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute (function *)
>  void
>  ipa_modref_c_finalize ()
>  {
> -  if (summaries)
> -ggc_delete (summaries);
>summaries = NULL;
>  }

Ah, thanks.  That is very odd behaviour of delete indeed.

Honza
>  
> 
> 
> 


[COMMITTED] c++: Add test for PR96652

2020-09-22 Thread Patrick Palka via Gcc-patches
Fixed by r11-3361.

gcc/testsuite/ChangeLog:

PR c++/96652
* g++.dg/cpp0x/decltype-96652.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/decltype-96652.C | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-96652.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-96652.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-96652.C
new file mode 100644
index 000..249cce2b9e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-96652.C
@@ -0,0 +1,14 @@
+// PR c++/96652
+// { dg-do compile { target c++11 } }
+
+struct A {};
+
+template 
+struct B
+{
+  A m;
+  friend decltype(m);
+};
+
+A a;
+B b;
-- 
2.28.0.497.g54e85e7af1



Issue with ggc_delete and finalizers (was Re: New modref/ipa_modref optimization passes)

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 22:24 +0200, Jan Hubicka wrote:
> > On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote:
> > > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > > > David,
> > > > with jit I get the following:
> > > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > > > nonrepresentable section on output
> > > > collect2: error: ld returned 1 exit status
> > > > make[3]: *** [../../gcc/jit/Make-lang.in:121:
> > > > libgccjit.so.0.0.1]
> > > > Error
> > > > 
> > > > Is there a fix/workaround?
> > > 
> > > I don't recognize that specific error, but googling suggests it
> > > may
> > > relate to position-independent code.
> > > 
> > > Are you configuring with --enable-host-shared ?  This is needed
> > > when
> > > enabling "jit" in --enable-languages (but slows down the compiler
> > > by
> > > a
> > > few percent, which is why "jit" isn't in "all").
> > > 
> > > 
> > > > Patch I am trying to test/debug is attached, it fixes the
> > > > selftest
> > > > issue
> > > > and the destructor.
> > > 
> > > Thanks.
> > > 
> > > Sadly it doesn't fix the jit crashes, which are now in bugzilla
> > > (as
> > > PR
> > > jit/97169).
> > > 
> > > Without the patch, the jit testcases crash at the end of the 1st
> > > in-
> > > process iteration, in the dtor for the the new pass.
> > > 
> > > With the patch the jit testcases crash inside the 3rd in-process
> > > iteration, invoking a modref_summaries finalizer at whichever GC-
> > > collection point happens first, I think, where the
> > > modref_summaries *
> > > seems to be pointing at corrupt data:
> > > 
> > > (gdb) p *(modref_summaries *)p
> > > $2 = {> =
> > > {> = {
> > >   _vptr.function_summary_base = 0x20001,
> > > m_symtab_insertion_hook = 0x1, m_symtab_removal_hook =
> > > 0x10004, 
> > >   m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> > > m_insertion_enabled = 112, m_allocator = {m_allocator = {
> > >   m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> > > m_returned_free_list = 0x7afafaf01, 
> > >   m_virgin_free_list = 0xafafafafafaf0001  > > access
> > > memory at address 0xafafafafafaf0001>, 
> > >   m_virgin_elts_remaining = 0, m_elts_allocated =
> > > 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
> > >   m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> > > m_initialized = false, m_location = {
> > > m_filename = 0x0, m_function = 0x0, m_line = 1,
> > > m_origin
> > > =
> > > 2947481856, m_ggc = false, 
> > > m_vector = 0x0}, ipa = false}
> > > 
> > > I think this latter crash may be a pre-existing bug in how the
> > > jit
> > > interacts with gc finalizers.  I think the finalizers are
> > > accumulating
> > > from in-process run to run, leading to chaos, but I need to debug
> > > it
> > > some more to be sure.  Alternatively, is there a way that a
> > > finalizer
> > > is being registered, and then the object is somehow clobbered
> > > without
> > > the finalizer being unregistered from the vec of finalizers?
> > 
> > Aha: this patch on top of yours seems to fix it, at least for the
> > test
> > I've been debugging.
> > 
> > Calling gcc_delete on something seems to delete it without removing
> > the
> > finalizer, leaving the finalizer around to run on whatever the
> > memory
> > eventually gets reused for, leading to segfaults:
> > 
> > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > index 4b9c4db4ee9..64d314321cb 100644
> > --- a/gcc/ipa-modref.c
> > +++ b/gcc/ipa-modref.c
> > @@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute
> > (function *)
> >  void
> >  ipa_modref_c_finalize ()
> >  {
> > -  if (summaries)
> > -ggc_delete (summaries);
> >summaries = NULL;
> >  }
> 
> Ah, thanks.  That is very odd behaviour of delete indeed.

Summarizing what's going on:

We have a use-after-ggc_delete happening with the finalizers code.

analyze_function has:

summaries = new (ggc_alloc  ())
 modref_summaries (symtab);

ggc_alloc (as opposed to ggc_alloc_no_dtor) uses need_finalization_p
and "sees" that the class has a nontrivial dtor, and hence it passes
finalize to ggc_internal_alloc as the "f" callback.

Within ggc_internal_alloc we have:

  if (f)
add_finalizer (result, f, s, n);

and so that callback is registered within G.finalizers - but there's
nothing stored in the object itself to track that finalizer.

Later, in ipa_modref_c_finalize, gcc_delete is called on the
mod_summaries object, but the finalizer is still registered in
G.finalizers with its address.

Later, a GC happens, and if the bit for "marked" on that old
modref_summaries object happens to be cleared (with whatever that
memory is now being used for, if anything), the finalizer callback is
called, and ~modref_summaries is called with its "this" pointing at
random bits, and we have a segfault.

This seems like a big "gotcha" in ggc_delete: it doesn't remove any
finalizer for the object, instead leaving it as a timebomb to happ

Re: [committed] analyzer: add -fdump-analyzer-json

2020-09-22 Thread Tobias Burnus

This patch breaks the cross build here:
...gcc/analyzer/engine.cc:65:10: fatal error: zlib.h: No such file or directory

I think you need to do something similar in Makefile.in as lto-compress has:

# lto-compress.o needs $(ZLIBINC) added to the include flags.
CFLAGS-lto-compress.o += $(ZLIBINC) $(ZSTD_INC)

Tobias

On 9/22/20 8:49 PM, David Malcolm via Gcc-patches wrote:

I've found this useful for debugging state explosions in the analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 809192e77e6e112a0fe32dee7fada7a49fbf25cd.

gcc/analyzer/ChangeLog:
  * analysis-plan.cc: Include "json.h".
  * analyzer.opt (fdump-analyzer-json): New.
  * call-string.cc: Include "json.h".
  (call_string::to_json): New.
  * call-string.h (call_string::to_json): New decl.
  * checker-path.cc: Include "json.h".
  * constraint-manager.cc: Include "json.h".
  (equiv_class::to_json): New.
  (constraint::to_json): New.
  (constraint_manager::to_json): New.
  * constraint-manager.h (equiv_class::to_json): New decl.
  (constraint::to_json): New decl.
  (constraint_manager::to_json): New decl.
  * diagnostic-manager.cc: Include "json.h".
  (saved_diagnostic::to_json): New.
  (diagnostic_manager::to_json): New.
  * diagnostic-manager.h (saved_diagnostic::to_json): New decl.
  (diagnostic_manager::to_json): New decl.
  * engine.cc: Include "json.h", .
  (exploded_node::status_to_str): New.
  (exploded_node::to_json): New.
  (exploded_edge::to_json): New.
  (exploded_graph::to_json): New.
  (dump_analyzer_json): New.
  (impl_run_checkers): Call it.
  * exploded-graph.h (exploded_node::status_to_str): New decl.
  (exploded_node::to_json): New.
  (exploded_edge::to_json): New.
  (exploded_graph::to_json): New.
  * pending-diagnostic.cc: Include "json.h".
  * program-point.cc: Include "json.h".
  (program_point::to_json): New.
  * program-point.h (program_point::to_json): New decl.
  * program-state.cc: Include "json.h".
  (extrinsic_state::to_json): New.
  (sm_state_map::to_json): New.
  (program_state::to_json): New.
  * program-state.h (extrinsic_state::to_json): New decl.
  (sm_state_map::to_json): New decl.
  (program_state::to_json): New decl.
  * region-model-impl-calls.cc: Include "json.h".
  * region-model-manager.cc: Include "json.h".
  * region-model-reachability.cc: Include "json.h".
  * region-model.cc: Include "json.h".
  * region-model.h (svalue::to_json): New decl.
  (region::to_json): New decl.
  * region.cc: Include "json.h".
  (region::to_json: New.
  * sm-file.cc: Include "json.h".
  * sm-malloc.cc: Include "json.h".
  * sm-pattern-test.cc: Include "json.h".
  * sm-sensitive.cc: Include "json.h".
  * sm-signal.cc: Include "json.h".
  (signal_delivery_edge_info_t::to_json): New.
  * sm-taint.cc: Include "json.h".
  * sm.cc: Include "diagnostic.h", "tree-diagnostic.h", and
  "json.h".
  (state_machine::state::to_json): New.
  (state_machine::to_json): New.
  * sm.h (state_machine::state::to_json): New.
  (state_machine::to_json): New.
  * state-purge.cc: Include "json.h".
  * store.cc: Include "json.h".
  (binding_key::get_desc): New.
  (binding_map::to_json): New.
  (binding_cluster::to_json): New.
  (store::to_json): New.
  * store.h (binding_key::get_desc): New decl.
  (binding_map::to_json): New decl.
  (binding_cluster::to_json): New decl.
  (store::to_json): New decl.
  * supergraph.cc: Include "json.h".
  (supergraph::to_json): New.
  (supernode::to_json): New.
  (superedge::to_json): New.
  * supergraph.h (supergraph::to_json): New decl.
  (supernode::to_json): New decl.
  (superedge::to_json): New decl.
  * svalue.cc: Include "json.h".
  (svalue::to_json): New.

gcc/ChangeLog:
  * doc/analyzer.texi (Other Debugging Techniques): Mention
  -fdump-analyzer-json.
  * doc/invoke.texi (Static Analyzer Options): Add
  -fdump-analyzer-json.
---
  gcc/analyzer/analysis-plan.cc |   1 +
  gcc/analyzer/analyzer.opt |   4 +
  gcc/analyzer/call-string.cc   |  29 +
  gcc/analyzer/call-string.h|   2 +
  gcc/analyzer/checker-path.cc  |   1 +
  gcc/analyzer/constraint-manager.cc|  77 
  gcc/analyzer/constraint-manager.h |   6 +
  gcc/analyzer/diagnostic-manager.cc|  58 +
  gcc/analyzer/diagnostic-manager.h |   4 +
  gcc/analyzer/engine.cc| 146 ++
  gcc/analyzer/exploded-graph.h |   7 ++
  gcc/analyzer/pending-diagnostic.cc|   1 +
  gcc/analyzer/program-point.cc |  38 ++
  gcc/analyzer/program-point.h  |   2 +
  gcc/analyzer/program-state.cc

Re: New modref/ipa_modref optimization passes

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 22:23 +0200, Jan Hubicka wrote:
> > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > > David,
> > > with jit I get the following:
> > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > > nonrepresentable section on output
> > > collect2: error: ld returned 1 exit status
> > > make[3]: *** [../../gcc/jit/Make-lang.in:121: libgccjit.so.0.0.1]
> > > Error
> > > 
> > > Is there a fix/workaround?
> > 
> > I don't recognize that specific error, but googling suggests it may
> > relate to position-independent code.
> > 
> > Are you configuring with --enable-host-shared ?  This is needed
> > when
> > enabling "jit" in --enable-languages (but slows down the compiler
> > by a
> > few percent, which is why "jit" isn't in "all").
> 
> Yes --enable-languages=all,jit --enable-host-shared.
> I suppose my binutils may show the age, I will check that tomorrow.
> It
> looks like weird error.

FWIW if you do get it to build, you can reproduce the crash via running
this in builddir/gcc:

[gcc] $ PRESERVE_EXECUTABLES= \
  make check-jit \
RUNTESTFLAGS="-v -v -v jit.exp=test-factorial.c"

[gcc] $ PATH=.:$PATH \
LD_LIBRARY_PATH=. \
LIBRARY_PATH=. \
  gdb --args \
testsuite/jit/test-factorial.c.exe

(taken from 
https://gcc.gnu.org/onlinedocs/jit/internals/index.html#running-the-test-suite
)



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Qing Zhao via Gcc-patches



> On Sep 22, 2020, at 12:06 PM, Richard Sandiford  
> wrote:
 
 The following is what I see from i386.md: (I didn’t look at how 
 “UNSPEC_volatile” is used in data flow analysis in GCC yet)
 
 ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
 ;; all of memory.  This blocks insns from being moved across this point.
>>> 
>>> Heh, it looks like that comment dates back to 1994. :-)
>>> 
>>> The comment is no longer correct though.  I wasn't around at the time,
>>> but I assume the comment was only locally true even then.
>>> 
>>> If what the comment said was true, then something like:
>>> 
>>> (define_insn "cld"
>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>> ""
>>> "cld"
>>> [(set_attr "length" "1")
>>>  (set_attr "length_immediate" "0")
>>>  (set_attr "modrm" "0")])
>>> 
>>> would invalidate the entire register file and so would require all values
>>> to be spilt to the stack around the CLD.
>> 
>> Okay, thanks for the info. 
>> then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>  @code{unspec_volatile} is used for volatile operations and operations
>  that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.
> 
> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>  Note that the compiler can move even @code{volatile asm} instructions 
> relative
>  to other code, including across jump instructions. For example, on many 
>  targets there is a system register that controls the rounding mode of 
>  floating-point operations. Setting it with a @code{volatile asm} statement,
>  as in the following PowerPC example, does not work reliably.
> 
>  @example
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
>  The compiler may move the addition back before the @code{volatile asm}
>  statement. To make it work as expected, add an artificial dependency to
>  the @code{asm} by referencing a variable in the subsequent code, for
>  example:
> 
>  @example
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
> which is very similar to the unspec_volatile case we're talking about.
> 
> To take an x86 example:
> 
>  void
>  f (char *x)
>  {
>asm volatile ("");
>x[0] = 0;
>asm volatile ("");
>x[1] = 0;
>asm volatile ("");
>  }

If we change the above as the following: (but it might not correct on the asm 
format):

Void
F (char *x)
{
asm volatile (“x[0]”);
x[0] = 0;
asm volatile (“x[1]"); 
x[1] = 0;
 asm volatile ("”);
}

Will the moving and merging be blocked?


I found the following code in df-scan.c:

static void
df_uses_record (class df_collection_rec *collection_rec,
rtx *loc, enum df_ref_type ref_type,
basic_block bb, struct df_insn_info *insn_info,
int flags)
{
…

case ASM_OPERANDS:
case UNSPEC_VOLATILE:
case TRAP_IF:
case ASM_INPUT:
…
if (code == ASM_OPERANDS)
  {
int j;

for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
  df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
  DF_REF_REG_USE, bb, insn_info, flags);
return;
  }
break;
…
}


Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df 
analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the 
data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, 
and fix the data flow, right?


> 
> gets optimised to:
> 
>xorl%eax, %eax
>movw%ax, (%rdi)
> 
> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.
> 
>>> 
>>> There's also no dataflow reason why this couldn't be reordered to:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>  (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>  (reg:SI 1 dx)
>>>  ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 17 … pop a register other than dx from the stack …)
>>> 
>> 
>> This is the place I don’t quite agree at this moment, maybe I still not 
>> quite understand the “UNSPEC_volatile”.
>> 
>> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  
>> for the routine “can_move_insns_across” in gcc/df-problem.c:
>> 
>>  if (NONDEBUG_INSN_P (insn))
>>{

[PATCH] Add $(ZLIBINC) to CFLAGS-analyzer/engine.o

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 22:55 +0200, Tobias Burnus wrote:
> This patch breaks the cross build here:
> ...gcc/analyzer/engine.cc:65:10: fatal error: zlib.h: No such file or
> directory
> 
> I think you need to do something similar in Makefile.in as lto-
> compress has:
> 
> # lto-compress.o needs $(ZLIBINC) added to the include flags.
> CFLAGS-lto-compress.o += $(ZLIBINC) $(ZSTD_INC)
> 
> Tobias

Sorry about that.

Does the following fix it for you?

gcc/ChangeLog:
* Makefile.in: Add $(ZLIBINC) to CFLAGS-analyzer/engine.o.
---
 gcc/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c710bad27b1..9c6c1c93b97 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2296,6 +2296,7 @@ s-bversion: BASE-VER
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
 CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
+CFLAGS-analyzer/engine.o += $(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
2.26.2



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-22 Thread Segher Boessenkool
Hi!

On Tue, Sep 22, 2020 at 06:06:30PM +0100, Richard Sandiford wrote:
> Qing Zhao  writes:
> > Okay, thanks for the info. 
> > then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>   @code{unspec_volatile} is used for volatile operations and operations
>   that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.

volatile_insn_p returns true for unspec_volatile (and all other volatile
things).  Unfortunately the comment on this function is just as confused
as pretty much everything else :-/

> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>   Note that the compiler can move even @code{volatile asm} instructions 
> relative
>   to other code, including across jump instructions. For example, on many 
>   targets there is a system register that controls the rounding mode of 
>   floating-point operations. Setting it with a @code{volatile asm} statement,
>   as in the following PowerPC example, does not work reliably.
> 
>   @example
>   asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
>   The compiler may move the addition back before the @code{volatile asm}
>   statement. To make it work as expected, add an artificial dependency to
>   the @code{asm} by referencing a variable in the subsequent code, for
>   example:
> 
>   @example
>   asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
> which is very similar to the unspec_volatile case we're talking about.

So just like volatile memory accesses, they have an (unknown) side
effect, which means they have to execute on the real machine as on the
abstract machine (wrt sequence points).  All side effects have to happen
exactly as often as proscribed, and in the same order.  Just like
volatile asm, too.

And there is no magic to it, there are no other effects.

> To take an x86 example:
> 
>   void
>   f (char *x)
>   {
> asm volatile ("");
> x[0] = 0;
> asm volatile ("");
> x[1] = 0;
> asm volatile ("");
>   }
> 
> gets optimised to:
> 
> xorl%eax, %eax
> movw%ax, (%rdi)

(If you use "#" or "#smth" you can see those in the generated asm --
completely empty asm is helpfully (uh...) not printed.)

> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.

Even then, x[] could be optimised away completely (with whole program
optimisation, or something).  The only way to really prevent the
compiler from optimising memory accesses is to make it not see the
details (with an asm or an unspec, for example).

> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>   /* If INSN contains volatile references (specifically volatile MEMs),
>  we cannot combine across any other volatile references.

And this is correct, and the *minimum* to do even (this could change the
order of the side effects, depending how combine places the resulting
insns in I2 and I3).

>  Even if INSN doesn't contain volatile references, any intervening
>  volatile insn might affect machine state.  */

Confusingly stated, but essentially correct (it is possible we place
the volatile at I2, and everything would still be sequenced correctly,
but combine does not guarantee that).

>   is_volatile_p = volatile_refs_p (PATTERN (insn))
> ? volatile_refs_p
> : volatile_insn_p;

Too much subtlety in there, heh.


Segher


Re: Go patch committed: Finalize methods for type aliases of struct types

2020-09-22 Thread Ian Lance Taylor via Gcc-patches
On Mon, Sep 21, 2020 at 3:54 PM Ian Lance Taylor  wrote:
>
> This patch to the Go frontend finalizes methods for type aliases of
> struct types.  Previously we would finalize the methods of the alias
> type itself, but since its a type alias we really need to finalize the
> methods of the aliased type.
>
> This patch also handles method expressions of unnamed struct types.
>
> The test case for both is https://golang.org/cl/251168.
>
> This fixes https://golang.org/issue/38125.
>
> Bootstrapped and tested on x86_64-pc-linux-gnu.  Committed to mainline.

This requires a change to one of the Go tests.  Updated like so,
committed to mainline.

Ian
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue4458.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue4458.go
index 820f18cb8d7..82b104a0fdf 100644
--- a/gcc/testsuite/go.test/test/fixedbugs/issue4458.go
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue4458.go
@@ -16,5 +16,5 @@ func (T) foo() {}
 func main() {
av := T{}
pav := &av
-   (**T).foo(&pav) // ERROR "no method foo|requires named type or pointer 
to named"
+   (**T).foo(&pav) // ERROR "no method|requires named type or pointer to 
named"
 }


Re: PR97107, libgo fails to build for power10

2020-09-22 Thread Segher Boessenkool
Hi!

On Tue, Sep 22, 2020 at 09:55:12AM +0930, Alan Modra wrote:
> Calls from split-stack code to non-split-stack code need to expand
> mapped stack memory via __morestack.  Even tail calls.
> 
> __morestack is quite a surprising function on powerpc in that it calls
> back to its caller, and a tail call will continue running in the
> context of extra mapped stack.

Also known as "pure evil" :-)

>   PR target/97107
>   * config/rs6000/rs6000-internal.h (struct rs6000_stack): Improve
>   calls_p comment.
>   * config/rs6000/rs6000-logue.c (rs6000_stack_info): Likewise.
>   (rs6000_expand_split_stack_prologue): Emit the prologue for
>   functions that make a sibling call.

>if (!info->push_p)
> -return;
> +{
> +  /* We need the -fsplit-stack prologue for functions that make
> +  tail calls.  Tail calls don't count against crtl->is_leaf.  */
> +  for (insn = get_topmost_sequence ()->first; insn; insn = NEXT_INSN 
> (insn))
> + if (CALL_P (insn))
> +   break;
> +  if (!insn)
> + return;
> +}

I don't think that get_topmost_sequence is correct.

Other than that this is fine for trunk (and backports).  Thanks!


Segher


Re: [RS6000] Power10 libffi fixes

2020-09-22 Thread Segher Boessenkool
Hi!

On Tue, Sep 22, 2020 at 10:00:11AM +0930, Alan Modra wrote:
> Power10 pc-relative code doesn't use or preserve r2 as a TOC pointer.
> That means calling between pc-relative and TOC using code can't be
> done without intervening linker stubs, and a call from TOC code to
> pc-relative code must have a nop after the bl in order to restore r2.
> 
> Now the PowerPC libffi assembly code doesn't use r2 except for the
> implicit use when making calls back to C, ffi_closure_helper_LINUX64
> and ffi_prep_args64.  So changing the assembly to interoperate with
> pc-relative code without stubs is easily done.  Controlling that is a
> new built-in macro.
> 
> Upstream libffi currently has a different patch applied to work around
> the power10 build failure.  I'll post a delta for upstream.
> Bootstrapped and regression tested on power8, built for power10.
> 
> gcc/
>   * config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
>   Conditionally define __PCREL__.

Please do that as a separate (earlier) patch (because it *is*, and to
simplify backports, etc).

> libffi/
>   * src/powerpc/linux64.S (ffi_call_LINUX64): Don't emit global
>   entry when __PCREL__.  Call using @notoc.
>   (ffi_closure_LINUX64, ffi_go_closure_linux64): Likewise.

This is okay for trunk, and for backports (possibly expedited, talk
with Peter for what is wanted/needed for AT).

Thanks!


Segher


Re: libbacktrace patch committed: Avoid ambiguous binary search

2020-09-22 Thread Ian Lance Taylor via Gcc-patches
On Tue, Sep 8, 2020 at 6:22 PM Ian Lance Taylor  wrote:
>
> This patch to libbacktrace avoids ambiguous binary searches.
> Searching for a range match can cause the search order to not match
> the sort order, which can cause libbacktrace to miss matching entries.
> This patch allocates an extra entry at the end of function_addrs and
> unit_addrs vectors, so that we can safely compare to the next entry
> when searching.  It adjusts the matching code accordingly.  This fixes
> https://github.com/ianlancetaylor/libbacktrace/issues/44.
> Bootstrapped and ran libbacktrace and libgo tests on
> x86_64-pc-linux-gnu.  Committed to mainline.

I realized that this isn't quite right for the case where the PC value
we are looking up is equal to the low value in the array we are
searching.  In that case to ensure consistent results we have to step
forward to the end of the sequence of identical low values, and only
then step backward.  This patch implements that.  It also ensures that
the right thing happens if someone decides to look up the PC value -1.
Bootstrapped and ran libbacktrace and Go tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

* dwarf.c (report_inlined_functions): Handle PC == -1 and PC ==
p->low.
(dwarf_lookup_pc): Likewise.
diff --git a/libbacktrace/dwarf.c b/libbacktrace/dwarf.c
index 386701bffea..582f34bc816 100644
--- a/libbacktrace/dwarf.c
+++ b/libbacktrace/dwarf.c
@@ -3558,6 +3558,11 @@ report_inlined_functions (uintptr_t pc, struct function 
*function,
   if (function->function_addrs_count == 0)
 return 0;
 
+  /* Our search isn't safe if pc == -1, as that is the sentinel
+ value.  */
+  if (pc + 1 == 0)
+return 0;
+
   p = ((struct function_addrs *)
bsearch (&pc, function->function_addrs,
function->function_addrs_count,
@@ -3567,9 +3572,12 @@ report_inlined_functions (uintptr_t pc, struct function 
*function,
 return 0;
 
   /* Here pc >= p->low && pc < (p + 1)->low.  The function_addrs are
- sorted by low, so we are at the end of a range of function_addrs
- with the same low alue.  Walk backward and use the first range
- that includes pc.  */
+ sorted by low, so if pc > p->low we are at the end of a range of
+ function_addrs with the same low value.  If pc == p->low walk
+ forward to the end of the range with that low value.  Then walk
+ backward and use the first range that includes pc.  */
+  while (pc == (p + 1)->low)
+++p;
   match = NULL;
   while (1)
 {
@@ -3636,8 +3644,10 @@ dwarf_lookup_pc (struct backtrace_state *state, struct 
dwarf_data *ddata,
 
   *found = 1;
 
-  /* Find an address range that includes PC.  */
-  entry = (ddata->addrs_count == 0
+  /* Find an address range that includes PC.  Our search isn't safe if
+ PC == -1, as we use that as a sentinel value, so skip the search
+ in that case.  */
+  entry = (ddata->addrs_count == 0 || pc + 1 == 0
   ? NULL
   : bsearch (&pc, ddata->addrs, ddata->addrs_count,
  sizeof (struct unit_addrs), unit_addrs_search));
@@ -3649,9 +3659,12 @@ dwarf_lookup_pc (struct backtrace_state *state, struct 
dwarf_data *ddata,
 }
 
   /* Here pc >= entry->low && pc < (entry + 1)->low.  The unit_addrs
- are sorted by low, so we are at the end of a range of unit_addrs
- with the same low value.  Walk backward and use the first range
- that includes pc.  */
+ are sorted by low, so if pc > p->low we are at the end of a range
+ of unit_addrs with the same low value.  If pc == p->low walk
+ forward to the end of the range with that low value.  Then walk
+ backward and use the first range that includes pc.  */
+  while (pc == (entry + 1)->low)
+++entry;
   found_entry = 0;
   while (1)
 {
@@ -3832,9 +3845,12 @@ dwarf_lookup_pc (struct backtrace_state *state, struct 
dwarf_data *ddata,
 return callback (data, pc, ln->filename, ln->lineno, NULL);
 
   /* Here pc >= p->low && pc < (p + 1)->low.  The function_addrs are
- sorted by low, so we are at the end of a range of function_addrs
- with the same low alue.  Walk backward and use the first range
- that includes pc.  */
+ sorted by low, so if pc > p->low we are at the end of a range of
+ function_addrs with the same low value.  If pc == p->low walk
+ forward to the end of the range with that low value.  Then walk
+ backward and use the first range that includes pc.  */
+  while (pc == (p + 1)->low)
+++p;
   fmatch = NULL;
   while (1)
 {


Re: PR97107, libgo fails to build for power10

2020-09-22 Thread Alan Modra via Gcc-patches
Hi Segher,

On Tue, Sep 22, 2020 at 06:59:42PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 22, 2020 at 09:55:12AM +0930, Alan Modra wrote:
> >if (!info->push_p)
> > -return;
> > +{
> > +  /* We need the -fsplit-stack prologue for functions that make
> > +tail calls.  Tail calls don't count against crtl->is_leaf.  */
> > +  for (insn = get_topmost_sequence ()->first; insn; insn = NEXT_INSN 
> > (insn))
> > +   if (CALL_P (insn))
> > + break;
> > +  if (!insn)
> > +   return;
> > +}
> 
> I don't think that get_topmost_sequence is correct.

We get here from inside a sequence so we definitely need a way to look
at the function RTL rather than the empty sequence.  You want
push_topmost_sequence / pop_topmost_sequence around the rtl scan?

I wasn't aware there was a need for that when not emitting insns.  And
can't see any reason why when I look carefully, except that seems to
be customary, grep only shows one other place using
get_topmost_sequence as I did.

> Other than that this is fine for trunk (and backports).  Thanks!
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


libgo patch committed: Fix build errors on AIX

2020-09-22 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot fixes build errors on AIX.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
63cd53d2f5da07856340bbea11ee09ab1125e8c0
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index a8ba5a35e44..d17d39702c8 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-b24062f0b2e8f6173731d5654afe0addf857270e
+5605a0727d3395becba1fbd4447807073984ec13
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/internal/cpu/cpu_no_init.go 
b/libgo/go/internal/cpu/cpu_no_init.go
index fb381e1ce2d..e7ff87383c3 100644
--- a/libgo/go/internal/cpu/cpu_no_init.go
+++ b/libgo/go/internal/cpu/cpu_no_init.go
@@ -6,6 +6,7 @@
 // +build !amd64
 // +build !arm
 // +build !arm64
+// +build !ppc
 // +build !ppc64
 // +build !ppc64le
 // +build !s390x
diff --git a/libgo/go/internal/cpu/cpu_ppc64x.go 
b/libgo/go/internal/cpu/cpu_ppc64x.go
deleted file mode 100644
index b726cc86d52..000
--- a/libgo/go/internal/cpu/cpu_ppc64x.go
+++ /dev/null
@@ -1,44 +0,0 @@
-// Copyright 2017 The Go Authors. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-
-// +build ppc64 ppc64le
-
-package cpu
-
-// ppc64x doesn't have a 'cpuid' equivalent, so we rely on HWCAP/HWCAP2.
-// These are initialized by archauxv and should not be changed after they are
-// initialized.
-// On aix/ppc64, these values are initialized early in the runtime in 
runtime/os_aix.go.
-var HWCap uint
-var HWCap2 uint
-
-// HWCAP/HWCAP2 bits. These are exposed by the kernel.
-const (
-   // ISA Level
-   PPC_FEATURE2_ARCH_2_07 = 0x8000
-   PPC_FEATURE2_ARCH_3_00 = 0x0080
-
-   // CPU features
-   PPC_FEATURE2_DARN = 0x0020
-   PPC_FEATURE2_SCV  = 0x0010
-)
-
-func doinit() {
-   options = []option{
-   {Name: "darn", Feature: &PPC64.HasDARN},
-   {Name: "scv", Feature: &PPC64.HasSCV},
-   {Name: "power9", Feature: &PPC64.IsPOWER9},
-   {Name: "power8", Feature: &PPC64.IsPOWER8},
-   }
-
-   // HWCAP2 feature bits
-   PPC64.IsPOWER8 = isSet(HWCap2, PPC_FEATURE2_ARCH_2_07)
-   PPC64.IsPOWER9 = isSet(HWCap2, PPC_FEATURE2_ARCH_3_00)
-   PPC64.HasDARN = isSet(HWCap2, PPC_FEATURE2_DARN)
-   PPC64.HasSCV = isSet(HWCap2, PPC_FEATURE2_SCV)
-}
-
-func isSet(hwc uint, value uint) bool {
-   return hwc&value != 0
-}
diff --git a/libgo/go/internal/cpu/cpu_ppcx.go 
b/libgo/go/internal/cpu/cpu_ppcx.go
new file mode 100644
index 000..56ff87524ee
--- /dev/null
+++ b/libgo/go/internal/cpu/cpu_ppcx.go
@@ -0,0 +1,44 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc ppc64 ppc64le
+
+package cpu
+
+// ppc64x doesn't have a 'cpuid' equivalent, so we rely on HWCAP/HWCAP2.
+// These are initialized by archauxv and should not be changed after they are
+// initialized.
+// On aix/ppc64, these values are initialized early in the runtime in 
runtime/os_aix.go.
+var HWCap uint
+var HWCap2 uint
+
+// HWCAP/HWCAP2 bits. These are exposed by the kernel.
+const (
+   // ISA Level
+   PPC_FEATURE2_ARCH_2_07 = 0x8000
+   PPC_FEATURE2_ARCH_3_00 = 0x0080
+
+   // CPU features
+   PPC_FEATURE2_DARN = 0x0020
+   PPC_FEATURE2_SCV  = 0x0010
+)
+
+func doinit() {
+   options = []option{
+   {Name: "darn", Feature: &PPC64.HasDARN},
+   {Name: "scv", Feature: &PPC64.HasSCV},
+   {Name: "power9", Feature: &PPC64.IsPOWER9},
+   {Name: "power8", Feature: &PPC64.IsPOWER8},
+   }
+
+   // HWCAP2 feature bits
+   PPC64.IsPOWER8 = isSet(HWCap2, PPC_FEATURE2_ARCH_2_07)
+   PPC64.IsPOWER9 = isSet(HWCap2, PPC_FEATURE2_ARCH_3_00)
+   PPC64.HasDARN = isSet(HWCap2, PPC_FEATURE2_DARN)
+   PPC64.HasSCV = isSet(HWCap2, PPC_FEATURE2_SCV)
+}
+
+func isSet(hwc uint, value uint) bool {
+   return hwc&value != 0
+}
diff --git a/libgo/go/net/interface_aix.go b/libgo/go/net/interface_aix.go
index f57c5ff6622..bd5538699bb 100644
--- a/libgo/go/net/interface_aix.go
+++ b/libgo/go/net/interface_aix.go
@@ -33,8 +33,6 @@ const _RTAX_NETMASK = 2
 const _RTAX_IFA = 5
 const _RTAX_MAX = 8
 
-const _SIOCGIFMTU = -0x3fd796aa
-
 func getIfList() ([]byte, error) {
needed, err := syscall.Getkerninfo(_KINFO_RT_IFLIST, 0, 0, 0)
if err != nil {
diff --git a/libgo/go/runtime/os_aix.go b/libgo/go/runtime/os_aix.go
index b337330c8f2..951aeb6cffd 100644
--- a/libgo/go/runtime/os_aix.go
+++ b/libgo/go/runtime/os_aix.go
@@ -46,7 +46,7 @@ func clock_gettime(clock_id int64, timeout *timespec) int32
 
 //go:nosplit
 func semacreate(mp *m) {
-   if mp.mos.waitsema != 0 {
+   if mp.waitsema != 0 {

Re: PR97107, libgo fails to build for power10

2020-09-22 Thread Segher Boessenkool
On Wed, Sep 23, 2020 at 10:00:01AM +0930, Alan Modra wrote:
> Hi Segher,
> 
> On Tue, Sep 22, 2020 at 06:59:42PM -0500, Segher Boessenkool wrote:
> > On Tue, Sep 22, 2020 at 09:55:12AM +0930, Alan Modra wrote:
> > >if (!info->push_p)
> > > -return;
> > > +{
> > > +  /* We need the -fsplit-stack prologue for functions that make
> > > +  tail calls.  Tail calls don't count against crtl->is_leaf.  */
> > > +  for (insn = get_topmost_sequence ()->first; insn; insn = NEXT_INSN 
> > > (insn))
> > > + if (CALL_P (insn))
> > > +   break;
> > > +  if (!insn)
> > > + return;
> > > +}
> > 
> > I don't think that get_topmost_sequence is correct.
> 
> We get here from inside a sequence so we definitely need a way to look
> at the function RTL rather than the empty sequence.  You want
> push_topmost_sequence / pop_topmost_sequence around the rtl scan?

That will not help I think.

It isn't obvious that the outer sequence is always what we want.  If
there is some nicer way to get at the info you want, that would help.

But, this is the expander only -- so I guess we are okay here, it is
much limited when we can be called here.  Add a comment though?

> I wasn't aware there was a need for that when not emitting insns.  And
> can't see any reason why when I look carefully, except that seems to
> be customary, grep only shows one other place using
> get_topmost_sequence as I did.

Yes, and not in a widely used target.  So I wonder if there is some less
tricky way to get at the wanted info?


Segher


Re: [PATCH] Add $(ZLIBINC) to CFLAGS-analyzer/engine.o

2020-09-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-09-22 at 17:47 -0400, David Malcolm wrote:
> On Tue, 2020-09-22 at 22:55 +0200, Tobias Burnus wrote:
> > This patch breaks the cross build here:
> > ...gcc/analyzer/engine.cc:65:10: fatal error: zlib.h: No such file
> > or
> > directory
> > 
> > I think you need to do something similar in Makefile.in as lto-
> > compress has:
> > 
> > # lto-compress.o needs $(ZLIBINC) added to the include flags.
> > CFLAGS-lto-compress.o += $(ZLIBINC) $(ZSTD_INC)
> > 
> > Tobias
> 
> Sorry about that.
> 
> Does the following fix it for you?

It successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, and I
see it adding  -I../../src/gcc/../zlib in my test builds, so I've gone
ahead and pushed it to master
as c1c2ccc74cb6f547118431d8142bc894991b104a.

Please let me know if this doesn't fix it.

Dave

> gcc/ChangeLog:
>   * Makefile.in: Add $(ZLIBINC) to CFLAGS-analyzer/engine.o.
> ---
>  gcc/Makefile.in | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index c710bad27b1..9c6c1c93b97 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -2296,6 +2296,7 @@ s-bversion: BASE-VER
>  
>  CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
>  CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
> $(ZLIBINC)
> +CFLAGS-analyzer/engine.o += $(ZLIBINC)
>  
>  pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
>   $(srcdir)/gen-pass-instances.awk



[committed] analyzer: use switch in exploded_node::on_stmt

2020-09-22 Thread David Malcolm via Gcc-patches
This patch replaces a sequence of dyn_cast to different gimple stmt
types in exploded_node::on_stmt with a switch on the gimple_code.  This
makes clearer which kinds of stmt are currently treated as no-ops, as a
precursor to handling them properly.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as fefc209299236593fcc3004c874b2602a3735056.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::on_stmt): Replace sequence of dyn_cast
with switch.
---
 gcc/analyzer/engine.cc | 134 -
 1 file changed, 80 insertions(+), 54 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index df7e33564f1..437429798f2 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1063,62 +1063,88 @@ exploded_node::on_stmt (exploded_graph &eg,
  &old_state, state,
  stmt);
 
-  if (const gassign *assign = dyn_cast  (stmt))
-state->m_region_model->on_assignment (assign, &ctxt);
-
-  if (const greturn *return_ = dyn_cast  (stmt))
-state->m_region_model->on_return (return_, &ctxt);
-
-  /* Track whether we have a gcall to a function that's not recognized by
- anything, for which we don't have a function body, or for which we
- don't know the fndecl.  */
   bool unknown_side_effects = false;
-  if (const gcall *call = dyn_cast  (stmt))
+
+  switch (gimple_code (stmt))
 {
-  /* Debugging/test support.  */
-  if (is_special_named_call_p (call, "__analyzer_describe", 2))
-   state->m_region_model->impl_call_analyzer_describe (call, &ctxt);
-  else if (is_special_named_call_p (call, "__analyzer_dump", 0))
-   {
- /* Handle the builtin "__analyzer_dump" by dumping state
-to stderr.  */
- state->dump (eg.get_ext_state (), true);
-   }
-  else if (is_special_named_call_p (call, "__analyzer_dump_path", 0))
-   {
- /* Handle the builtin "__analyzer_dump_path" by queuing a
-diagnostic at this exploded_node.  */
- ctxt.warn (new dump_path_diagnostic ());
-   }
-  else if (is_special_named_call_p (call, "__analyzer_dump_region_model", 
0))
-   {
- /* Handle the builtin "__analyzer_dump_region_model" by dumping
-the region model's state to stderr.  */
- state->m_region_model->dump (false);
-   }
-  else if (is_special_named_call_p (call, "__analyzer_eval", 1))
-   state->m_region_model->impl_call_analyzer_eval (call, &ctxt);
-  else if (is_special_named_call_p (call, "__analyzer_break", 0))
-   {
- /* Handle the builtin "__analyzer_break" by triggering a
-breakpoint.  */
- /* TODO: is there a good cross-platform way to do this?  */
- raise (SIGINT);
-   }
-  else if (is_special_named_call_p (call, "__analyzer_dump_exploded_nodes",
-   1))
-   {
- /* This is handled elsewhere.  */
-   }
-  else if (is_setjmp_call_p (call))
-   state->m_region_model->on_setjmp (call, this, &ctxt);
-  else if (is_longjmp_call_p (call))
-   {
- on_longjmp (eg, call, state, &ctxt);
- return on_stmt_flags::terminate_path ();
-   }
-  else
-   unknown_side_effects = state->m_region_model->on_call_pre (call, &ctxt);
+default:
+  /* No-op for now.  */
+  break;
+
+case GIMPLE_ASSIGN:
+  {
+   const gassign *assign = as_a  (stmt);
+   state->m_region_model->on_assignment (assign, &ctxt);
+  }
+  break;
+
+case GIMPLE_ASM:
+  /* No-op for now.  */
+  break;
+
+case GIMPLE_CALL:
+  {
+   /* Track whether we have a gcall to a function that's not recognized by
+  anything, for which we don't have a function body, or for which we
+  don't know the fndecl.  */
+   const gcall *call = as_a  (stmt);
+
+   /* Debugging/test support.  */
+   if (is_special_named_call_p (call, "__analyzer_describe", 2))
+ state->m_region_model->impl_call_analyzer_describe (call, &ctxt);
+   else if (is_special_named_call_p (call, "__analyzer_dump", 0))
+ {
+   /* Handle the builtin "__analyzer_dump" by dumping state
+  to stderr.  */
+   state->dump (eg.get_ext_state (), true);
+ }
+   else if (is_special_named_call_p (call, "__analyzer_dump_path", 0))
+ {
+   /* Handle the builtin "__analyzer_dump_path" by queuing a
+  diagnostic at this exploded_node.  */
+   ctxt.warn (new dump_path_diagnostic ());
+ }
+   else if (is_special_named_call_p (call, "__analyzer_dump_region_model",
+ 0))
+ {
+   /* Handle the builtin "__analyzer_dump_region_model" by dumping
+  the region model's state to stderr.  */
+   state->m_region_mo

Re: [PATCH] c: Fix -Wduplicated-branches ICE [PR97125]

2020-09-22 Thread Sandra Loosemore

On 9/20/20 5:08 PM, Marek Polacek via Gcc-patches wrote:

We crash here because since r11-3302 the C FE uses codes like SWITCH_STMT
in the else branches in the attached test, and inchash::add_expr in
do_warn_duplicated_branches doesn't handle these front-end codes.  In
the C++ FE this works because by the time we get to do_warn_duplicated_branches
we've already cp_genericize'd the SWITCH_STMT tree into a SWITCH_EXPR.

The fix is to call do_warn_duplicated_branches_r only after loops and other
structured control constructs have been lowered.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c/97125
* c-gimplify.c (c_genericize): Only call do_warn_duplicated_branches_r
after loops and other structured control constructs have been lowered.

gcc/testsuite/ChangeLog:

PR c/97125
* c-c++-common/Wduplicated-branches-15.c: New test.


As I noted in the issue, this was my bad, and the fix looks good to me 
(although I have no authority to approve it).


The same problem was independently reported as PR c/97157.

-Sandra


libgo patch committed: Remove ptrace for ppc64 AIX

2020-09-22 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot removes the ptrace syscall on ppc64
AIX.  ptrace is available only for 32 bits programs.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
bbc644a3e0d9da37d0987918be5764d17a6069c4
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index d17d39702c8..59b580f0956 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-5605a0727d3395becba1fbd4447807073984ec13
+99ab98d2ed8fa8a33947c52925f89b344d7cb8ae
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/libcall_aix.go b/libgo/go/syscall/libcall_aix.go
index 8d9f59eb62f..27b469e1e47 100644
--- a/libgo/go/syscall/libcall_aix.go
+++ b/libgo/go/syscall/libcall_aix.go
@@ -16,9 +16,6 @@ const SYS_EXECVE = 0
 //sys  Openat(dirfd int, path string, flags int, mode uint32) (fd int, err 
error)
 //open64at(dirfd _C_int, path *byte, flags _C_int, mode Mode_t) _C_int
 
-//sys  ptrace(request int, id int, addr uintptr, data int, buff uintptr) (val 
int)
-//ptrace(request _C_int, id int, addr uintptr, data _C_int, buff *byte) _C_int
-
 //sys  ptrace64(request int, id int64, addr int64, data int, buff uintptr) 
(err error)
 //ptrace64(request _C_int, id int64, addr int64, data _C_int, buff *byte) 
_C_int
 
diff --git a/libgo/go/syscall/syscall_aix_ppc.go 
b/libgo/go/syscall/syscall_aix_ppc.go
index 83ed1e64c3a..2e89081 100644
--- a/libgo/go/syscall/syscall_aix_ppc.go
+++ b/libgo/go/syscall/syscall_aix_ppc.go
@@ -8,6 +8,9 @@ package syscall
 
 import "unsafe"
 
+//sys  ptrace(request int, id int, addr uintptr, data int, buff uintptr) (val 
int)
+//ptrace(request _C_int, id int, addr uintptr, data _C_int, buff *byte) _C_int
+
 // AIX does not define a specific structure but instead uses separate
 // ptrace calls for the different registers.
 type PtraceRegs struct {


  1   2   >