date:20191120

Re: [PATCH v3] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-20 Thread Jakub Jelinek

On Wed, Nov 20, 2019 at 02:13:52AM +, Maciej W. Rozycki wrote:

Ok with appropriate ChangeLog entry.

> --- gcc.orig/libgomp/testsuite/lib/libgomp.exp
> +++ gcc/libgomp/testsuite/lib/libgomp.exp
> @@ -174,6 +174,20 @@ proc libgomp_init { args } {
>  # For build-tree testing, also consider the library paths used for 
> builing.
>  # For installed testing, we assume all that to be provided in the 
> sysroot.
>  if { $blddir != "" } {
> + # The `-fopenacc' and `-fopenmp' options imply `-pthread', and
> + # that implies `-latomic' on some hosts, so wire in libatomic
> + # build directories.
> + if [ishost "riscv*-*-linux*"] {
> + set shlib_ext [get_shlib_extension]
> + set atomic_library_path "${blddir}/../libatomic/.libs"
> + if { [file exists "${atomic_library_path}/libatomic.a"]
> +  || [file exists \
> +  "${atomic_library_path}/libatomic.${shlib_ext}"] } {
> + lappend ALWAYS_CFLAGS \
> + "additional_flags=-L${atomic_library_path}"
> + append always_ld_library_path ":${atomic_library_path}"
> + }
> + }
>   global cuda_driver_include
>   global cuda_driver_lib
>   if { $cuda_driver_include != "" } {

Jakub

Re: [patch,avr] Add suport for devices from the 0-series.

2019-11-20 Thread Georg-Johann Lay


Am 16.11.19 um 18:36 schrieb Jeff Law:

On 11/8/19 9:19 AM, Georg-Johann Lay wrote:

Hi,

this patch adds support for a few more AVR devices.  Because the offset
where flash is seen in RAM deviates from the settings for the family
(and hence also from the linker script defaults), a new field in
avr_mcu_t is needed to express this so that specs can be generated
appropriately.

The AVR_MCU lines in avr-mcus.def are longer than 80 chars because it's
easier to maintain 1 device = 1 line entries.  And it's easier to scan
them with the awk scripts.

Ok for trunk?

Johann

 Add support for AVR devices from the 0-series.

 * config/avr/avr-arch.h (avr_mcu_t) : New field.
 * config/avr/avr-devices.c (avr_mcu_types): Adjust initializers.
 * config/avr/avr-mcus.def (AVR_MCU): Add respective field.
 * config/avr/specs.h (LINK_SPEC) <%(link_pm_base_address)>: Add.
 * config/avr/gen-avr-mmcu-specs.c (print_mcu)
 <*cpp, *cpp_mcu, *cpp_avrlibc, *link_pm_base_address>: Emit code
 for spec definitions.
 * doc/avr-mmcu.texi: Regenerate.

OK
jeff


Also applied the following patchlet atop of this.  It adds the
definition of __RODATA_PM_OFFSET__ at a different place so that
the generated specs file also works with older versions of
the compiler.  For the current version, it is an no-op.

Johann

Make 0-series device specs work with older versions of avr-gcc.

PR target/92545
* config/avr/specs.h (LINK_SPEC) <%(link_pm_base_address)>: Remove.
* config/avr/gen-avr-mmcu-specs.c (print_mcu)
<*link_pm_base_address>: Don't write spec.
<*link_arch>: Add --defsym=__RODATA_PM_OFFSET__= as needed.

Index: config/avr/gen-avr-mmcu-specs.c
===
--- config/avr/gen-avr-mmcu-specs.c (revision 278477)
+++ config/avr/gen-avr-mmcu-specs.c (revision 278478)
@@ -253,7 +253,11 @@ bool is_arch = mcu->macro == NULL;

   fprintf (f, "*link_relax:\n\t%s\n\n", LINK_RELAX_SPEC);

-  fprintf (f, "*link_arch:\n\t%s\n\n", LINK_ARCH_SPEC);
+  fprintf (f, "*link_arch:\n\t%s", LINK_ARCH_SPEC);
+  if (is_device
+  && flash_pm_offset)
+fprintf (f, " --defsym=__RODATA_PM_OFFSET__=0x%x", flash_pm_offset);
+  fprintf (f, "\n\n");

   if (is_device)
 {
@@ -270,14 +274,6 @@ bool is_arch = mcu->macro == NULL;
   fprintf (f, "\n\n");
 }

-  if (is_device
-  && flash_pm_offset)
-{
-  fprintf (f, "*link_pm_base_address:\n");
-  fprintf (f, "\t--defsym=__RODATA_PM_OFFSET__=0x%x", flash_pm_offset);
-  fprintf (f, "\n\n");
-}
-
   // Specs known to GCC.

   if (is_device)
Index: config/avr/specs.h
===
--- config/avr/specs.h  (revision 278477)
+++ config/avr/specs.h  (revision 278478)
@@ -68,7 +68,6 @@ along with GCC; see the file COPYING3.
   "%(link_text_start) " \
   "%(link_relax) "  \
   "%(link_pmem_wrap) "  \
-  "%(link_pm_base_address) "\
   "%{shared:%eshared is not supported} "

 #undef  LIB_SPEC

Re: [PATCH 3/3] [ARC] Register ARC specific passes with a .def file.

2019-11-20 Thread Claudiu Zissulescu

Thank you for your review, patch pushed.

//Claudiu

On Wed, Nov 20, 2019 at 1:58 AM Jeff Law  wrote:
>
> On 11/19/19 2:02 AM, Claudiu Zissulescu wrote:
> > Use arc-passes.def to register ARC specific passes.
> >
> > Ok to apply?
> > Claudiu
> >
> > gcc/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc-protos.h (make_pass_arc_ifcvt): Declare.
> >   (make_pass_arc_predicate_delay_insns): Likewise.
> >   * config/arc/arc.c (class pass_arc_ifcvt): Reformat text, add gate
> >   method.
> >   (class pass_arc_predicate_delay_insns): Likewise.
> >   (arc_init): Remove registering of ARC specific passes.
> >   * config/arc/t-arc (PASSES_EXTRA): Add arc-passes.def.
> >   * config/arc/arc-passes.def: New file.
> OK
> jeff
>

Re: [PATCH 2/3] [ARC] Add scaled load pattern

2019-11-20 Thread Claudiu Zissulescu

You are right, I should start with cleaning up the movsi pattern :) I
will come back to you with a proper patch when it is ready.

Thank you,
Claudiu

On Wed, Nov 20, 2019 at 2:00 AM Jeff Law  wrote:
>
> On 11/19/19 2:02 AM, Claudiu Zissulescu wrote:
> > ARC processors can use scaled addresses, i.e., the offset part of the
> > load address can be shifted by 2 (multiplied by 4). Add this pattern
> > and a test for it.
> >
> > gcc/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc.md (load_scaledsi): New pattern.
> >
> > testcase/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * gcc.target/arc/scaled-ld.c: New test.
> This is worrisome.  I'm pretty sure this has to be folded into the
> existing move pattern to satisfy obscure reload requirements.
>
> Jeff
>

[PATCH] Fix PR92537

2019-11-20 Thread Richard Biener



There's a bit left in this PR which the following fixes.  The root
only became external late and the check more naturally belongs to
the new place.

Boostrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-20  Richard Biener  

PR tree-optimization/92537
* tree-vect-slp.c (vect_analyze_slp_instance): Move CTOR
vectorization validity check...
(vect_slp_analyze_operations): ... here.

* gfortran.dg/pr92537.f90: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 278477)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2253,18 +2264,6 @@ vect_analyze_slp_instance (vec_info *vin
  matches[group_size / const_max_nunits * const_max_nunits] = false;
  vect_free_slp_tree (node, false);
}
-  else if (constructor
-  && SLP_TREE_DEF_TYPE (node) != vect_internal_def)
-   {
- /* CONSTRUCTOR vectorization relies on a vector stmt being
-generated, that doesn't work for fully external ones.  */
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: CONSTRUCTOR of external "
-"or constant elements\n");
- vect_free_slp_tree (node, false);
- return false;
-   }
   else
{
  /* Create a new SLP instance.  */
@@ -2939,7 +2939,12 @@ vect_slp_analyze_operations (vec_info *v
   if (!vect_slp_analyze_node_operations (vinfo,
 SLP_INSTANCE_TREE (instance),
 instance, visited, lvisited,
-&cost_vec))
+&cost_vec)
+ /* Instances with a root stmt require vectorized defs for the
+SLP tree root.  */
+ || (SLP_INSTANCE_ROOT_STMT (instance)
+ && (SLP_TREE_DEF_TYPE (SLP_INSTANCE_TREE (instance))
+ != vect_internal_def)))
 {
  slp_tree node = SLP_INSTANCE_TREE (instance);
  stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
Index: gcc/testsuite/gfortran.dg/pr92537.f90
===
--- gcc/testsuite/gfortran.dg/pr92537.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92537.f90   (working copy)
@@ -0,0 +1,32 @@
+! { dg-do compile }
+! { dg-options "-O2 -ftree-vectorize -fno-inline" }
+! { dg-additional-options "-march=skylake" { target x86_64-*-* i?86-*-* } }
+MODULE pr93527
+  implicit none
+  integer, parameter :: wp = kind (1.d0)
+  interface p_min
+ module procedure p_min_wp
+  end interface
+contains
+  subroutine foo (pr)
+real(wp), pointer :: pr(:)
+integer  ::  nzd
+real(wp) ::  pmin
+real(wp) ::  pmin_diag
+integer  ::  i
+nzd  = 15
+allocate (pr(nzd))
+pmin_diag = 4000._wp
+pmin = p_min(pmin_diag)
+pmin = min (pmin,pmin_diag)
+pr(1) = log(pmin)
+do i=1,nzd-1
+   pr(i+1) = log(pmin) + i
+end do
+  end subroutine foo
+  function p_min_wp (x) result (p_min)
+real(wp), intent(in) :: x
+real(wp) :: p_min
+p_min = x
+  end function p_min_wp
+end module pr93527

[PATCH] Do not attempt to vectorize uniform CTORs

2019-11-20 Thread Richard Biener



We're doing quite some useless work here and in the case we
actually manage to "vectorize" it, we've done a no-op (bb-slp-42.c).

It also refactors the routine a bit and only dumps about "vectorizable"
CTORs when we actually analyze the SLP tree (when all CTOR elements
were internally defined).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-20  Richard Biener  

* tree-vect-slp.c (vect_analyze_slp_instance): Dump
constructors we are actually analyzing.
(vect_slp_check_for_constructors): Do not vectorize uniform
constuctors, do not dump here.

* gcc.dg/vect/bb-slp-42.c: Adjust.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 278484)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2183,6 +2183,10 @@ vect_analyze_slp_instance (vec_info *vin
  else
return false;
}
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"Analyzing vectorizable constructor: %G\n",
+stmt_info->stmt);
 }
   else
 {
@@ -3123,31 +3127,22 @@ vect_slp_check_for_constructors (bb_vec_
   gimple_stmt_iterator gsi;
 
   for (gsi = bb_vinfo->region_begin;
-  gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
+   gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
 {
-  gimple *stmt = gsi_stmt (gsi);
-
-  if (is_gimple_assign (stmt)
- && gimple_assign_rhs_code (stmt) == CONSTRUCTOR
- && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
- && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE)
-   {
- tree rhs = gimple_assign_rhs1 (stmt);
-
- if (CONSTRUCTOR_NELTS (rhs) == 0)
-   continue;
-
- poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
+  gassign *stmt = dyn_cast  (gsi_stmt (gsi));
+  if (!stmt || gimple_assign_rhs_code (stmt) != CONSTRUCTOR)
+   continue;
 
- if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))
-   continue;
+  tree rhs = gimple_assign_rhs1 (stmt);
+  if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
+ || maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)),
+  CONSTRUCTOR_NELTS (rhs))
+ || VECTOR_TYPE_P (TREE_TYPE (CONSTRUCTOR_ELT (rhs, 0)->value))
+ || uniform_vector_p (rhs))
+   continue;
 
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"Found vectorizable constructor: %G\n", stmt);
- stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
- BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
-   }
+  stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
+  BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
 }
 }
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-42.c   (revision 278484)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c   (working copy)
@@ -44,6 +44,5 @@ main ()
 
 }
 
-/* See that we vectorize an SLP instance.  */
-/* { dg-final { scan-tree-dump "Found vectorizable constructor" "slp1" { 
target { ! vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" { 
target { ! vect_fully_masked } } } } */
+/* See that we do not try to vectorize the uniform CTORs.  */
+/* { dg-final { scan-tree-dump-not "Analyzing vectorizable constructor" "slp1" 
} } */

Re: introduce -fcallgraph-info option

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Alexandre Oliva wrote:

> On Nov  6, 2019, Alexandre Oliva  wrote:
> 
> > (CALLEE_FROM_CGRAPH_P): New.
> > (dump_final_callee_vcg, dump_final_node_vcg): New.
> 
> I goofed in the testing of this last-minute change.  It only works with
> optimization disabled.  With any optimization enabled,
> pass_remove_cgraph_callee_edges runs first thing in
> pass_all_optimizations{,_g} without a subsequent
> pass_rebuild_cgraph_edges.  This release takes place long before we'd
> even know which calls make to expand, and thus to the callgraph-info
> expected output.
> 
> It has just occurred to me that I could retain the logic but make it
> conditional on !optimize, but that would be error prone (AFAICT there's
> no reason why we don't release callees early without optimization) and
> probably not worth it.
> 
> 
> drop attempt to reuse cgraph callees for -fcallgraph-info
> 
> The information in cgraph callees is released long before we get to
> the point in which -fcallgraph-info edges are dumped, or even
> expanded.  It doesn't make sense to retain it longer: the edges
> created for -fcallgraph-info are much smaller, and they don't even
> coexist, so not even peak use grows.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Richard.

> 
> for  gcc/ChangeLog
> 
>   * function.h (CALLEE_FROM_CGRAPH_P): Remove.
>   * function.c (record_final_call): Record even calls that might
>   have been in the cgraph.
>   * toplev.c (dump_final_node_vcg): Skip iteration over cgraph
>   callees.
> ---
>  gcc/function.c |3 ---
>  gcc/function.h |5 +
>  gcc/toplev.c   |8 
>  3 files changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/gcc/function.c b/gcc/function.c
> index 1fe956b..2534c92 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -6406,9 +6406,6 @@ rest_of_handle_thread_prologue_and_epilogue (void)
>  void
>  record_final_call (tree callee, location_t location)
>  {
> -  if (!callee || CALLEE_FROM_CGRAPH_P (callee))
> -return;
> -
>struct callinfo_callee datum = { location, callee };
>vec_safe_push (cfun->su->callees, datum);
>  }
> diff --git a/gcc/function.h b/gcc/function.h
> index 14794c4..beb5c7d 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -192,15 +192,12 @@ public:
>poly_int64 length;
>  };
>  
> -/* Describe emitted builtin calls for -fcallgraph-info.  Those that
> -   are not builtin are taken from cgraph edges.  */
> +/* Describe emitted calls for -fcallgraph-info.  */
>  struct GTY(()) callinfo_callee
>  {
>location_t location;
>tree decl;
>  };
> -#define CALLEE_FROM_CGRAPH_P(T)  \
> -  (!fndecl_built_in_p (T) && !DECL_IS_BUILTIN (T))
>  
>  /* Describe dynamic allocation for -fcallgraph-info=da.  */
>  struct GTY(()) callinfo_dalloc
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index d4583ba..3b9f9ee 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1093,14 +1093,6 @@ dump_final_node_vcg (FILE *f)
>  dump_final_callee_vcg (f, c->location, c->decl);
>vec_free (cfun->su->callees);
>cfun->su->callees = NULL;
> -
> -  cgraph_node *cnode = cgraph_node::get (current_function_decl);
> -  for (cgraph_edge *e = cnode->callees; e; e = e->next_callee)
> -if (CALLEE_FROM_CGRAPH_P (e->callee->decl))
> -  dump_final_callee_vcg (f, gimple_location (e->call_stmt),
> -  e->callee->decl);
> -  for (cgraph_edge *e = cnode->indirect_calls; e; e = e->next_callee)
> -dump_final_callee_vcg (f, gimple_location (e->call_stmt), NULL);
>  }
>  
>  /* Output stack usage and callgraph info, as requested.  */
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [C++ PATCH] Fix up lambda decl specifier parsing ICE (PR c++/90842)

2019-11-20 Thread Jakub Jelinek

On Tue, Nov 19, 2019 at 11:35:02PM -0500, Jason Merrill wrote:
> It would seem better to break after consuming the token, so we just skip the
> extra processing and still give the same error.
> 
> And instead of this, maybe set CP_PARSER_FLAGS_NO_TYPE_DEFINITIONS so we
> keep the same diagnostic for other type-specifiers?

That seems to work well.
So like this if it passes bootstrap/regtest?

2019-11-20  Jakub Jelinek  
Jason Merrill  

PR c++/90842
* parser.c (cp_parser_decl_specifier_seq): For concept or typedef
break early if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
For type specifiers, set CP_PARSER_FLAGS_NO_TYPE_DEFINITIONS
if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR is set.

* g++.dg/cpp1y/lambda-generic-90842.C: New test.

--- gcc/cp/parser.c.jj  2019-11-19 22:26:46.171295670 +0100
+++ gcc/cp/parser.c 2019-11-20 10:05:27.24499 +0100
@@ -14097,6 +14097,9 @@ cp_parser_decl_specifier_seq (cp_parser*
   ds = ds_concept;
   cp_lexer_consume_token (parser->lexer);
 
+ if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+   break;
+
   /* Warn for concept as a decl-specifier. We'll rewrite these as
  concept declarations later.  */
   if (!flag_concepts_ts)
@@ -14139,6 +14142,10 @@ cp_parser_decl_specifier_seq (cp_parser*
  ds = ds_typedef;
  /* Consume the token.  */
  cp_lexer_consume_token (parser->lexer);
+
+ if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+   break;
+
  /* A constructor declarator cannot appear in a typedef.  */
  constructor_possible_p = false;
  /* The "typedef" keyword can only occur in a declaration; we
@@ -14235,6 +14242,9 @@ cp_parser_decl_specifier_seq (cp_parser*
  bool is_cv_qualifier;
  tree type_spec;
 
+ if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+   flags |= CP_PARSER_FLAGS_NO_TYPE_DEFINITIONS;
+
  type_spec
= cp_parser_type_specifier (parser, flags,
decl_specs,
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C.jj2019-11-20 
09:36:32.434390276 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C   2019-11-20 
10:08:18.037769485 +0100
@@ -0,0 +1,10 @@
+// PR c++/90842
+// { dg-do compile { target c++14 } }
+
+auto a = [](auto x) struct C { void foo (); } {};  // { dg-error 
"expected" }
+   // { dg-error 
"type-specifier invalid in lambda" "" { target *-*-* } .-1 }
+auto b = [](auto x) mutable typedef {};// { dg-error 
"'typedef' invalid in lambda" }
+#if __cpp_concepts >= 201907L
+auto c = [](auto x) constexpr concept {};  // { dg-error 
"'concept' invalid in lambda" "" { target c++2a } }
+#endif
+auto d = [](auto x) mutable friend {}; // { dg-error "'friend' 
invalid in lambda" }


Jakub

Re: [arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-20 Thread Christophe Lyon

On Tue, 19 Nov 2019 at 11:35, Richard Henderson
 wrote:
>
> I'm not sure what happened to v2.  I can see it in my sent email, but it never
> made it to the mailing list, and possibly not to Richard E. either.
>
> So resending, with an extra testsuite fix for ilp32, spotted by Christophe.
>
> Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, 
> I
> am swapping out a targetm hook when changing into and out of thumb1 mode.
>

There were small problems in this patch, we I have fixed as obvious in r278487:

[testsuite][arm] Fix asm-flag-[45].c tests

In asm-flag-4.c, we need to use dg-message instead of dg-error because
we have to match "sorry, unimplemented:" rather than "error:".  In
asm-flag-5.c, fix the dg-error syntax.

2019-11-20  Christophe Lyon  

* gcc.target/arm/asm-flag-4.c: Replace dg-error with dg-message.
* gcc.target/arm/asm-flag-5.c: Add quotes around dg-error
messages.

Index: gcc/testsuite/gcc.target/arm/asm-flag-4.c
===
--- gcc/testsuite/gcc.target/arm/asm-flag-4.c   (revision 278486)
+++ gcc/testsuite/gcc.target/arm/asm-flag-4.c   (revision 278487)
@@ -9,5 +9,5 @@

 void __attribute__((target("thumb"))) g(char *out)
 {
-  asm("" : "=@ccne"(out[0]));  /* { dg-error asm flags not supported } */
+  asm("" : "=@ccne"(out[0]));  /* { dg-message "asm flags not supported" } */
 }
Index: gcc/testsuite/gcc.target/arm/asm-flag-5.c
===
--- gcc/testsuite/gcc.target/arm/asm-flag-5.c   (revision 278486)
+++ gcc/testsuite/gcc.target/arm/asm-flag-5.c   (revision 278487)
@@ -13,13 +13,13 @@
 void f_f(void)
 {
   float x;
-  asm("" : "=@"(x)); /* { dg-error invalid type } */
+  asm("" : "=@"(x)); /* { dg-error "invalid type" } */
 }

 void f_d(void)
 {
   double x;
-  asm("" : "=@"(x)); /* { dg-error invalid type } */
+  asm("" : "=@"(x)); /* { dg-error "invalid type" } */
 }

 struct S { int x[3]; };
@@ -27,5 +27,5 @@
 void f_S(void)
 {
   struct S x;
-  asm("" : "=@"(x)); /* { dg-error invalid type } */
+  asm("" : "=@"(x)); /* { dg-error "invalid type" } */
 }

>
> r~
>
>

Re: Reverting r278411

2019-11-20 Thread Richard Sandiford

Segher Boessenkool  writes:
>> find_sets_in_insn has:
>> 
>>   /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
>>   The hard function value register is used only once, to copy to
>>   someplace else, so it isn't worth cse'ing.  */
>>   else if (GET_CODE (SET_SRC (x)) == CALL)
>>  ;
>> 
>> So n_sets == 1 can be true if we have a single SET in parallel
>> with a call.  Unsurprising, I guess no-one had MEM sets in
>> parallel with a call, so it didn't matter until now.
>> 
>> I'm retesting with a !CALL_P check added to make sure that was
>> the only problem.
>> 
>> Before I resubmit, why is the simplify-rtx.c part all wrong?
>
> It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
> lt gt eq un are hardly worth documenting or making symbolic constants
> for, because a) they are familiar to all powerpc users anyway (the
> assembler has those as predefined constants!), admittedly this isn't a
> strong argument for most people;

Ah, OK.  I was totally unaware of the connection.

> but also b) they were only used in two short and obvious routines.
> After your patch the routines are no longer short or obvious.
>
> A comparison result is exactly one of four things: lt, gt, eq, or un.
> So we can express testing for any subset of those with just an OR of
> the masks for the individual conditions.  Whether a comparison is
> floating point, floating point no-nans, signed, or unsigned, is just
> a property of the comparison, not of the result, in this representation.

Yeah, this works for OR because we never lose the unorderdness.

There were two aspects to my patch.  One was adding AND, and had:

  /* We only handle AND if we can ignore unordered cases.  */
  bool honor_nans_p = HONOR_NANS (GET_MODE (op0));
  if (code != IOR && (code != AND || honor_nans_p))
return 0;

This is needed because e.g. UNLT & ORDERED -> LT is only conditionally
valid.  It doesn't sound like you're objecting to that bit, is that right?
Or was this what you had in mind with the reference to no-nans?

As mentioned in the covering note, I punted for this because handling
trapping FP comparisons correctly for AND would need its own set of
testcases.  The current code punts for AND and for unsigned comparisons,
so continuing to punt for ANDs on this case seemed fair game.

> If you do not mix signed and unsigned comparisons (they make not much
> sense mixed anyway(*)), you need no changes at all: just translate
> ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
> function (there already are helper functions for that, signed_condition
> and unsigned_condition).

So this all seems to come down to whether unsigned comparisons are handled
as separate mask bits or whether they're dealt with by removing the
unsignedness and then adding it back.  ISTM both are legitimate ways
of doing it.  I don't think one of them is "all wrong".

FWIW, I'd posted a patch to do the IOR/AND thing in July:

  https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00933.html

But it turns out I'd got the signalling NaN behaviour wrong:

  https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01006.html

(hence punting above :-)).

I was very belatedly getting around to dealing with Joseph's comment
when you sent your patch and had it approved.  Since that patch seemed
to be more favourably received in general, I was trying to work within
the existing style of your version.  And without the powerpc background,
I just assumed that the mask values were "documented" by the first block
of case statements:

case LT:
  return 8;
case GT:
  return 4;
case EQ:
  return 2;
case UNORDERED:
  return 1;

Adding two more cases here didn't seem to make things any more unclear.
But maybe it is more jarring if you've memorised the powerpc mapping.

I'd actually considered converting to signed and back instead of adding
extra cases, but I thought that would be rejected as too inefficient.
(That was a concern with my patch above.)  It seemed like one of the selling
points of doing it your way was that everything was handled by one switch
statement "in" and one switch statement "out", and I was trying to
preserve that.

signed_condition and unsigned_condition assert on unordered comparisons,
so if we're going to go that route, we either need to filter them out
first or add maybe_* versions of the routines that return UNKNOWN.
We also have to deal with the neutrality of EQ and NE.  E.g. something
like:

  rtx_code signed_code0 = maybe_signed_condition (code0);
  rtx_code signed_code1 = maybe_signed_condition (code1);
  if (signed_code0 != UNKNOWN && signed_code1 != UNKNOWN)
code0 = signed_code0, code1 = signed_code1;
  else if (signed_code0 != UNKNOWN || signed_code1 != UNKNOWN)
return 0;

  ...

  if (signed_code0 == UNKNOWN)
code = unsigned_condition (code);

Or (without the maybe_* versions) something like:

  bool unsigned0_p
= (code0 == LTU || code0 == GTU || code0 == LEU || code0 ==

Re: Implement the part of C++20 p1032 Misc constexpr bits.

2019-11-20 Thread Christophe Lyon

On Thu, 14 Nov 2019 at 16:55, Jonathan Wakely  wrote:
>
> On 09/11/19 02:07 +, Smith-Rowland, Edward M wrote:
> >Here is the  part of C++20 p1032 Misc constexpr bits.
> >
> >Tested on x86_64-linux. OK?
>
> OK for trunk, thanks.
>

Hi,

The new test constexpr_allocator_arg_t.cc fails on arm and aarch64 and
many other targets according to gcc-testresults.
Is that expected?

Christophe

Re: [PATCH] Fix PR92537

2019-11-20 Thread Richard Sandiford

Richard Biener  writes:
> There's a bit left in this PR which the following fixes.  The root
> only became external late and the check more naturally belongs to
> the new place.
>
> Boostrap and regtest running on x86_64-unknown-linux-gnu.

FWIW, I was just about to post the patch below before seeing your
message. :-)  Thought I might as well post it anyway just in case.

I guess the slight advantage to keeping the vect_analyze_slp_instance
test is that we can still free child nodes at that point, so it might
be more efficient to catch it "early".  It probably doesn't make much
differnce in practice though.

Tested on aarch64-linux-gnu.

Thanks,
Richard


2019-11-20  Richard Sandiford  

gcc/
PR tree-optimization/92537
* tree-vect-slp.c (vect_slp_analyze_node_operations): Don't
allow the root node to be external.

gcc/testsuite/
PR tree-optimization/92537
* gcc.dg/vect/pr92537.c: New test.
* gfortran.dg/vect/pr92537.f90: Likewise.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2019-11-18 15:21:12.919993766 +
+++ gcc/tree-vect-slp.c 2019-11-20 09:48:57.145101295 +
@@ -2848,7 +2848,7 @@ vect_slp_analyze_node_operations (vec_in
   slp_tree child;
 
   if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
-return true;
+return node != SLP_INSTANCE_TREE (node_instance);
 
   /* If we already analyzed the exact same set of scalar stmts we're done.
  We share the generated vector stmts for those.
Index: gcc/testsuite/gcc.dg/vect/pr92537.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/pr92537.c 2019-11-20 09:48:57.145101295 +
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int a[64];
+int b, c, e;
+short d;
+short f[64];
+void g() {
+  b = 0;
+  c = d >> 3;
+  for (; b < 64 - 1; b++) {
+e = 0;
+e -= a[b] * c;
+f[b] = e;
+  }
+}
Index: gcc/testsuite/gfortran.dg/vect/pr92537.f90
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gfortran.dg/vect/pr92537.f90  2019-11-20 09:48:57.145101295 
+
@@ -0,0 +1,32 @@
+! { dg-do compile } */
+! { dg-additional-options "-fno-inline -march=skylake" }
+
+MODULE pr93527
+  implicit none
+  integer, parameter :: wp = kind (1.d0)
+  interface p_min
+ module procedure p_min_wp
+  end interface
+contains
+  subroutine foo (pr)
+real(wp), pointer :: pr(:)
+integer  ::  nzd
+real(wp) ::  pmin
+real(wp) ::  pmin_diag
+integer  ::  i
+nzd  = 15
+allocate (pr(nzd))
+pmin_diag = 4000._wp
+pmin = p_min(pmin_diag)
+pmin = min (pmin,pmin_diag)
+pr(1) = log(pmin)
+do i=1,nzd-1
+   pr(i+1) = log(pmin) + i
+end do
+  end subroutine foo
+  function p_min_wp (x) result (p_min)
+real(wp), intent(in) :: x
+real(wp) :: p_min
+p_min = x
+  end function p_min_wp
+end module pr93527

Re: RFA; patch to fix PR90007

2019-11-20 Thread Richard Biener

On Tue, Nov 19, 2019 at 5:07 PM Vladimir Makarov  wrote:
>
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007
>
> Sometime ago a code which permits LRA to reload hard register into
> memory as it did for pseudo were added.  But this LRA possibility was
> not reflected in recog.c.  The following patch fixes this discrepancy
> and as a result fixes PR90007.
>
> OK to commit?

I guess the change is OK but for the bug itself it sounds like
selective scheduling doesn't properly recognize insns it
proagates into (and avoids doing that then)?  That is,
selective scheduling creates invalid RTL?

Thanks,
Richard.

>

Re: [PATCH, Modula-2 (C/C++/D/F/Go/Jit)] (Register spec fn) (v2)

2019-11-20 Thread Gaius Mulley

Matthias Klose  writes:

> On 08.07.19 23:19, Matthias Klose wrote:
>> On 14.06.19 15:09, Gaius Mulley wrote:
>>>
>>> Hello,
>>>
>>> here is version two of the patches which introduce Modula-2 into the
>>> GCC trunk.  The patches include:
>>>
>>>   (*)  a patch to allow all front ends to register a lang spec function.
>>>(included are patches for all front ends to provide an empty
>>> callback function).
>>>   (*)  patch diffs to allow the Modula-2 front end driver to be
>>>built using GCC Makefile and friends.
>>>
>>> The compressed tarball includes:
>>>
>>>   (*)  gcc/m2  (compiler driver and lang-spec stuff for Modula-2).
>>>Including the need for registering lang spec functions.
>>>   (*)  gcc/testsuite/gm2  (a Modula-2 dejagnu test to ensure that
>>>the gm2 driver is built and can understands --version).
>>>
>>> These patches have been re-written after taking on board the comments
>>> found in this thread:
>>>
>>>https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02620.html
>>>
>>> it is a revised patch set from:
>>>
>>>https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00220.html
>>>
>>> I've run make bootstrap and run the regression tests on trunk and no
>>> extra failures occur for all languages touched in the ChangeLog.
>>>
>>> I'm currently tracking gcc trunk and gcc-9 with gm2 (which works well
>>> with amd64/arm64/i386) - these patches are currently simply for the
>>> driver to minimise the patch size.  There are also > 1800 tests in a
>>> dejagnu testsuite for gm2 which can be included at some future time.
>> 
>> I had a look at the GCC 9 version of the patches, with a build
>> including a make
>> install. Some comments:
>
> A look at the licenses:
>
> libgm2/p2c/*: GPL 3+
> libgm2/libiso/*: LGPL 2.1+
> libgm2/libmin/libc.c: GPL 3+
> libgm2/liblog/*: LGPL 2.1+
> libgm2/libpim/*: LGPL 2.1+
> libgm2/libpim/Selective.c: GPL 3+
> libgm2/libpim/wrapc.c: GPL 3+
> libgm2/libpth/*: LGPL 2.1+
>
> gcc/gm2/ulm-lib-gm2/* GPL 3+, Ulm copyright holder?
>
> gcc/gm2/gm2-libs/*.def GPL 3+
> gcc/gm2/gm2-libs/Break.def LGPL 2.1+
> gcc/gm2/gm2-libs/*.mod LGPL 2.1+
> gcc/gm2/gm2-libs/Builtins.mod GPL 3+
>
> I didn't look at everything in gcc/gm2, however it's not clear for me when a
> file is LGPL or GPL.  And at least in gm2-libs, it seems to be mixed randomly.
> First I thought all definition modules were GPL, and implementation modules 
> were
> LGPL, but that's not the case.
>
> So currently all code linked with the runtime libs becomes GPL 3+?
>
> For the ulm lib, the files mention the Ulm university as the copyright holder,
> but it's not clear which license these files had before they were imported.
>
> libgm2 seems to be mostly LGPL except for two files. Intended?
>
> Matthias

Hi Matthias,

many thanks for the analysis.  I've now:

  *  removed all gnu pth code and rewritten the coroutine libraries to
 use libgcc/gthr.h.
  *  removed all Univ Ulm libraries
  *  changed all remaining library code to use GPL3+ gcc runtime
 extensions
  *  checked all compiler code is GPL3

(on the master branch of gm2)

[I've still working on the testsuite]


regards,
Gaius

Re: Implement the part of C++20 p1032 Misc constexpr bits.

2019-11-20 Thread Ville Voutilainen

On Wed, 20 Nov 2019 at 11:47, Christophe Lyon
 wrote:
>
> On Thu, 14 Nov 2019 at 16:55, Jonathan Wakely  wrote:
> >
> > On 09/11/19 02:07 +, Smith-Rowland, Edward M wrote:
> > >Here is the  part of C++20 p1032 Misc constexpr bits.
> > >
> > >Tested on x86_64-linux. OK?
> >
> > OK for trunk, thanks.
> >
>
> Hi,
>
> The new test constexpr_allocator_arg_t.cc fails on arm and aarch64 and
> many other targets according to gcc-testresults.
> Is that expected?

No, that's not expected. Can you give a link to the build log?

Re: Implement the part of C++20 p1032 Misc constexpr bits.

2019-11-20 Thread Christophe Lyon

On Wed, 20 Nov 2019 at 11:10, Ville Voutilainen
 wrote:
>
> On Wed, 20 Nov 2019 at 11:47, Christophe Lyon
>  wrote:
> >
> > On Thu, 14 Nov 2019 at 16:55, Jonathan Wakely  wrote:
> > >
> > > On 09/11/19 02:07 +, Smith-Rowland, Edward M wrote:
> > > >Here is the  part of C++20 p1032 Misc constexpr bits.
> > > >
> > > >Tested on x86_64-linux. OK?
> > >
> > > OK for trunk, thanks.
> > >
> >
> > Hi,
> >
> > The new test constexpr_allocator_arg_t.cc fails on arm and aarch64 and
> > many other targets according to gcc-testresults.
> > Is that expected?
>
> No, that's not expected. Can you give a link to the build log?

On (cross) aarch64, I can see:
FAIL: 20_util/tuple/cons/constexpr_allocator_arg_t.cc (test for excess errors)
Excess errors:
/libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:48:
error: non-constant condition for static assertion
/libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:48:
  in 'constexpr' expansion of 'test_tuple()'
/libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:31:
  in 'constexpr' expansion of 'ta.std::tuple::tuple >(std::allocator_arg, alloc)'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/aarch64-none-linux-gnu/libstdc++-v3/include/tuple:705:
error: 'constexpr std::_Tuple_impl<_Idx, _Head, _Tail
...>::_Tuple_impl(std::allocator_arg_t, const _Alloc&) [with _Alloc =
std::allocator; long unsigned int _Idx = 0; _Head = int; _Tail =
{double, double}]' called in a constant expression
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/aarch64-none-linux-gnu/libstdc++-v3/include/tuple:251:
error: call to non-'constexpr' function 'std::__uses_alloc_t<_Tp,
_Alloc, _Args ...> std::__use_alloc(const _Alloc&) [with _Tp = int;
_Alloc = std::allocator; _Args = {}; std::__uses_alloc_t<_Tp,
_Alloc, _Args ...> = std::__uses_alloc_t >]'


I see it failing at r278333, was it fixed since then?

Re: Reverting r278411

2019-11-20 Thread Richard Sandiford

Richard Sandiford  writes:
> Segher Boessenkool  writes:
>>> find_sets_in_insn has:
>>> 
>>>   /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
>>>  The hard function value register is used only once, to copy to
>>>  someplace else, so it isn't worth cse'ing.  */
>>>   else if (GET_CODE (SET_SRC (x)) == CALL)
>>> ;
>>> 
>>> So n_sets == 1 can be true if we have a single SET in parallel
>>> with a call.  Unsurprising, I guess no-one had MEM sets in
>>> parallel with a call, so it didn't matter until now.
>>> 
>>> I'm retesting with a !CALL_P check added to make sure that was
>>> the only problem.
>>> 
>>> Before I resubmit, why is the simplify-rtx.c part all wrong?
>>
>> It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
>> lt gt eq un are hardly worth documenting or making symbolic constants
>> for, because a) they are familiar to all powerpc users anyway (the
>> assembler has those as predefined constants!), admittedly this isn't a
>> strong argument for most people;
>
> Ah, OK.  I was totally unaware of the connection.
>
>> but also b) they were only used in two short and obvious routines.
>> After your patch the routines are no longer short or obvious.
>>
>> A comparison result is exactly one of four things: lt, gt, eq, or un.
>> So we can express testing for any subset of those with just an OR of
>> the masks for the individual conditions.  Whether a comparison is
>> floating point, floating point no-nans, signed, or unsigned, is just
>> a property of the comparison, not of the result, in this representation.
>
> Yeah, this works for OR because we never lose the unorderdness.
>
> There were two aspects to my patch.  One was adding AND, and had:
>
>   /* We only handle AND if we can ignore unordered cases.  */
>   bool honor_nans_p = HONOR_NANS (GET_MODE (op0));
>   if (code != IOR && (code != AND || honor_nans_p))
> return 0;
>
> This is needed because e.g. UNLT & ORDERED -> LT is only conditionally
> valid.  It doesn't sound like you're objecting to that bit, is that right?
> Or was this what you had in mind with the reference to no-nans?
>
> As mentioned in the covering note, I punted for this because handling
> trapping FP comparisons correctly for AND would need its own set of
> testcases.  The current code punts for AND and for unsigned comparisons,
> so continuing to punt for ANDs on this case seemed fair game.
>
>> If you do not mix signed and unsigned comparisons (they make not much
>> sense mixed anyway(*)), you need no changes at all: just translate
>> ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
>> function (there already are helper functions for that, signed_condition
>> and unsigned_condition).
>
> So this all seems to come down to whether unsigned comparisons are handled
> as separate mask bits or whether they're dealt with by removing the
> unsignedness and then adding it back.  ISTM both are legitimate ways
> of doing it.  I don't think one of them is "all wrong".
>
> FWIW, I'd posted a patch to do the IOR/AND thing in July:
>
>   https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00933.html
>
> But it turns out I'd got the signalling NaN behaviour wrong:
>
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01006.html
>
> (hence punting above :-)).
>
> I was very belatedly getting around to dealing with Joseph's comment
> when you sent your patch and had it approved.  Since that patch seemed
> to be more favourably received in general, I was trying to work within
> the existing style of your version.  And without the powerpc background,
> I just assumed that the mask values were "documented" by the first block
> of case statements:
>
> case LT:
>   return 8;
> case GT:
>   return 4;
> case EQ:
>   return 2;
> case UNORDERED:
>   return 1;
>
> Adding two more cases here didn't seem to make things any more unclear.
> But maybe it is more jarring if you've memorised the powerpc mapping.
>
> I'd actually considered converting to signed and back instead of adding
> extra cases, but I thought that would be rejected as too inefficient.
> (That was a concern with my patch above.)  It seemed like one of the selling
> points of doing it your way was that everything was handled by one switch
> statement "in" and one switch statement "out", and I was trying to
> preserve that.
>
> signed_condition and unsigned_condition assert on unordered comparisons,
> so if we're going to go that route, we either need to filter them out
> first or add maybe_* versions of the routines that return UNKNOWN.
> We also have to deal with the neutrality of EQ and NE.  E.g. something
> like:
>
>   rtx_code signed_code0 = maybe_signed_condition (code0);
>   rtx_code signed_code1 = maybe_signed_condition (code1);
>   if (signed_code0 != UNKNOWN && signed_code1 != UNKNOWN)
> code0 = signed_code0, code1 = signed_code1;
>   else if (signed_code0 != UNKNOWN || signed_code1 != UNKNOWN)
> return 0;
>
>   ...
>
>   if

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-20 Thread Richard Biener

On Tue, Nov 19, 2019 at 11:02 PM David Malcolm  wrote:
>
> > > The checker is implemented as a GCC plugin.
> > >
> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
> > > that
> > > would live in the GCC source tree and be shipped as part of the GCC
> > > tarball,
> > > with a new:
> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > configure option, analogous to --enable-languages (the
> > > Makefile/configure
> > > machinery for handling in-tree GCC plugins is adapted from how we
> > > support
> > > frontends).
> >
> > I like that.  Implementing this as a plugin surely must help to
> > either
> > document the GCC plugin interface as powerful/mature for such a
> > task.  Or
> > make it so, if it isn't yet.  ;-)
>
> Our plugin "interface" as such is very broad.

Just to sneak in here I don't like exposing our current plugin "non-API"
more.  In fact I'd just build the analyzer into GCC with maybe an
option to disable its build (in case it is very fat?).

>From what I read it seems the analyzer could do with a proper
plugin API that just exposes introspection - and I really hope
somebody finds the time to complete (or rewrite...) the
proposed introspection API that ideally is even cross-compiler
(proven by implementing said API ontop of both GCC and clang/llvm).
That way the Analyzer would work with both GCC and clang [and golang
and rustc...].

So it would be interesting if you could try to sketch the kind of API
the Analyzer needs?  That is, merely the detail on which it inspects
statements, the CFG and the callgraph.

Richard.

Re: [C/C++ PATCH] Fix up build of GCC 4.6 and earlier with GCC 9+ (PR c/90677)

2019-11-20 Thread Richard Biener

On Wed, Nov 20, 2019 at 12:30 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch fixes build of older GCC releases with newer ones.
> In GCC 4.6 and earlier, we had:
> struct cgraph_node;
> struct cgraph_node *cgraph_node (tree);
> and that is something on which GCC 9+ code errors on if it sees any
> __gcc_diag__ and similar attributes, because cgraph_node it wants to find is
> not a type.
>
> As older GCC releases don't have the __gcc_diag__ etc. attributes guarded on
> no newer GCC releases, only on minimum GCC version that does support it,
> I think we need to make sure we don't reject what older GCCs used to have.
>
> The following patch does that.  In addition, get_pointer_to_named_type
> looked misnamed, because we actually aren't interested in getting gimple *
> etc. type, but gimple.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and eventually 9 too?

Nothing on the specific patch but I wonder if we should change the
error ()s in init_dynamic_diag_info to warnings to make stage1 uses
fine?  That probably then boils down to ignoring __gcc_diag__ formats
when initializing wasn't suceessful.

It also looks like we repeatedly do work in init_dynamic_diag_info ...

Richard.

> 2019-11-19  Jakub Jelinek  
>
> PR c/90677
> * c-format.c (get_pointer_to_named_type): Renamed to ...
> (get_named_type): ... this.  If result is FUNCTION_DECL for
> cgraph_node, return cgraph_node from pointee of return type if
> possible instead of emitting an error.
> (init_dynamic_diag_info): Adjust get_pointer_to_named_type callers
> to call get_named_type instead.  Formatting fixes.
>
> * c-c++-common/pr90677.c: New test.
>
> --- gcc/c-family/c-format.c.jj  2019-10-05 09:35:12.917997709 +0200
> +++ gcc/c-family/c-format.c 2019-11-19 13:05:10.113308578 +0100
> @@ -4899,21 +4899,39 @@ init_dynamic_gfc_info (void)
>  }
>  }
>
> -/* Lookup the type named NAME and return a pointer-to-NAME type if found.
> -   Otherwise, return void_type_node if NAME has not been used yet, or 
> NULL_TREE if
> -   NAME is not a type (issuing an error).  */
> +/* Lookup the type named NAME and return a NAME type if found.
> +   Otherwise, return void_type_node if NAME has not been used yet,
> +   or NULL_TREE if NAME is not a type (issuing an error).  */
>
>  static tree
> -get_pointer_to_named_type (const char *name)
> +get_named_type (const char *name)
>  {
>tree result;
> -  if ((result = maybe_get_identifier (name)))
> +  if (tree id = maybe_get_identifier (name))
>  {
> -  result = identifier_global_value (result);
> +  result = identifier_global_value (id);
>if (result)
> {
>   if (TREE_CODE (result) != TYPE_DECL)
> {
> + if (TREE_CODE (result) == FUNCTION_DECL
> + && !strcmp (name, "cgraph_node")
> + && TREE_CODE (TREE_TYPE (result)) == FUNCTION_TYPE)
> +   {
> + /* Before GCC 4.7, there used to be
> +struct cgraph_node;
> +struct cgraph_node *cgraph_node (tree);
> +Don't error in this case, so that GCC 9+ can still
> +compile GCC 4.6 and earlier.  */
> + tree res = TREE_TYPE (TREE_TYPE (result));
> + if (TREE_CODE (res) == POINTER_TYPE
> + && (TYPE_NAME (TREE_TYPE (res)) == id
> + || (TREE_CODE (TYPE_NAME (TREE_TYPE (res)))
> + == TYPE_DECL
> + && (DECL_NAME (TYPE_NAME (TREE_TYPE (res)))
> + == id
> +   return TREE_TYPE (res);
> +   }
>   error ("%qs is not defined as a type", name);
>   result = NULL_TREE;
> }
> @@ -4953,23 +4971,24 @@ init_dynamic_diag_info (void)
>  an extra type level.  */
>if ((local_tree_type_node = maybe_get_identifier ("tree")))
> {
> - local_tree_type_node = identifier_global_value 
> (local_tree_type_node);
> + local_tree_type_node
> +   = identifier_global_value (local_tree_type_node);
>   if (local_tree_type_node)
> {
>   if (TREE_CODE (local_tree_type_node) != TYPE_DECL)
> {
>   error ("% is not defined as a type");
> - local_tree_type_node = 0;
> + local_tree_type_node = NULL_TREE;
> }
>   else if (TREE_CODE (TREE_TYPE (local_tree_type_node))
>!= POINTER_TYPE)
> {
>   error ("% is not defined as a pointer type");
> - local_tree_type_node = 0;
> + local_tree_type_node = NULL_TREE;
> }
>   else
> -   local_tree_type_node =
> - TREE_TYPE (TREE_TYPE (local_tree_type_node))

Re: [C/C++ PATCH] Fix up build of GCC 4.6 and earlier with GCC 9+ (PR c/90677)

2019-11-20 Thread Jakub Jelinek

On Wed, Nov 20, 2019 at 11:25:46AM +0100, Richard Biener wrote:
> Nothing on the specific patch but I wonder if we should change the
> error ()s in init_dynamic_diag_info to warnings to make stage1 uses
> fine?  That probably then boils down to ignoring __gcc_diag__ formats
> when initializing wasn't suceessful.

Turning the errors into warnings is fine with me, but I think it still
doesn't hurt to handle the common case of cgraph_node being a function
the way the patch does, without any warning, as otherwise it would warn on
every single GCC file.
There is another thing, gimple, in the past we had typedef union ... *gimple;
and now we have struct gimple instead.  Though, both %G and %C that use
gimple * or cgraph_node * aren't going to appear in GCC 4.6 or earlier
anyway, so as long as we don't error (or ideally warn) all the time
on that, we just don't care.

> It also looks like we repeatedly do work in init_dynamic_diag_info ...

Yes, but only as long as the types aren't found yet.  I bet it is for the
case that there is
prototype with __gcc_diag__
struct cgraph_node;
or similar ordering, but that is just a guess.

Jakub

Re: [PATCH] Fix PR92537

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > There's a bit left in this PR which the following fixes.  The root
> > only became external late and the check more naturally belongs to
> > the new place.
> >
> > Boostrap and regtest running on x86_64-unknown-linux-gnu.
> 
> FWIW, I was just about to post the patch below before seeing your
> message. :-)  Thought I might as well post it anyway just in case.

I was thinking whether it makes sense to have external root nodes
and they probably can only appear with the CTORs right now (otherwise
they'll be stores).  So possibly generalizing the check like you
do makes sense, still the check belongs in vect_slp_analyze_operations ;)

> I guess the slight advantage to keeping the vect_analyze_slp_instance
> test is that we can still free child nodes at that point, so it might
> be more efficient to catch it "early".  It probably doesn't make much
> differnce in practice though.

But if the root is external there are no child nodes (well, besides
that root).  But yeah, having the check twice looked odd to me.

> Tested on aarch64-linux-gnu.

I've installed my variant before seeing you mail, so...

I also have a prototype for finding "random" roots for SLP
vectorization (just a bunch of same stmts) which runs into
the same issue (but also very many others ;))

Thanks,
Richard.

> 
> Thanks,
> Richard
> 
> 
> 2019-11-20  Richard Sandiford  
> 
> gcc/
>   PR tree-optimization/92537
>   * tree-vect-slp.c (vect_slp_analyze_node_operations): Don't
>   allow the root node to be external.
> 
> gcc/testsuite/
>   PR tree-optimization/92537
>   * gcc.dg/vect/pr92537.c: New test.
>   * gfortran.dg/vect/pr92537.f90: Likewise.
> 
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c   2019-11-18 15:21:12.919993766 +
> +++ gcc/tree-vect-slp.c   2019-11-20 09:48:57.145101295 +
> @@ -2848,7 +2848,7 @@ vect_slp_analyze_node_operations (vec_in
>slp_tree child;
>  
>if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
> -return true;
> +return node != SLP_INSTANCE_TREE (node_instance);
>  
>/* If we already analyzed the exact same set of scalar stmts we're done.
>   We share the generated vector stmts for those.
> Index: gcc/testsuite/gcc.dg/vect/pr92537.c
> ===
> --- /dev/null 2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr92537.c   2019-11-20 09:48:57.145101295 
> +
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a[64];
> +int b, c, e;
> +short d;
> +short f[64];
> +void g() {
> +  b = 0;
> +  c = d >> 3;
> +  for (; b < 64 - 1; b++) {
> +e = 0;
> +e -= a[b] * c;
> +f[b] = e;
> +  }
> +}
> Index: gcc/testsuite/gfortran.dg/vect/pr92537.f90
> ===
> --- /dev/null 2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gfortran.dg/vect/pr92537.f902019-11-20 
> 09:48:57.145101295 +
> @@ -0,0 +1,32 @@
> +! { dg-do compile } */
> +! { dg-additional-options "-fno-inline -march=skylake" }
> +
> +MODULE pr93527
> +  implicit none
> +  integer, parameter :: wp = kind (1.d0)
> +  interface p_min
> + module procedure p_min_wp
> +  end interface
> +contains
> +  subroutine foo (pr)
> +real(wp), pointer :: pr(:)
> +integer  ::  nzd
> +real(wp) ::  pmin
> +real(wp) ::  pmin_diag
> +integer  ::  i
> +nzd  = 15
> +allocate (pr(nzd))
> +pmin_diag = 4000._wp
> +pmin = p_min(pmin_diag)
> +pmin = min (pmin,pmin_diag)
> +pr(1) = log(pmin)
> +do i=1,nzd-1
> +   pr(i+1) = log(pmin) + i
> +end do
> +  end subroutine foo
> +  function p_min_wp (x) result (p_min)
> +real(wp), intent(in) :: x
> +real(wp) :: p_min
> +p_min = x
> +  end function p_min_wp
> +end module pr93527
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [C++ PATCH] Fix weird addr_expr not supported by dump_expr messages (PR c++/90767)

2019-11-20 Thread David Malcolm

On Wed, 2019-11-20 at 00:38 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The following patch is a minimal fix to avoid
> cannot convert ‘‘addr_expr’ not supported by dump_type’
> to ‘X*’
> and similar messages.  The recently added complain_about_bad_argument
> function expects a from_type argument, but conv->from isn't
> necessarily a
> type, it can be an expression too.
> 
> With this patch one gets error like:
> cannot convert ‘const X*’ to ‘X*’
> and note like
> initializing argument 'this' of ‘void X::foo()’
> Still, perhaps what GCC 8 and earlier used to emit might be clearer:
> pr90767-1.C: In member function ‘X::operator T() const’:
> pr90767-1.C:12:7: error: no matching function for call to ‘X::foo()
> const’
> pr90767-1.C:6:8: note: candidate: ‘void X::foo()’ 
> pr90767-1.C:6:8: note:   passing ‘const X*’ as ‘this’ argument
> discards qualifiers
> There is the print_conversion_rejection function that handles the
> various
> cases, like this vs. other arguments, conv->from with expr type vs.
> type
> etc.
> Though, I must say I don't understand the reasons why
> complain_about_bad_argument
> has been added and whether we'd want to emit there what
> print_conversion_rejection prints as notes with 2 leading spaces
> instead as
> errors with no leading spaces.

I added complain_about_bad_argument in r264250 (aka
b78e49d1ddf1a09e16152353b41bf7c0b44d6405); the intent is to special-
case when there's a single candidate due to a bad argument, underlining
the pertinent bogus argument at the callsite, rather than having the
caret at the final close-paren of the call.

But it looks like I forgot about the "this" case.  Does it make more
sense for pr90767-1.C to reject the special-casing if conv->n_arg is -1
for the "this"?  (either in
complain_about_no_candidates_for_method_call, or in
maybe_get_bad_conversion_for_unmatched_call); perhaps like this:

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 0034c1c..eefb7ab 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9855,16 +9855,18 @@ complain_about_no_candidates_for_method_call (tree 
instance,
   else
 {
   /* Special-case for when there's a single candidate that's failing
-due to a bad argument type.  */
+due to a bad argument type (but not for the "this" argument),
+so that we can underline the pertinent argument at the callsite.  */
   if (z_candidate *candidate = single_z_candidate (candidates))
  if (const conversion_info *conv
= maybe_get_bad_conversion_for_unmatched_call (candidate))
-   {
- complain_about_bad_argument (conv->loc,
-  conv->from, conv->to_type,
-  candidate->fn, conv->n_arg);
- return;
-   }
+   if (conv->n_arg != -1)
+ {
+   complain_about_bad_argument (conv->loc,
+conv->from, conv->to_type,
+candidate->fn, conv->n_arg);
+   return;
+ }
 
   tree arglist = build_tree_list_vec (user_args);
   tree errname = name;


(caveat: barely tested)

Dave

> In any case, I think the patch below is a step in the right
> direction.
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR c++/90767
>   * call.c (complain_about_no_candidates_for_method_call): If
>   conv->from is not a type, pass to complain_about_bad_argument
>   lvalue_type of conv->from.
> 
>   * g++.dg/diagnostic/pr90767-1.C: New test.
>   * g++.dg/diagnostic/pr90767-2.C: New test.
> 
> --- gcc/cp/call.c.jj  2019-11-18 18:49:14.461880924 +0100
> +++ gcc/cp/call.c 2019-11-19 14:40:19.121937148 +0100
> @@ -9861,8 +9861,11 @@ complain_about_no_candidates_for_method_
> if (const conversion_info *conv
>   = maybe_get_bad_conversion_for_unmatched_call
> (candidate))
>   {
> +   tree from_type = conv->from;
> +   if (!TYPE_P (conv->from))
> + from_type = lvalue_type (conv->from);
> complain_about_bad_argument (conv->loc,
> -conv->from, conv->to_type,
> +from_type, conv->to_type,
>  candidate->fn, conv->n_arg);
> return;
>   }
> --- gcc/testsuite/g++.dg/diagnostic/pr90767-1.C.jj2019-11-19
> 14:48:00.386041586 +0100
> +++ gcc/testsuite/g++.dg/diagnostic/pr90767-1.C   2019-11-19
> 14:46:53.395043036 +0100
> @@ -0,0 +1,15 @@
> +// PR c++/90767
> +// { dg-do compile }
> +
> +struct X {
> +  int n;
> +  void foo ();   // { dg-message "initializing argument 'this'"
> }
> +
> +  template
> +  operator T () const
> +{
> +  if (n == 0)
> + foo (); // { dg-error "cannot convert 'const X\\*' to 'X\\*'"
> }
> +  return n;
> +}
> +};
> --- gcc/testsuite/g++.dg/diagnostic/pr90767-2.C.jj2019-11-19
> 14:50:48.923522136 +0100
> +

[PATCH] Update comment in libsanitizer/*/libtool-version files.

2019-11-20 Thread Martin Liška


Hi.

The patch is about removal of an unused libtool-version file,
and comments in other libtool-version files are legacy.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to install the patch if there are no objections.

Thanks,
Martin

libsanitizer/ChangeLog:

2019-11-20  Martin Liska  

* libtool-version: Remove.
* lsan/libtool-version: Upate comment to not mention libmudflap.
* tsan/libtool-version: Likewise.
* ubsan/libtool-version: Likewise.
---
 libsanitizer/libtool-version   | 6 --
 libsanitizer/lsan/libtool-version  | 2 +-
 libsanitizer/tsan/libtool-version  | 2 +-
 libsanitizer/ubsan/libtool-version | 2 +-
 4 files changed, 3 insertions(+), 9 deletions(-)
 delete mode 100644 libsanitizer/libtool-version


diff --git a/libsanitizer/libtool-version b/libsanitizer/libtool-version
deleted file mode 100644
index 204fdd2d8e5..000
--- a/libsanitizer/libtool-version
+++ /dev/null
@@ -1,6 +0,0 @@
-# This file is used to maintain libtool version info for libmudflap.  See
-# the libtool manual to understand the meaning of the fields.  This is
-# a separate file so that version updates don't involve re-running
-# automake.
-# CURRENT:REVISION:AGE
-0:0:0
diff --git a/libsanitizer/lsan/libtool-version b/libsanitizer/lsan/libtool-version
index 204fdd2d8e5..c63710360f6 100644
--- a/libsanitizer/lsan/libtool-version
+++ b/libsanitizer/lsan/libtool-version
@@ -1,4 +1,4 @@
-# This file is used to maintain libtool version info for libmudflap.  See
+# This file is used to maintain libtool version info for liblsan.  See
 # the libtool manual to understand the meaning of the fields.  This is
 # a separate file so that version updates don't involve re-running
 # automake.
diff --git a/libsanitizer/tsan/libtool-version b/libsanitizer/tsan/libtool-version
index 204fdd2d8e5..11974598ac5 100644
--- a/libsanitizer/tsan/libtool-version
+++ b/libsanitizer/tsan/libtool-version
@@ -1,4 +1,4 @@
-# This file is used to maintain libtool version info for libmudflap.  See
+# This file is used to maintain libtool version info for libtsan.  See
 # the libtool manual to understand the meaning of the fields.  This is
 # a separate file so that version updates don't involve re-running
 # automake.
diff --git a/libsanitizer/ubsan/libtool-version b/libsanitizer/ubsan/libtool-version
index 4b494765130..26538f99f19 100644
--- a/libsanitizer/ubsan/libtool-version
+++ b/libsanitizer/ubsan/libtool-version
@@ -1,4 +1,4 @@
-# This file is used to maintain libtool version info for libmudflap.  See
+# This file is used to maintain libtool version info for libubsan.  See
 # the libtool manual to understand the meaning of the fields.  This is
 # a separate file so that version updates don't involve re-running
 # automake.

Re: Implement the part of C++20 p1032 Misc constexpr bits.

2019-11-20 Thread Ville Voutilainen

On Wed, 20 Nov 2019 at 12:16, Christophe Lyon
 wrote:
>
> On Wed, 20 Nov 2019 at 11:10, Ville Voutilainen
>  wrote:
> >
> > On Wed, 20 Nov 2019 at 11:47, Christophe Lyon
> >  wrote:
> > >
> > > On Thu, 14 Nov 2019 at 16:55, Jonathan Wakely  wrote:
> > > >
> > > > On 09/11/19 02:07 +, Smith-Rowland, Edward M wrote:
> > > > >Here is the  part of C++20 p1032 Misc constexpr bits.
> > > > >
> > > > >Tested on x86_64-linux. OK?
> > > >
> > > > OK for trunk, thanks.
> > > >
> > >
> > > Hi,
> > >
> > > The new test constexpr_allocator_arg_t.cc fails on arm and aarch64 and
> > > many other targets according to gcc-testresults.
> > > Is that expected?
> >
> > No, that's not expected. Can you give a link to the build log?
>
> On (cross) aarch64, I can see:
> FAIL: 20_util/tuple/cons/constexpr_allocator_arg_t.cc (test for excess errors)
> Excess errors:
> /libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:48:
> error: non-constant condition for static assertion
> /libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:48:
>   in 'constexpr' expansion of 'test_tuple()'
> /libstdc++-v3/testsuite/20_util/tuple/cons/constexpr_allocator_arg_t.cc:31:
>   in 'constexpr' expansion of 'ta.std::tuple double>::tuple >(std::allocator_arg, alloc)'
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/aarch64-none-linux-gnu/libstdc++-v3/include/tuple:705:
> error: 'constexpr std::_Tuple_impl<_Idx, _Head, _Tail
> ...>::_Tuple_impl(std::allocator_arg_t, const _Alloc&) [with _Alloc =
> std::allocator; long unsigned int _Idx = 0; _Head = int; _Tail =
> {double, double}]' called in a constant expression
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/aarch64-none-linux-gnu/libstdc++-v3/include/tuple:251:
> error: call to non-'constexpr' function 'std::__uses_alloc_t<_Tp,
> _Alloc, _Args ...> std::__use_alloc(const _Alloc&) [with _Tp = int;
> _Alloc = std::allocator; _Args = {}; std::__uses_alloc_t<_Tp,
> _Alloc, _Args ...> = std::__uses_alloc_t >]'
>
>
> I see it failing at r278333, was it fixed since then?

Yes, in r278373.

Re: [C++ PATCH] Fix weird addr_expr not supported by dump_expr messages (PR c++/90767)

2019-11-20 Thread Jakub Jelinek

On Wed, Nov 20, 2019 at 05:54:48AM -0500, David Malcolm wrote:
> I added complain_about_bad_argument in r264250 (aka
> b78e49d1ddf1a09e16152353b41bf7c0b44d6405); the intent is to special-
> case when there's a single candidate due to a bad argument, underlining
> the pertinent bogus argument at the callsite, rather than having the
> caret at the final close-paren of the call.
> 
> But it looks like I forgot about the "this" case.  Does it make more
> sense for pr90767-1.C to reject the special-casing if conv->n_arg is -1
> for the "this"?  (either in
> complain_about_no_candidates_for_method_call, or in
> maybe_get_bad_conversion_for_unmatched_call); perhaps like this:

I don't know.  Either something like that, or handle those cases in
complain_about_bad_argument specially too.
Note, print_conversion_rejection apparently has special cases for
n_arg == -1 (TYPE_P conv->from vs. non-TYPE_P)
!TYPE_P conv->from
n_arg == -2
the rest
I really don't know what exactly they mean, appart from n_arg == -1
being this, n_arg seems to be return value, but dunno when that triggers,
and it is unclear what TYPE_P conv->from vs. non-TYPE_P means.

Jakub

Re: [PATCH] Switch gcc ftp URL's to http

2019-11-20 Thread Janne Blomqvist

PING

On Wed, Nov 13, 2019 at 11:05 PM Janne Blomqvist
 wrote:
>
> On Wed, Nov 13, 2019 at 10:41 PM Andrew Pinski  wrote:
> >
> > On Wed, Nov 13, 2019 at 12:37 PM Janne Blomqvist
> >  wrote:
> > >
> > > The FTP protocol is getting long in the tooth, and we should emphasize
> > > HTTP where that is available. This patch changes various gcc.gnu.org
> > > URL's to instead use HTTP.
> >
> > May I suggest you use https instead of http here?  Because it will be
> > redirected anyways to use https.
>
> For me, when I use my normal web browser (firefox), it does redirect
> to https. But I'm using the "HTTPS everywhere" extension, so I'm not
> sure if it's the extension that does it, or if the server redirects
> me, or if it's some other web-security-thingy that does it. When I use
> curl, and if I manage to interpret the output correctly, it does not
> redirect.
>
> That being said, it's probably a good idea to use https anyway. So
> yes, consider it done (I'm not sending a new iteration of the patch
> for this).
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-20 Thread Richard Sandiford

Hi,

Thanks for doing this.  Adding Richard on cc:, since the SVE subject
tag might have put him off.  There's not really anything SVE-specific
here apart from the testcase.

> 2019-11-19  Prathamesh Kulkarni  
>
>   PR tree-optimization/89007
>   * tree-vect-patterns.c (vect_recog_average_pattern): If there is no
>   target support available, generate code to distribute rshift over plus
>   and add one depending upon floor or ceil rounding.
>
> testsuite/
>   * gcc.target/aarch64/sve/pr89007.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> new file mode 100644
> index 000..32095c63c61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define N 1024
> +unsigned char dst[N];
> +unsigned char in1[N];
> +unsigned char in2[N];
> +
> +/*
> +**  foo: 
> +**   ...
> +**   lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> +**   lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> +**   add (z[0-9]+\.b), \1, \2
> +**   orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> +**   and (z[0-9]+\.b), \4\.b, #0x1
> +**   add z0.b, \3, \5

It'd probably be more future-proof to allow (\1, \2|\2, \1) and
(\3, \5|\5, \3).  Same for the other testcase.

> +**   ...
> +*/
> +void
> +foo ()
> +{
> +  for( int x = 0; x < N; x++ )
> +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> new file mode 100644
> index 000..cc40f45046b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define N 1024
> +unsigned char dst[N];
> +unsigned char in1[N];
> +unsigned char in2[N];
> +
> +/*
> +**  foo: 
> +**   ...
> +**   lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> +**   lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> +**   add (z[0-9]+\.b), \1, \2
> +**   and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> +**   and (z[0-9]+\.b), \4\.b, #0x1
> +**   add z0.b, \3, \5
> +**   ...
> +*/
> +void
> +foo ()
> +{
> +  for( int x = 0; x < N; x++ )
> +dst[x] = (in1[x] + in2[x]) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 8ebbcd76b64..7025a3b4dc2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> last_stmt_info, tree *type_out)
 
>/* Check for target support.  */
>tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> -  if (!new_vectype
> -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> -   OPTIMIZE_FOR_SPEED))
> +
> +  if (!new_vectype)
>  return NULL;
> 
> +  bool ifn_supported
> += direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED);
> +
>/* The IR requires a valid vector type for the cast result, even though
>   it's likely to be discarded.  */
>*type_out = get_vectype_for_scalar_type (vinfo, type);
>if (!*type_out)
>  return NULL;
 >
> -  /* Generate the IFN_AVG* call.  */
>tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
>tree new_ops[2];
>vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
>  unprom, new_vectype);
> +
> +  if (!ifn_supported)

I guess this is personal preference, but I'm not sure there's much
benefit to splitting this out of the "if" into a separate variable.

> +{
> +  /* If there is no target support available, generate code
> +  to distribute rshift over plus and add one depending
> +  upon floor or ceil rounding.  */

Maybe "and add a carry"?  It'd be good to write out the expansion in
pseudo-code too.

The transform is only valid on unsigned types.  We still need to
bail out for signed types because the carry would act in the wrong
direction for negative results.  E.g.:

  (-3 + 1) >> 1 == -2
  (-3 >> 1) == -1,  (1 >> 1) == 0,  (-1 & 1) == 1   total == 0

Handling that case could be future work.

> +  tree one_cst = build_one_cst (new_type);
> +
> +  tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], 
> one_cst);
> +
> +  tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g2 = gi

Re: [PATCH 1/X] [libsanitizer] Tie the hwasan library into our build system

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

+# This file is used to maintain libtool version info for libmudflap. See


Just a small nit here, the comment is a copy&paste leftover, please rename
it to libhwasan. I've just sent similar patch that fixes that in other
places:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01922.html

Martin

Re: [PATCH 4/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.

The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel.  This flag has defaults that allow compiling KASAN with tags as
it is currently implemented.
These defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Stack tagging in the kernel is a future aim, stack instrumentation has
not yet been enabled for the kernel for clang either
(https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/687121.html).

We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (memory_tagging_p): New.
* asan.h (memory_tagging_p): New.
* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.def (PARAM_HWASAN_RANDOM_FRAME_TAG): New.
(PARAM_HWASAN_STACK): New.
* params.h (HWASAN_STACK): New.
(HWASAN_RANDOM_FRAME_TAG): New.
* target.def (HOOK_PREFIX): Add new hook.
* targhooks.c (default_memtag_can_tag_addresses): New.
* toplev.c (process_options): Ensure hwasan only on TBI
architectures.

gcc/c-family/ChangeLog:

2019-11-07  Matthew Malcomson  

* c-attribs.c (handle_no_sanitize_hwaddress_attribute): New
attribute.



### Attachment also inlined for ease of reply###


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 
1c9f28587fbb2348cc30e302e889a5a22906901a..a5e68061ff956018957b6be137a7b2f2b7353647
 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -54,6 +54,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
  static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
  static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_hwaddress_attribute (tree *, tree, tree,
+   int, bool *);
  static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
  static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
{ "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_hwaddress",0, 0, true, false, false, false,
+ handle_no_sanitize_hwaddress_attribute, NULL },
{ "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
{ "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -941,6 +945,22 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
return NULL_TREE;
  }
  
+/* Handle a "no_sanitize_hwaddress" attribute; arguments as in

+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_sanitize_hwaddress_attribute (tree *node, tree name, tree, int,
+ bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+add_no_sanitize_value (*node, SANITIZE_HWADDRESS);
+
+  return NULL_TREE;
+}
+
  /* Handle a "no_sanitize_thread" attribute; arguments as in

Re: [C++ coroutines 6/6] Testsuite.

2019-11-20 Thread Iain Sandoe

Hello JunMa,

JunMa  wrote:

> 在 2019/11/17 下午6:28, Iain Sandoe 写道:

> I find that the patches donot support 'for co_await'. And it is
> quiet simple to implement range based 'for co_await' based on your
> patches, since it's just need few more works on range for source to
> source transform. Any reason for that?

If I understand your question correctly,

for example TS n4775, there was:

[stmt.stmt]
  ….
  for co_await ( for-range-declaration : for-range-initializer ) statement

yes?

This was removed by a later committee resolution, and was *not* merged
to the C++20 Working Draft (I am currently working to n4835).

So, the reason it is not implemented in GCC at present, is that it is not clear
exactly what form it might take if introduced in some future proposal for
enhanced coroutines.

hope that answers the question,
thanks
Iain

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Richard Sandiford wrote:

> Hi,
> 
> Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> tag might have put him off.  There's not really anything SVE-specific
> here apart from the testcase.

Ah.

> > 2019-11-19  Prathamesh Kulkarni  
> >
> > PR tree-optimization/89007
> > * tree-vect-patterns.c (vect_recog_average_pattern): If there is no
> > target support available, generate code to distribute rshift over plus
> > and add one depending upon floor or ceil rounding.
> >
> > testsuite/
> > * gcc.target/aarch64/sve/pr89007.c: New test.
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > new file mode 100644
> > index 000..32095c63c61
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve --save-temps" } 
> > */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#define N 1024
> > +unsigned char dst[N];
> > +unsigned char in1[N];
> > +unsigned char in2[N];
> > +
> > +/*
> > +**  foo: 
> > +** ...
> > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > +** add (z[0-9]+\.b), \1, \2
> > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > +** and (z[0-9]+\.b), \4\.b, #0x1
> > +** add z0.b, \3, \5
> 
> It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> (\3, \5|\5, \3).  Same for the other testcase.
> 
> > +** ...
> > +*/
> > +void
> > +foo ()
> > +{
> > +  for( int x = 0; x < N; x++ )
> > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > new file mode 100644
> > index 000..cc40f45046b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve --save-temps" } 
> > */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#define N 1024
> > +unsigned char dst[N];
> > +unsigned char in1[N];
> > +unsigned char in2[N];
> > +
> > +/*
> > +**  foo: 
> > +** ...
> > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > +** add (z[0-9]+\.b), \1, \2
> > +** and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > +** and (z[0-9]+\.b), \4\.b, #0x1
> > +** add z0.b, \3, \5
> > +** ...
> > +*/
> > +void
> > +foo ()
> > +{
> > +  for( int x = 0; x < N; x++ )
> > +dst[x] = (in1[x] + in2[x]) >> 1;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > index 8ebbcd76b64..7025a3b4dc2 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> > last_stmt_info, tree *type_out)
>  
> >/* Check for target support.  */
> >tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> > -  if (!new_vectype
> > -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> > - OPTIMIZE_FOR_SPEED))
> > +
> > +  if (!new_vectype)
> >  return NULL;
> > 
> > +  bool ifn_supported
> > += direct_internal_fn_supported_p (ifn, new_vectype, 
> > OPTIMIZE_FOR_SPEED);
> > +
> >/* The IR requires a valid vector type for the cast result, even though
> >   it's likely to be discarded.  */
> >*type_out = get_vectype_for_scalar_type (vinfo, type);
> >if (!*type_out)
> >  return NULL;
>  >
> > -  /* Generate the IFN_AVG* call.  */
> >tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> >tree new_ops[2];
> >vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
> >unprom, new_vectype);
> > +
> > +  if (!ifn_supported)
> 
> I guess this is personal preference, but I'm not sure there's much
> benefit to splitting this out of the "if" into a separate variable.
> 
> > +{
> > +  /* If there is no target support available, generate code
> > +to distribute rshift over plus and add one depending
> > +upon floor or ceil rounding.  */
> 
> Maybe "and add a carry"?  It'd be good to write out the expansion in
> pseudo-code too.
> 
> The transform is only valid on unsigned types.  We still need to
> bail out for signed types because the carry would act in the wrong
> direction for negative results.  E.g.:
> 
>   (-3 + 1) >> 1 == -2
>   (-3 >> 1) == -1,  (1 >> 1) == 0,  (-1 & 1) == 1   total == 0
> 
> Handling that case could be future w

Optimize fibonacci heap allocations

2019-11-20 Thread Jan Hubicka

Hi,
since inliner got faster malloc overhead starts to show up in profiles.
This patch eliminates a lot of malloc/free traffic by putting fibonaci
heaps nodes into pool allocators.

It also fixes a silly issue with constant sized vector not being
allocated on stack (which actually seems most important problem here).

I went with pool_allocator rather than object_allocator because I do not
know how to call non-default constructor which of fibonnaci node which
is used by fibonacci_heap::insert.

By default every heap has its own allocator.  We implement (but do not
use) union_with operation for which I needed to add a way to allocate
multiple heaps in one allocator.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fibonacci_heap.h (fibonacci_heap::fibonacci_heap):
Add allocator parameter.
(fibonacci_heap::~fibonacci_heap): Optimize destruction.
(fibonacci_heap::m_allocator): New.
(fibonacci_heap::m_own_allocator): New.
(fibonacci_heap::insert): Use allocator.
(fibonacci_heap::extract_min): Likewise.
(fibonacci_heap::union_with): Assert that both heaps share
allocator.
(fibonacci_heap::consolidate): Allocate constant sized vector
on stack.
* fibonacci_heap.c: Include alloc-pool
(test_empty_heap): Initialize allocator.
(test_union): Likewise.
* bb-reorder.c: Include alloc-pool.h.
* tracer.c: Inlclude alloc-pool.h.
Index: fibonacci_heap.h
===
--- fibonacci_heap.h(revision 278464)
+++ fibonacci_heap.h(working copy)
@@ -145,17 +145,36 @@ class fibonacci_heap
   friend class fibonacci_node;
 
 public:
-  /* Default constructor.  */
-  fibonacci_heap (K global_min_key): m_nodes (0), m_min (NULL), m_root (NULL),
-m_global_min_key (global_min_key)
+  /* Default constructor.  ALLOCATOR is optional and is primarily useful
+ when heaps are going to be merged (in that case they need to be allocated
+ in same alloc pool).  */
+  fibonacci_heap (K global_min_key, pool_allocator *allocator = NULL):
+m_nodes (0), m_min (NULL), m_root (NULL),
+m_global_min_key (global_min_key),
+m_allocator (allocator), m_own_allocator (false)
   {
+if (!m_allocator)
+  {
+m_allocator = new pool_allocator ("Fibonacci heap",
+   sizeof (fibonacci_node_t));
+   m_own_allocator = true;
+  }
   }
 
   /* Destructor.  */
   ~fibonacci_heap ()
   {
-while (m_min != NULL)
-  delete (extract_minimum_node ());
+/* Actual memory will be released by the destructor of m_allocator.  */
+if (need_finalization_p () || !m_own_allocator)
+  while (m_min != NULL)
+   {
+  fibonacci_node_t *n = extract_minimum_node ();
+ n->~fibonacci_node_t ();
+ if (!m_own_allocator)
+   m_allocator->remove (n);
+   }
+if (m_own_allocator)
+  delete m_allocator;
   }
 
   /* Insert new node given by KEY and DATA associated with the key.  */
@@ -259,6 +278,11 @@ private:
   fibonacci_node_t *m_root;
   /* Global minimum given in the heap construction.  */
   K m_global_min_key;
+
+  /* Allocator used to hold nodes.  */
+  pool_allocator *m_allocator;
+  /* True if alocator is owned by the current heap only.  */
+  bool m_own_allocator;
 };
 
 /* Remove fibonacci heap node.  */
@@ -333,7 +357,8 @@ fibonacci_node*
 fibonacci_heap::insert (K key, V *data)
 {
   /* Create the new node.  */
-  fibonacci_node *node = new fibonacci_node_t (key, data);
+  fibonacci_node *node = new (m_allocator->allocate ())
+ fibonacci_node_t (key, data);
 
   return insert_node (node);
 }
@@ -438,7 +463,10 @@ fibonacci_heap::extract_min (bool r
   ret = z->m_data;
 
   if (release)
-delete (z);
+   {
+ z->~fibonacci_node_t ();
+  m_allocator->remove (z);
+   }
 }
 
   return ret;
@@ -474,6 +502,9 @@ fibonacci_heap::union_with (fibonac
 
   fibonacci_node *a_root, *b_root;
 
+  /* Both heaps must share allocator.  */
+  gcc_checking_assert (m_allocator == heapb->m_allocator);
+
   /* If one of the heaps is empty, the union is just the other heap.  */
   if ((a_root = heapa->m_root) == NULL)
 {
@@ -616,8 +647,8 @@ fibonacci_heap::remove_root (fibona
 template
 void fibonacci_heap::consolidate ()
 {
-  int D = 1 + 8 * sizeof (long);
-  auto_vec *> a (D);
+  const int D = 1 + 8 * sizeof (long);
+  auto_vec *, D> a;
   a.safe_grow_cleared (D);
   fibonacci_node *w, *x, *y;
   int i, d;
Index: fibonacci_heap.c
===
--- fibonacci_heap.c(revision 278464)
+++ fibonacci_heap.c(working copy)
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "alloc-pool.h"
 #include "fibonacci_heap.h"
 #include "selftest.h"
 
@@ -38,13 +39,14 @@

Re: Optimize fibonacci heap allocations

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Jan Hubicka wrote:

> Hi,
> since inliner got faster malloc overhead starts to show up in profiles.
> This patch eliminates a lot of malloc/free traffic by putting fibonaci
> heaps nodes into pool allocators.
> 
> It also fixes a silly issue with constant sized vector not being
> allocated on stack (which actually seems most important problem here).
> 
> I went with pool_allocator rather than object_allocator because I do not
> know how to call non-default constructor which of fibonnaci node which
> is used by fibonacci_heap::insert.
> 
> By default every heap has its own allocator.  We implement (but do not
> use) union_with operation for which I needed to add a way to allocate
> multiple heaps in one allocator.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
>   * fibonacci_heap.h (fibonacci_heap::fibonacci_heap):
>   Add allocator parameter.
>   (fibonacci_heap::~fibonacci_heap): Optimize destruction.
>   (fibonacci_heap::m_allocator): New.
>   (fibonacci_heap::m_own_allocator): New.
>   (fibonacci_heap::insert): Use allocator.
>   (fibonacci_heap::extract_min): Likewise.
>   (fibonacci_heap::union_with): Assert that both heaps share
>   allocator.
>   (fibonacci_heap::consolidate): Allocate constant sized vector
>   on stack.
>   * fibonacci_heap.c: Include alloc-pool
>   (test_empty_heap): Initialize allocator.
>   (test_union): Likewise.
>   * bb-reorder.c: Include alloc-pool.h.
>   * tracer.c: Inlclude alloc-pool.h.
> Index: fibonacci_heap.h
> ===
> --- fibonacci_heap.h  (revision 278464)
> +++ fibonacci_heap.h  (working copy)
> @@ -145,17 +145,36 @@ class fibonacci_heap
>friend class fibonacci_node;
>  
>  public:
> -  /* Default constructor.  */
> -  fibonacci_heap (K global_min_key): m_nodes (0), m_min (NULL), m_root 
> (NULL),
> -m_global_min_key (global_min_key)
> +  /* Default constructor.  ALLOCATOR is optional and is primarily useful
> + when heaps are going to be merged (in that case they need to be 
> allocated
> + in same alloc pool).  */
> +  fibonacci_heap (K global_min_key, pool_allocator *allocator = NULL):
> +m_nodes (0), m_min (NULL), m_root (NULL),
> +m_global_min_key (global_min_key),
> +m_allocator (allocator), m_own_allocator (false)
>{
> +if (!m_allocator)
> +  {
> +m_allocator = new pool_allocator ("Fibonacci heap",
> + sizeof (fibonacci_node_t));
> + m_own_allocator = true;
> +  }
>}
>  
>/* Destructor.  */
>~fibonacci_heap ()
>{
> -while (m_min != NULL)
> -  delete (extract_minimum_node ());
> +/* Actual memory will be released by the destructor of m_allocator.  */
> +if (need_finalization_p () || !m_own_allocator)
> +  while (m_min != NULL)
> + {
> +  fibonacci_node_t *n = extract_minimum_node ();
> +   n->~fibonacci_node_t ();
> +   if (!m_own_allocator)
> + m_allocator->remove (n);
> + }
> +if (m_own_allocator)
> +  delete m_allocator;
>}
>  
>/* Insert new node given by KEY and DATA associated with the key.  */
> @@ -259,6 +278,11 @@ private:
>fibonacci_node_t *m_root;
>/* Global minimum given in the heap construction.  */
>K m_global_min_key;
> +
> +  /* Allocator used to hold nodes.  */
> +  pool_allocator *m_allocator;
> +  /* True if alocator is owned by the current heap only.  */
> +  bool m_own_allocator;
>  };
>  
>  /* Remove fibonacci heap node.  */
> @@ -333,7 +357,8 @@ fibonacci_node*
>  fibonacci_heap::insert (K key, V *data)
>  {
>/* Create the new node.  */
> -  fibonacci_node *node = new fibonacci_node_t (key, data);
> +  fibonacci_node *node = new (m_allocator->allocate ())
> +   fibonacci_node_t (key, data);
>  
>return insert_node (node);
>  }
> @@ -438,7 +463,10 @@ fibonacci_heap::extract_min (bool r
>ret = z->m_data;
>  
>if (release)
> -delete (z);
> + {
> +   z->~fibonacci_node_t ();
> +  m_allocator->remove (z);
> + }

tab/space mixup here

>  }
>  
>return ret;
> @@ -474,6 +502,9 @@ fibonacci_heap::union_with (fibonac
>  
>fibonacci_node *a_root, *b_root;
>  
> +  /* Both heaps must share allocator.  */
> +  gcc_checking_assert (m_allocator == heapb->m_allocator);
> +
>/* If one of the heaps is empty, the union is just the other heap.  */
>if ((a_root = heapa->m_root) == NULL)
>  {
> @@ -616,8 +647,8 @@ fibonacci_heap::remove_root (fibona
>  template
>  void fibonacci_heap::consolidate ()
>  {
> -  int D = 1 + 8 * sizeof (long);
> -  auto_vec *> a (D);
> +  const int D = 1 + 8 * sizeof (long);
> +  auto_vec *, D> a;
>a.safe_grow_cleared (D);

this can now use a.quick_grow_cleared (D)

Ok with those changes.

Thanks,
Richard.

>fibonacci_node *w, *x, *y;
>int i, d;
> Index: fibonacci_heap

Stack allocate DFS::scc_stack and DFS::worklist

2019-11-20 Thread Jan Hubicka

Hi,
another common (ab)use of malloc/free is the DEF worklist and stack
used in streaming out.  Since most of SCCs are small, we could easily
use stack for them in majority of time.

Bootstrapped/regtested x86_64-linux, OK?

* lto-streamer-out.c (DFS::sccstack): Turn into auto-vec;
preallocate for 32 entries.
(DFS::worklist_vec): Likewise.
(DFS::DFS): Do not initialize sccstack and worklist.
(DFS::~DFS): Do not release sccstack.

Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 278464)
+++ lto-streamer-out.c  (working copy)
@@ -514,7 +514,7 @@ public:
 tree t;
 hashval_t hash;
   };
-  vec sccstack;
+  auto_vec sccstack;
 
 private:
   struct sccs
@@ -544,7 +544,7 @@ private:
bool ref_p, bool this_ref_p);
 
   hash_map sccstate;
-  vec worklist_vec;
+  auto_vec worklist_vec;
   struct obstack sccstate_obstack;
 };
 
@@ -558,9 +558,7 @@ DFS::DFS (struct output_block *ob, tree
  bool single_p)
 {
   unsigned int next_dfs_num = 1;
-  sccstack.create (0);
   gcc_obstack_init (&sccstate_obstack);
-  worklist_vec = vNULL;
   DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
   while (!worklist_vec.is_empty ())
 {
@@ -735,12 +733,10 @@ DFS::DFS (struct output_block *ob, tree
from_state->low = MIN (cstate->dfsnum, from_state->low);
   worklist_vec.pop ();
 }
-  worklist_vec.release ();
 }
 
 DFS::~DFS ()
 {
-  sccstack.release ();
   obstack_free (&sccstate_obstack, NULL);
 }

Re: Record leader nodes in the SLP graph (PR92516)

2019-11-20 Thread Richard Sandiford

Richard Biener  writes:
> On Sat, 16 Nov 2019, Richard Sandiford wrote:
>> If stmt analysis failed for an SLP node, r278246 tried building the
>> node from scalars instead.  By that point we've already processed child
>> nodes (usually successfully) and might have made some of them the lead
>> nodes for their stmt list.  This is why we couldn't free the child nodes
>> when deciding to build from scalars:
>> 
>>   /* Don't remove and free the child nodes here, since they could be
>>  referenced by other structures.  The analysis and scheduling phases
>>  (need to) ignore child nodes of anything that isn't vect_internal_def.  
>> */
>> 
>> The problem in this PR is that we (correctly) don't process the unused
>> child nodes again during the scheduling phase, which means that those
>> nodes won't become the leader again.  So some other node with same stmts
>> becomes the leader instead.  However, any internal-def child nodes of
>> this new leader won't have been processed during the analysis phase,
>> because we also (correctly) skip child nodes of non-leader nodes.
>> 
>> We talked on IRC about fixing this by sharing nodes between SLP
>> instances, so that it becomes a "proper" graph.  But that seems
>> like it could throw up some problems too, so I wanted to fix the
>> PR in a less invasive way first.
>> 
>> This patch therefore records the leader nodes during the analysis
>> phase and reuses that choice during scheduling.  When scheduling
>> a non-leader node, we first try to schedule the leader node and
>> then reuse its vector stmts.  At the moment, doing this means we
>> need to know the leader node's SLP instance, so the patch records
>> that in the SLP node too.
>> 
>> While there, I noticed that the two-operation handling returned
>> early without restoring the original def types.  That's really
>> an independent bug fix.
>
> Can you split that out please?

Sure, done as below.  Tested on aarch64-linux-gnu and x86_64-linux-gnu.
OK to install?

> For the rest I prefer the following which I am now fully testing
> on x86_64-unknown-linux-gnu (vect.exp testing went fine).  As
> discussed on IRC this makes the SLP graph nodes shared between
> SLP instances (which it has in fact been, just "virtual" via
> all the leader computations).  So SLP instances are now just
> the collection of entries into the directed SLP graph.

Looks great.  Obviously I chickened out too early :-)

Thanks,
Richard


2019-11-19  Richard Sandiford  

gcc/
* tree-vect-slp.c (vect_schedule_slp_instance): Restore stmt
def types for two-operation SLP.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2019-11-20 09:48:57.145101295 +
+++ gcc/tree-vect-slp.c 2019-11-20 12:11:56.818216093 +
@@ -4165,6 +4165,7 @@ vect_schedule_slp_instance (slp_tree nod
 
   /* Handle two-operation SLP nodes by vectorizing the group with
  both operations and then performing a merge.  */
+  bool done_p = false;
   if (SLP_TREE_TWO_OPERATORS (node))
 {
   gassign *stmt = as_a  (stmt_info->stmt);
@@ -4235,10 +4236,11 @@ vect_schedule_slp_instance (slp_tree nod
}
  v0.release ();
  v1.release ();
- return;
+ done_p = true;
}
 }
-  vect_transform_stmt (stmt_info, &si, node, instance);
+  if (!done_p)
+vect_transform_stmt (stmt_info, &si, node, instance);
 
   /* Restore stmt def-types.  */
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)

Add more markup to vect-alias-check-{1,18}.c (PR 92543)

2019-11-20 Thread Richard Sandiford

In vect-alias-check-1.c we unroll the inner loop and then vectorise
the stores at a[c + 1][b].  Since the access has no guaranteed
alignemnt, we need a realignment mechanism or support for unaligned
accesses in order to vectorise.

In vect-alias-check-18.c we use a reverse access and so need
permute support in order to vectorise.

I'm not really sure when this part of the testsuite prefers
{ xfail { ! foo } } and when it prefers { target foo }.  xfail
seems like the most common choice for the alignment restriction,
whereas vect_int and vect_perm are mostly dg-require-effective-target
style features, so I went with that combination.

Tested on aarch64-linux-gnu, x86_64-linux-gnu, powerpc64-linux-gnu
(Power 7) and sparc-sun-solaris2.11.  OK to install?

Richard


2019-11-20  Richard Sandiford  

gcc/testsuite/
PR testsuite/92543
* gcc.dg/vect/vect-alias-check-1.c: XFAIL the alias check message
if there is no realignment support and no support for unaligned
accesses.
* gcc.dg/vect/vect-alias-check-18.c: Restrict the test for the
alias message to targets that have permute support.

Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c  2019-11-19 
16:25:49.0 +
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c  2019-11-20 
12:35:16.140748524 +
@@ -15,5 +15,5 @@ fn1 ()
 }
 
 /* { dg-final { scan-tree-dump "improved number of alias checks from \[0-9\]* 
to 1" "vect" } } */
-/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } 
} */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" { 
xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
 /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
===
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-19 
16:25:49.0 +
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-20 
12:35:16.140748524 +
@@ -60,5 +60,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } 
*/
-/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } 
*/
+/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" { 
target { vect_int && vect_perm } } } } */
 /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */

Adjust expected output for bb-slp-21.c (PR 92527)

2019-11-20 Thread Richard Sandiford

After r278246, we can try building the out[] store value from scalars
if the target has no multiplication support.  That's not necessarily
a good thing, but like most of vect/, this test is run with the cost
model disabled.

Tested on aarch64-linux-gnu, x86_64-linux-gnu, powerpc64-linux-gnu
(Power 7) and sparc-sun-solaris2.11.  OK to install?

Richard


2019-11-20  Richard Sandiford  

gcc/testsuite/
PR testsuite/92527
* gcc.dg/vect/bb-slp-21.c: Expect both SLP groups to be vectorized,
regardless of whether the target supports multiplication.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-21.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-21.c   2019-11-20 12:46:19.0 
+
+++ gcc/testsuite/gcc.dg/vect/bb-slp-21.c   2019-11-20 12:46:19.940266697 
+
@@ -64,6 +64,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp2" { 
target { ! {vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp2" { 
target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp2" } 
} */

Re: RFA; patch to fix PR90007

2019-11-20 Thread Alexander Monakov

On Wed, 20 Nov 2019, Richard Biener wrote:

> On Tue, Nov 19, 2019 at 5:07 PM Vladimir Makarov  wrote:
> >
> > The following patch fixes
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007
> >
> > Sometime ago a code which permits LRA to reload hard register into
> > memory as it did for pseudo were added.  But this LRA possibility was
> > not reflected in recog.c.  The following patch fixes this discrepancy
> > and as a result fixes PR90007.
> >
> > OK to commit?
> 
> I guess the change is OK but for the bug itself it sounds like
> selective scheduling doesn't properly recognize insns it
> proagates into (and avoids doing that then)?  That is,
> selective scheduling creates invalid RTL?

We validate the substitution by invoking validate_replace_rtx_part_nosimplify
from substitute_reg_in_expr.  I think that should be sufficient?  I see similar
code in haifa-sched uses attempt_change, which also ultimately uses
apply_change_group.

FWIW, here's gdb session transcript showing that the substituted insn is
greenlighted:

Breakpoint 1, substitute_reg_in_expr (expr=0x248ee98, insn=0x7791b880, 
undo=false) at /home/monoid/gcc-vecdoc/gcc/sel-sched.c:743
743   bool has_rhs = VINSN_RHS (*vi) != NULL;
(gdb) call debug_expr(expr)
[((17;r95=flt(r98);)type:set;count:3;)prio:8;orig_bb:2;]
(gdb) call debug_insn(insn)
([((35;r98=r8;)type:use;count:1;)prio:9;orig_bb:2;];seqno:2;)
(gdb) b validate_replace_rtx_part_nosimplify
Breakpoint 2 at 0xcc1cfb: file /home/monoid/gcc-vecdoc/gcc/recog.c, line 835.
(gdb) c
Continuing.

Breakpoint 2, validate_replace_rtx_part_nosimplify 
(from=from@entry=0x77aa2588, to=to@entry=0x77a9acd8, 
where=0x77aa2d00, insn=insn@entry=0x7791b940) at 
/home/monoid/gcc-vecdoc/gcc/recog.c:835
835   validate_replace_rtx_1 (where, from, to, insn, false);
(gdb) call debug_rtx(from)
(reg:SI 98)
(gdb) call debug_rtx(to)
(reg:SI 36 r8 [ e7 ])
(gdb) call debug_rtx(*where)
(float:DF (reg:SI 98))
(gdb) call debug_rtx(insn)
(insn 39 0 0 (set (reg:DF 95)
(float:DF (reg:SI 98))) 167 {*floatsidf2}
 (expr_list:REG_DEAD (reg:SI 98)
(nil)))
(gdb) fin
Run till exit from #0  validate_replace_rtx_part_nosimplify 
(from=from@entry=0x77aa2588, to=to@entry=0x77a9acd8, 
where=0x77aa2d00, insn=insn@entry=0x7791b940) at 
/home/monoid/gcc-vecdoc/gcc/recog.c:835
substitute_reg_in_expr (expr=0x248ee98, insn=, undo=) at /home/monoid/gcc-vecdoc/gcc/sel-sched.c:783
783   if (new_insn_valid)
Value returned is $1 = 1
(gdb) call debug_rtx(new_insn)
(insn 39 0 0 (set (reg:DF 95)
(float:DF (reg:SI 36 r8 [ e7 ]))) 167 {*floatsidf2}
 (expr_list:REG_DEAD (reg:SI 98)
(nil)))

Alexander

[Patch][OpenMP/OpenACC/Fortran] Fix mapping of optional (present|absent) arguments

2019-11-20 Thread Tobias Burnus

This patch does two things regarding explicit and automatical variable 
mapping to offloaded devices:


* Fixes bugs with optional arguments, which are present. They were 
mapped but the mapping had issues causing run-time failures.

* It now also handles absent optional arguments.

Compared to the previous patch set,** I added several OpenMP test cases 
– and fixed the fallout.


Except for trivial changes to libgomp/oacc-mem.c and omp-low.c, all 
changes are in fortran/trans-openmp.c and only affect optional arguments.


The patch was bootstrapped and tested on x86_64-gnu-linux w/o 
offloading-support configured and with nvptx offloading.


Tobias

** Included in the attached patch are the following previously posted 
patches: [1] the trivial libgomp/oacc-mem.c change, [2] only the 
remaining single-line change in omp-low.c, [3] the trans-openmp.c 
changes (which had to be modified+extended), and [5] the test cases. 
([2] and [4] are already in GCC 10.) See: 
https://gcc.gnu.org/ml/gcc-patches/2019-07/threads.html#00960 for the 
original patches.


PS: For full OpenMP support, (absent) optional arguments also needed to 
be handled for data-share clauses.


2019-10-20  Tobias Burnus  
	Kwok Cheung Yeung 

	gcc/fortran/
	* trans-openmp.c (gfc_build_conditional_assign, 
	gfc_build_conditional_assign_expr): New static functions.
	(gfc_omp_finish_clause, gfc_trans_omp_clauses): Handle mapping of
	absent optional arguments and fix mapping of present optional args.

	gcc/
	* omp-low.c (lower_omp_target): For optional arguments, deref once
	more to obtain the type.

	libgomp/
	* oacc-mem.c (update_dev_host, gomp_acc_insert_pointer): Just return
	if input it a NULL pointer.
	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Remove; dependent on
	diagnostic of NULL pointer.
	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Ditto.
	* testsuite/libgomp.fortran/optional-map.f90: New.
	* testsuite/libgomp.fortran/use_device_addr-1.f90
	(test_dummy_opt_callee_1_absent): New.
	(test_dummy_opt_call_1): Call it.
	* testsuite/libgomp.fortran/use_device_addr-2.f90: Likewise.
	* testsuite/libgomp.fortran/use_device_addr-3.f90: Likewise.
	* testsuite/libgomp.fortran/use_device_addr-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-cache.f95: New.
	* testsuite/libgomp.oacc-fortran/optional-data-copyin-by-value.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-data-copyin.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-data-copyout.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-data-enter-exit.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-declare.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-firstprivate.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-host_data.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-nested-calls.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-private.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-update-device.f90: New.
	* testsuite/libgomp.oacc-fortran/optional-update-host.f90: New.

 gcc/fortran/trans-openmp.c   | 224 ---
 gcc/omp-low.c|   3 +-
 libgomp/oacc-mem.c   |   9 ++
 libgomp/testsuite/libgomp.fortran/optional-map.f90   | 119 +
 libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90  |  36 +++
 libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90  |  36 +++
 libgomp/testsuite/libgomp.fortran/use_device_addr-3.f90  |  27 +
 libgomp/testsuite/libgomp.fortran/use_device_addr-4.f90  |  27 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c |  51 -
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c |  49 -
 libgomp/testsuite/libgomp.oacc-fortran/optional-cache.f95|  23 
 libgomp/testsuite/libgomp.oacc-fortran/optional-data-copyin-by-value.f90 |  29 +
 libgomp/testsuite/libgomp.oacc-fortran/optional-data-copyin.f90  | 140 
 libgomp/testsuite/libgomp.oacc-fortran/optional-data-copyout.f90 |  96 +
 libgomp/testsuite/libgomp.oacc-fortran/optional-data-enter-exit.f90  |  91 
 libgomp/testsuite/libgomp.oacc-fortran/optional-declare.f90  |  87 +++
 libgomp/testsuite/libgomp.oacc-fortran/optional-firstprivate.f90 | 112 
 libgomp/testsuite/libgomp.oacc-fortran/optional-host_data.f90|  39 +++
 libgomp/testsuite/libgomp.oacc-fortran/optional-nested-calls.f90 | 135 +++
 libgomp/testsuite/libgomp.oacc-fortran/optional-private.f90  | 115 
 libgomp/testsuite/libgomp.oacc-fortran/optional-reductio

Re: [C++ coroutines 6/6] Testsuite.

2019-11-20 Thread JunMa


在 2019/11/20 下午7:22, Iain Sandoe 写道:

Hello JunMa,

JunMa  wrote:


在 2019/11/17 下午6:28, Iain Sandoe 写道:
I find that the patches donot support 'for co_await'. And it is
quiet simple to implement range based 'for co_await' based on your
patches, since it's just need few more works on range for source to
source transform. Any reason for that?

If I understand your question correctly,

for example TS n4775, there was:

[stmt.stmt]
   ….
   for co_await ( for-range-declaration : for-range-initializer ) statement

yes?

This was removed by a later committee resolution, and was *not* merged
to the C++20 Working Draft (I am currently working to n4835).

So, the reason it is not implemented in GCC at present, is that it is not clear
exactly what form it might take if introduced in some future proposal for
enhanced coroutines.

hope that answers the question,
thanks
Iain

Hi Iain,
    Thanks, that make sense.

Regards
JunMa

[PATCH] rs6000: Fix UNORDERED without NaNs, for DFP (PR92573)

2019-11-20 Thread Segher Boessenkool

This is the analogue of r278103, but for DFP.

Committing to trunk.


Segher


2019-11-20  Segher Boessenkool  

PR target/92573
* config/rs6000/dfp.md (dfptstsfi__ for DFP_TEST and DDTD):
Handle UNORDERED if !HONOR_NANS.

---
 gcc/config/rs6000/dfp.md | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 659b3c9..687858c 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -289,6 +289,12 @@ (define_expand "dfptstsfi__"
   ]
   "TARGET_P9_MISC"
 {
+  if ( == UNORDERED && !HONOR_NANS (mode))
+{
+  emit_move_insn (operands[0], const0_rtx);
+  DONE;
+}
+
   operands[3] = gen_reg_rtx (CCFPmode);
 })
 
-- 
1.8.3.1

Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-20 Thread Mikael Pettersson

On Mon, Nov 18, 2019 at 9:57 PM Mikael Pettersson  wrote:
>
> On Mon, Nov 18, 2019 at 8:31 PM Bernd Schmidt  wrote:
> >
> > Hi Mikael,
> >
> > > This fixed the problem, thanks.
> >
> > Could you also run the testsuite to see if you can reproduce the
> > g++.old-deja failures Andreas reported?
>
> Yes, but it will probably take another week before the native
> bootstrap (on aranym) and test suite run is finished.  It's currently
> in stage 2.

Failed later in stage 2:

/mnt/scratch/objdir10/./gcc/xgcc -B/mnt/scratch/objdir10/./gcc/
-B/mnt/scratch/install10/m68k-unknown-linux-gnu/bin/
-B/mnt/scratch/install10/m68k-unknown-linux-gnu/lib/ -isystem
/mnt/scratch/install10/m68k-unknown-linux-gnu/include -isystem
/mnt/scratch/install10/m68k-unknown-linux-gnu/sys-include
-fno-checking -g -O2 -O2  -g -O2 -DIN_GCC-W -Wall -Wno-narrowing
-Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wstrict-prototypes
-Wmissing-prototypes -Wno-error=format-diag -Wold-style-definition
-isystem ./include   -fPIC -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector   -fPIC -I. -I. -I../.././gcc
-I/mnt/scratch/gcc-10-20191110/libgcc
-I/mnt/scratch/gcc-10-20191110/libgcc/.
-I/mnt/scratch/gcc-10-20191110/libgcc/../gcc
-I/mnt/scratch/gcc-10-20191110/libgcc/../include  -DHAVE_CC_TLS  -o
_powixf2.o -MT _powixf2.o -MD -MP -MF _powixf2.dep -DL_powixf2 -c
/mnt/scratch/gcc-10-20191110/libgcc/libgcc2.c -fvisibility=hidden
-DHIDE_EXPORTS
/mnt/scratch/gcc-10-20191110/libgcc/libgcc2.c: In function '__powixf2':
/mnt/scratch/gcc-10-20191110/libgcc/libgcc2.c:1888:1: error:
unrecognizable insn:
 1888 | }
  | ^
(insn 116 115 117 13 (parallel [
(set (reg/f:SI 15 %sp)
(plus:SI (reg/f:SI 15 %sp)
(const_int -12 [0xfff4])))
(set (reg:XF 18 %fp2)
(mem/c:XF (reg/f:SI 15 %sp) [3  S12 A8]))
]) "/mnt/scratch/gcc-10-20191110/libgcc/libgcc2.c":1888:1 -1
 (nil))
during RTL pass: cprop_hardreg
/mnt/scratch/gcc-10-20191110/libgcc/libgcc2.c:1888:1: internal
compiler error: in extract_insn, at recog.c:2294
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
Makefile:500: recipe for target '_powixf2.o' failed
make[3]: *** [_powixf2.o] Error 1
make[3]: Leaving directory '/mnt/scratch/objdir10/m68k-unknown-linux-gnu/libgcc'
Makefile:16905: recipe for target 'all-stage2-target-libgcc' failed
make[2]: *** [all-stage2-target-libgcc] Error 2
make[2]: Leaving directory '/mnt/scratch/objdir10'
Makefile:20204: recipe for target 'stage2-bubble' failed
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory '/mnt/scratch/objdir10'
Makefile:20399: recipe for target 'bootstrap' failed
make: *** [bootstrap] Error 2

I'll try to reduce it to a test case later today.

Re: Stack allocate DFS::scc_stack and DFS::worklist

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Jan Hubicka wrote:

> Hi,
> another common (ab)use of malloc/free is the DEF worklist and stack
> used in streaming out.  Since most of SCCs are small, we could easily
> use stack for them in majority of time.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

>   * lto-streamer-out.c (DFS::sccstack): Turn into auto-vec;
>   preallocate for 32 entries.
>   (DFS::worklist_vec): Likewise.
>   (DFS::DFS): Do not initialize sccstack and worklist.
>   (DFS::~DFS): Do not release sccstack.
> 
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 278464)
> +++ lto-streamer-out.c(working copy)
> @@ -514,7 +514,7 @@ public:
>  tree t;
>  hashval_t hash;
>};
> -  vec sccstack;
> +  auto_vec sccstack;
>  
>  private:
>struct sccs
> @@ -544,7 +544,7 @@ private:
>   bool ref_p, bool this_ref_p);
>  
>hash_map sccstate;
> -  vec worklist_vec;
> +  auto_vec worklist_vec;
>struct obstack sccstate_obstack;
>  };
>  
> @@ -558,9 +558,7 @@ DFS::DFS (struct output_block *ob, tree
> bool single_p)
>  {
>unsigned int next_dfs_num = 1;
> -  sccstack.create (0);
>gcc_obstack_init (&sccstate_obstack);
> -  worklist_vec = vNULL;
>DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
>while (!worklist_vec.is_empty ())
>  {
> @@ -735,12 +733,10 @@ DFS::DFS (struct output_block *ob, tree
>   from_state->low = MIN (cstate->dfsnum, from_state->low);
>worklist_vec.pop ();
>  }
> -  worklist_vec.release ();
>  }
>  
>  DFS::~DFS ()
>  {
> -  sccstack.release ();
>obstack_free (&sccstate_obstack, NULL);
>  }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: Record leader nodes in the SLP graph (PR92516)

2019-11-20 Thread Richard Biener

On Wed, 20 Nov 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Sat, 16 Nov 2019, Richard Sandiford wrote:
> >> If stmt analysis failed for an SLP node, r278246 tried building the
> >> node from scalars instead.  By that point we've already processed child
> >> nodes (usually successfully) and might have made some of them the lead
> >> nodes for their stmt list.  This is why we couldn't free the child nodes
> >> when deciding to build from scalars:
> >> 
> >>   /* Don't remove and free the child nodes here, since they could be
> >>  referenced by other structures.  The analysis and scheduling phases
> >>  (need to) ignore child nodes of anything that isn't 
> >> vect_internal_def.  */
> >> 
> >> The problem in this PR is that we (correctly) don't process the unused
> >> child nodes again during the scheduling phase, which means that those
> >> nodes won't become the leader again.  So some other node with same stmts
> >> becomes the leader instead.  However, any internal-def child nodes of
> >> this new leader won't have been processed during the analysis phase,
> >> because we also (correctly) skip child nodes of non-leader nodes.
> >> 
> >> We talked on IRC about fixing this by sharing nodes between SLP
> >> instances, so that it becomes a "proper" graph.  But that seems
> >> like it could throw up some problems too, so I wanted to fix the
> >> PR in a less invasive way first.
> >> 
> >> This patch therefore records the leader nodes during the analysis
> >> phase and reuses that choice during scheduling.  When scheduling
> >> a non-leader node, we first try to schedule the leader node and
> >> then reuse its vector stmts.  At the moment, doing this means we
> >> need to know the leader node's SLP instance, so the patch records
> >> that in the SLP node too.
> >> 
> >> While there, I noticed that the two-operation handling returned
> >> early without restoring the original def types.  That's really
> >> an independent bug fix.
> >
> > Can you split that out please?
> 
> Sure, done as below.  Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> OK to install?

OK.

Thanks,
Richard.
 
> > For the rest I prefer the following which I am now fully testing
> > on x86_64-unknown-linux-gnu (vect.exp testing went fine).  As
> > discussed on IRC this makes the SLP graph nodes shared between
> > SLP instances (which it has in fact been, just "virtual" via
> > all the leader computations).  So SLP instances are now just
> > the collection of entries into the directed SLP graph.
> 
> Looks great.  Obviously I chickened out too early :-)
> 
> Thanks,
> Richard
> 
> 
> 2019-11-19  Richard Sandiford  
> 
> gcc/
>   * tree-vect-slp.c (vect_schedule_slp_instance): Restore stmt
>   def types for two-operation SLP.
> 
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c   2019-11-20 09:48:57.145101295 +
> +++ gcc/tree-vect-slp.c   2019-11-20 12:11:56.818216093 +
> @@ -4165,6 +4165,7 @@ vect_schedule_slp_instance (slp_tree nod
>  
>/* Handle two-operation SLP nodes by vectorizing the group with
>   both operations and then performing a merge.  */
> +  bool done_p = false;
>if (SLP_TREE_TWO_OPERATORS (node))
>  {
>gassign *stmt = as_a  (stmt_info->stmt);
> @@ -4235,10 +4236,11 @@ vect_schedule_slp_instance (slp_tree nod
>   }
> v0.release ();
> v1.release ();
> -   return;
> +   done_p = true;
>   }
>  }
> -  vect_transform_stmt (stmt_info, &si, node, instance);
> +  if (!done_p)
> +vect_transform_stmt (stmt_info, &si, node, instance);
>  
>/* Restore stmt def-types.  */
>FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH v3] Extend the simd function attribute

2019-11-20 Thread Szabolcs Nagy

On 14/11/2019 20:23, Szabolcs Nagy wrote:
> Sorry v2 had a bug.
> 
> v2: added documentation and tests.
> v3: fixed expand_simd_clones so different isa variants are actually
> generated.
> 
> GCC currently supports two ways to declare the availability of vector
> variants of a scalar function:
> 
>   #pragma omp declare simd
>   void f (void);
> 
> and
> 
>   __attribute__ ((simd))
>   void f (void);
> 
> However these declare a set of symbols that are different simd variants
> of f, so a library either provides definitions for all those symbols or
> it cannot use these declarations. (The set of declared symbols can be
> narrowed down with additional omp clauses, but not enough to allow
> declaring a single symbol. OpenMP 5 has a declare variant feature that
> allows declaring more specific simd variants, but it is complicated and
> still requires gcc or vendor extension for unambiguous declarations.)
> 
> This patch extends the gcc specific simd attribute such that it can
> specify a single vector variant of simple scalar functions (functions
> that only take and return scalar integer or floating type values):
> 
>   __attribute__ ((simd (mask, simdlen, simdabi, name

ping.

so the comments so far

- make all attribute arguments mandatory (e.g don't allow
  simd(mask, simdlen)), this is fine with me if others agree.

- support the linear clause for pointer arguments (sincos).
  this requires listing arguments to which linear applies,
  i would only want to do that if there is a hope that it
  will ever be useful (currently gcc won't vectorize calls
  with pointer arguments, but maybe it should?). i don't know
  of a precedent for "list of integers" used in the attribute
  syntax, so i wonder what's the right way to do it.

- plain simd should have fixed abi for a given architecture:
  aarch64 can of course do this, but if we include sve, then
  libmvec with plain simd attr won't be testable with gcc-10
  since gcc-10 does not support simd attr for sve, so we still
  need the attribute extension to do work on libmvec.

any more comments on supporting linear clause in simd attr?
or if the posted patch is reasonable as is?

> 
> where mask is "inbranch" or "notinbranch" like now, simdlen is an int
> with the same meaning as in omp declare simd and simdabi is a string
> specifying the call ABI (which the intel vector ABI calls ISA). The
> name is optional and allows a library to use a different symbol name
> than what the vector ABI specifies.
> 
> The simd attribute currently can be used for both declarations and
> definitions, in the latter case the simd varaints of the function are
> generated, which should work with the extended simd attribute too.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> gcc/ChangeLog:
> 
> 2019-11-14  Szabolcs Nagy  
> 
>   * cgraph.h (struct cgraph_simd_clone): Add simdname field.
>   * doc/extend.texi: Update the simd attribute documentation.
>   * tree.h (OMP_CLAUSE__SIMDABI__EXPR): Define.
>   (OMP_CLAUSE__SIMDNAME__EXPR): Define.
>   * tree.c (walk_tree_1): Handle new omp clauses.
>   * tree-core.h (enum omp_clause_code): Likewise.
>   * tree-nested.c (convert_nonlocal_omp_clauses): Likewise.
>   * tree-pretty-print.c (dump_omp_clause): Likewise.
>   * omp-low.c (scan_sharing_clauses): Likewise.
>   * omp-simd-clone.c (simd_clone_clauses_extract): Likewise.
>   (simd_clone_mangle): Handle simdname.
>   (expand_simd_clones): Reset vecsize_mangle when generating clones.
>   * config/aarch64/aarch64.c
>   (aarch64_simd_clone_compute_vecsize_and_simdlen): Warn about
>   unsupported SIMD ABI.
>   * config/i386/i386.c
>   (ix86_simd_clone_compute_vecsize_and_simdlen): Likewise.
> 
> gcc/c-family/ChangeLog:
> 
> 2019-11-14  Szabolcs Nagy  
> 
>   * c-attribs.c (handle_simd_attribute): Handle 4 arguments.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-14  Szabolcs Nagy  
> 
>   * c-c++-common/attr-simd-5.c: Update.
>   * c-c++-common/attr-simd-6.c: New test.
>   * c-c++-common/attr-simd-7.c: New test.
>   * c-c++-common/attr-simd-8.c: New test.
>

Re: [PATCH 5/X] [libsanitizer][mid-end] Introduce stack variable handling for HWASAN

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

Handling stack variables has three features.

1) Ensure HWASAN required alignment for stack variables

When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.

This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
variable as an alignment boundary between the end and the start of any
other data stored on the stack.

This patch ensures that by adding alignment requirements in
`align_local_variable` and forcing all stack variable allocation to be
deferred so that `expand_stack_vars` can ensure the stack pointer is
aligned before allocating any variable for the current frame.

2) Put tags into each stack variable pointer

Make sure that every pointer to a stack variable includes a tag of some
sort on it.

The way tagging works is:
   1) For every new stack frame, a random tag is generated.
   2) A base register is formed from the stack pointer value and this
  random tag.
   3) References to stack variables are now formed with RTL describing an
  offset from this base in both tag and value.

The random tag generation is handled by a backend hook.  This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag.  Using the stack
background is necessary for testing and bootstrap.  It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.

Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.

The tag&value offsets are also handled by a backend hook.

This patch also adds some macros defining how the HWASAN shadow memory
is stored and how a tag is stored in a pointer.

3) For each stack variable, tag and untag the shadow stack on function
prologue and epilogue.

On entry to each function we tag the relevant shadow stack region for
each stack variable the tag to match the tag added to each pointer for
that variable.

This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce.  This instrumentation is done
in `compile_file`.

When exiting a function we need to ensure the shadow stack for this
function has no remaining tag.  Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.

Hence we ensure that the entire stack frame is cleared on function exit.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (hwasan_record_base): New function.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New function.
(hwasan_with_tag): New function.
(hwasan_tag_init): New function.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_tag): New.
(hwasan_emit_prologue): New.
(hwasan_create_untagged_base): New.
(hwasan_finish_file): New.
(hwasan_sanitize_stack_p): New.
(memory_tagging_p): New.
* asan.h (hwasan_record_base): New declaration.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New declaration.
(hwasan_with_tag): New declaration.
(hwasan_sanitize_stack_p): New declaration.
(hwasan_tag_init): New declaration.
(memory_tagging_p): New declaration.
(HWASAN_TAG_SIZE): New macro.
(HWASAN_TAG_GRANULE_SIZE):New macro.
(HWASAN_SHIFT):New macro.
(HWASAN_SHIFT_RTX):New macro.
(HWASAN_STACK_BACKGROUND):New macro.
(hwasan_finish_file): New.
(hwasan_current_tag): New.
(hwasan_create_untagged_base): New.
(hwasan_emit_prologue): New.
* cfgexpand.c (struct stack_vars_data): Add information to
record hwasan variable stack offsets.
(expand_stack_vars): Ensure variables are offset from a tagged
base. Record offsets for hwasan. Ensure alignment.
(expand_used_vars): Call function to emit prologue, and get
untagging instructions for function exit.
(align_local_variable): Ensure alignment.
(defer_stack_allocation): Ensure all variables are deferred so
they can be handled by `expand_stack_vars`.
(expand_one_stack_var_at): Account for tags in
variables when using HWASAN.
(expand_one_stack_var_1): Pass new argument to
expand_one_stack_var_at.
(init_vars_expansion): Initialise hwasan internal variables when
starting variable expansion.
* doc/tm.texi (TARGET_MEMTAG_GENTAG): Document.

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-20 Thread Segher Boessenkool

Hi!

On Wed, Nov 20, 2019 at 03:31:36PM +0800, Kewen.Lin wrote:
> >> +(define_expand "vector_ge"
> >> +  [(set (match_operand:VEC_F 0 "vlogical_operand")
> >> +  (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
> >> +(match_operand:VEC_F 2 "vlogical_operand")))]
> >>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> >> +  "")
> > 
> > This already exists?  Line 764 in vector.md?
> > 
> 
> Yes, that one is only for VEC_F (aka. vector FP), move to here to centralize
> them.  Does it make sense?

Ah, I see.  Yeah, that makes sense.

> > Does this work for (say) ungt?  I would do two switch statements: the
> > first only to do the invert, and then the second to do the swap (with
> > the modified cond!)  So:
> 
> Yes, it works but it has to call further splitting for "le".  It looks
> better to split to the end at once.  Thanks a lot for your example!

Yes, exactly.  The code is simpler in this case, and more importantly,
it is clearer that the code the splitter outputs is somewhat optimsed.
Output from splitters does not get most basic RTL optimisations anymore,
the splitters run after those!

> >   if (cond == UNLE || cond == UNLT || cond == NE
> >   || cond == UNGE || cond == UNGT)
> > {
> >   cond = reverse_condition_maybe_unordered (cond);
> >   need_invert = true;
> > }
> > 
> >   if (cond == LT || cond == LE)
> > {
> >   cond = swap_condition (cond);
> >   std::swap (operands[1], operands[2]);
> > }
> 
> I added one assert here to guard the codes.

Sure, if you aren't certain no other codes can happen, or you expect
things to ever change, etc.

> > Also, can_create_pseudo in the split condition can fail, in theory
> > anyway.  It should be part of the insn condition, instead, and even
> > then it can fail: after the last split pass, but before reload. such
> > an insn can in principle be created, and then it sill be never split.
> > 
> > In this case, maybe you should add a scratch register; in the previous
> > case, you can reuse operands[0] for res instead of making a new reg
> > for it, if !can_create_pseudo ().
> 
> I've updated the previous case with operands[0] with !can_create_pseudo
> check.  But for the later one I met ICE if I added two clobbers with
> scratch into the RTL pattern like:
> 
>   [(set (match_operand:VEC_F 0 "vfloat_operand")
>(vector_fp_extra_comparison_operator:VEC_F
>(match_operand:VEC_F 1 "vfloat_operand")
>(match_operand:VEC_F 2 "vfloat_operand")))
>(clobber (match_scratch:VEC_F 3))
>(clobber (match_scratch:VEC_F 4))]
> 
> Some pattern without clobber as below can't be recognized (early in vregs).
> 
> (insn 24 23 25 4 (set (reg:V4SF 142)
> (uneq:V4SF (reg:V4SF 141 [ vect__4.11 ])
> (reg:V4SF 127 [ vect_cst__47 ]))) -1
>  (nil))

Yeah.  Just doing can_create_pseudo in the insn condition (and in the
split condition, via &&) will work -- there just is this window of
failure you should be aware of, and we need to do something about that,
eventually.  It's a very general problem.  There are many places in
rs6000 where we already do evil things with this (and some much worse
than this even), so one more won't hurt too much.

It does need can_create_pseudo in the split condition then though.

> >   rtx comp1 = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> >   rtx res1 = gen_reg_rtx (mode);
> >   emit_insn (gen_rtx_SET (res1, comp1));
> >   rtx comp2 = gen_rtx_fmt_ee (cond, mode, operands[2], operands[1]);
> >   rtx res2 = gen_reg_rtx (mode);
> >   emit_insn (gen_rtx_SET (res2, comp2));
> > 
> >   if (need_invert)
> > {
> >   rtx comp3 = gen_rtx_fmt_ee (IOR, mode, res1, res2);
> >   rtx comp4 = gen_rtx_fmt_e (NOT, mode, comp3);
> 
> I had to add one more gen_rtx_SET for the IOR here, otherwise it gets the 
> error
> on the pattern can't be recognized.

Oh, sorry.  The canonical way of writing a NOR is as an ANDCC, so this
should be

  rtx not1 = gen_rtx_fmt_e (NOT, mode, res1);
  rtx not2 = gen_rtx_fmt_e (NOT, mode, res2);
  rtx comp3 = gen_rtx_fmt_ee (AND, mode, not1, not2);
  emit_insn (gen_rtx_SET (operands[0], comp3));

> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b132037..deeab9f 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -107,6 +107,12 @@
>(smin "smin")
>(smax "smax")])
>  
> +;; code iterators and attributes for vector FP comparison operators:
> +(define_code_iterator
> + vector_fp_comparison_operator [lt le ne ungt unge unlt unle])

Can you change this name so it is clear it is just the simple ones?


Segher

Re: [PATCH 6/X] [libsanitizer] Add hwasan pass and associated gimple changes

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

There are four main features to this change:

1) Check pointer tags match address tags.

In the new `hwasan` pass we put HWASAN_CHECK internal functions around
all memory accesses, to check that tags in the pointer being used match
the tag stored in shadow memory for the memory region being used.

These internal functions are expanded into actual checks in the sanopt
pass that happens just before expansion into RTL.

We use the same mechanism that currently inserts ASAN_CHECK internal
functions to insert the new HWASAN_CHECK functions.

2) Instrument known builtin function calls.

Handle all builtin functions that we know use memory accesses.
This commit uses the machinery added for ASAN to identify builtin
functions that access memory.

The main differences between the approaches for HWASAN and ASAN are:
  - libhwasan intercepts much less builtin functions.
  - Alloca needs to be transformed differently (instead of adding
redzones it needs to tag shadow memory and return a tagged pointer).
  - stack_restore needs to untag the shadow stack between the current
position and where it's going.
  - `noreturn` functions can not be handled by simply unpoisoning the
entire shadow stack -- there is no "always valid" tag.
(exceptions and things such as longjmp need to be handled in a
different way).

For hardware implemented checking (such as AArch64's memory tagging
extension) alloca and stack_restore will need to be handled by hooks in
the backend rather than transformation at the gimple level.  This will
allow architecture specific handling of such stack modifications.

3) Introduce HWASAN block-scope poisoning

Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison
variables on entry/exit of a block.

In order to simply use the exact same machinery we're using the same
internal functions until the SANOPT pass.  This means that all handling
of ASAN_MARK is the same.
This has the negative that the naming may be a little confusing, but a
positive that handling of the internal function doesn't have to be
duplicated for a function that behaves exactly the same but has a
different name.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (handle_builtin_stack_restore): Account for HWASAN.
(handle_builtin_alloca): Account for HWASAN.
(get_mem_refs_of_builtin_call): Special case strlen for HWASAN.
(report_error_func): Assert not HWASAN.
(build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK.
(instrument_derefs): HWASAN does not tag globals.
(maybe_instrument_call): Don't instrument `noreturn` functions.
(initialize_sanitizer_builtins): Add new type.
(asan_expand_mark_ifn): Account for HWASAN.
(asan_expand_check_ifn): Assert never called by HWASAN.
(asan_expand_poison_ifn): Account for HWASAN.
(hwasan_instrument): New.
(hwasan_base): New.
(hwasan_emit_untag_frame): Free block-scope-var hash map.
(hwasan_check_func): New.
(hwasan_expand_check_ifn): New.
(hwasan_expand_mark_ifn): New.
(gate_hwasan): New.
(class pass_hwasan): New.
(make_pass_hwasan): New.
(class pass_hwasan_O0): New.
(make_pass_hwasan_O0): New.
* asan.h (hwasan_base): New decl.
(hwasan_expand_check_ifn): New decl.
(hwasan_expand_mark_ifn): New decl.
(gate_hwasan): New decl.
(enum hwasan_mark_flags): New.
(asan_intercepted_p): Always false for hwasan.
(asan_sanitize_use_after_scope): Account for HWASAN.
* builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New.
* gimple-pretty-print.c (dump_gimple_call_args): Account for
HWASAN.
* gimplify.c (asan_poison_variable): Account for HWASAN.
(gimplify_function_tree): Remove requirement of
SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in
`asan_sanitize_use_after_scope`.
* internal-fn.c (expand_HWASAN_CHECK): New.
(expand_HWASAN_CHOOSE_TAG): New.
(expand_HWASAN_MARK): New.
* internal-fn.def (HWASAN_CHOOSE_TAG): New.
(HWASAN_CHECK): New.
(HWASAN_MARK): New.
* passes.def: Add hwasan and hwasan_O0 passes.
* sanitizer.def (BUILT_IN_HWASAN_LOAD1): New.
(BUILT_IN_HWASAN_LOAD2): New.
(BUILT_IN_HWASAN_LOAD4): New.
(BUILT_IN_HWASAN_LOAD8): New.
(BUILT_IN_HWASAN_LOAD16): New.
(BUILT_IN_HWASAN_LOADN): New.
(BUILT_IN_HWASAN_STORE1): New.
(BUILT_IN_HWASAN_STORE2): New.
(BUILT_IN_HWASAN_STORE4): New.
(BUILT_IN_HWASAN_STORE8): New.
(BUILT_IN_HWASAN_STORE16): New.
(BUILT_IN_HWASAN_STOREN): New.
(BUILT_IN_HWASAN_LOAD1_NOABORT): New.
(BUILT_IN_HWASAN_LOAD2_NOABORT): New.
(BUILT_IN_HWASAN_LOAD4_NOABORT): New.
(BUILT_IN_HWASAN_LOAD8_NOABORT): New.
(BUILT_IN_HWASAN

Re: [PATCH 7/X] [libsanitizer] Add tests

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

Adding hwasan tests.

Frankly, these could be tidied up a little.
I will be tidying them up while getting feedback on the hwasan introduction.


Which is file, however I would consider adding some dynamic allocation tests.
One can easily inspire either in GCC for ASAN, or here:
https://github.com/llvm/llvm-project/tree/master/compiler-rt/test/hwasan

Martin

Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-20 Thread Bernd Schmidt

On 11/20/19 2:50 PM, Mikael Pettersson wrote:
> On Mon, Nov 18, 2019 at 9:57 PM Mikael Pettersson  
> wrote:
>>
>> On Mon, Nov 18, 2019 at 8:31 PM Bernd Schmidt  wrote:
>>>
>>> Hi Mikael,
>>>
 This fixed the problem, thanks.
>>>
>>> Could you also run the testsuite to see if you can reproduce the
>>> g++.old-deja failures Andreas reported?
>>
>> Yes, but it will probably take another week before the native
>> bootstrap (on aranym) and test suite run is finished.  It's currently
>> in stage 2.

Ugh, that suggests the stage2 compiler was miscompiled. That would be
nasty to track down.

Probably best to just run tests on stage1 and hope something shows up.

What distro are you using for native builds? The m68k debian I'm using
does not have an installable gcc package.

Bernd

Re: v2 [PATCH 0/X] Introduce HWASAN sanitizer to GCC

2019-11-20 Thread Martin Liška


On 11/7/19 7:37 PM, Matthew Malcomson wrote:

I have rebased this series onto Martin Liska's patches that take the most
recent libhwasan from upstream LLVM.
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00340.html

I've also cleared up some nomenclature (I had previously used the word 'colour'
a few times instead of the word 'tag' and that clashes with other descriptions)
and based the patch series off a more recent GCC revision (r277678).

There's an ongoing discussion on whether to have __SANITIZER_ADDRESS__, or
__SANITIZER_HWADDRESS__, but I'm keeping that discussion to the existing
thread.

Similarly there's still the question around C++ exceptions that I'm keeping to
the existing thread (on the first patch series).


NOTE:
   Unfortunately, there's a bug in the more recent version of GCC I rebased
   onto.
   Hwasan catches this when bootstrapping, which means bootstrapping with hwasan
   fails.
   I'm working on tracking the bug down now, but sending this series upstream
   for visibility while that happens.

   Bugzilla link:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

Entire patch series attached to cover letter.=



Hello.

Thank you very much for the patch set, I've just wrote some inline replies
and I have also some questions/suggestions that I'll summarize here. One caveat,
I'm not maintainer in any of the ideas and I must confirm that I haven't made
a deep review of the part which relates to new RTL hooks and stack expansion.
I expect Jakub can help us here.

Questions/notes:
a) For ASAN we do have bunch of parameters:

gcc --help=param | grep asan
  asan-stack  Enable asan stack protection.
  asan-instrument-allocas Enable asan allocas/VLAs protection.
  asan-globalsEnable asan globals protection.
  asan-instrument-writes  Enable asan store operations protection.
  asan-instrument-reads   Enable asan load operations protection.
  asan-memintrin  Enable asan builtin functions protection.
  asan-use-after-return   Enable asan detection of use-after-return bugs.
  asan-instrumentation-with-call-threshold Use callbacks instead of inline code 
if number of accesses in function becomes greater or equal to this number.

We can probably use some of these in order to drive hwaddress in a similar way. 
Most of them makes sense for hwaddress as well

b) Apparently clang prints also value of all registers. Do we want to do the 
same?

$ clang 
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/hwasan/stack-tagging-basic-1.c 
-fsanitize=hwaddress  && ./a.out
...
Tags for short granules around the buggy address (one tag corresponds to 16 
bytes):
   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
=> ..  ..  ..  ..  ..  ..  ..  ..  e9  .. [..] ..  ..  ..  ..  .. <=
   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
See 
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#short-granules
 for a description of short granule tags
Registers where the failure occurred (pc 0xb2976d60):
x0  0063  x1  f763249c  x2  e900f76324a4  x3  
b2976d94
x4    x5  165a043ab8eb3734  x6    x7  
b299f99c
x8  0011  x9  0200efff  x10   x11 
0063
x12 0200fffeff763241  x13 0011  x14 0008  x15 
c000e9e9
x16 b2959b88  x17 0007  x18 0040  x19 
b2977148
x20   x21 b2950388  x22   x23 

x24   x25   x26   x27 

x28   x29 f76324a0  x30 b2976d60
SUMMARY: HWAddressSanitizer: tag-mismatch 
(/home/marxin/Programming/gcc/a.out+0x2cd5c) in using_stack

c) I'm a bit confused by the pointer tags, where clang does:

$ clang 
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/hwasan/stack-tagging-basic-1.c 
-fsanitize=hwaddress  && ./a.out
==30028==ERROR: HWAddressSanitizer: tag-mismatch on address 0xf76324a4 at 
pc 0xb2976d60
READ of size 4 at 0xf76324a4 tags: e9/00 (ptr/mem) in thread T0
#0 0xb2976d5c in using_stack 
(/home/marxin/Programming/gcc/a.out+0x2cd5c)
#1 0xb2976e08 in main (/home/marxin/Programming/gcc/a.out+0x2ce08)
#2 0x83ca73e8 in __libc_start_main (/lib64/libc.so.6+0x243e8)
#3 0xb29503b8 in _start 
/home/abuild/rpmbuild/BUILD/glibc-2.30/csu/../sysdeps/aarch64/start.S:92

Address 0xf76324a4 is located in stack of thread T0
Thread: T0 0xeffe2000 stack: [0xf6e33000,0xf7633000) sz: 8388608 
tls: [0x83f82d20,0x83f83470)
Previously allocated frames:
  record_addr:0x83c68748 record:0x324ab2976c8c in using_stack 
(/home/marxin/Programming/gcc/a.out+0x2cc8c)
Memory tags around the buggy address (one tag corresponds to 16 bytes):
   00  00  00  00  00  00  00  00  00  00  00  00  0

Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-11-20 Thread Martin Liška


On 11/13/19 4:24 PM, Matthew Malcomson wrote:

On 12/11/2019 12:08, Martin Liška wrote:

On 11/11/19 5:03 PM, Matthew Malcomson wrote:

Ah!
My apologies -- I sent up a series with a few documentation mistakes.
(the others were wording problems so less noticeable)


That's fine, I fixed that very easily.

Right now, I can confirm using a aarch64 KVM with the following linux
kernel:
5.4.0-rc6-3.g7068448-default works. I haven't tried HWASAN bootstrap,
but I can
run almost all hwasan.exp tests.

There are 2 exceptions:

FAIL: gcc.dg/hwasan/stack-tagging-basic-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/hwasan/large-aligned-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test


Wow, I have no idea how I missed that but thanks for the catch!



These fail due to unused value of a function that returns int. The
attached patch fixes that.
I'm planning to make a proper comments about the series starting next week.

For the meantime, I have some libsanitizer upstream suggestions
that you can may be discuss. It's mostly about
shadow memory dump differences in between ASAN and HWASAN:





Improvements I see:
a) HWASAN uses less compact dump (2 spaces compared to one)
b) HWASAN is not using colors and it would be handy to know which color
is used for "uninitialized" tags
     and I would mark the 2 compares tags in dumps (ptr/mem)
c) "Tags for short granules around the buggy address" dump is using a
dot notation which seems a bit misleading
d) For HWASAN address offset is missing for each line in both shadow
memory and the pointer


Hi.



To mention my thoughts in turn:
a) The dump of HWASAN takes more space, but represents more application
 bytes for a given space on the console.
 Each byte of HWASAN shadow space represents 16 application bytes
 while each byte in ASAN shadow space represents 8 application bytes.
 I personally appreciate the extra spacing, and figure that the 8|16
 byte difference allows for this.


Thanks for explanation, it works for me.



b) Marking 'ptr' and 'mem' in the dump sounds like a good idea to me.
 Exactly how I'm not sure -- maybe with a colourscheme?  Do you have a
 marking in mind?


Libsanitizer is capable of using colors for report printing.
I can help with that and come up with a patch for upstream.



 Uninitialised shadow space has the zero tag, however, there are a few
 extra details that help understanding these traces:

 On the stack, zero is both uninitialized and "the background" (i.e.
 the tag for anything not specially instrumented, like register spills
 and parameters passed on the stack).
 However, accessible tagged objects can be given a zero tag.


Question here would be if we should use non-zero tags here? Maybe related
to my comment about skipping of HWASAN_STACK_BACKGROUND tag?


 We allow this to avoid runtime checks when incrementing the random
 frame tag to get the tag for a new local variable.
 We can easily avoid the zero tag at compile-time if we don't use a
 random tag for each frame.  I had this in development at one point
 and found it quite useful for verification.  I already have an option
 to disable random tags for each frame that this ability could go
 under.
 I don't believe (but am not 100% certain) this option is in LLVM.

 On the heap uninitialised is tag zero, but memory that has been
 `free`d is given a random tag, so non-zero in a dump does not mean a
 valid object.

c) Is there an alternate notation you have in mind?
 I would guess the "dots" are there to say "this granule has no
 short-tag information", and I'm not sure what would be a better
 way to demonstrate that.


Now I've got it here. Dot means that top-byte of a pointer equals to zero.
Right?



d) I agree, an address offset annotation on each line of the shadow
 memory sounds useful.


I can come up with an upstream patch as well.

Thank you,
Martin



Cheers,
MM



Thanks,
Martin



I'm attaching the entire updated patch series (with the other
documentation fixes in it too) and the fixed patch for just this part in
case you just want to compile and test right now.

Re: [PATCH 5/X] [libsanitizer][mid-end] Introduce stack variable handling for HWASAN

2019-11-20 Thread Matthew Malcomson

Hi Martin,

Thanks for the review,
I'll get working on your comments now, but since I really enjoyed 
finding this bug in ./configure when I hit it I thought I'd answer this 
right away.

On 20/11/2019 14:02, Martin Liška wrote:
> On 11/7/19 7:37 PM, Matthew Malcomson wrote:
>>
>> diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
>> index 
>> 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff56438357b896645e541
>>  
>> 100644
>> --- a/config/bootstrap-hwasan.mk
>> +++ b/config/bootstrap-hwasan.mk
>> @@ -1,7 +1,11 @@
>>   # This option enables -fsanitize=hwaddress for stage2 and stage3.
>> +# We need to disable random frame tags for bootstrap since the 
>> autoconf check
>> +# for which direction the stack is growing has UB that a random frame 
>> tag
>> +# breaks.  Running with a random frame tag gives approx. 50% chance of
>> +# bootstrap comparison diff in libiberty/alloca.c.
> 
> Here I would like to see what's exactly the problem. I would expect ASAN 
> will
> have exactly the same problem? Can you please isolate it and file a bug. 
> I bet
> a configure script should not expose an undefined behavior.
> 

The configure problem is this snippet below:

find_stack_direction ()
{
   static char *addr = 0;
   auto char dummy;
   if (addr == 0)
 {
   addr = &dummy;
   return find_stack_direction ();
 }
   else
 return (&dummy > addr) ? 1 : -1;
}
main ()
{
   exit (find_stack_direction() < 0);
}

configure uses this to determine the direction that the stack grows.

`find_stack_direction` compares the address of two different objects and 
uses that to make a decision.

With HWASAN random frame tags the answer to the comparison is mostly 
determined by what random tag was assigned to the object in each frame, 
rather than the memory layout of the stack -- which means this configure 
test program can end up getting different answers on different runs.

This is not a problem for ASAN since ASAN does not store tags in the 
pointers of variables.

You're right -- I should file a bug on that for configure.

For reference the UB clause in the standard is 6.5.8 #5 (relational 
operators) where there's a sentence at the end saying "In all other 
cases, the behaviour is undefined".  Essentially, this program is 
comparing the address of two different objects on the stack, and that's 
not allowed.

Re: [PATCH] musl: Fix invalid tls model in libgomp and libitm PR91938

2019-11-20 Thread Szabolcs Nagy

On 15/11/2019 09:55, Szabolcs Nagy wrote:
> Musl does not support initial-exec tls in dynamically loaded shared
> libraries.

ping.

> 
> libgomp/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * configure.tgt: Avoid IE tls on *-*-musl*.

i should add PR libgomp/91938 here.

> 
> libitm/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * configure.tgt: Avoid IE tls on *-*-musl*.

Re: [PATCH 5/X] [libsanitizer][mid-end] Introduce stack variable handling for HWASAN

2019-11-20 Thread Martin Liška


On 11/20/19 3:37 PM, Matthew Malcomson wrote:

Hi Martin,

Thanks for the review,


You're welcome.


I'll get working on your comments now, but since I really enjoyed
finding this bug in ./configure when I hit it I thought I'd answer this
right away.


Heh :)




On 20/11/2019 14:02, Martin Liška wrote:

On 11/7/19 7:37 PM, Matthew Malcomson wrote:


diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
index
4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff56438357b896645e541
100644
--- a/config/bootstrap-hwasan.mk
+++ b/config/bootstrap-hwasan.mk
@@ -1,7 +1,11 @@
   # This option enables -fsanitize=hwaddress for stage2 and stage3.
+# We need to disable random frame tags for bootstrap since the
autoconf check
+# for which direction the stack is growing has UB that a random frame
tag
+# breaks.  Running with a random frame tag gives approx. 50% chance of
+# bootstrap comparison diff in libiberty/alloca.c.


Here I would like to see what's exactly the problem. I would expect ASAN
will
have exactly the same problem? Can you please isolate it and file a bug.
I bet
a configure script should not expose an undefined behavior.



The configure problem is this snippet below:


find_stack_direction ()
{
static char *addr = 0;
auto char dummy;
if (addr == 0)
  {
addr = &dummy;
return find_stack_direction ();
  }
else
  return (&dummy > addr) ? 1 : -1;
}
main ()
{
exit (find_stack_direction() < 0);
}


configure uses this to determine the direction that the stack grows.

`find_stack_direction` compares the address of two different objects and
uses that to make a decision.

With HWASAN random frame tags the answer to the comparison is mostly
determined by what random tag was assigned to the object in each frame,
rather than the memory layout of the stack -- which means this configure
test program can end up getting different answers on different runs.

This is not a problem for ASAN since ASAN does not store tags in the
pointers of variables.


You're right -- I should file a bug on that for configure.

For reference the UB clause in the standard is 6.5.8 #5 (relational
operators) where there's a sentence at the end saying "In all other
cases, the behaviour is undefined".  Essentially, this program is
comparing the address of two different objects on the stack, and that's
not allowed.


Well, to be honest, this is quite cute violation of the standard. I would
have written exactly the same code for the stack direction direction. I 
understand
that a top byte will (a.k.a. tag) make the randomness.

Do you have an idea how can we rewrite the check?
Thanks,
Martin

Re: RFA; patch to fix PR90007

2019-11-20 Thread Vladimir Makarov




On 2019-11-20 5:06 a.m., Richard Biener wrote:

On Tue, Nov 19, 2019 at 5:07 PM Vladimir Makarov  wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

Sometime ago a code which permits LRA to reload hard register into
memory as it did for pseudo were added.  But this LRA possibility was
not reflected in recog.c.  The following patch fixes this discrepancy
and as a result fixes PR90007.

OK to commit?

I guess the change is OK but for the bug itself it sounds like
selective scheduling doesn't properly recognize insns it
proagates into (and avoids doing that then)?  That is,
selective scheduling creates invalid RTL?

It is hard for me to say what should be enough for new insn validation 
in any scheduler code.  I think recog code would be a safe choice as it 
is already used in DFA.


On the other hand using non-recog code for insn validation helped to 
find the code discrepancy between recog and LRA.


In any case I believe that the patch should be committed anyway and it 
fixes the problem.

Re: [PATCH 2/7] Include new generated gcc/params.opt file.

2019-11-20 Thread David Malcolm

On Wed, 2019-11-06 at 11:30 +0100, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2019-11-06  Martin Liska  
> 
>   * Makefile.in: Include params.opt.
>   * flag-types.h (enum parloops_schedule_type): Add
>   parloops_schedule_type used in params.opt.
>   * params.opt: New file.
> ---
>  gcc/Makefile.in  |   2 +-
>  gcc/flag-types.h |  11 +
>  gcc/params.opt   | 967
> +++
>  3 files changed, 979 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/params.opt

Is your params.def -> params.opt script available somewhere?  (sorry if
I missed it, I didn't see it looking over the patch kit)

(I'm rebasing my static analyzer patch kit and am about to convert my
params.def additions to be params.opt additions, though I'm only adding
4 params, so hopefully doing it by hand will be trivial)

Thanks
Dave

Re: [PATCH 2/7] Include new generated gcc/params.opt file.

2019-11-20 Thread Martin Liška


On 11/20/19 3:49 PM, David Malcolm wrote:

On Wed, 2019-11-06 at 11:30 +0100, Martin Liska wrote:

gcc/ChangeLog:

2019-11-06  Martin Liska  

* Makefile.in: Include params.opt.
* flag-types.h (enum parloops_schedule_type): Add
parloops_schedule_type used in params.opt.
* params.opt: New file.
---
  gcc/Makefile.in  |   2 +-
  gcc/flag-types.h |  11 +
  gcc/params.opt   | 967
+++
  3 files changed, 979 insertions(+), 1 deletion(-)
  create mode 100644 gcc/params.opt


Is your params.def -> params.opt script available somewhere?  (sorry if
I missed it, I didn't see it looking over the patch kit)


Hi David.



(I'm rebasing my static analyzer patch kit and am about to convert my
params.def additions to be params.opt additions, though I'm only adding
4 params, so hopefully doing it by hand will be trivial)


No, it's not public. Please convert it by hand, it will be faster ;)

Martin



Thanks
Dave

Re: [PATCH 5/X] [libsanitizer][mid-end] Introduce stack variable handling for HWASAN

2019-11-20 Thread Matthew Malcomson

On 20/11/2019 14:46, Martin Liška wrote:
> On 11/20/19 3:37 PM, Matthew Malcomson wrote:
>> Hi Martin,
>>
>> Thanks for the review,
> 
> You're welcome.
> 
>> I'll get working on your comments now, but since I really enjoyed
>> finding this bug in ./configure when I hit it I thought I'd answer this
>> right away.
> 
> Heh :)
> 
>>
>>
>> On 20/11/2019 14:02, Martin Liška wrote:
>>> On 11/7/19 7:37 PM, Matthew Malcomson wrote:

 diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
 index
 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff56438357b896645e541
  

 100644
 --- a/config/bootstrap-hwasan.mk
 +++ b/config/bootstrap-hwasan.mk
 @@ -1,7 +1,11 @@
    # This option enables -fsanitize=hwaddress for stage2 and stage3.
 +# We need to disable random frame tags for bootstrap since the
 autoconf check
 +# for which direction the stack is growing has UB that a random frame
 tag
 +# breaks.  Running with a random frame tag gives approx. 50% chance of
 +# bootstrap comparison diff in libiberty/alloca.c.
>>>
>>> Here I would like to see what's exactly the problem. I would expect ASAN
>>> will
>>> have exactly the same problem? Can you please isolate it and file a bug.
>>> I bet
>>> a configure script should not expose an undefined behavior.
>>>
>>
>> The configure problem is this snippet below:
>>
>>
>> find_stack_direction ()
>> {
>>     static char *addr = 0;
>>     auto char dummy;
>>     if (addr == 0)
>>   {
>>     addr = &dummy;
>>     return find_stack_direction ();
>>   }
>>     else
>>   return (&dummy > addr) ? 1 : -1;
>> }
>> main ()
>> {
>>     exit (find_stack_direction() < 0);
>> }
>>
>>
>> configure uses this to determine the direction that the stack grows.
>>
>> `find_stack_direction` compares the address of two different objects and
>> uses that to make a decision.
>>
>> With HWASAN random frame tags the answer to the comparison is mostly
>> determined by what random tag was assigned to the object in each frame,
>> rather than the memory layout of the stack -- which means this configure
>> test program can end up getting different answers on different runs.
>>
>> This is not a problem for ASAN since ASAN does not store tags in the
>> pointers of variables.
>>
>>
>> You're right -- I should file a bug on that for configure.
>>
>> For reference the UB clause in the standard is 6.5.8 #5 (relational
>> operators) where there's a sentence at the end saying "In all other
>> cases, the behaviour is undefined".  Essentially, this program is
>> comparing the address of two different objects on the stack, and that's
>> not allowed.
> 
> Well, to be honest, this is quite cute violation of the standard. I would
> have written exactly the same code for the stack direction direction. I 
> understand
> that a top byte will (a.k.a. tag) make the randomness.
> 
> Do you have an idea how can we rewrite the check?
> Thanks,
> Martin
> 

I don't have much of a plan.

The most promising lead I have is that libiberty/alloca.c has a similar 
functionality but with macros to account for a special case.

Instead of just using '&' it uses a macro `ADDRESS_FUNCTION`.
I can use that macro to ensure the libiberty/alloca.c function could 
handle tags, but I'm not sure that architecture specific conditions will 
neatly fit into autoconf.

Re: [PATCH 2/7] Include new generated gcc/params.opt file.

2019-11-20 Thread David Malcolm

On Wed, 2019-11-20 at 15:51 +0100, Martin Liška wrote:
> On 11/20/19 3:49 PM, David Malcolm wrote:
> > On Wed, 2019-11-06 at 11:30 +0100, Martin Liska wrote:
> > > gcc/ChangeLog:
> > > 
> > > 2019-11-06  Martin Liska  
> > > 
> > >   * Makefile.in: Include params.opt.
> > >   * flag-types.h (enum parloops_schedule_type): Add
> > >   parloops_schedule_type used in params.opt.
> > >   * params.opt: New file.
> > > ---
> > >   gcc/Makefile.in  |   2 +-
> > >   gcc/flag-types.h |  11 +
> > >   gcc/params.opt   | 967
> > > +++
> > >   3 files changed, 979 insertions(+), 1 deletion(-)
> > >   create mode 100644 gcc/params.opt
> > 
> > Is your params.def -> params.opt script available
> > somewhere?  (sorry if
> > I missed it, I didn't see it looking over the patch kit)
> 
> Hi David.
> 
> > (I'm rebasing my static analyzer patch kit and am about to convert
> > my
> > params.def additions to be params.opt additions, though I'm only
> > adding
> > 4 params, so hopefully doing it by hand will be trivial)
> 
> No, it's not public. Please convert it by hand, it will be faster ;)

Fair enough.

Does this new machinery mean we can have per-frontend params (by
putting the Param options in the pertinent .opt file) ?  (not sure what
that will do to LTO)

Dave

Free memory_block pool

2019-11-20 Thread Jan Hubicka

Hi,
I have noticed that for Firefox around 1GB of peak memory use goes into
the fact that we never free memory_block_pool::freelist.

This patch adds memory_block_pool::trim which reduces freelist to a given
size.  It is called from ggc_collect which is a convenient place to return
heap allocations too and fully freeed prior forking in ggc_collect.

I originaly was freeing block directly in memory_block_pool::release
but that makes it non-leaf function which prevents optimization.
So I decided to go this way we get tiny bit better code
given that we already have ggc_collect that is conveninet place
to do such a bookeeping.

Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK?

Honza

* memory-block.h (memory_block_pool::freelist): New constant.
(memory_block_pool::clear_free_list): Rename to ...
(memory_block_pool::reduce_free_list): ... this.
(memory_block_pool::trim): New function.
(memory_block_pool::block_list): Add m_prev.
(memory_block_pool::m_num_blocks): New field.
(memory_block_pool::m_block_end): New field.
(memory_block_pool::allocate): Maintain m_num_blocks and m_blocks_end.
(memory_block_pool::release): Likewise.
* memory-block.cc (memory_block_pool::memory_block_pool): Initialize
new fields.
(memory_block_pool::clear_free_list): Rename to ...
(memory_block_pool::reduce_free_list): ... this one; free from end
and add NUM parameter.
(memory_block_pool::trim): New.
* ggc-page.c (ggc_collect): Call memory_block_pool::trim.

* lto.c: Call memory_block_pool::trim.
Index: memory-block.h
===
--- memory-block.h  (revision 278464)
+++ memory-block.h  (working copy)
@@ -28,12 +28,15 @@ class memory_block_pool
 public:
   /* Blocks have fixed size.  This is necessary for sharing.  */
   static const size_t block_size = 64 * 1024;
+  /* Number of blocks we keep in the freelists.  */
+  static const size_t freelist_size = 1024 * 1024 / block_size;
 
   memory_block_pool ();
 
   static inline void *allocate () ATTRIBUTE_MALLOC;
   static inline void release (void *);
-  void clear_free_list ();
+  static void trim (int nblocks = freelist_size);
+  void reduce_free_list (int);
 
 private:
   /* memory_block_pool singleton instance, defined in memory-block.cc.  */
@@ -42,10 +45,13 @@ private:
   struct block_list
   {
 block_list *m_next;
+block_list *m_prev;
   };
 
   /* Free list.  */
   block_list *m_blocks;
+  block_list *m_blocks_end;
+  int m_num_blocks;
 };
 
 /* Allocate a single block.  Reuse a previously returned block, if possible.  
*/
@@ -57,6 +63,9 @@ memory_block_pool::allocate ()
 
   void *result = instance.m_blocks;
   instance.m_blocks = instance.m_blocks->m_next;
+  instance.m_num_blocks--;
+  if (!instance.m_blocks)
+instance.m_blocks_end = NULL;
   VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (result, block_size));
   return result;
 }
@@ -67,7 +76,12 @@ memory_block_pool::release (void *uncast
 {
   block_list *block = new (uncast_block) block_list;
   block->m_next = instance.m_blocks;
+  if (instance.m_blocks)
+instance.m_blocks->m_prev = block;
+  else
+instance.m_blocks_end = block;
   instance.m_blocks = block;
+  instance.m_num_blocks++;
 
   VALGRIND_DISCARD (VALGRIND_MAKE_MEM_NOACCESS ((char *)uncast_block
+ sizeof (block_list),
Index: memory-block.cc
===
--- memory-block.cc (revision 278464)
+++ memory-block.cc (working copy)
@@ -26,18 +27,27 @@ along with GCC; see the file COPYING3.
 /* Global singleton-like instance.  */
 memory_block_pool memory_block_pool::instance;
 
-memory_block_pool::memory_block_pool () : m_blocks (NULL) {}
+/* Default constructor.  */
+memory_block_pool::memory_block_pool ()
+ : m_blocks (NULL), m_blocks_end (NULL), m_num_blocks (0)
+{
+}
 
-/* Return all blocks from free list to the OS.  */
+/* Reduce free list to NUM blocks.  */
 void
-memory_block_pool::clear_free_list ()
+memory_block_pool::reduce_free_list (int num)
 {
-  while (m_blocks)
+  gcc_checking_assert (num >= 0);
+  while (m_num_blocks > num)
 {
-  block_list *next = m_blocks->m_next;
-  XDELETEVEC (m_blocks);
-  m_blocks = next;
+  block_list *prev = m_blocks_end->m_prev;
+  XDELETEVEC (m_blocks_end);
+  m_blocks_end = prev;
+  prev->m_next = NULL;
+  m_num_blocks--;
 }
+  if (!m_num_blocks)
+m_blocks = m_blocks_end = 0;
 }
 
 /* Allocate a chunk for obstack.  Use the pool if requested chunk size matches
@@ -62,3 +72,10 @@ mempool_obstack_chunk_free (void *chunk)
   else
 XDELETEVEC (chunk);
 }
+
+/* Return allocated memory back to malloc (and to system).  */
+void
+memory_block_pool::trim (int num)
+{
+  instance.reduce_free_list (num);
+}
Index: ggc-page.c
===

Re: Free memory_block pool

2019-11-20 Thread Martin Liška


On 11/20/19 4:03 PM, Jan Hubicka wrote:

I have noticed that for Firefox around 1GB of peak memory use goes into
the fact that we never free memory_block_pool::freelist.


Great, can you reduce the peak memory by the 1GB with the patch?

Thanks,
Martin

Re: [PATCH 2/7] Include new generated gcc/params.opt file.

2019-11-20 Thread Martin Liška


On 11/20/19 4:02 PM, David Malcolm wrote:

On Wed, 2019-11-20 at 15:51 +0100, Martin Liška wrote:

On 11/20/19 3:49 PM, David Malcolm wrote:

On Wed, 2019-11-06 at 11:30 +0100, Martin Liska wrote:

gcc/ChangeLog:

2019-11-06  Martin Liska  

* Makefile.in: Include params.opt.
* flag-types.h (enum parloops_schedule_type): Add
parloops_schedule_type used in params.opt.
* params.opt: New file.
---
   gcc/Makefile.in  |   2 +-
   gcc/flag-types.h |  11 +
   gcc/params.opt   | 967
+++
   3 files changed, 979 insertions(+), 1 deletion(-)
   create mode 100644 gcc/params.opt


Is your params.def -> params.opt script available
somewhere?  (sorry if
I missed it, I didn't see it looking over the patch kit)


Hi David.


(I'm rebasing my static analyzer patch kit and am about to convert
my
params.def additions to be params.opt additions, though I'm only
adding
4 params, so hopefully doing it by hand will be trivial)


No, it's not public. Please convert it by hand, it will be faster ;)


Fair enough.

Does this new machinery mean we can have per-frontend params (by
putting the Param options in the pertinent .opt file) ?  (not sure what
that will do to LTO)


I guess so. Note that now parameters are first class citizens same as options.
So that, having a per-FE should work fine.

Martin



Dave

Re: Free memory_block pool

2019-11-20 Thread Jan Hubicka

> On 11/20/19 4:03 PM, Jan Hubicka wrote:
> > I have noticed that for Firefox around 1GB of peak memory use goes into
> > the fact that we never free memory_block_pool::freelist.
> 
> Great, can you reduce the peak memory by the 1GB with the patch?

About half of it, approx 0.5GB (8.3GB -> 7.8GB) judging from memory use
graphs.  I was checking and malloc returns to system everything over
128k (at least it claims so) and malloc_trim does not make much
difference.  I am not sure how the vmstat reacts to MADV_DONTNEED.

Honza

Restrict bb-slp-40.c to targets with VnQI addition (PR 92366)

2019-11-20 Thread Richard Sandiford

bb-slp-40.c fails on SPARC targets without VIS4 because it
requires addition on vectors of bytes.  There doesn't seem to be
an existing target selector for this, so I added vect_char_add.
(Wasn't sure whether to use vect_char_add, for consistency
with vect_no_int_add/vect_int_mult etc., or vect_add_char for
consistency with vect_shift_char etc.)

I took the target list from vect_int and removed targets that didn't
seem to support the operation (namely sparc*, since we don't seem to
have any test for VIS4, niagara7 or m8, and alpha*-*-*.)

Tested on aarch64-linux-gnu, x86_64-linux-gnu, powerpc64-linux-gnu
(Power 7) and sparc-sun-solaris2.11.  OK to install?

Richard


2019-11-20  Richard Sandiford  

gcc/
PR testsuite/92366
* doc/sourcebuild.texi (vect_char_add): Document.

gcc/testsuite/
PR testsuite/92366
* lib/target-supports.exp (check_effective_target_vect_char_add):
New proc.
* gcc.dg/vect/bb-slp-40.c: Require vect_char_add instead of vect_int.

Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi2019-11-18 15:36:04.865884928 +
+++ gcc/doc/sourcebuild.texi2019-11-20 15:11:52.572010515 +
@@ -1522,6 +1522,10 @@ Target does not support a vector add ins
 @item vect_no_bitwise
 Target does not support vector bitwise instructions.
 
+@item vect_char_add
+Target supports addition of @code{char} vectors for at least one
+vector length.
+
 @item vect_char_mult
 Target supports @code{vector char} multiplication.
 
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   2019-11-18 15:36:04.869884903 
+
+++ gcc/testsuite/lib/target-supports.exp   2019-11-20 15:11:52.580010459 
+
@@ -5749,6 +5749,27 @@ proc check_effective_target_vect_bswap {
 || [istarget amdgcn-*-*] }}]
 }
 
+# Return 1 if the target supports addition of char vectors for at least
+# one vector length.
+
+proc check_effective_target_vect_char_add { } {
+return [check_cached_effective_target_indexed vect_int {
+  expr {
+ [istarget i?86-*-*] || [istarget x86_64-*-*]
+ || ([istarget powerpc*-*-*]
+&& ![istarget powerpc-*-linux*paired*])
+|| [istarget amdgcn-*-*]
+|| [istarget ia64-*-*]
+|| [istarget aarch64*-*-*]
+|| [is-effective-target arm_neon]
+|| ([istarget mips*-*-*]
+&& ([et-is-effective-target mips_loongson_mmi]
+|| [et-is-effective-target mips_msa]))
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx])
+   }}]
+}
+
 # Return 1 if the target supports hardware vector shift operation for char.
 
 proc check_effective_target_vect_shift_char { } {
Index: gcc/testsuite/gcc.dg/vect/bb-slp-40.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-40.c   2019-11-19 16:25:49.0 
+
+++ gcc/testsuite/gcc.dg/vect/bb-slp-40.c   2019-11-20 15:11:52.572010515 
+
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-fvect-cost-model=dynamic" } */
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_char_add } */
 
 char g_d[1024], g_s1[1024], g_s2[1024];
 void foo(void)

Re: Reverting r278411

2019-11-20 Thread Segher Boessenkool

Hi!

On Wed, Nov 20, 2019 at 09:42:46AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> >> Before I resubmit, why is the simplify-rtx.c part all wrong?
> >
> > It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
> > lt gt eq un are hardly worth documenting or making symbolic constants
> > for, because a) they are familiar to all powerpc users anyway (the
> > assembler has those as predefined constants!), admittedly this isn't a
> > strong argument for most people;
> 
> Ah, OK.  I was totally unaware of the connection.

Yeah, I should have documented it.  Time pressure was high the last N
weeks, with everyone else putting stuff in before end of stage 1 things
broke left and right for me.  Normally I would just not update trunk the
last two months or so of stage 1 (for development -- for testing you
always need current trunk of course), but I needed some stuff from it.
Oh well, I finally dug myself out.

> > but also b) they were only used in two short and obvious routines.
> > After your patch the routines are no longer short or obvious.
> >
> > A comparison result is exactly one of four things: lt, gt, eq, or un.
> > So we can express testing for any subset of those with just an OR of
> > the masks for the individual conditions.  Whether a comparison is
> > floating point, floating point no-nans, signed, or unsigned, is just
> > a property of the comparison, not of the result, in this representation.
> 
> Yeah, this works for OR because we never lose the unorderdness.

It works for everything, including things with a NOT.

If the mode does not allow unordered, you mask that away all the way at
the end, and nothing else needs to be done.  This doesn't have to be done
for IOR because you can never end up with unordered in the mask if you
didn't start out with some code that allows unordered.

You also can encode NE as not allowing unordered, if you have a mode
where that does not exist, but that is purely cosmetic.

8421  "full" "no-nan"
  false  false
1000  lt lt
0100  gt gt
1100  ltgt   ne
0010  eq eq
1010  le le
0110  ge ge
1110  orderedtrue
0001  unordered
1001  unlt
0101  ungt
1101  ne
0011  uneq
1011  unle
0111  unge
  true

So for !HONOR_NANS modes we need to output LTGT as NE, and ORDERED as
true, and that is all.

> There were two aspects to my patch.  One was adding AND, and had:
> 
>   /* We only handle AND if we can ignore unordered cases.  */
>   bool honor_nans_p = HONOR_NANS (GET_MODE (op0));
>   if (code != IOR && (code != AND || honor_nans_p))
> return 0;
> 
> This is needed because e.g. UNLT & ORDERED -> LT is only conditionally
> valid.  It doesn't sound like you're objecting to that bit, is that right?
> Or was this what you had in mind with the reference to no-nans?

UNLT & ORDERED is always LT.  When would it not be true?

> As mentioned in the covering note, I punted for this because handling
> trapping FP comparisons correctly for AND would need its own set of
> testcases.

This isn't trapping arithmetic.  Unordered is a perfectly normal result.
As IEEE 754 says:
  Four mutually exclusive relations are possible: less than, equal,
  greater than, and unordered.
This same is true when !HONOR_NANS, for signed integer comparisons, and
for unsigned integer comparisons: just UNORDERED never happens.

> > If you do not mix signed and unsigned comparisons (they make not much
> > sense mixed anyway(*)), you need no changes at all: just translate
> > ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
> > function (there already are helper functions for that, signed_condition
> > and unsigned_condition).
> 
> So this all seems to come down to whether unsigned comparisons are handled
> as separate mask bits or whether they're dealt with by removing the
> unsignedness and then adding it back.  ISTM both are legitimate ways
> of doing it.  I don't think one of them is "all wrong".

It violates the whole design of the thing left and right.  I never
documented that well (or at all), of course :-/

> I was very belatedly getting around to dealing with Joseph's comment
> when you sent your patch and had it approved.  Since that patch seemed
> to be more favourably received in general, I was trying to work within
> the existing style of your version.  And without the powerpc background,
> I just assumed that the mask values were "documented" by the first block
> of case statements:
> 
> case LT:
>   return 8;
> case GT:
>   return 4;
> case EQ:
>   return 2;
> case UNORDERED:
>   return 1;

Yeah, but it may not be obvious that exactly one of those is true for
any comparison, and you can combine them to a mask and do any logical
operations on it.

> Adding two more cases here didn't seem to make things any more unclear.
> But maybe it is more jarring if you've memorised the powerpc mapping.

It's also that 2**4 is small, but 2**6 is not.  The 2**4 cases all

Reject versioning for alignment with different masks (PR 92526)

2019-11-20 Thread Richard Sandiford

Allowing mixed vector sizes broke the assumption in the following assert,
since it's now possible for different accesses to require different
levels of alignment:

  /* FORNOW: use the same mask to test all potentially unaligned
 references in the loop.  The vectorizer currently supports
 a single vector size, see the reference to
 GET_MODE_NUNITS (TYPE_MODE (vectype)) where the
 vectorization factor is computed.  */
  gcc_assert (!LOOP_VINFO_PTR_MASK (loop_vinfo)
  || LOOP_VINFO_PTR_MASK (loop_vinfo) == mask);

I guess we could try to over-align smaller accesses so that all
of them are consistent, or try to support multiple alignment masks,
but for now the easiest fix seems to be to turn the assert into a
bail-out check.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-20  Richard Sandiford  

gcc/
PR tree-optimization/92526
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Reject
versioning for alignment if the accesses do not have a consistent
mask, rather than asserting that the masks are consistent.

gcc/testsuite/
PR tree-optimization/92526
* gcc.target/aarch64/pr92526.c: New test.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2019-11-16 11:40:19.105159717 +
+++ gcc/tree-vect-data-refs.c   2019-11-20 15:27:49.385346722 +
@@ -2266,13 +2266,15 @@ vect_enhance_data_refs_alignment (loop_v
  mask must be 15 = 0xf. */
  mask = size - 1;
 
-  /* FORNOW: use the same mask to test all potentially unaligned
- references in the loop.  The vectorizer currently supports
- a single vector size, see the reference to
- GET_MODE_NUNITS (TYPE_MODE (vectype)) where the
- vectorization factor is computed.  */
-  gcc_assert (!LOOP_VINFO_PTR_MASK (loop_vinfo)
-  || LOOP_VINFO_PTR_MASK (loop_vinfo) == mask);
+ /* FORNOW: use the same mask to test all potentially unaligned
+references in the loop.  */
+ if (LOOP_VINFO_PTR_MASK (loop_vinfo)
+ && LOOP_VINFO_PTR_MASK (loop_vinfo) != mask)
+   {
+ do_versioning = false;
+ break;
+   }
+
   LOOP_VINFO_PTR_MASK (loop_vinfo) = mask;
  LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).safe_push (stmt_info);
 }
Index: gcc/testsuite/gcc.target/aarch64/pr92526.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/pr92526.c  2019-11-20 15:27:49.385346722 
+
@@ -0,0 +1,9 @@
+/* { dg-options "-O3 -mstrict-align" } */
+
+void
+f (unsigned int *restrict x, unsigned int *restrict y,
+   unsigned char *restrict z, unsigned int n)
+{
+  for (unsigned int i = 0; i < n % 4; ++i)
+x[i] = x[i] + y[i] + z[i];
+}

Re: Optimize fibonacci heap allocations

2019-11-20 Thread Jan Hubicka

Hi,
sadly this patch seems to trigger false positive 
../../gcc/vec.h:311:10: error: attempt to free a non-heap object �@Xa�@Y
[-Werror=free-nonheap-object]

which is about vec destructor freeing heap allocated memory if the
vector was ever resized from hits stack allocated place.  In this case
it never is.

I am testing the following workaround which turns it into simple array
(only benefit of auto_vec here is the bounds check that I implemented by
hand).

I will commit it once x86_64-linux boostrap finishes unless someone
comes with better alternative. Sorry for this - it did not show up with
LTO build for me.

Honza

* fibonacci_heap.h (fibonacci_heap::consolidate): Turn auto_vec
to ordinary array.

Index: fibonacci_heap.h
===
--- fibonacci_heap.h(revision 278501)
+++ fibonacci_heap.h(working copy)
@@ -648,17 +648,18 @@ template
 void fibonacci_heap::consolidate ()
 {
   const int D = 1 + 8 * sizeof (long);
-  auto_vec *, D> a;
+  fibonacci_node *a[D];
   fibonacci_node *w, *x, *y;
   int i, d;
 
-  a.quick_grow_cleared (D);
+  memset (a, 0, sizeof (a));
 
   while ((w = m_root) != NULL)
 {
   x = w;
   remove_root (w);
   d = x->m_degree;
+  gcc_checking_assert (d < D);
   while (a[d] != NULL)
{
  y = a[d];

Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-11-20 Thread Matthew Malcomson

On 20/11/2019 14:33, Martin Liška wrote:
> On 11/13/19 4:24 PM, Matthew Malcomson wrote:
>> On 12/11/2019 12:08, Martin Liška wrote:
>>> On 11/11/19 5:03 PM, Matthew Malcomson wrote:
 Ah!
 My apologies -- I sent up a series with a few documentation mistakes.
> 
>>
>> b) Marking 'ptr' and 'mem' in the dump sounds like a good idea to me.
>>  Exactly how I'm not sure -- maybe with a colourscheme?  Do you 
>> have a
>>  marking in mind?
> 
> Libsanitizer is capable of using colors for report printing.
> I can help with that and come up with a patch for upstream.
> 
>>
>>  Uninitialised shadow space has the zero tag, however, there are a 
>> few
>>  extra details that help understanding these traces:
>>
>>  On the stack, zero is both uninitialized and "the background" (i.e.
>>  the tag for anything not specially instrumented, like register 
>> spills
>>  and parameters passed on the stack).
>>  However, accessible tagged objects can be given a zero tag.
> 
> Question here would be if we should use non-zero tags here? Maybe related
> to my comment about skipping of HWASAN_STACK_BACKGROUND tag?

Unfortunately we can't skip non-zero tags at compile time when using a 
random frame tag.  This is because we don't know at compile time what 
the random frame tag will be.

On each entry to a frame a "base tag" is generated randomly at runtime.
Each local object in the frame has a compile-time offset that's what 
gets calculated in `hwasan_increment_tag` -- the offset from this random 
tag.
The tag assigned to a local object is the runtime random frame tag plus 
the compile-time constant offset.

I could avoid HWASAN_STACK_BACKGROUND as a tag when the parameter 
`hwasan-random-frame-tag` is false, since then there is no runtime 
random base tag (instead I start with zero).

I'll be happy to add that in if you'd like -- I decided against it since 
it would only matter when a function has 256 or more variables, but I 
flip-flopped on the decision a few times.

> 
>>  We allow this to avoid runtime checks when incrementing the random
>>  frame tag to get the tag for a new local variable.
>>  We can easily avoid the zero tag at compile-time if we don't use a
>>  random tag for each frame.  I had this in development at one point
>>  and found it quite useful for verification.  I already have an 
>> option
>>  to disable random tags for each frame that this ability could go
>>  under.
>>  I don't believe (but am not 100% certain) this option is in LLVM.
>>
>>  On the heap uninitialised is tag zero, but memory that has been
>>  `free`d is given a random tag, so non-zero in a dump does not mean a
>>  valid object.
>>
>> c) Is there an alternate notation you have in mind?
>>  I would guess the "dots" are there to say "this granule has no
>>  short-tag information", and I'm not sure what would be a better
>>  way to demonstrate that.
> 
> Now I've got it here. Dot means that top-byte of a pointer equals to zero.
> Right?

Ah!
I think I never described the "short-tag" functionality, and the fact 
it's in the debug output is getting confusing.

This will also be part of answering your question "c)", and question 
"h", in the other email 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01950.html .

--

The main tagging behaviour as described has a natural limitation.
Invalid accesses that do not cross a 16 byte boundary are not caught, 
since each shadow-memory tag applies to a 16 byte chunk.

To account for this, HWASAN has a "short-tag" functionality.
This functionality was introduced in llvm-svn revision 365551.

Usually, a shadow-memory byte records the *tag* that is valid for access 
to the relevant 16 byte granule in normal memory.
When using short-tags, if an object fills only part of a 16 byte granule 
in normal memory, the corresponding shadow-memory byte stores the 
*length (in bytes) into this granule that is valid*.
The *tag* is then stored in the last byte of the 16 byte granule in 
normal memory.
(We know that last byte is unused, since this is a "short" granule tag).

Now, checking a memory access consists of two parts.
1) A normal tag comparison.
2) A fallback in the tag-mismatch case.
This fallback checks if the accessing pointer is accessing less bytes
into the granule than the length given in shadow-memory.
Then if that's the case it also checks the pointers tag matches the
last byte in this 16 byte granule.

That is a little difficult to explain clearly in text, so I apologise if 
the above doesn't make sense.

--

The hwasan error-reporting output lists both memory tags *and* short-tags.
These are the two sections under the titles of "Memory tags ..." and 
"Tags for short granules ...".

The first printed section shows what is stored in shadow-memory.
This is usually the tag, but can be a length if using "short" tags.
The second section contains the "last byte of a granule" for ever

Re: [PATCH 1/2] PR 92463 MPFR modernization in GFortran

2019-11-20 Thread Tobias Burnus


Hi Janne,

On 11/18/19 9:34 PM, Janne Blomqvist wrote:

Now that we require a minimum of MPFR 3.1.0+ to build GCC, we can do
some modernization of the MPFR usage in the GFortran frontend.


OK – thanks for the cleanup.

[For reference, those changes were introduced in MPFR 3.0.0, cf. 
https://www.mpfr.org/mpfr-3.0.0/#changes ; 3.0 and 3.1 also added a 
bunch of new functions, MPFR_RNDA and did remove some undefined behaviour.]


Do you intent update the other places as well? Namely:


This patch replaces
1) GMP_RND* with MPFR_RND*
Also used by: configure.ac, gcc/builtins.c, gcc/fold-const-call.c, 
gcc/gimple-ssa-sprintf.c, gcc/go/…, gcc/real.c, gcc/realmpfr.h, 
gcc/ubsan.c.


2) mp_exp_t with mpfr_exp_t

gcc/realmpfr.c + gcc/go/…


3) mp_prec_t with mpfr_prec_t

(only used by gcc/fortran)

4) mp_rnd_t with mpfr_rnd_t


gcc/builtins.c, gcc/fold-const-call.c, gcc/realmpfr.c

Cheers,

Tobias


gcc/fortran/ChangeLog:

2019-11-18  Janne Blomqvist  

PR fortran/92463
* arith.c (gfc_mpfr_to_mpz): Change mp_exp_t to mpfr_exp_t.
(gfc_check_real_range): Likewise.
* gfortran.h (GFC_RND_MODE): Change GMP_RNDN to MPFR_RNDN.
* module.c (mio_gmp_real): Change mp_exp_t to mpfr_exp_t.
* simplify.c (degrees_f): Change mp_rnd_t to mpfr_rnd_t.
(radians_f): Likewise.
(fullprec_erfc_scaled): Change mp_prec_t to mpfr_prec_t.
(asympt_erfc_scaled): Likewise.
(gfc_simplify_nearest): Change mp_exp_t to mpfr_exp_t, and
GMP_RND* to MPFR_RND*.
---
  gcc/fortran/arith.c|  8 
  gcc/fortran/gfortran.h |  2 +-
  gcc/fortran/module.c   |  2 +-
  gcc/fortran/simplify.c | 20 ++--
  4 files changed, 16 insertions(+), 16 deletions(-)

[PATCH] PR c++/92236: Improve static assertions of concepts

2019-11-20 Thread Andrew Sutton

This patch builds on https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01034.html.

Tie static assertion diagnostics into constraint diagnostic. For
example, this program:

template
concept iterator =
  requires (I i)
  {
++i;
*i;
  };

static_assert(iterator);

Yields these errors:

x.cpp:11:15: error: static assertion failed
   11 | static_assert(iterator);
  |   ^
x.cpp:11:15: note: constraints not satisfied
x.cpp:4:9:   required by the constraints of ‘template concept iterator’
x.cpp:5:3:   in requirements with ‘int i’
x.cpp:8:5: note: the required expression ‘* i’ is invalid
8 | *i;
  | ^~

Andrew Sutton


0001-Emit-detailed-diagnostics-for-static-assertion-failu.patch
Description: Binary data

Re: Reverting r278411

2019-11-20 Thread Richard Sandiford

Segher Boessenkool  writes:
> Hi!
>
> On Wed, Nov 20, 2019 at 09:42:46AM +, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> >> Before I resubmit, why is the simplify-rtx.c part all wrong?
>> >
>> > It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
>> > lt gt eq un are hardly worth documenting or making symbolic constants
>> > for, because a) they are familiar to all powerpc users anyway (the
>> > assembler has those as predefined constants!), admittedly this isn't a
>> > strong argument for most people;
>> 
>> Ah, OK.  I was totally unaware of the connection.
>
> Yeah, I should have documented it.  Time pressure was high the last N
> weeks, with everyone else putting stuff in before end of stage 1 things
> broke left and right for me.  Normally I would just not update trunk the
> last two months or so of stage 1 (for development -- for testing you
> always need current trunk of course), but I needed some stuff from it.
> Oh well, I finally dug myself out.
>
>> > but also b) they were only used in two short and obvious routines.
>> > After your patch the routines are no longer short or obvious.
>> >
>> > A comparison result is exactly one of four things: lt, gt, eq, or un.
>> > So we can express testing for any subset of those with just an OR of
>> > the masks for the individual conditions.  Whether a comparison is
>> > floating point, floating point no-nans, signed, or unsigned, is just
>> > a property of the comparison, not of the result, in this representation.
>> 
>> Yeah, this works for OR because we never lose the unorderdness.
>
> It works for everything, including things with a NOT.
>
> If the mode does not allow unordered, you mask that away all the way at
> the end, and nothing else needs to be done.  This doesn't have to be done
> for IOR because you can never end up with unordered in the mask if you
> didn't start out with some code that allows unordered.
>
> You also can encode NE as not allowing unordered, if you have a mode
> where that does not exist, but that is purely cosmetic.
>
> 8421  "full" "no-nan"
>   false  false
> 1000  lt lt
> 0100  gt gt
> 1100  ltgt   ne
> 0010  eq eq
> 1010  le le
> 0110  ge ge
> 1110  orderedtrue
> 0001  unordered
> 1001  unlt
> 0101  ungt
> 1101  ne
> 0011  uneq
> 1011  unle
> 0111  unge
>   true
>
> So for !HONOR_NANS modes we need to output LTGT as NE, and ORDERED as
> true, and that is all.
>
>> There were two aspects to my patch.  One was adding AND, and had:
>> 
>>   /* We only handle AND if we can ignore unordered cases.  */
>>   bool honor_nans_p = HONOR_NANS (GET_MODE (op0));
>>   if (code != IOR && (code != AND || honor_nans_p))
>> return 0;
>> 
>> This is needed because e.g. UNLT & ORDERED -> LT is only conditionally
>> valid.  It doesn't sound like you're objecting to that bit, is that right?
>> Or was this what you had in mind with the reference to no-nans?
>
> UNLT & ORDERED is always LT.  When would it not be true?

LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.

>> As mentioned in the covering note, I punted for this because handling
>> trapping FP comparisons correctly for AND would need its own set of
>> testcases.
>
> This isn't trapping arithmetic.  Unordered is a perfectly normal result.
> As IEEE 754 says:
>   Four mutually exclusive relations are possible: less than, equal,
>   greater than, and unordered.
> This same is true when !HONOR_NANS, for signed integer comparisons, and
> for unsigned integer comparisons: just UNORDERED never happens.
>
>> > If you do not mix signed and unsigned comparisons (they make not much
>> > sense mixed anyway(*)), you need no changes at all: just translate
>> > ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
>> > function (there already are helper functions for that, signed_condition
>> > and unsigned_condition).
>> 
>> So this all seems to come down to whether unsigned comparisons are handled
>> as separate mask bits or whether they're dealt with by removing the
>> unsignedness and then adding it back.  ISTM both are legitimate ways
>> of doing it.  I don't think one of them is "all wrong".
>
> It violates the whole design of the thing left and right.  I never
> documented that well (or at all), of course :-/
>
>> I was very belatedly getting around to dealing with Joseph's comment
>> when you sent your patch and had it approved.  Since that patch seemed
>> to be more favourably received in general, I was trying to work within
>> the existing style of your version.  And without the powerpc background,
>> I just assumed that the mask values were "documented" by the first block
>> of case statements:
>> 
>> case LT:
>>   return 8;
>> case GT:
>>   return 4;
>> case EQ:
>>   return 2;
>> case UNORDERED:
>>   return 1;
>
> Yeah, but it may not be obvious that exactly one of those is true for
> any comparison, and you can combine them t

Re: [PATCH v3] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-20 Thread Maciej W. Rozycki

On Wed, 20 Nov 2019, Jakub Jelinek wrote:

> Ok with appropriate ChangeLog entry.

 I've used one included with the submission.  Change applied now, thanks 
for your review.

  Maciej

Re: [PATCH 2/2] PR 92463 MPFR modernization: Revert r269139

2019-11-20 Thread Tobias Burnus


LGTM.

Thanks,

Tobias

PS: For reference, mpfr_regular_p was added in MPFR 3.0.0 (as stated); 
acting as follows:
mpfr_number_p = returns nonzero if ordinary number (i.e., neither NaN 
nor an infinity),
mpfr_regular_p = returns nonzero if regular number (i.e., neither NaN, 
nor an infinity nor zero)


On 11/18/19 9:34 PM, Janne Blomqvist wrote:

Commit r269139 fixed an accidental dependency on MPFR 3.0. As we now
require at least MPFR 3.1.0+ we can revert it and instead use the
simpler MPFR 3.0+ code.

ChangeLog entry of the original commit was:

2019-02-23  David Malcolm  
 Jakub Jelinek  

 PR middle-end/88074
 * simplify.c (norm2_do_sqrt, gfc_simplify_norm2): Use
 mpfr_number_p && !mpfr_zero_p instead of mpfr_regular_p.
 (norm2_add_squared): Likewise.  Use mp_exp_t rather than mpfr_exp_t.
---
  gcc/fortran/simplify.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index a5c940ca2d5..b48bf014121 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6023,8 +6023,8 @@ norm2_add_squared (gfc_expr *result, gfc_expr *e)
  
gfc_set_model_kind (result->ts.kind);

int index = gfc_validate_kind (BT_REAL, result->ts.kind, false);
-  mp_exp_t exp;
-  if (mpfr_number_p (result->value.real) && !mpfr_zero_p (result->value.real))
+  mpfr_exp_t exp;
+  if (mpfr_regular_p (result->value.real))
  {
exp = mpfr_get_exp (result->value.real);
/* If result is getting close to overflowing, scale down.  */
@@ -6038,7 +6038,7 @@ norm2_add_squared (gfc_expr *result, gfc_expr *e)
  }
  
mpfr_init (tmp);

-  if (mpfr_number_p (e->value.real) && !mpfr_zero_p (e->value.real))
+  if (mpfr_regular_p (e->value.real))
  {
exp = mpfr_get_exp (e->value.real);
/* If e**2 would overflow or close to overflowing, scale down.  */
@@ -6079,9 +6079,7 @@ norm2_do_sqrt (gfc_expr *result, gfc_expr *e)
if (result != e)
  mpfr_set (result->value.real, e->value.real, GFC_RND_MODE);
mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE);
-  if (norm2_scale
-  && mpfr_number_p (result->value.real)
-  && !mpfr_zero_p (result->value.real))
+  if (norm2_scale && mpfr_regular_p (result->value.real))
  {
mpfr_t tmp;
mpfr_init (tmp);
@@ -6120,9 +6118,7 @@ gfc_simplify_norm2 (gfc_expr *e, gfc_expr *dim)
result = simplify_transformation_to_scalar (result, e, NULL,
  norm2_add_squared);
mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE);
-  if (norm2_scale
- && mpfr_number_p (result->value.real)
- && !mpfr_zero_p (result->value.real))
+  if (norm2_scale && mpfr_regular_p (result->value.real))
{
  mpfr_t tmp;
  mpfr_init (tmp);

[PATCH] doc: Correct `--enable-version-specific-runtime-libs' support information

2019-11-20 Thread Maciej W. Rozycki

The `--enable-version-specific-runtime-libs' configuration option is now 
supported throughout all of our target library subdirectories, so update 
installation documentation accordingly and also mention that the default 
for the option is `yes' for libada and `no' for the remaining libraries.

gcc/
* doc/install.texi (Options specification): Remove the list of 
target library subdirectories supporting 
`--enable-version-specific-runtime-libs'.  Document defaults for 
the option.
---
 gcc/doc/install.texi |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-doc-version-specific-runtime-libs.diff
Index: gcc/gcc/doc/install.texi
===
--- gcc.orig/gcc/doc/install.texi
+++ gcc/gcc/doc/install.texi
@@ -1560,8 +1560,8 @@ addition, @samp{libstdc++}'s include fil
 @file{@var{libdir}} unless you overruled it by using
 @option{--with-gxx-include-dir=@var{dirname}}.  Using this option is
 particularly useful if you intend to use several versions of GCC in
-parallel.  This is currently supported by @samp{libgfortran},
-@samp{libstdc++}, and @samp{libobjc}.
+parallel.  The default is @samp{yes} for @samp{libada}, and @samp{no} for
+the remaining libraries.
 
 @item @anchor{WithAixSoname}--with-aix-soname=@samp{aix}, @samp{svr4} or 
@samp{both}
 Traditional AIX shared library versioning (versioned @code{Shared Object}

Fix nonlinearity in estimate_edge_growth

2019-11-20 Thread Jan Hubicka

Hi,
this patch treads ages old problem that compile time needed to estimate call
size is not constant, but a function of a number of calls of the callee.
If there is a function with many callers and many callees this triggers
quadratic behaviour.

This patch adds summary info for all callees of a node which is of same format
as size_time_table used to account normal code.  It is a different table since,
unlike normal code, call statemetns are changing thorough the IPA ooptimization
queue. In a followup patch I will add incremental update of this summary which
will solve the second traditional quadratic bottleneck of greedy inliner
(with inlinining very many calls into large function).

This has became problem for compiling cc1 itself because with enablement of
-finline-functions at -O2 we end up inlining a lot into the auto-generated
insn-* pattern matchers (since a lot of predicates are small).
Thus WPA inlining time regressed from 6s in gcc 9 to 65s in trunk two weeks
back.

The memory use of new tables is 64bytes per function summary and at most one
entry in the allocated vector per call (whole point of the change is to glob
calls with same predicates together and also cap number of predicates we care
about). Overal 10MB for cc1 (out of 900MB we need at WPA time).

It would be possible to merge both size_time_table and call_size_time_table to
one saving some of overhead.  This would however make it impossible to
recalculate the data and do some sanity checking and I am affraid of making
that very hard to maintain

The profile of WPA cc1 shows:

-   47.85% 0.30%  lto1-wpa lto1  [.] 
inline_small_functions
   - 47.55% inline_small_functions
  - 24.20% update_callee_keys
 - 6.95% can_inline_edge_p
  1.32% sanitize_flags_p
+ 1.32% is_tm_pure
 - 4.34% update_edge_key
  3.48% edge_badness
   4.24% want_inline_small_function_p
 - 3.51% can_inline_edge_by_limits_p
  0.61% estimate_size_after_inlining
   0.89% cgraph_node::get_availability
  - 13.44% inline_call
 + 10.57% ipa_update_overall_fn_summary
 + 1.23% clone_inlined_nodes
   1.06% ipa_merge_fn_summary_after_inlining
  - 5.27% update_caller_keys
 + 4.46% want_inline_small_function_p
   0.62% can_inline_edge_p
1.34% fibonacci_heap::extract_minimum_node
  + 0.83% want_inline_small_function_p
  + 0.73% estimate_growth

ipa_update_overall_fn_summary is the nonlinearity of updating summaries after
inlining (each inline update is function of size of the inline tree of caller).

Honza

* ipa-fnsummary.c (ipa_fn_summary::account_size_time): Add CALL
parameter and update call_size_time_table.
(ipa_fn_summary::max_size_time_table_size): New constant.
(estimate_calls_size_and_time_1): Break out from ...
(estimate_calls_size_and_time): ... here; implement summary production.
(summarize_calls_size_and_time): New function.
(ipa_call_context::estimate_size_and_time): Bypass
estimate_calls_size_and_time for leaf functions.
(ipa_update_overall_fn_summary): Likewise.
* ipa-fnsummary.h (call_size_time_table): New.
(ipa_fn_summary::account_size_time): Update prototype.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278496)
+++ ipa-fnsummary.c (working copy)
@@ -146,17 +147,22 @@ ipa_dump_hints (FILE *f, ipa_hints hints
 /* Record SIZE and TIME to SUMMARY.
The accounted code will be executed when EXEC_PRED is true.
When NONCONST_PRED is false the code will evaulate to constant and
-   will get optimized out in specialized clones of the function.   */
+   will get optimized out in specialized clones of the function.
+   If CALL is true account to call_size_time_table rather than
+   size_time_table.   */
 
 void
 ipa_fn_summary::account_size_time (int size, sreal time,
   const predicate &exec_pred,
-  const predicate &nonconst_pred_in)
+  const predicate &nonconst_pred_in,
+  bool call)
 {
   size_time_entry *e;
   bool found = false;
   int i;
   predicate nonconst_pred;
+  vec *table = call
+  ? call_size_time_table : size_time_table;
 
   if (exec_pred == false)
 return;
@@ -168,23 +174,23 @@ ipa_fn_summary::account_size_time (int s
 
   /* We need to create initial empty unconitional clause, but otherwie
  we don't need to account empty times and sizes.  */
-  if (!size && time == 0 && size_time_table)
+  if (!size && time == 0 && table)
 return;
 
   gcc_assert (time >= 0);
 
-  for (i = 0; vec_safe_iterate (size_time_table, i, &e); i++)
+  for (i = 0; vec_safe_iterate (table, i, &e); i++)
 if (e->exec_predicate == exec_pred
&& e->nonc

Re: Reverting r278411

2019-11-20 Thread Segher Boessenkool

On Wed, Nov 20, 2019 at 03:46:29PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > UNLT & ORDERED is always LT.  When would it not be true?
> 
> LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.

No?  -ftrapping-math makes nothing trap.  The only thing it does is to
not do optimisations that are not valid if traps are considered to be
a user-visible thing.

Almost nothing ever traps on quiet NaNs.

And indeed the existing code is wrong for this already, as I mentioned
before (in the context of IEEE SNaNs).  But yeah, we can do almost no
optimisation if trapping math is true, so we should skip everything in
that case.  I wonder how much of existing GCC code breaks this as well
though, hrm.

Segher

Optimize allocations for evaluate_properties_for_edges

2019-11-20 Thread Jan Hubicka

Hi,
this patch avoids malloc traffic in evaluate_properties_for_edge which
allocates vectors holding known values, aggregates and polyorphic contextes.
I made the vector allocated by a caller who can place them at stack using
auto_vec.

The patch also adds logic to avoid calculation and clearing of these vectors
for parameters which are not used.

I realize that API for evaluate_properties_for_edge became somewhat ugly
with adding a lot of additional stuff.  I will clean it up next stage1.

With this patch we still do a lot of allocations to hold
ipa_agg_value_set::items which are vectors within vectors.  I wonder how much
of this work would go away if we would further track what parameters are being
used as aggregates?

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* ipa-fnsummary.c (evaluate_conditions_for_known_args): Be
ready for some vectors to not be allocated.
(evaluate_properties_for_edge): Document better; make
known_vals and known_aggs caller allocated; avoid determining
values of parameters which are not used.
(ipa_merge_fn_summary_after_inlining): Pre allocate known_vals and
known_aggs.
* ipa-inline-analysis.c (do_estimate_edge_time): Likewise.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278464)
+++ ipa-fnsummary.c (working copy)
@@ -329,7 +329,7 @@ evaluate_conditions_for_known_args (stru
 
   for (i = 0; vec_safe_iterate (info->conds, i, &c); i++)
 {
-  tree val;
+  tree val = NULL;
   tree res;
   int j;
   struct expr_eval_op *op;
@@ -338,14 +338,8 @@ evaluate_conditions_for_known_args (stru
  (especially for K&R style programs).  So bound check here (we assume
  known_aggs vector, if non-NULL, has the same length as
  known_vals).  */
-  gcc_checking_assert (!known_aggs.exists ()
+  gcc_checking_assert (!known_aggs.length () || !known_vals.length ()
   || (known_vals.length () == known_aggs.length ()));
-  if (c->operand_num >= (int) known_vals.length ())
-   {
- clause |= 1 << (i + predicate::first_dynamic_condition);
- nonspec_clause |= 1 << (i + predicate::first_dynamic_condition);
- continue;
-   }
 
   if (c->agg_contents)
{
@@ -353,19 +347,24 @@ evaluate_conditions_for_known_args (stru
 
  if (c->code == predicate::changed
  && !c->by_ref
+ && c->operand_num < (int)known_vals.length ()
  && (known_vals[c->operand_num] == error_mark_node))
continue;
 
- if (known_aggs.exists ())
+ if (c->operand_num < (int)known_aggs.length ())
{
  agg = &known_aggs[c->operand_num];
- val = ipa_find_agg_cst_for_param (agg, known_vals[c->operand_num],
+ val = ipa_find_agg_cst_for_param (agg,
+   c->operand_num
+  < (int) known_vals.length ()
+   ? known_vals[c->operand_num]
+   : NULL,
c->offset, c->by_ref);
}
  else
val = NULL_TREE;
}
-  else
+  else if (c->operand_num < (int) known_vals.length ())
{
  val = known_vals[c->operand_num];
  if (val == error_mark_node && c->code != predicate::changed)
@@ -491,7 +490,18 @@ evaluate_conditions_for_known_args (stru
 }
 
 
-/* Work out what conditions might be true at invocation of E.  */
+/* Work out what conditions might be true at invocation of E.
+   Compute costs for inlined edge if INLINE_P is true.
+
+   Return in CLAUSE_PTR the evaluated condistions and in NONSPEC_CLAUSE_PTR
+   (if non-NULL) conditions evaluated for nonspecialized clone called
+   in a given context.
+
+   KNOWN_VALS_PTR and KNOWN_AGGS_PTR must be non-NULL and will be filled by
+   known canstant and aggregate values of parameters.
+
+   KNOWN_CONTEXT_PTR, if non-NULL, will be filled by polymorphic call contexts
+   of parameter used by a polymorphic call.  */
 
 void
 evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
@@ -504,113 +514,141 @@ evaluate_properties_for_edge (struct cgr
 {
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
   class ipa_fn_summary *info = ipa_fn_summaries->get (callee);
-  vec known_vals = vNULL;
   auto_vec known_value_ranges;
-  vec known_aggs = vNULL;
   class ipa_edge_args *args;
 
   if (clause_ptr)
 *clause_ptr = inline_p ? 0 : 1 << predicate::not_inlined_condition;
-  if (known_vals_ptr)
-known_vals_ptr->create (0);
-  if (known_contexts_ptr)
-known_contexts_ptr->create (0);
 
   if (ipa_node_params_sum

Re: Reverting r278411

2019-11-20 Thread Jakub Jelinek

On Wed, Nov 20, 2019 at 10:17:38AM -0600, Segher Boessenkool wrote:
> On Wed, Nov 20, 2019 at 03:46:29PM +, Richard Sandiford wrote:
> > Segher Boessenkool  writes:
> > > UNLT & ORDERED is always LT.  When would it not be true?
> > 
> > LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.
> 
> No?  -ftrapping-math makes nothing trap.  The only thing it does is to
> not do optimisations that are not valid if traps are considered to be
> a user-visible thing.
> 
> Almost nothing ever traps on quiet NaNs.

A lot traps even with quiet NaNs, assuming exceptions are enabled.
E.g. for x86 https://www.felixcloutier.com/x86/cmppd#tbl-3-1 lists the
details which operations rise FE_INVALID and which don't if it is enabled.

Jakub

Re: Reverting r278411

2019-11-20 Thread Richard Sandiford

Richard Sandiford  writes:
>>> I'd actually considered converting to signed and back instead of adding
>>> extra cases, but I thought that would be rejected as too inefficient.
>>> (That was a concern with my patch above.)  It seemed like one of the selling
>>> points of doing it your way was that everything was handled by one switch
>>> statement "in" and one switch statement "out", and I was trying to
>>> preserve that.
>>> 
>>> signed_condition and unsigned_condition assert on unordered comparisons,
>>> so if we're going to go that route, we either need to filter them out
>>> first or add maybe_* versions of the routines that return UNKNOWN.
>>
>> Yeah.  rs6000 has
>>
>> (define_predicate "unsigned_comparison_operator"
>>   (match_code "ltu,gtu,leu,geu"))
>> (define_predicate "signed_comparison_operator"
>>   (match_code "lt,gt,le,ge"))
>>
>> Maybe we should have those be for every target?
>>
>>   bool is_signed = (signed_comparison_operator (code0)
>> || signed_comparison_operator (code1));
>>   bool is_unsigned = (unsigned_comparison_operator (code0)
>>   || unsigned_comparison_operator (code1));
>>
>>   /* Don't allow mixing signed and unsigned comparisons.  */
>>   if (is_signed && is_unsigned)
>> return 0;
>>
>> (this takes care of your EQ/NE concern automatically, btw)
>>
>>   if (unsigned_comparison_operator (code0))
>> code0 = signed_condition (code0);
>>   if (unsigned_comparison_operator (code1))
>> code1 = signed_condition (code1);
>>
>> and at the end
>>
>>   if (is_unsigned && signed_comparison_operator (code))
>> code = unsigned_condition (code);
>
> OK, thanks, I'll do it this way.

Actually, this doesn't work because *_operators want rtxes rather
than codes.  I can get around that by passing op0 and op1 for
the existing rtxes.  For the conversion at the end, I can do:

  machine_mode compared_mode = GET_MODE (XEXP (op0, 0));

  if (code == ORDERED && INTEGRAL_MODE_P (compared_mode))
return const_true_rtx;

  if (is_unsigned)
code = unsigned_condition (code);

Or I can add signed_comparison_p and unsigned_comparison_p functions
that take codes instead of rtxes.

Do either of those sound OK, or would you prefer something else?

Richard

Re: RFA; patch to fix PR90007

2019-11-20 Thread Alexander Monakov

On Wed, 20 Nov 2019, Alexander Monakov wrote:

> On Wed, 20 Nov 2019, Richard Biener wrote:
> 
> > On Tue, Nov 19, 2019 at 5:07 PM Vladimir Makarov  
> > wrote:
> > >
> > > The following patch fixes
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007
> > >
> > > Sometime ago a code which permits LRA to reload hard register into
> > > memory as it did for pseudo were added.  But this LRA possibility was
> > > not reflected in recog.c.  The following patch fixes this discrepancy
> > > and as a result fixes PR90007.
> > >
> > > OK to commit?
> > 
> > I guess the change is OK but for the bug itself it sounds like
> > selective scheduling doesn't properly recognize insns it
> > proagates into (and avoids doing that then)?  That is,
> > selective scheduling creates invalid RTL?
> 
> We validate the substitution by invoking validate_replace_rtx_part_nosimplify
> from substitute_reg_in_expr.  I think that should be sufficient?  I see 
> similar
> code in haifa-sched uses attempt_change, which also ultimately uses
> apply_change_group.

Although looking at this more, I see that we specifically arrange for a call to
constrain_operands in replace_src_with_reg_ok_p (with a comment), but here in
substitute_reg_in_expr we have a '???' comment that seems to mention that
theoretically there might be a problem, but it never came up in practice.

So there's an inconsistency in sel-sched as well.

Alexander

Document -Wc11-c2x-compat

2019-11-20 Thread Joseph Myers

My patch that added initial C2X support and associated command-line
options missed documenting -Wc11-c2x-compat although the other options
were properly documented.  This patch adds the missing documentation.

Tested with "make info" and "make pdf".  Applied to mainline.  Will apply 
to GCC 9 branch after reducing the list of features covered to reflect the 
more limited C2X support in GCC 9.

2019-11-20  Joseph Myers  

* doc/invoke.texi (-Wc11-c2x-compat): Document.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 278507)
+++ gcc/doc/invoke.texi (working copy)
@@ -295,6 +295,7 @@
 -Wbool-compare  -Wbool-operation @gol
 -Wno-builtin-declaration-mismatch @gol
 -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
+-Wc11-c2x-compat @gol
 -Wc++-compat  -Wc++11-compat  -Wc++14-compat  -Wc++17-compat  @gol
 -Wc++20-compat  @gol
 -Wcast-align  -Wcast-align=strict  -Wcast-function-type  -Wcast-qual  @gol
@@ -6830,6 +6831,16 @@
 and so on.  This option is independent of the standards mode.  Warnings are
 disabled in the expression that follows @code{__extension__}.
 
+@item -Wc11-c2x-compat @r{(C and Objective-C only)}
+@opindex Wc11-c2x-compat
+@opindex Wno-c11-c2x-compat
+Warn about features not present in ISO C11, but present in ISO C2X.
+For instance, warn about omitting the string in @code{_Static_assert},
+use of @samp{[[]]} syntax for attributes, use of decimal
+floating-point types, and so on.  This option is independent of the
+standards mode.  Warnings are disabled in the expression that follows
+@code{__extension__}.
+
 @item -Wc++-compat @r{(C and Objective-C only)}
 @opindex Wc++-compat
 @opindex Wno-c++-compat

-- 
Joseph S. Myers
jos...@codesourcery.com

[C++ Patch] Avoid redundant error messages from build_x_arrow

2019-11-20 Thread Paolo Carlini


Hi,

while working on improving the locations of cp_build_indirect_ref_1 & 
co, I noticed this nit which seems a separate issue.


In a nutshell, at variance with many other cases, in build_x_arrow we 
don't immediately check for error_mark_node the return value of 
decay_conversion. Then, for the testcase, after a sensible:


error: invalid use of member function ‘C* C::f()’ (did you forget the 
‘()’ ?)


we also issue:

error: base operand of ‘->’ is not a pointer

which is certainly redundant and a bit misleading, is talking about the 
'f' mentioned in the first message. The amended behavior is also 
consistent with EDG and CLANG.


Tested x86_64-linux, as usual.

Thanks, Paolo.

//

/gcc
2019-11-20  Paolo Carlini  

* typeck2.c (build_x_arrow): Early return if decay_conversion
returns error_mark_node.

/testsuite
2019-11-20  Paolo Carlini  

* g++.dg/parse/error43.C: Adjust expected error.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 278499)
+++ cp/typeck2.c(working copy)
@@ -2044,7 +2044,11 @@ build_x_arrow (location_t loc, tree expr, tsubst_f
last_rval = convert_from_reference (last_rval);
 }
   else
-last_rval = decay_conversion (expr, complain);
+{
+  last_rval = decay_conversion (expr, complain);
+  if (last_rval == error_mark_node)
+   return error_mark_node;
+}
 
   if (TYPE_PTR_P (TREE_TYPE (last_rval)))
 {
Index: testsuite/g++.dg/parse/error43.C
===
--- testsuite/g++.dg/parse/error43.C(revision 278499)
+++ testsuite/g++.dg/parse/error43.C(working copy)
@@ -2,4 +2,4 @@
 // { dg-options "" }
 
 class C { public: C* f(); int get(); };
-int f(C* p) { return p->f->get(); }  // { dg-error "forget the '\\(\\)'|base 
operand" }
+int f(C* p) { return p->f->get(); }  // { dg-error "25:invalid use of member 
function" }

Fix (most of) nonlinearity in update_callee_keys

2019-11-20 Thread Jan Hubicka

Hi,
this patch makes inliner to update callee keys only in function within
newly inlined clone rather than in the whole function it is inlined to.
This is possible only when the remaining edges are not becoming more hot
for inlining and on this I rely on monotonocity of the badness function:
if caller gets bigger and slower the call is not becoming hotter.
This is not true when wrapper penalty is applied and thus a special case
is needed.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-inline.c (wrapper_heuristics_may_apply): Break out from ...
(edge_badness): ... here.
(inline_small_functions): Use monotonicity of badness calculation
to avoid redundant updates.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 278459)
+++ ipa-inline.c(working copy)
@@ -1097,6 +1097,17 @@ want_inline_function_to_all_callers_p (s
   return true;
 }
 
+/* Return true if WHERE of SIZE is a possible candidate for wrapper heuristics
+   in estimate_edge_badness.  */
+
+static bool
+wrapper_heuristics_may_apply (struct cgraph_node *where, int size)
+{
+  return size < (DECL_DECLARED_INLINE_P (where->decl)
+? inline_insns_single (where, false)
+: inline_insns_auto (where, false));
+}
+
 /* A cost model driving the inlining heuristics in a way so the edges with
smallest badness are inlined first.  After each inlining is performed
the costs of all caller edges of nodes affected are recomputed so the
@@ -1227,10 +1238,8 @@ edge_badness (struct cgraph_edge *edge,
 and it is not called once.  */
  if (!caller_info->single_caller && overall_growth < caller_growth
  && caller_info->inlinable
- && ipa_size_summaries->get (caller)->size
-< (DECL_DECLARED_INLINE_P (caller->decl)
-   ? inline_insns_single (caller, false)
-   : inline_insns_auto (caller, false)))
+ && wrapper_heuristics_may_apply
+(caller, ipa_size_summaries->get (caller)->size))
{
  if (dump)
fprintf (dump_file,
@@ -2158,11 +2167,24 @@ inline_small_functions (void)
fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
 
  gcc_checking_assert (!callee->inlined_to);
+
+ int old_size = ipa_size_summaries->get (where)->size;
+ sreal old_time = ipa_fn_summaries->get (where)->time;
+
  inline_call (edge, true, &new_indirect_edges, &overall_size, true);
  reset_edge_caches (edge->callee);
  add_new_edges_to_heap (&edge_heap, new_indirect_edges);
 
- update_callee_keys (&edge_heap, where, updated_nodes);
+ /* If caller's size and time increased we do not need to update
+all edges becuase badness is not going to decrease.  */
+ if (old_size <= ipa_size_summaries->get (where)->size
+ && old_time <= ipa_fn_summaries->get (where)->time
+ /* Wrapper penalty may be non-monotonous in this respect.
+Fortunately it only affects small functions.  */
+ && !wrapper_heuristics_may_apply (where, old_size))
+   update_callee_keys (&edge_heap, edge->callee, updated_nodes);
+ else
+   update_callee_keys (&edge_heap, where, updated_nodes);
}
   where = edge->caller;
   if (where->inlined_to)

Re: Reverting r278411

2019-11-20 Thread Segher Boessenkool

On Wed, Nov 20, 2019 at 05:30:48PM +0100, Jakub Jelinek wrote:
> On Wed, Nov 20, 2019 at 10:17:38AM -0600, Segher Boessenkool wrote:
> > On Wed, Nov 20, 2019 at 03:46:29PM +, Richard Sandiford wrote:
> > > Segher Boessenkool  writes:
> > > > UNLT & ORDERED is always LT.  When would it not be true?
> > > 
> > > LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.
> > 
> > No?  -ftrapping-math makes nothing trap.  The only thing it does is to
> > not do optimisations that are not valid if traps are considered to be
> > a user-visible thing.
> > 
> > Almost nothing ever traps on quiet NaNs.
> 
> A lot traps even with quiet NaNs, assuming exceptions are enabled.
> E.g. for x86 https://www.felixcloutier.com/x86/cmppd#tbl-3-1 lists the
> details which operations rise FE_INVALID and which don't if it is enabled.

That is what ordered comparisons (aka signaling comparisons) do, sure.
This is part of "almost nothing" in my count ;-)

Ordered comparisons should trap both with and without -ftrapping-math.
The difference is that with -fno-trapping math GCC can ignore that and
just optimise code how it wants to.


Segher

Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-20 Thread Jeff Law

On 11/20/19 7:16 AM, Bernd Schmidt wrote:
> On 11/20/19 2:50 PM, Mikael Pettersson wrote:
>> On Mon, Nov 18, 2019 at 9:57 PM Mikael Pettersson  
>> wrote:
>>>
>>> On Mon, Nov 18, 2019 at 8:31 PM Bernd Schmidt  
>>> wrote:

 Hi Mikael,

> This fixed the problem, thanks.

 Could you also run the testsuite to see if you can reproduce the
 g++.old-deja failures Andreas reported?
>>>
>>> Yes, but it will probably take another week before the native
>>> bootstrap (on aranym) and test suite run is finished.  It's currently
>>> in stage 2.
> 
> Ugh, that suggests the stage2 compiler was miscompiled. That would be
> nasty to track down.
> 
> Probably best to just run tests on stage1 and hope something shows up.
> 
> What distro are you using for native builds? The m68k debian I'm using
> does not have an installable gcc package.
Definitely agreed, best place to start is with teh stage1 testresults
and see if we can nail it down that way.

Debugging user mode qemu bits is *painful*.  For that we may ultimately
be better off with Aranym.

jeff

Re: Reverting r278411

2019-11-20 Thread Richard Sandiford

Segher Boessenkool  writes:
> On Wed, Nov 20, 2019 at 05:30:48PM +0100, Jakub Jelinek wrote:
>> On Wed, Nov 20, 2019 at 10:17:38AM -0600, Segher Boessenkool wrote:
>> > On Wed, Nov 20, 2019 at 03:46:29PM +, Richard Sandiford wrote:
>> > > Segher Boessenkool  writes:
>> > > > UNLT & ORDERED is always LT.  When would it not be true?
>> > > 
>> > > LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.
>> > 
>> > No?  -ftrapping-math makes nothing trap.  The only thing it does is to
>> > not do optimisations that are not valid if traps are considered to be
>> > a user-visible thing.
>> > 
>> > Almost nothing ever traps on quiet NaNs.
>> 
>> A lot traps even with quiet NaNs, assuming exceptions are enabled.
>> E.g. for x86 https://www.felixcloutier.com/x86/cmppd#tbl-3-1 lists the
>> details which operations rise FE_INVALID and which don't if it is enabled.
>
> That is what ordered comparisons (aka signaling comparisons) do, sure.
> This is part of "almost nothing" in my count ;-)
>
> Ordered comparisons should trap both with and without -ftrapping-math.
> The difference is that with -fno-trapping math GCC can ignore that and
> just optimise code how it wants to.

Sure, no-one was disputing that.  I think you're arguing against
a strawman here.

Richard

Re: Reverting r278411

2019-11-20 Thread Segher Boessenkool

On Wed, Nov 20, 2019 at 04:35:24PM +, Richard Sandiford wrote:
> Actually, this doesn't work because *_operators want rtxes rather
> than codes.  I can get around that by passing op0 and op1 for
> the existing rtxes.  For the conversion at the end, I can do:
> 
>   machine_mode compared_mode = GET_MODE (XEXP (op0, 0));
> 
>   if (code == ORDERED && INTEGRAL_MODE_P (compared_mode))
> return const_true_rtx;

This should be all !HONOR_NANS?  Also LTGT should be turned into NE,
under that same condition.  So something like

  if (!HONOR_NANS (mode))
{
  /* UNORDERED cannot happen without NaNs.  */
  mask &= ~1;

  /* LTGT is written as NE, and ORDERED just is always true,
 without NaNs.  */
  if (mask == 12 || mask == 14)
mask |= 1;
}

before returning true for 15.

>   if (is_unsigned)
> code = unsigned_condition (code);
> 
> Or I can add signed_comparison_p and unsigned_comparison_p functions
> that take codes instead of rtxes.

That may be easier, dunno.

Thanks,


Segher

Re: Reverting r278411

2019-11-20 Thread Segher Boessenkool

On Wed, Nov 20, 2019 at 04:59:49PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Wed, Nov 20, 2019 at 05:30:48PM +0100, Jakub Jelinek wrote:
> >> On Wed, Nov 20, 2019 at 10:17:38AM -0600, Segher Boessenkool wrote:
> >> > On Wed, Nov 20, 2019 at 03:46:29PM +, Richard Sandiford wrote:
> >> > > Segher Boessenkool  writes:
> >> > > > UNLT & ORDERED is always LT.  When would it not be true?
> >> > > 
> >> > > LT traps on quiet NaNs for -ftrapping-math, UNLT and ORDERED don't.
> >> > 
> >> > No?  -ftrapping-math makes nothing trap.  The only thing it does is to
> >> > not do optimisations that are not valid if traps are considered to be
> >> > a user-visible thing.
> >> > 
> >> > Almost nothing ever traps on quiet NaNs.
> >> 
> >> A lot traps even with quiet NaNs, assuming exceptions are enabled.
> >> E.g. for x86 https://www.felixcloutier.com/x86/cmppd#tbl-3-1 lists the
> >> details which operations rise FE_INVALID and which don't if it is enabled.
> >
> > That is what ordered comparisons (aka signaling comparisons) do, sure.
> > This is part of "almost nothing" in my count ;-)
> >
> > Ordered comparisons should trap both with and without -ftrapping-math.
> > The difference is that with -fno-trapping math GCC can ignore that and
> > just optimise code how it wants to.
> 
> Sure, no-one was disputing that.  I think you're arguing against
> a strawman here.

No, I don't understand the question I guess?  Everything in that table
that traps on quiet NaNs is a signaling comparison.

We should simply not do simplify_logical_relational_operation for
floating point types and signaling comparisons.  There is no "only for
some codes", etc.


Segher

Re: Restrict bb-slp-40.c to targets with VnQI addition (PR 92366)

2019-11-20 Thread Jeff Law

On 11/20/19 8:15 AM, Richard Sandiford wrote:
> bb-slp-40.c fails on SPARC targets without VIS4 because it
> requires addition on vectors of bytes.  There doesn't seem to be
> an existing target selector for this, so I added vect_char_add.
> (Wasn't sure whether to use vect_char_add, for consistency
> with vect_no_int_add/vect_int_mult etc., or vect_add_char for
> consistency with vect_shift_char etc.)
> 
> I took the target list from vect_int and removed targets that didn't
> seem to support the operation (namely sparc*, since we don't seem to
> have any test for VIS4, niagara7 or m8, and alpha*-*-*.)
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu, powerpc64-linux-gnu
> (Power 7) and sparc-sun-solaris2.11.  OK to install?
> 
> Richard
> 
> 
> 2019-11-20  Richard Sandiford  
> 
> gcc/
>   PR testsuite/92366
>   * doc/sourcebuild.texi (vect_char_add): Document.
> 
> gcc/testsuite/
>   PR testsuite/92366
>   * lib/target-supports.exp (check_effective_target_vect_char_add):
>   New proc.
>   * gcc.dg/vect/bb-slp-40.c: Require vect_char_add instead of vect_int.
OK
jeff

Re: Adjust expected output for bb-slp-21.c (PR 92527)

2019-11-20 Thread Jeff Law

On 11/20/19 5:46 AM, Richard Sandiford wrote:
> After r278246, we can try building the out[] store value from scalars
> if the target has no multiplication support.  That's not necessarily
> a good thing, but like most of vect/, this test is run with the cost
> model disabled.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu, powerpc64-linux-gnu
> (Power 7) and sparc-sun-solaris2.11.  OK to install?
> 
> Richard
> 
> 
> 2019-11-20  Richard Sandiford  
> 
> gcc/testsuite/
>   PR testsuite/92527
>   * gcc.dg/vect/bb-slp-21.c: Expect both SLP groups to be vectorized,
>   regardless of whether the target supports multiplication.
OK.
Jeff

Re: [PATCH] Update comment in libsanitizer/*/libtool-version files.

2019-11-20 Thread Jeff Law

On 11/20/19 4:02 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about removal of an unused libtool-version file,
> and comments in other libtool-version files are legacy.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> I'm going to install the patch if there are no objections.
> 
> Thanks,
> Martin
> 
> libsanitizer/ChangeLog:
> 
> 2019-11-20  Martin Liska  
> 
> * libtool-version: Remove.
> * lsan/libtool-version: Upate comment to not mention libmudflap.
> * tsan/libtool-version: Likewise.
> * ubsan/libtool-version: Likewise.
OK
jeff

Re: v2 [PATCH 0/X] Introduce HWASAN sanitizer to GCC

2019-11-20 Thread Matthew Malcomson

On 20/11/2019 14:29, Martin Liška wrote:
> On 11/7/19 7:37 PM, Matthew Malcomson wrote:
>> I have rebased this series onto Martin Liska's patches that take the most
>> recent libhwasan from upstream LLVM.
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00340.html
>>
>> I've also cleared up some nomenclature (I had previously used the word 
>> 'colour'
>> a few times instead of the word 'tag' and that clashes with other 
>> descriptions)
>> and based the patch series off a more recent GCC revision (r277678).
>>
>> There's an ongoing discussion on whether to have 
>> __SANITIZER_ADDRESS__, or
>> __SANITIZER_HWADDRESS__, but I'm keeping that discussion to the existing
>> thread.
>>
>> Similarly there's still the question around C++ exceptions that I'm 
>> keeping to
>> the existing thread (on the first patch series).
>>
>>
>> NOTE:
>>    Unfortunately, there's a bug in the more recent version of GCC I 
>> rebased
>>    onto.
>>    Hwasan catches this when bootstrapping, which means bootstrapping 
>> with hwasan
>>    fails.
>>    I'm working on tracking the bug down now, but sending this series 
>> upstream
>>    for visibility while that happens.
>>
>>    Bugzilla link:
>>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410
>>
>> Entire patch series attached to cover letter.=
>>
> 
> Hello.
> 
> Thank you very much for the patch set, I've just wrote some inline replies
> and I have also some questions/suggestions that I'll summarize here. One 
> caveat,
> I'm not maintainer in any of the ideas and I must confirm that I haven't 
> made
> a deep review of the part which relates to new RTL hooks and stack 
> expansion.
> I expect Jakub can help us here.
> 
> Questions/notes:
> a) For ASAN we do have bunch of parameters:
> 
> gcc --help=param | grep asan
>    asan-stack  Enable asan stack protection.
>    asan-instrument-allocas Enable asan allocas/VLAs protection.
>    asan-globals    Enable asan globals protection.
>    asan-instrument-writes  Enable asan store operations protection.
>    asan-instrument-reads   Enable asan load operations protection.
>    asan-memintrin  Enable asan builtin functions protection.
>    asan-use-after-return   Enable asan detection of use-after-return 
> bugs.
>    asan-instrumentation-with-call-threshold Use callbacks instead of 
> inline code if number of accesses in function becomes greater or equal 
> to this number.
> 
> We can probably use some of these in order to drive hwaddress in a 
> similar way. Most of them makes sense for hwaddress as well

I would prefer to keep the options different but would be happy to share 
them if others want.

I figure that there will be some parameters that make sense for hwasan 
but not for asan.
For example hwasan-random-frame-tag doesn't have any analogue for asan.
Re-using `asan-stack` for hwasan would mean we would then need to decide 
between having options with either `hwasan` or `asan` prefix being able 
to affect hwasan, or introducing a parameter `asan-random-frame-tag` 
that has no meaning on asan.

> 
> b) Apparently clang prints also value of all registers. Do we want to do 
> the same?
> 

I believe this only happens on clang for inline checking (try testing 
clang using the parameter `-mllvm --hwasan-instrument-with-calls` to see 
without).

This would be nice to have, but I'm not planning on looking at it any 
time soon.
This is currently implemented in clang on top of two not-yet implemented 
features: Inline instrumentation, and generated checking functions with 
special calling conventions.

It's certainly not something I'm aiming to get into GCC 10.

> 
> c) I'm a bit confused by the pointer tags, where clang does:
> 

Yeah -- I left out a description of short-tags.
Hopefully my explanation in the below email helped.
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01968.html

> 
> d) Are you planning to come up with inline stack variable 
> tagging/untagging for GCC 10?

To make sure I understand the question correctly:
I think you're asking about writing tags into the shadow memory.

I wasn't planning on this.

What I've posted has all the functionality I'm aiming to get in.
My stretch goal at the moment is to handle exceptions (where I currently 
have a fundamental problem I'm trying to resolve).

> e) In hwasan_increment_tag, shouldn't you skip HWASAN_STACK_BACKGROUND 
> color?

Hopefully answered in 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01968.html

> f) I would rename ASAN_MARK in tree dumps to HWASAN_MARK, similarly to 
> what you did
>     for HWASAN_CHECK?

This is an artifact of a shortcut I made (and tried but probably failed 
to explain well in 
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00392.html).

For HWASAN_CHECK I introduced a new internal function that takes the 
same arguments as ASAN_CHECK, hence this new function is printed 
differently.

For HWASAN_MARK, I used a trick to avoid adding "almost duplicate" code 
everywhere in the

Re: [PATCH 1/3] [ARC] Fix failing pr77309 for ARC700

2019-11-20 Thread Jeff Law

On 11/19/19 2:02 AM, Claudiu Zissulescu wrote:
> The patterns neg_scc_insn and not_scc_insn are not correct, leading to
> failing pr77309 test for ARC700. Add two new bic compare with zero
> patterns to improve output code.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.md (bic_f): Use cc_set_register predicate.
>   (bic_cmp0_noout): New pattern.
>   (bic_cmp0): Likewise.
>   (neg_scc_insn): Remove pattern.
>   (not_scc_insn): Likewise.
OK.

FYI between

e00e0f8c391779c8d6cc9ad2fff8056a73c765c2 (good)

fcae029b424f0546ee0efe574dff150be41271ea (bad)

I got the following regression.  I haven't looked into it yet at all,
but figured passing it along might be helpful.

> # Comparing 4 common sum files
> ## /bin/sh gcc/contrib/compare_tests  /tmp/gxx-sum1.60617 /tmp/gxx-sum2.60617
> Tests that now fail, but worked before (1 tests):
> 
> arc-sim: gcc.dg/pr86991.c (test for excess errors)


Jeff

Re: [PATCH v2] Add `--with-install-sysroot=' configuration option

2019-11-20 Thread Joseph Myers

On Wed, 20 Nov 2019, Maciej W. Rozycki wrote:

> > But even then, if you configure GCC using "--with-sysroot" or 
> > "--with-build-sysroot", both of those paths are the top-level sysroot, to 
> > which the sysroot suffix gets appended before GCC uses it for any purpose, 
> > unless you explicitly build using --no-sysroot-suffix.  So I still think 
> > it's natural for a user of GCC's configure scripts to expect the new 
> > option, like the other sysroot-related configure options, also to be one 
> > to which the per-multilib sysroot suffix gets appended before GCC uses it.  
> > And if it's not like that, the documentation needs to say so explicitly.
> 
>  Thanks for your concern, however again, AFAICT this change is tangential 
> to any sysroot suffix, which necessarily has to be already included in the 
> multilib OS directory as given by `-print-multi-os-directory', so that it 
> gets embedded within $toolexeclibdir for the purpose of target library 
> installation across the relevant subdirs, as per this excerpt from 
> `configure' code right after the assignments quoted in the example above:
> 
> multi_os_directory=`$CC -print-multi-os-directory`
> case $multi_os_directory in
>   .) ;; # Avoid trailing /.
>   *) toolexeclibdir=$toolexeclibdir/$multi_os_directory ;;
> esac
> 
> or otherwise the existing arrangement where 
> toolexeclibdir='$(toolexecdir)/lib' wouldn't have worked either (and 
> neither would in the native case where toolexeclibdir='$(libdir)').
> 
>  Does this answer clear your concern?  OK to apply with the documentation 
> thinko fixed?

The answer explains the reasoning behind the design of the option (i.e., 
the design that means it's not particularly useful with sysroot suffixes, 
because the user would still need to relocate libraries manually to the 
correct suffixed sysroot).  It is indeed the case that making a version 
useful with sysroot suffixes would not simply be a configuration change 
but involve changes in the compiler driver to disentangle two different 
uses of multilib OS directory suffixes.

However, it's not enough to answer the question about the semantics of the 
option on the mailing list.  The question is a natural one for anyone who 
knows about sysroot suffixes and is reading the documentation of the 
option.  And so *the actual GCC documentation proposed to be committed*, 
not just explanations on the mailing list that people reading that 
documentation won't see, needs to say explicitly that the OS directory 
suffix is appended to lib/ in the *unsuffixed* sysroot, so that all 
libraries get installed in the same sysroot even if suffixes are in use.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-20 Thread Jeff Law

On 11/18/19 9:15 AM, Martin Sebor wrote:
> On 11/17/19 12:03 PM, Jeff Law wrote:
>> On 11/12/19 12:55 PM, Martin Sebor wrote:
>>> On 11/12/19 10:54 AM, Jeff Law wrote:
 On 11/12/19 1:15 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:
>>
>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
 On 11/6/19 1:39 PM, Jeff Law wrote:
> On 11/6/19 1:27 PM, Martin Sebor wrote:
>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
 The -Wstringop-overflow warnings for single-byte and multi-byte
 stores mention the amount of data being stored and the
 amount of
 space remaining in the destination, such as:

 warning: writing 4 bytes into a region of size 0
 [-Wstringop-overflow=]

   123 |   *p = 0;
   |   ~~~^~~
 note: destination object declared here
    45 |   char b[N];
   |    ^

 A warning like this can take some time to analyze.  First, the
 size
 of the destination isn't mentioned and may not be easy to tell
 from
 the sources.  In the note above, when N's value is the
 result of
 some non-trivial computation, chasing it down may be a small
 project
 in and of itself.  Second, it's also not clear why the region
 size
 is zero.  It could be because the offset is exactly N, or
 because
 it's negative, or because it's in some range greater than N.

 Mentioning both the size of the destination object and the
 offset
 makes the existing messages clearer, are will become essential
 when
 GCC starts diagnosing overflow into allocated buffers (as my
 follow-on patch does).

 The attached patch enhances -Wstringop-overflow to do this by
 letting compute_objsize return the offset to its caller, doing
 something similar in get_stridx, and adding a new function to
 the strlen pass to issue this enhanced warning (eventually, I'd
 like the function to replace the -Wstringop-overflow handler in
 builtins.c).  With the change, the note above might read
 something
 like:

 note: at offset 11 to object ‘b’ with size 8 declared here
    45 |   char b[N];
   |    ^

 Tested on x86_64-linux.

 Martin

 gcc-store-offset.diff

 gcc/ChangeLog:

    * builtins.c (compute_objsize): Add an argument and set
 it to
 offset
    into destination.
    * builtins.h (compute_objsize): Add an argument.
    * tree-object-size.c (addr_object_size): Add an argument
 and
 set it
    to offset into destination.
    (compute_builtin_object_size): Same.
    * tree-object-size.h (compute_builtin_object_size):
 Add an
 argument.
    * tree-ssa-strlen.c (get_addr_stridx): Add an
 argument and
 set it
    to offset into destination.
    (maybe_warn_overflow): New function.
    (handle_store): Call maybe_warn_overflow to issue
 warnings.

 gcc/testsuite/ChangeLog:

    * c-c++-common/Wstringop-overflow-2.c: Adjust text of
 expected
 messages.
    * g++.dg/warn/Wstringop-overflow-3.C: Same.
    * gcc.dg/Wstringop-overflow-17.c: Same.

>>>
 Index: gcc/tree-ssa-strlen.c
 ===

 --- gcc/tree-ssa-strlen.c    (revision 277886)
 +++ gcc/tree-ssa-strlen.c    (working copy)
 @@ -189,6 +189,52 @@ struct laststmt_struct
  static int get_stridx_plus_constant (strinfo *, unsigned
 HOST_WIDE_INT, tree);
  static void handle_builtin_stxncpy (built_in_function,
 gimple_stmt_iterator *);
  +/* Sets MINMAX to either the constant value or the
 range VAL
 is in
 +   and returns true on success.  */
 +
 +static bool
 +get_range (tree val, wide_int minmax[2], const vr_values
 *rvals =
 NULL)
 +{

Re: [PATCH] Switch gcc ftp URL's to http

2019-11-20 Thread Joseph Myers

This patch is OK with http changed to https.  (That is, with it changed 
where the patch is already changing the URL.  While changing http to https 
makes sense more generally in the documentation whenever a site supports 
https, that's probably best not mixed with the move from ftp.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] jit: fix ICE with GCC_JIT_BOOL_OPTION_SELFCHECK_GC since r278084 (PR jit/92483)

2019-11-20 Thread David Malcolm

Since r278084 (part of the params refactoring), most of libgccjit's
test suite has been ICEing.

The root cause is that jit-playback.c injects params to its fake_args
here:

  /* Aggressively garbage-collect, to shake out bugs: */
  if (get_bool_option (GCC_JIT_BOOL_OPTION_SELFCHECK_GC))
{
  ADD_ARG ("--param");
  ADD_ARG ("ggc-min-expand=0");
  ADD_ARG ("--param");
  ADD_ARG ("ggc-min-heapsize=0");
}

(building a vec of char * where the char * are allocated using xstrdup)

and r278084 added this logic to decode_cmdline_options_to_array:

964   /* Interpret "--param" "key=name" as "--param=key=name".  */
965   const char *needle = "--param";
966   if (i + 1 < argc && strcmp (opt, needle) == 0)
967 {
968   const char *replacement
969 = opts_concat (needle, "=", argv[i + 1], NULL);
970   argv[++i] = replacement;
971 }

Note that at line 970 it manipulates the argv in-place, inserting a
new option allocated with opts_concat, which uses opts_obstack
(itself initialized from toplev::main).

jit-playback.c cleans up its fake arguments using "free", at which
point we have a free of the middle of an obstack and an ICE.

This patch fixes the issue by using the new syntax for the params.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Fixes all 60 FAILs in jit.sum, restoring the number of PASS results
from 2033 to 10469.

Committed to trunk as r278515.

gcc/jit/ChangeLog:
PR jit/92483
* jit-playback.c (gcc::jit::playback::context::make_fake_args):
Update GCC_JIT_BOOL_OPTION_SELFCHECK_GC for new --param syntax.
---
 gcc/jit/jit-playback.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 9eeb2a7..c043d69 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -2273,10 +2273,8 @@ make_fake_args (vec  *argvec,
   /* Aggressively garbage-collect, to shake out bugs: */
   if (get_bool_option (GCC_JIT_BOOL_OPTION_SELFCHECK_GC))
 {
-  ADD_ARG ("--param");
-  ADD_ARG ("ggc-min-expand=0");
-  ADD_ARG ("--param");
-  ADD_ARG ("ggc-min-heapsize=0");
+  ADD_ARG ("--param=ggc-min-expand=0");
+  ADD_ARG ("--param=ggc-min-heapsize=0");
 }
 
   if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING))
-- 
1.8.5.3

Re: [PATCH] Fix PR92462

2019-11-20 Thread Jeff Law

On 11/19/19 12:42 AM, Richard Biener wrote:
> On Mon, 18 Nov 2019, Jeff Law wrote:
> 
>> On 11/18/19 3:37 AM, Richard Biener wrote:
>>> On Mon, 18 Nov 2019, Jakub Jelinek wrote:
>>>
 On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote:
> On Mon, 18 Nov 2019, Jakub Jelinek wrote:
>
>> On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote:
>>> The following adjusts the find_base_{term,value} heuristic when
>>> looking through ANDs to the case where it is obviously not
>>> aligning a base (when the LSB is set).
>>
>> What is specific about the LSB?  I mean & 2 is obviously not an aligning
>> AND either.
>
> It aligns 0x1 and 0x3 ;)
>
>> Shouldn't we recurse only if the CONST_INT_P operand has
>> some set bits followed by clear bits, i.e. after the != 0 check
>> compute ctz_hwi and verify that INTVAL >> ctz == -1?
>
> I thought of more advanced heuristics but I guess that
> any contiguous set of bits with LSB unset might be aligning
> if the programmer knows upper bits are zero anyways?  So I fear
> the == -1 test would not work reliable?

 I'd say once a user does & 0x1ff8 or similar, he is doing something weird,
 and not recursing is the conservatively correct answer (or maybe it isn't?
 Aren't there cases where we punt if from a binary op we get base terms from
 both sides and just use one if we get it only from one side?  If so,
 we might need to have some kind of annotated return, this could have a base
 term but please don't use it).
>>>
>>> Yes, we might run into the "wrong" one via binary op handling, so
>>> there isn't really a conservative answer here :/
>>>
 I guess the only non-weird case would be if the target has weird pointer
 sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32
 integers, so the aligning then could be & 0xf8 etc.
>>>
>>> Yeah.  Or the weird ones are exposed by nonzero bits "optimizations"
>>> of constants.
>> IIRC all the low order bitmask handling for pointers was primarily to
>> benefit the Alpha.  I think over time we saw it was helpful for
>> varargs/stdarg, but that's about it.  I'd be surprised if it's all that
>> important to dig deeply into addresses of this form.
> 
> The whole find_base_{value,term} thing is most important for
> stack accesses which we otherwise can't handle well in aliasing
> and code effects mostly materialize in DSE.  See my analysis
> in PR49330.  But the code is really broken and we're clawing
> to the extra DSE in exchange for these wrong-code issues... :/
I'd rather fix the wrong-code issues :-)

Note that REG_POINTER should already be conservative, if it isn't, then
that needs to be fixed.  Some backends certainly depend on anything with
REG_POINTER actually being a valid pointer.

For this RTX in c#18:

> (plus:DI (reg:DI 83 [ d.0_2 ])
> (symbol_ref:DI ("y") [flags 0x2]  ))

I don't see how (reg 83) could ever be marked as REG_POINTER.  It's an
index, not a pointer.  I would _not_ expect the result (assuming its
stored in a REG) to be marked with REG_POINTER either since we don't
know if the addition of (reg 83) creates a result outside the object
(without additional information not in the BZ).


Jeff

Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-20 Thread Bernhard Reutner-Fischer

On 19 November 2019 23:54:55 CET, Thomas Koenig  wrote:
>Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:
>> +  char name[GFC_MAX_SYMBOL_LEN + 1];
>> +  snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
>> +f->sym->name);
>> +
>> +  if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
>> 
>> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the
>like.
>
>GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK,

This is only true for user-provided names in the source code.

Think e.g. class names as can be seen in the dumps..

>it
>is not possible to use a longer symbol name than that, so it needs to
>be
>truncated. Uniqueness of the variable name is guaranteed by the var_num
>variable.
>
>If the user puts dummy arguments Supercalifragilisticexpialidociousa
>and
>Supercalifragilisticexpialidociousb into the argument list of a
>procedure, he will have to look at the numbers to differentiate them
>:-)
>
>> (II) s/__dummy/__intent_in/ for clarity?
>
>It's moved away a bit from INTENT(IN) now, because an argument which
>cannot be modified (even by passing to a procedure with a dummy
>argument
>with unknown intent) is now also handled.

So maybe __readonly_ , __rdonly_, __rd_or the like? dummy is really misleading 
a name in the dumps..

thanks,

1 2 >

1 - 100 of 164 matches

Mail list logo