date:20201207

Re: Updating the backend status for h8300 on the wiki

2020-12-07 Thread Eric Botcazou

> Now there are only AVR and CR16 that need to be converted. Great progress!

Indeed, but why does CR16 not have the 'c' letter then?

-- 
Eric Botcazou

[patch] don't build insn-extract.o with rtl checking

2020-12-07 Thread Matthias Klose

As seen in PR98144, building insn-extract.o with rtl checking takes some memory,
and it doesn't work on 32bit architectures at all (PR97314).  Richard suggested
on irc to disable rtl checking for this auto-generated file, like it's already
done for genconditions.c.  Patching it like done for genconditons.c.  Ok for the
trunk?

Matthias

--- a/gcc/genextract.c
+++ b/gcc/genextract.c
@@ -365,6 +365,8 @@ print_header (void)
 #define IN_TARGET_CODE 1\n\
 #include \"config.h\"\n\
 #include \"system.h\"\n\
+#undef ENABLE_RTL_CHECKING\n\
+#undef ENABLE_RTL_FLAG_CHECKING\n\
 #include \"coretypes.h\"\n\
 #include \"tm.h\"\n\
 #include \"rtl.h\"\n\

Re: [patch] don't build insn-extract.o with rtl checking

2020-12-07 Thread Richard Biener

On Mon, 7 Dec 2020, Matthias Klose wrote:

> As seen in PR98144, building insn-extract.o with rtl checking takes some 
> memory,
> and it doesn't work on 32bit architectures at all (PR97314).  Richard 
> suggested
> on irc to disable rtl checking for this auto-generated file, like it's already
> done for genconditions.c.  Patching it like done for genconditons.c.  Ok for 
> the
> trunk?

OK.  There's nothing to be gained from RTL checking on this generated
code.

Thanks,
Richard.

> Matthias
> 
> --- a/gcc/genextract.c
> +++ b/gcc/genextract.c
> @@ -365,6 +365,8 @@ print_header (void)
>  #define IN_TARGET_CODE 1\n\
>  #include \"config.h\"\n\
>  #include \"system.h\"\n\
> +#undef ENABLE_RTL_CHECKING\n\
> +#undef ENABLE_RTL_FLAG_CHECKING\n\
>  #include \"coretypes.h\"\n\
>  #include \"tm.h\"\n\
>  #include \"rtl.h\"\n\
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: Updating the backend status for h8300 on the wiki

2020-12-07 Thread John Paul Adrian Glaubitz

On 12/7/20 9:06 AM, Eric Botcazou wrote:
>> Now there are only AVR and CR16 that need to be converted. Great progress!
> 
> Indeed, but why does CR16 not have the 'c' letter then?

I noticed that as well.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Re: Updating the backend status for h8300 on the wiki

2020-12-07 Thread Eric Botcazou

> I noticed that as well.

OK, now corrected.

-- 
Eric Botcazou

[patch] [ada] Fix PR ada/97504 for mips*-linux

2020-12-07 Thread Matthias Klose

Fix PR ada/97504 for mips*-linux, the bootstrap works again on mips*-linux.
Ok for the trunk?

gcc/ada/
PR ada/97504
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) : Use wraplf
version of Aux_Long_Long_Float.

--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2288,6 +2288,7 @@ endif
 ifeq ($(strip $(filter-out mips% linux%,$(target_cpu) $(target_os))),)
   LIBGNAT_TARGET_PAIRS = \
   a-intnam.ads

Re: [patch] [ada] Fix PR ada/97504 for mips*-linux

2020-12-07 Thread Arnaud Charlet

> Fix PR ada/97504 for mips*-linux, the bootstrap works again on mips*-linux.
> Ok for the trunk?

OK, thanks.

> gcc/ada/
>   PR ada/97504
>   * Makefile.rtl (LIBGNAT_TARGET_PAIRS) : Use wraplf
>   version of Aux_Long_Long_Float.
> 
> --- a/gcc/ada/Makefile.rtl
> +++ b/gcc/ada/Makefile.rtl
> @@ -2288,6 +2288,7 @@ endif
>  ifeq ($(strip $(filter-out mips% linux%,$(target_cpu) $(target_os))),)
>LIBGNAT_TARGET_PAIRS = \
>a-intnam.ads +  a-nallfl.adss-inmaop.adbs-intman.adbs-linux.ads

[Committed] IBM Z: Change Pmode to word_mode for stack probes

2020-12-07 Thread Andreas Krebbel via Gcc-patches

In s390.c we are still using Pmode for the stack probes. This breaks
with -m31 -mzarch where Pmode != word_mode.

The patch also adds a new target check to s390.exp which allows us to
implement zarch specific checks in the testcases.

Bootstrapped and regression tested on s390x with and without zarch
default.

gcc/ChangeLog:

* config/s390/s390.c (s390_emit_stack_probe): Change Pmode to
word_mode.

gcc/testsuite/ChangeLog:

* gcc.target/s390/s390.exp: New target check s390_zarch.
* gcc.target/s390/stack-clash-1.c: Use s390_zarch instead of lp64.
* gcc.target/s390/stack-clash-2.c: Likewise.
* gcc.target/s390/stack-clash-3.c: Likewise.
* gcc.target/s390/stack-clash-5.c: New test.
---
 gcc/config/s390/s390.c|  2 +-
 gcc/testsuite/gcc.target/s390/s390.exp|  7 +++
 gcc/testsuite/gcc.target/s390/stack-clash-1.c |  4 ++--
 gcc/testsuite/gcc.target/s390/stack-clash-2.c |  4 ++--
 gcc/testsuite/gcc.target/s390/stack-clash-3.c |  4 ++--
 gcc/testsuite/gcc.target/s390/stack-clash-5.c | 10 ++
 6 files changed, 24 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/stack-clash-5.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fb48102559d..2f839882d96 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -11082,7 +11082,7 @@ s390_prologue_plus_offset (rtx target, rtx reg, rtx 
offset, bool frame_related_p
 static void
 s390_emit_stack_probe (rtx addr)
 {
-  rtx mem = gen_rtx_MEM (Pmode, addr);
+  rtx mem = gen_rtx_MEM (word_mode, addr);
   MEM_VOLATILE_P (mem) = 1;
   emit_insn (gen_probe_stack (mem));
 }
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index 00e0555d55c..d76d80dd0f3 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -202,6 +202,13 @@ proc check_effective_target_s390_z14_hw { } {
}
 }] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 }
 }
+# Return 1 if the default compiler options enable z/Architecture mode
+proc check_effective_target_s390_zarch { } {
+return [check_no_compiler_messages s390_zarch object {
+   int dummy[sizeof (int __attribute__((__mode__(__word__ == 8
+ ? 1 : -1];
+}]
+}
 
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
diff --git a/gcc/testsuite/gcc.target/s390/stack-clash-1.c 
b/gcc/testsuite/gcc.target/s390/stack-clash-1.c
index 3d29cab9446..45221c4ef82 100644
--- a/gcc/testsuite/gcc.target/s390/stack-clash-1.c
+++ b/gcc/testsuite/gcc.target/s390/stack-clash-1.c
@@ -13,5 +13,5 @@ void large_stack() {
 
 /* We use a compare for the stack probe.  There needs to be one inside
a loop and another for the remaining bytes.  */
-/* { dg-final { scan-assembler-times "cg\t" 2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "c\t" 2 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "cg\t" 2 { target s390_zarch } } } */
+/* { dg-final { scan-assembler-times "c\t" 2 { target { ! s390_zarch } } } } */
diff --git a/gcc/testsuite/gcc.target/s390/stack-clash-2.c 
b/gcc/testsuite/gcc.target/s390/stack-clash-2.c
index e554ad5ed0d..20f645de347 100644
--- a/gcc/testsuite/gcc.target/s390/stack-clash-2.c
+++ b/gcc/testsuite/gcc.target/s390/stack-clash-2.c
@@ -13,5 +13,5 @@ foo ()
 /* For alloca a common code routine emits the probes.  Make sure the
"probe_stack" expander is used in that case. We want to use mem
compares instead of stores.  */
-/* { dg-final { scan-assembler-times "cg\t" 5 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "c\t" 5 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "cg\t" 5 { target s390_zarch } } } */
+/* { dg-final { scan-assembler-times "c\t" 5 { target { ! s390_zarch } } } } */
diff --git a/gcc/testsuite/gcc.target/s390/stack-clash-3.c 
b/gcc/testsuite/gcc.target/s390/stack-clash-3.c
index 929d3fbb365..12a2d34cacf 100644
--- a/gcc/testsuite/gcc.target/s390/stack-clash-3.c
+++ b/gcc/testsuite/gcc.target/s390/stack-clash-3.c
@@ -13,5 +13,5 @@ foo ()
 /* For alloca a common code routine emits the probes.  Make sure the
"probe_stack" expander is used in that case. We want to use mem
compares instead of stores.  */
-/* { dg-final { scan-assembler-times "cg\t" 5 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "c\t" 5 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "cg\t" 5 { target s390_zarch } } } */
+/* { dg-final { scan-assembler-times "c\t" 5 { target { ! s390_zarch } } } } */
diff --git a/gcc/testsuite/gcc.target/s390/stack-clash-5.c 
b/gcc/testsuite/gcc.target/s390/stack-clash-5.c
new file mode 100644
index 000..81e202e2aab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/stack-clash-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -m31 -mzarch -fstack-clash-protection" } */
+
+extern void bar (ch

Re: [Ada] Build support units for 128-bit integer types on 64-bit platforms

2020-12-07 Thread Maciej W. Rozycki

On Thu, 3 Dec 2020, Andreas Schwab wrote:

> > .../gcc/gnatbind -x impbit.ali
> > error: "s-imgllli.ali" not found, "s-imgllli.ads" must be compiled
> > gnatmake: *** bind failed.
> >  Failed to compile impbit
> 
> This means GNATRTL_128BIT_OBJS is missing from
> EXTRA_GNATRTL_NONTASKING_OBJS.

 Yep, I came to the same conclusion and that's what Matthias's commit 
ba97b5326048 fixed.

  Maciej

[Ada] Fix problematic conversion to boolean type

2020-12-07 Thread Eric Botcazou

The new ranger exposed a problematic conversion to boolean type done in gigi.

Tested on x86-64/Linux, applied on the mainline.


2020-12-07  Eric Botcazou  

* gcc-interface/utils.c (convert) : Call fold_convert
in the cases where convert_to_integer is not called.
: Call fold_convert instead of convert_to_integer.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 340023)
+++ gcc-interface/utils.c	(revision 340024)
@@ -4930,10 +4930,6 @@ convert (tree type, tree expr)
 	  convert (TREE_TYPE (type),
 		   TYPE_MIN_VALUE (type;
 
-  /* ... fall through ... */
-
-case ENUMERAL_TYPE:
-case BOOLEAN_TYPE:
   /* If we are converting an additive expression to an integer type
 	 with lower precision, be wary of the optimization that can be
 	 applied by convert_to_integer.  There are 2 problematic cases:
@@ -4945,8 +4941,7 @@ convert (tree type, tree expr)
 	 intermediate conversion that changes the sign could
 	 be inserted and thus introduce an artificial overflow
 	 at compile time when the placeholder is substituted.  */
-  if (code == INTEGER_TYPE
-	  && ecode == INTEGER_TYPE
+  if (ecode == INTEGER_TYPE
 	  && TYPE_PRECISION (type) < TYPE_PRECISION (etype)
 	  && (TREE_CODE (expr) == PLUS_EXPR || TREE_CODE (expr) == MINUS_EXPR))
 	{
@@ -4955,11 +4950,18 @@ convert (tree type, tree expr)
 	  if ((TREE_CODE (TREE_TYPE (op0)) == INTEGER_TYPE
 	   && TYPE_BIASED_REPRESENTATION_P (TREE_TYPE (op0)))
 	  || CONTAINS_PLACEHOLDER_P (expr))
-	return build1 (NOP_EXPR, type, expr);
+	return fold_convert (type, expr);
 	}
 
+  /* ... fall through ... */
+
+case ENUMERAL_TYPE:
   return fold (convert_to_integer (type, expr));
 
+case BOOLEAN_TYPE:
+  /* Do not use convert_to_integer with boolean types.  */
+  return fold_convert_loc (EXPR_LOCATION (expr), type, expr);
+
 case POINTER_TYPE:
 case REFERENCE_TYPE:
   /* If converting between two thin pointers, adjust if needed to account

[Ada] Fix corner case issue with discriminated record type

2020-12-07 Thread Eric Botcazou

The compiler generates code that writes too much data into a component of a 
record subject to a representation clause, when the source of the assignment 
is a call to a function that returns a discriminated record type with default 
discriminants, variable size and a statically known upper bound for this size, 
and the size of the component given by the representation clause is lower than 
the value of this bound rounded up to the alignment.

Tested on x86-64/Linux, applied on the mainline.


2020-12-07  Eric Botcazou  

* gcc-interface/trans.c (Call_to_gnu): Also create a temporary for
the return value if the LHS is a bit-field and the return type is
a type padding a self-referential type.
(gnat_to_gnu): Do not remove the padding on the result if it is too
small with regard to the natural padding size.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 0eec1788e05..07e5a285b2b 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -4513,7 +4513,11 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 	  and the return type has variable size, because the gimplifier
 	  doesn't handle these cases.
 
-   4. There is no target and we have misaligned In Out or Out parameters
+   4. There is a target which is a bit-field and the function returns an
+	  unconstrained record type with default discriminant, because the
+	  return may copy more data than the bit-field can contain.
+
+   5. There is no target and we have misaligned In Out or Out parameters
 	  passed by reference, because we need to preserve the return value
 	  before copying back the parameters.  However, in this case, we'll
 	  defer creating the temporary, see below.
@@ -4536,7 +4540,11 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 		  || (TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE
 		  && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target)))
 			 == INTEGER_CST))
-	  && TREE_CODE (TYPE_SIZE (gnu_result_type)) != INTEGER_CST)))
+	  && TREE_CODE (TYPE_SIZE (gnu_result_type)) != INTEGER_CST)
+	  || (gnu_target
+	  && TREE_CODE (gnu_target) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (gnu_target, 1))
+	  && type_is_padding_self_referential (gnu_result_type
 {
   gnu_retval = create_temporary ("R", gnu_result_type);
   DECL_RETURN_VALUE_P (gnu_retval) = 1;
@@ -8249,8 +8257,10 @@ gnat_to_gnu (Node_Id gnat_node)
   /* Remove padding only if the inner object is of self-referential
 	 size: in that case it must be an object of unconstrained type
 	 with a default discriminant and we want to avoid copying too
-	 much data.  */
-  if (type_is_padding_self_referential (TREE_TYPE (gnu_result)))
+	 much data.  But do not remove it if it is already too small.  */
+  if (type_is_padding_self_referential (TREE_TYPE (gnu_result))
+	  && !(TREE_CODE (gnu_result) == COMPONENT_REF
+	   && DECL_BIT_FIELD (TREE_OPERAND (gnu_result, 1
 	gnu_result = convert (TREE_TYPE (TYPE_FIELDS (TREE_TYPE (gnu_result))),
 			  gnu_result);
 }

Re: [patch] Enhance debug info for fixed-point types

2020-12-07 Thread Eric Botcazou

> 2020-11-11  Eric Botcazou  
> 
>   * exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
>   (Get_Encoded_Name): Do not call it.
>   * gcc-interface/decl.c (gnat_to_gnu_entity) :
>   Tidy up and always use a meaningful description for arbitrary
>   scale factors.
>   * gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
>   obsolete block and adjust the description of the scale factor.

There was an oversight in the patch, fixed thus.

* gcc-interface/decl.c (gnat_to_gnu_entity) : Put
back the "else" unduly removed.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index a0f17b1aafc..7caca6a6fb2 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -1764,6 +1764,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	/* Use the arbitrary scale factor description.  Note that we support
 	   a Small_Value whose magnitude is larger than 64-bit even on 32-bit
 	   platforms, so we unconditionally use a (dummy) 128-bit type.  */
+	else
 	  {
 	const Uint gnat_num = Norm_Num (gnat_small_value);
 	const Uint gnat_den = Norm_Den (gnat_small_value);

[PATCH] tree-optimization/98117 - fix range set by vectorization on niter IVs

2020-12-07 Thread Richard Biener

This avoids the degenerate case of a TYPE_MAX_VALUE latch iteration
count value causing wrong range info for the vector IV.  There's
still the case of VF == 1 where if we don't know whether we hit the
above case we cannot emit a range.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2020-12-07  Richard Biener  

PR tree-optimization/98117
* tree-vect-loop-manip.c (vect_gen_vector_loop_niters):
Properly handle degenerate niter when setting the vector
loop IV range.

* gcc.dg/torture/pr98117.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr98117.c | 19 +
 gcc/tree-vect-loop-manip.c | 28 --
 2 files changed, 41 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr98117.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr98117.c 
b/gcc/testsuite/gcc.dg/torture/pr98117.c
new file mode 100644
index 000..f2160257263
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr98117.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
+
+unsigned char c;
+void __attribute__((noipa))
+e()
+{
+  do
+{
+}
+  while (++c);
+}
+int main()
+{
+  e();
+  if (c != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 36179188f6d..2370b879b21 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2034,13 +2034,29 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, 
tree niters,
   niters_vector = force_gimple_operand (niters_vector, &stmts, true, var);
   gsi_insert_seq_on_edge_immediate (pe, stmts);
   /* Peeling algorithm guarantees that vector loop bound is at least ONE,
-we set range information to make niters analyzer's life easier.  */
+we set range information to make niters analyzer's life easier.
+Note the number of latch iteration value can be TYPE_MAX_VALUE so
+we have to represent the vector niter TYPE_MAX_VALUE + 1 >> log_vf.  */
   if (stmts != NULL && log_vf)
-   set_range_info (niters_vector, VR_RANGE,
-   wi::to_wide (build_int_cst (type, 1)),
-   wi::to_wide (fold_build2 (RSHIFT_EXPR, type,
- TYPE_MAX_VALUE (type),
- log_vf)));
+   {
+ if (niters_no_overflow)
+   set_range_info (niters_vector, VR_RANGE,
+   wi::one (TYPE_PRECISION (type)),
+   wi::rshift (wi::max_value (TYPE_PRECISION (type),
+  TYPE_SIGN (type)),
+   exact_log2 (const_vf),
+   TYPE_SIGN (type)));
+ /* For VF == 1 the vector IV might also overflow so we cannot
+assert a minimum value of 1.  */
+ else if (const_vf > 1)
+   set_range_info (niters_vector, VR_RANGE,
+   wi::one (TYPE_PRECISION (type)),
+   wi::rshift (wi::max_value (TYPE_PRECISION (type),
+  TYPE_SIGN (type))
+   - (const_vf - 1),
+   exact_log2 (const_vf), TYPE_SIGN (type))
+   + 1);
+   }
 }
   *niters_vector_ptr = niters_vector;
   *step_vector_ptr = step_vector;
-- 
2.26.2

[Ada] Fix assembler name collision

2020-12-07 Thread Eric Botcazou

Gigi uses a dummy global variable to register global types for debug info 
purposes and its name can now collide with user variables.

Tested on x86-64/Linux, applied on the mainline and 10 branch.


2020-12-07  Eric Botcazou  

* gcc-interface/trans.c (lvalue_for_aggregate_p): Also return true
for return statements.
* gcc-interface/utils.c (gnat_write_global_declarations): Use the
maximum index for the dummy object to avoid a name collision.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 07e5a285b2b..bf8289ba323 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -970,6 +970,10 @@ lvalue_for_aggregate_p (Node_Id gnat_node, tree gnu_type)
   /* Even if the parameter is by copy, prefer an lvalue.  */
   return true;
 
+case N_Simple_Return_Statement:
+  /* Likewise for a return value.  */
+  return true;
+
 case N_Indexed_Component:
 case N_Selected_Component:
   /* If an elementary component is used, take it from the constant.  */
diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 1d49db9fb1b..494f60e0879 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -5903,7 +5903,7 @@ gnat_write_global_declarations (void)
   struct varpool_node *node;
   char *label;
 
-  ASM_FORMAT_PRIVATE_NAME (label, first_global_object_name, 0);
+  ASM_FORMAT_PRIVATE_NAME (label, first_global_object_name, ULONG_MAX);
   dummy_global
 	= build_decl (BUILTINS_LOCATION, VAR_DECL, get_identifier (label),
 		  void_type_node);

[Ada] Fix internal error on library-level type extended locally

2020-12-07 Thread Eric Botcazou

This is a regression present on the mainline, 10 and 9 branches: the compiler 
aborts on the local extension of a tagged type declared at library level, with 
a progenitor given by an interface type having a primitive that is a homograph 
of a primitive of the tagged type.

Tested on x86-64/Linux, applied on the mainline, 10 and 9 branches.


2020-12-07  Eric Botcazou  

* gcc-interface/trans.c (maybe_make_gnu_thunk): Return false if the
target is local and thunk and target do not have the same context.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index bf8289ba323..4ab26d3e2dd 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -10730,8 +10730,11 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree gnu_thunk)
 
   tree gnu_target = gnat_to_gnu_entity (gnat_target, NULL_TREE, false);
 
-  /* Thunk and target must have the same nesting level, if any.  */
-  gcc_assert (DECL_CONTEXT (gnu_thunk) == DECL_CONTEXT (gnu_target));
+  /* If the target is local, then thunk and target must have the same context
+ because cgraph_node::expand_thunk can only forward the static chain.  */
+  if (DECL_STATIC_CHAIN (gnu_target)
+  && DECL_CONTEXT (gnu_thunk) != DECL_CONTEXT (gnu_target))
+return false;
 
   /* If the target returns by invisible reference and is external, apply the
  same transformation as Subprogram_Body_to_gnu here.  */

Re: [PATCH] tree-optimization/98117 - fix range set by vectorization on niter IVs

2020-12-07 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> This avoids the degenerate case of a TYPE_MAX_VALUE latch iteration
> count value causing wrong range info for the vector IV.  There's
> still the case of VF == 1 where if we don't know whether we hit the
> above case we cannot emit a range.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> 2020-12-07  Richard Biener  
>
>   PR tree-optimization/98117
>   * tree-vect-loop-manip.c (vect_gen_vector_loop_niters):
>   Properly handle degenerate niter when setting the vector
>   loop IV range.
>
>   * gcc.dg/torture/pr98117.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr98117.c | 19 +
>  gcc/tree-vect-loop-manip.c | 28 --
>  2 files changed, 41 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr98117.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr98117.c 
> b/gcc/testsuite/gcc.dg/torture/pr98117.c
> new file mode 100644
> index 000..f2160257263
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr98117.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-fno-tree-scev-cprop" } */
> +
> +unsigned char c;
> +void __attribute__((noipa))
> +e()
> +{
> +  do
> +{
> +}
> +  while (++c);
> +}
> +int main()
> +{
> +  e();
> +  if (c != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 36179188f6d..2370b879b21 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -2034,13 +2034,29 @@ vect_gen_vector_loop_niters (loop_vec_info 
> loop_vinfo, tree niters,
>niters_vector = force_gimple_operand (niters_vector, &stmts, true, 
> var);
>gsi_insert_seq_on_edge_immediate (pe, stmts);
>/* Peeling algorithm guarantees that vector loop bound is at least ONE,
> -  we set range information to make niters analyzer's life easier.  */
> +  we set range information to make niters analyzer's life easier.
> +  Note the number of latch iteration value can be TYPE_MAX_VALUE so
> +  we have to represent the vector niter TYPE_MAX_VALUE + 1 >> log_vf.  */
>if (stmts != NULL && log_vf)
> - set_range_info (niters_vector, VR_RANGE,
> - wi::to_wide (build_int_cst (type, 1)),
> - wi::to_wide (fold_build2 (RSHIFT_EXPR, type,
> -   TYPE_MAX_VALUE (type),
> -   log_vf)));
> + {
> +   if (niters_no_overflow)
> + set_range_info (niters_vector, VR_RANGE,
> + wi::one (TYPE_PRECISION (type)),
> + wi::rshift (wi::max_value (TYPE_PRECISION (type),
> +TYPE_SIGN (type)),
> + exact_log2 (const_vf),
> + TYPE_SIGN (type)));
> +   /* For VF == 1 the vector IV might also overflow so we cannot
> +  assert a minimum value of 1.  */
> +   else if (const_vf > 1)

This code is common to const vf and non-const vf, so const_vf isn't
guaranteed to be valid.

Thanks,
Richard

> + set_range_info (niters_vector, VR_RANGE,
> + wi::one (TYPE_PRECISION (type)),
> + wi::rshift (wi::max_value (TYPE_PRECISION (type),
> +TYPE_SIGN (type))
> + - (const_vf - 1),
> + exact_log2 (const_vf), TYPE_SIGN (type))
> + + 1);
> + }
>  }
>*niters_vector_ptr = niters_vector;
>*step_vector_ptr = step_vector;

Re: [PATCH] tree-optimization/98117 - fix range set by vectorization on niter IVs

2020-12-07 Thread Richard Biener

On Mon, 7 Dec 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > This avoids the degenerate case of a TYPE_MAX_VALUE latch iteration
> > count value causing wrong range info for the vector IV.  There's
> > still the case of VF == 1 where if we don't know whether we hit the
> > above case we cannot emit a range.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > 2020-12-07  Richard Biener  
> >
> > PR tree-optimization/98117
> > * tree-vect-loop-manip.c (vect_gen_vector_loop_niters):
> > Properly handle degenerate niter when setting the vector
> > loop IV range.
> >
> > * gcc.dg/torture/pr98117.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/torture/pr98117.c | 19 +
> >  gcc/tree-vect-loop-manip.c | 28 --
> >  2 files changed, 41 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr98117.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr98117.c 
> > b/gcc/testsuite/gcc.dg/torture/pr98117.c
> > new file mode 100644
> > index 000..f2160257263
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr98117.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do run } */
> > +/* { dg-additional-options "-fno-tree-scev-cprop" } */
> > +
> > +unsigned char c;
> > +void __attribute__((noipa))
> > +e()
> > +{
> > +  do
> > +{
> > +}
> > +  while (++c);
> > +}
> > +int main()
> > +{
> > +  e();
> > +  if (c != 0)
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> > index 36179188f6d..2370b879b21 100644
> > --- a/gcc/tree-vect-loop-manip.c
> > +++ b/gcc/tree-vect-loop-manip.c
> > @@ -2034,13 +2034,29 @@ vect_gen_vector_loop_niters (loop_vec_info 
> > loop_vinfo, tree niters,
> >niters_vector = force_gimple_operand (niters_vector, &stmts, true, 
> > var);
> >gsi_insert_seq_on_edge_immediate (pe, stmts);
> >/* Peeling algorithm guarantees that vector loop bound is at least 
> > ONE,
> > -we set range information to make niters analyzer's life easier.  */
> > +we set range information to make niters analyzer's life easier.
> > +Note the number of latch iteration value can be TYPE_MAX_VALUE so
> > +we have to represent the vector niter TYPE_MAX_VALUE + 1 >> log_vf.  */
> >if (stmts != NULL && log_vf)
> > -   set_range_info (niters_vector, VR_RANGE,
> > -   wi::to_wide (build_int_cst (type, 1)),
> > -   wi::to_wide (fold_build2 (RSHIFT_EXPR, type,
> > - TYPE_MAX_VALUE (type),
> > - log_vf)));
> > +   {
> > + if (niters_no_overflow)
> > +   set_range_info (niters_vector, VR_RANGE,
> > +   wi::one (TYPE_PRECISION (type)),
> > +   wi::rshift (wi::max_value (TYPE_PRECISION (type),
> > +  TYPE_SIGN (type)),
> > +   exact_log2 (const_vf),
> > +   TYPE_SIGN (type)));
> > + /* For VF == 1 the vector IV might also overflow so we cannot
> > +assert a minimum value of 1.  */
> > + else if (const_vf > 1)
> 
> This code is common to const vf and non-const vf, so const_vf isn't
> guaranteed to be valid.

It's guarded with log_vf != NULL which we only set in the
constant vf case.

Richard.

> Thanks,
> Richard
> 
> > +   set_range_info (niters_vector, VR_RANGE,
> > +   wi::one (TYPE_PRECISION (type)),
> > +   wi::rshift (wi::max_value (TYPE_PRECISION (type),
> > +  TYPE_SIGN (type))
> > +   - (const_vf - 1),
> > +   exact_log2 (const_vf), TYPE_SIGN (type))
> > +   + 1);
> > +   }
> >  }
> >*niters_vector_ptr = niters_vector;
> >*step_vector_ptr = step_vector;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Remove misleading debug line entries

2020-12-07 Thread Richard Biener

On Fri, 4 Dec 2020, Bernd Edlinger wrote:

> On 12/3/20 9:30 AM, Richard Biener wrote:
> > On Wed, 2 Dec 2020, Bernd Edlinger wrote:
> > 
> >> On 12/2/20 8:50 AM, Richard Biener wrote:
> >>> On Tue, 1 Dec 2020, Bernd Edlinger wrote:
> >>>
>  Hi!
> 
> 
>  This removes gimple_debug stmts without block info after a
>  NULL INLINE_ENTRY.
> 
>  The line numbers from these stmts are from the inline function,
>  but since the inline function is completely optimized away,
>  there will be no DW_TAG_inlined_subroutine so the debugger has
>  no callstack available at this point, and therefore those
>  line table entries are not helpful to the user.
> 
>  2020-11-20  Bernd Edlinger  
> 
>   * cfgexpand.c (expand_gimple_basic_block): Remove debug_begin_stmts
>   following a removed debug_inline_entry.
> 
> 
>  Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>  Is it OK for trunk?
> >>>
> >>> So are those visited by clear_unused_block_pointer?  If so wouldn't
> >>> it be more appropriate to remove those there, when we elide the
> >>> inlined block scope?
> >>>
> >>
> >> That's what I thought initially, but that is only true for 99% of the 
> >> inline statements.  However 1% of the inline_entries without block info,
> >> and debug_begin_stmts without block info, that have line numbers from
> >> an inline header, do actually originate here:
> >>
> >> copy_debug_stmt (gdebug *stmt, copy_body_data *id)
> >> {
> >>   tree t, *n;
> >>   struct walk_stmt_info wi;
> >>
> >>   if (tree block = gimple_block (stmt))
> >> {
> >>   n = id->decl_map->get (block);
> >>   gimple_set_block (stmt, n ? *n : id->block);
> >> }
> >>
> >> because id->block is NULL, and decl_map does not have
> >> an entry.
> >>
> >> So I tracked it down why that happens.
> >>
> >> I think remap_gimple_stmt should just drop those nonbind markers
> >> on the floor when the call statement has no block information.
> >>
> >> Once that is fixed, the special handling of inline entries without
> >> block info can as well be moved from remap_gimple_stmt to
> >> clear_unused_block_pointer.
> >>
> >> What do you think of this (not yet fully tested) patch?
> >>
> >> Is it OK when bootstrap and reg-testing passes?
> > 
> > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> > index d9814bd..e87c653 100644
> > --- a/gcc/tree-inline.c
> > +++ b/gcc/tree-inline.c
> > @@ -1819,7 +1819,8 @@ remap_gimple_stmt (gimple *stmt, copy_body_data *id)
> >   /* If the inlined function has too many debug markers,
> >  don't copy them.  */
> >   if (id->src_cfun->debug_marker_count
> > - > param_max_debug_marker_count)
> > + > param_max_debug_marker_count
> > + || !id->block)
> > return stmts;
> > 
> > Isn't this overly pessimistic in throwing away all debug markers
> > of an inline rather than just those which are associated with
> > the outermost scope (that's mapped to NULL if !id->block)?  Can
> > we instead remap the block here (move it from copy_debug_stmt)
> > and elide the copy only when it maps to NULL?
> > 
> 
> Yes, indeed, I missed the fact that this is also called up from
> tree_function_versioning.  id->block is always NULL in that case.
> But since this should be a 1:1 copy, missing block info should not
> get worse as it already is.  Fortunately it is possible to distinguish
> that from the actual inlining by looking at id->call_stmt.
> 
> 
> > 
> >   gdebug *copy = as_a  (gimple_copy (stmt));
> > diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
> > index 21a9ee4..ca119c6 100644
> > --- a/gcc/tree-ssa-live.c
> > +++ b/gcc/tree-ssa-live.c
> > @@ -623,13 +623,25 @@ clear_unused_block_pointer (void)
> >{
> > unsigned i;
> > tree b;
> > -   gimple *stmt = gsi_stmt (gsi);
> > +   gimple *stmt;
> >  
> > +  next:
> > +   stmt = gsi_stmt (gsi);
> > if (!is_gimple_debug (stmt) && !gimple_clobber_p (stmt))
> >   continue;
> > b = gimple_block (stmt);
> > if (b && !TREE_USED (b))
> > - gimple_set_block (stmt, NULL);
> > + {
> > +   if (gimple_debug_nonbind_marker_p (stmt)
> > +   && BLOCK_ABSTRACT_ORIGIN (b))
> > 
> > why only for inlined BLOCKs?  Did you want to restrict it further
> > to inlined_function_outer_scope_p?
> > 
> 
> Yes.
> I had assumed that check would be sufficient, but as you said,
> I have to walk the block structure, until I find a
> inlined_function_outer_scope_p.
> 
> I don't know if there is a chance that any of the debug lines will
> get a block info assigned in the end, if id->block == NULL, but I think
> it does not hurt to remove the debug statements in copy_debug_stmt.
> 
> > I guess I don't understand the debug situation fully - I guess it is
> > about jumping to locations in inlines where the call stack does
> > not show we are in the actual inlined functi

Re: [PATCH] d: Add darwin support for D language front-end

2020-12-07 Thread Iain Buclaw via Gcc-patches

Hi,

Sorry it's been delayed by a bit.  Currently Iain (not that Iain) is
running tests on a wide net of versions and architectures, with mixed
success rates that seem to boil down to some C binding issue.

Most of the published effort on my side is currently sitting in
users/ibuclaw/darwin.

Iain.

Excerpts from ciel's message of December 7, 2020 3:43 am:
> Iain-san,
> 
> Thank you for finally working on GDC Darwin support.
> 
> T. Yamada
> 
> 2020年11月30日(月) 7:28 :
>>
>> Send Gcc-patches mailing list submissions to
>> gcc-patches@gcc.gnu.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://gcc.gnu.org/mailman/listinfo/gcc-patches
>> or, via email, send a message with subject or body 'help' to
>> gcc-patches-requ...@gcc.gnu.org
>>
>> You can reach the person managing the list at
>> gcc-patches-ow...@gcc.gnu.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Gcc-patches digest..."
>> Today's Topics:
>>
>>1. Re: [PATCH] configure: Support building D front-end on
>>   *-*-darwin* (Iain Buclaw)
>>2. Re: [PATCH] d: Add darwin support for D language front-end
>>   (Iain Buclaw)
>>3. Fix freeing of thunk_info (Jan Hubicka)
>>4. Re: [PATCH] handle conditionals in -Wstringop-overflow et al.
>>   (PR 92936) (Martin Sebor)
>>
>>
>>
>> -- Forwarded message --
>> From: Iain Buclaw 
>> To: Iain Sandoe 
>> Cc: gcc-patches@gcc.gnu.org
>> Bcc:
>> Date: Sun, 29 Nov 2020 22:13:58 +0100
>> Subject: Re: [PATCH] configure: Support building D front-end on *-*-darwin*
>> Excerpts from Iain Sandoe's message of November 29, 2020 10:35 am:
>> > Hi Iain
>> >
>> > Iain Buclaw  wrote:
>> >
>> >> The bootstrap has been succeeding for some time now, there's no need to
>> >> set it as an unsupported language.
>> >>
>> >> OK for mainline?
>> >
>> > At present, this breaks bootstrap on 32 bit Darwin hosts.
>> >
>> > Once that’s resolved, then it seems a reasonable to make sure that the D
>> > FE is built and tested (even though there is still work to do on the 
>> > library
>> > support).
>> >
>> > So, OK once all the currently tested Darwin hosts bootstrap with D enabled.
>> >
>>
>> As confirmed off list, powerpc and i686 darwin9 bootstrapped OK, so I've
>> committed it as r11-5521.
>>
>> Iain.
>>
>>
>>
>>
>> -- Forwarded message --
>> From: Iain Buclaw 
>> To: Iain Sandoe 
>> Cc: gcc-patches@gcc.gnu.org
>> Bcc:
>> Date: Sun, 29 Nov 2020 22:21:26 +0100
>> Subject: Re: [PATCH] d: Add darwin support for D language front-end
>> Excerpts from Iain Sandoe's message of November 29, 2020 1:49 pm:
>> > Hi Iain
>> >
>> > Iain Buclaw  wrote:
>> >
>> >> This patch adds necessary predefined version symbols and support for
>> >> moduleinfo sections for darwin to allow testing libphobos support.
>> >>
>> >> OK for mainline?
>> >
>> > As we discussed off-list, this sets ABI (the section names are visible and
>> > part
>> > of the contract with the runtime).
>> >
>> > It’s highly desirable that we can interoperate with D objects built by
>> > other impls.
>> > so the section naming convention should be agreed.
>> >
>> > As far as I understand things, the names you have here agree with those 
>> > used
>> > by “upstream” (but current the LLVM D variant uses something different).
>> >
>> > The Darwin parts are OK, with a note to (re-)check on the section names
>> > before
>> > the first release*.
>> >
>> > thanks
>> > Iain
>> >
>> > * perhaps it might be worth pinging the relevant upstream and LLV folks 
>> > long
>> > before that - since the release cycles there mean that there will likely be
>> > another
>> > one before GCC11 ships.
>> >
>>
>> I've sent Martin a message, so hoping for a response within the week
>> which name he'd prefer I go.  I suspect it'd be better to align with
>> LLVM instead of the reference DMD compiler.  As the compiler/run-time
>> integration of modules uses the same machinery, and we both want to
>> support loading shared DSOs (DMD is static only on OSX).
>>
>> Adjusted the patch with some copy/paste fixes (missing tabs, Glibc gets
>> a mention).  Committed the following as r11-5522.
>>
>> Regards,
>> Iain
>>
>> ---
>> gcc/ChangeLog:
>>
>> * config.gcc (*-*-darwin*): Set d_target_objs and 
>> target_has_targetdm.
>> * config/elfos.h (TARGET_D_MINFO_SECTION): New macro.
>> (TARGET_D_MINFO_START_NAME): New macro.
>> (TARGET_D_MINFO_END_NAME): New macro.
>> * config/t-darwin: Add darwin-d.o.
>> * doc/tm.texi: Regenerate.
>> * doc/tm.texi.in (D language and ABI): Add @hook for
>> TARGET_D_MINFO_SECTION, TARGET_D_MINFO_START_NAME, and
>> TARGET_D_MINFO_END_NAME.
>> * config/darwin-d.c: New file.
>>
>> gcc/d/ChangeLog:
>>
>> * d-target.def (d_minfo_section): New hook.
>> (d_minfo_start_name): New hook.
>> (d_minfo_end_name): New hook.
>> * modules.cc: Includ

Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for vmvn intrinsics

2020-12-07 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 3 Dec 2020 at 16:05, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Prathamesh Kulkarni via Gcc-patches
> > Sent: 03 December 2020 10:30
> > To: gcc Patches ; Kyrill Tkachov
> > 
> > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for
> > vmvn intrinsics
> >
> > On Wed, 25 Nov 2020 at 22:01, Prathamesh Kulkarni
> >  wrote:
> > >
> > > Hi,
> > > This patch replaces calls to __builtin_neon_vmvnv* builtins with ~
> > > operator in arm_neon.h.
> > > Cross-tested on arm*-*-*.
> > > OK to commit ?
>
> Ok.
Hi Kyrill,
I have attached an updated patch that removes entry for vmvn from
arm_neon_builtins.def,
but I am not sure if that's sufficient ?

I tried this:
__extension__ extern __inline uint32x2_t
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vmvn_u32 (uint32x2_t __a)
{
  return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a);
//  return ~__a;
}

after removing the entry for vmvn, from arm_neon_builtins.def, but
compilation appeared to proceed successfully,
so I guess it hasn't stopped the builtin from being created ?

Thanks,
Prathamesh
> Sorry for missing this.
> Thanks,
> Kyrill
>
> P.S. please use my @arm.com address when CC'ing me, I don't have access to 
> the foss.arm.com one currently...
>
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/560223.html
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh


vmvn-2.diff
Description: Binary data

Re: [PATCH] Optimize macro: make it more predictable

2020-12-07 Thread Martin Liška


PING^2

On 11/26/20 2:56 PM, Martin Liška wrote:

PING^1

On 11/9/20 11:35 AM, Martin Liška wrote:

On 11/3/20 2:34 PM, Jakub Jelinek wrote:

On Tue, Nov 03, 2020 at 02:27:52PM +0100, Richard Biener wrote:

On Fri, Oct 23, 2020 at 1:47 PM Martin Liška  wrote:

This is a follow-up of the discussion that happened in thread about 
no_stack_protector
attribute: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545916.html

The current optimize attribute works in the following way:
- 1) we take current global_options as base
- 2) maybe_default_options is called for the currently selected optimization 
level, which
   means all rules in default_options_table are executed
- 3) attribute values are applied (via decode_options)

So the step 2) is problematic: in case of -O2 -fno-omit-frame-pointer and 
__attribute__((optimize("-fno-stack-protector")))
ends basically with -O2 -fno-stack-protector because -fno-omit-frame-pointer is 
default:
  /* -O1 and -Og optimizations.  */
  { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },

My patch handled and the current optimize attribute really behaves that same as 
appending attribute value
to the command line. So far so good. We should also reflect that in 
documentation entry which is quite
vague right now:

"""
The optimize attribute is used to specify that a function is to be compiled 
with different optimization options than specified on the command line.
"""

and we may want to handle -Ox in the attribute in a special way. I guess many 
macro/pragma users expect that

-O2 -ftree-vectorize and __attribute__((optimize(1))) will end with -O1 and not
with -ftree-vectorize -O1 ?


Hmm.  I guess the only two reasonable options are to append to the active set
and thus end up with -ftree-vectorize -O1 or to start from an empty set and thus
end up with -O1.


I'd say we always want to append, but only take into account explicit
options.


Yes, I also prefer to always append and basically drop the "reset" 
functionality.


So basically get the effect of
take the command line, append to that options from the optimize/target
pragmas in effect and append to that options from optimize/target
attributes and only from that figure out the implicit options.


Few notes here:
- target and optimize attributes are separate so parsing happens independently; 
however
   they use global_options and global_options_set as a starting point
- you can have a series of wrapped optimize/pragma macros and again information 
is shared
in global_options/global_options_set
- target and optimize options interact, but in a controlled way with 
SET_OPTION_IF_UNSET

That said, I hope the biggest offender is right now the handling of -Olevel.

@Jakub: Do you see a situation with my patch where it breaks?

Thanks,
Martin



Jakub

RE: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for vmvn intrinsics

2020-12-07 Thread Kyrylo Tkachov via Gcc-patches

Hi Prathamesh,

> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 07 December 2020 11:01
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for
> vmvn intrinsics
> 
> On Thu, 3 Dec 2020 at 16:05, Kyrylo Tkachov 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Gcc-patches  On Behalf Of
> > > Prathamesh Kulkarni via Gcc-patches
> > > Sent: 03 December 2020 10:30
> > > To: gcc Patches ; Kyrill Tkachov
> > > 
> > > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~
> for
> > > vmvn intrinsics
> > >
> > > On Wed, 25 Nov 2020 at 22:01, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > Hi,
> > > > This patch replaces calls to __builtin_neon_vmvnv* builtins with ~
> > > > operator in arm_neon.h.
> > > > Cross-tested on arm*-*-*.
> > > > OK to commit ?
> >
> > Ok.
> Hi Kyrill,
> I have attached an updated patch that removes entry for vmvn from
> arm_neon_builtins.def,
> but I am not sure if that's sufficient ?
> 
> I tried this:
> __extension__ extern __inline uint32x2_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vmvn_u32 (uint32x2_t __a)
> {
>   return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a);
> //  return ~__a;
> }
> 
> after removing the entry for vmvn, from arm_neon_builtins.def, but
> compilation appeared to proceed successfully,
> so I guess it hasn't stopped the builtin from being created ?
> 

I think you'll want to look at the generated assembly?
IIRC if the builtin isn't created then the assembly will contain a call to 
__builtin_neon_vmvnv4si, which of course will fail at link-time.

Thanks,
Kyrill

> Thanks,
> Prathamesh
> > Sorry for missing this.
> > Thanks,
> > Kyrill
> >
> > P.S. please use my @arm.com address when CC'ing me, I don't have access
> to the foss.arm.com one currently...
> >
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > > November/560223.html
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh

[PATCH] builtins: Avoid ICE with __builtin_clear_padding on POINTERS_EXTEND_UNSIGNED targets [PR98147]

2020-12-07 Thread Jakub Jelinek via Gcc-patches

Hi!

The function that calls targetm.emit_call_builtin___clear_cache
asserts that each of the begin and end operands has either ptr_mode or
Pmode.
On most targets that is the same mode, but e.g. on aarch64 -mabi=ilp32
or a few others it is different.  When a target has a clear cache
non-library handler, it will use create_address_operand which will do the
conversion to the right mode automatically, but when emitting a library
call, we just say the operands are ptr_mode even when they can be Pmode
too; in that case we need to convert explicitly.

Fixed thusly, tested on the testcase with cross to aarch64 -mabi=ilp32,
bootstrapped/regtested on x86_64-linux and i686-linux (I really don't
have aarch64 ilp32 testing setup), ok for trunk?

2020-12-07  Jakub Jelinek  

PR target/98147
* builtins.c (default_emit_call_builtin___clear_cache): Call
convert_memory_address to ptr_mode on both begin and end.

* gcc.dg/pr98147.c: New test.

--- gcc/builtins.c.jj   2020-12-04 08:08:06.350436898 +0100
+++ gcc/builtins.c  2020-12-05 14:47:09.555476027 +0100
@@ -7790,8 +7790,8 @@ default_emit_call_builtin___clear_cache
 
   emit_library_call (callee,
 LCT_NORMAL, VOIDmode,
-begin, ptr_mode,
-end, ptr_mode);
+convert_memory_address (ptr_mode, begin), ptr_mode,
+convert_memory_address (ptr_mode, end), ptr_mode);
 }
 
 /* Emit a call to __builtin___clear_cache, unless the target specifies
--- gcc/testsuite/gcc.dg/pr98147.c.jj   2020-12-05 14:49:53.817626223 +0100
+++ gcc/testsuite/gcc.dg/pr98147.c  2020-12-05 14:48:20.984671644 +0100
@@ -0,0 +1,10 @@
+/* PR target/98147 */
+
+char buffer[32] = "foo bar";
+
+int
+main ()
+{
+  __builtin___clear_cache (buffer, buffer + 32);
+  return 0;
+}

Jakub

Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for vmvn intrinsics

2020-12-07 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 7 Dec 2020 at 16:34, Kyrylo Tkachov  wrote:
>
> Hi Prathamesh,
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 07 December 2020 11:01
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for
> > vmvn intrinsics
> >
> > On Thu, 3 Dec 2020 at 16:05, Kyrylo Tkachov 
> > wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  On Behalf Of
> > > > Prathamesh Kulkarni via Gcc-patches
> > > > Sent: 03 December 2020 10:30
> > > > To: gcc Patches ; Kyrill Tkachov
> > > > 
> > > > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~
> > for
> > > > vmvn intrinsics
> > > >
> > > > On Wed, 25 Nov 2020 at 22:01, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > This patch replaces calls to __builtin_neon_vmvnv* builtins with ~
> > > > > operator in arm_neon.h.
> > > > > Cross-tested on arm*-*-*.
> > > > > OK to commit ?
> > >
> > > Ok.
> > Hi Kyrill,
> > I have attached an updated patch that removes entry for vmvn from
> > arm_neon_builtins.def,
> > but I am not sure if that's sufficient ?
> >
> > I tried this:
> > __extension__ extern __inline uint32x2_t
> > __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > vmvn_u32 (uint32x2_t __a)
> > {
> >   return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a);
> > //  return ~__a;
> > }
> >
> > after removing the entry for vmvn, from arm_neon_builtins.def, but
> > compilation appeared to proceed successfully,
> > so I guess it hasn't stopped the builtin from being created ?
> >
>
> I think you'll want to look at the generated assembly?
> IIRC if the builtin isn't created then the assembly will contain a call to 
> __builtin_neon_vmvnv4si, which of course will fail at link-time.
Ah, indeed. When I try to call it, it now results in:
vmvn.c:6:22: warning: implicit declaration of function
‘__builtin_neon_vmvnv2si’; did you mean ‘__builtin_neon_vmlav2si’?
[-Wimplicit-function-declaration]
6 |   return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a);
  |  ^~~
  |  __builtin_neon_vmlav2si
vmvn.c:6:3: error: cannot convert a value of type ‘int’ to vector type
‘__simd64_uint32_t’ which has different size
6 |   return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a);
  |   ^~

so I suppose it has correctly removed it.
Is it OK to commit the patch ?

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> > Thanks,
> > Prathamesh
> > > Sorry for missing this.
> > > Thanks,
> > > Kyrill
> > >
> > > P.S. please use my @arm.com address when CC'ing me, I don't have access
> > to the foss.arm.com one currently...
> > >
> > > > ping https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > > > November/560223.html
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh

RE: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for vmvn intrinsics

2020-12-07 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 07 December 2020 11:26
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for
> vmvn intrinsics
> 
> On Mon, 7 Dec 2020 at 16:34, Kyrylo Tkachov 
> wrote:
> >
> > Hi Prathamesh,
> >
> > > -Original Message-
> > > From: Prathamesh Kulkarni 
> > > Sent: 07 December 2020 11:01
> > > To: Kyrylo Tkachov 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~
> for
> > > vmvn intrinsics
> > >
> > > On Thu, 3 Dec 2020 at 16:05, Kyrylo Tkachov 
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Gcc-patches  On Behalf
> Of
> > > > > Prathamesh Kulkarni via Gcc-patches
> > > > > Sent: 03 December 2020 10:30
> > > > > To: gcc Patches ; Kyrill Tkachov
> > > > > 
> > > > > Subject: Re: [PR66791][ARM] Replace calls to __builtin_neon_vmvn*
> by ~
> > > for
> > > > > vmvn intrinsics
> > > > >
> > > > > On Wed, 25 Nov 2020 at 22:01, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > > This patch replaces calls to __builtin_neon_vmvnv* builtins with ~
> > > > > > operator in arm_neon.h.
> > > > > > Cross-tested on arm*-*-*.
> > > > > > OK to commit ?
> > > >
> > > > Ok.
> > > Hi Kyrill,
> > > I have attached an updated patch that removes entry for vmvn from
> > > arm_neon_builtins.def,
> > > but I am not sure if that's sufficient ?
> > >
> > > I tried this:
> > > __extension__ extern __inline uint32x2_t
> > > __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > > vmvn_u32 (uint32x2_t __a)
> > > {
> > >   return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a);
> > > //  return ~__a;
> > > }
> > >
> > > after removing the entry for vmvn, from arm_neon_builtins.def, but
> > > compilation appeared to proceed successfully,
> > > so I guess it hasn't stopped the builtin from being created ?
> > >
> >
> > I think you'll want to look at the generated assembly?
> > IIRC if the builtin isn't created then the assembly will contain a call to
> __builtin_neon_vmvnv4si, which of course will fail at link-time.
> Ah, indeed. When I try to call it, it now results in:
> vmvn.c:6:22: warning: implicit declaration of function
> ‘__builtin_neon_vmvnv2si’; did you mean ‘__builtin_neon_vmlav2si’?
> [-Wimplicit-function-declaration]
> 6 |   return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a);
>   |  ^~~
>   |  __builtin_neon_vmlav2si
> vmvn.c:6:3: error: cannot convert a value of type ‘int’ to vector type
> ‘__simd64_uint32_t’ which has different size
> 6 |   return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a);
>   |   ^~
> 
> so I suppose it has correctly removed it.
> Is it OK to commit the patch ?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Prathamesh
> > > > Sorry for missing this.
> > > > Thanks,
> > > > Kyrill
> > > >
> > > > P.S. please use my @arm.com address when CC'ing me, I don't have
> access
> > > to the foss.arm.com one currently...
> > > >
> > > > > ping https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > > > > November/560223.html
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh

Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-12-07 Thread Maciej W. Rozycki

On Sun, 29 Nov 2020, Martin Sebor wrote:

> > Perhaps it has been fixed since, but mentioning it in case it has not.
> 
> I wouldn't expect to see this warning after r11-5073.  If it persists,
> can you please open a bug and attach the translation unit to it?

 Indeed, the issue has gone now, now that I have pushed the VAX changes 
and rebuilt a refreshed tree.  Thanks for looking into it.

  Maciej

Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-07 Thread Uros Bizjak via Gcc-patches

On Fri, Dec 4, 2020 at 7:26 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Fri, Dec 04, 2020 at 07:06:45PM +0100, Uros Bizjak wrote:
> > On Fri, Dec 4, 2020 at 6:57 PM Jakub Jelinek  wrote:
> > >
> > > On Fri, Dec 04, 2020 at 06:53:49PM +0100, Uros Bizjak wrote:
> > > > > > I was trying that first, but it didn't work.  Without the
> > > > > > clobber it actually works right, we don't have the rotate insn with 
> > > > > > the
> > > > > > masking and no clobber, so in the end combiner does add the clobber 
> > > > > > there
> > > > > > (or would fail it the clobber couldn't be added).
> > > > >
> > > > > I was not aware of that detail ...
> > > >
> > > > That said, IMO, it would be better to rewrite other _mask and _mask_1
> > > > patterns that remove useless masking to combine splitter.
> > > > Unfortunately, the combine splitter expects exactly two output
> > > > instructions for some reason, but these patterns split to one
> > > > instruction. Perhaps it is possible to relax this limitation of
> > > > combine splitters and also allow one output instruction.
> > >
> > > I've already checked it in.  Guess I can try to change the combine 
> > > splitters
> > > (can it wait till Monday?) so that they remove the masking when splitting
> > > the insn into two, so that the pre-reload splitters aren't involved.
> >
> > No, I didn't want to burden you with the additional task - the patch
> > is OK as it is. I was just thinking out loud, as I remembered that
> > changing bt patterns to combine splitter regressed one testcase. IIRC
> > combination of two insns blocked better combination of three insns, or
> > something like that.
> >
> > > To turn those pre-reload define_insn_and_splits I'm afraid we'd indeed
> > > need combiner's changes, so that would need to be discussed with Segher
> > > first.
> >
> > Yes, that is the long-term plan. Segher CC'd.
>
> A splitter can *already* split to only one insn.

Unfortunately, the attached patch with the following testcase:

--cut here--
int test (int a, int b)
{
 return a << (b & 31);
}
--cut here--

does not trigger the call to combine_split_insns. The reason in the
following condition:

  /* If we were combining three insns and the result is a simple SET
 with no ASM_OPERANDS that wasn't recognized, try to split it into two
 insns.  There are two ways to do this.  It can be split using a
 machine-specific method (like when you have an addition of a large
 constant) or by combine in the function find_split_point.  */

  if (i1 && insn_code_number < 0 && GET_CODE (newpat) == SET
  && asm_noperands (newpat) < 0)

where i1 is null.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 21f0044179f..124495c1c4c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10883,50 +10883,35 @@
 })
 
 ;; Avoid useless masking of count operand.
-(define_insn_and_split "*ashl3_mask"
+(define_split
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
  (subreg:QI
(and:SI
- (match_operand:SI 2 "register_operand" "c,r")
- (match_operand:SI 3 "const_int_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:SI 2 "register_operand")
+ (match_operand:SI 3 "const_int_operand")) 0)))]
   "ix86_binary_operator_ok (ASHIFT, mode, operands)
&& (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
-  == GET_MODE_BITSIZE (mode)-1
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-  "operands[2] = gen_lowpart (QImode, operands[2]);"
-  [(set_attr "isa" "*,bmi2")])
+  == GET_MODE_BITSIZE (mode)-1"
+  [(set (match_dup 0)
+   (ashift:SWI48 (match_dup 1)
+ (match_dup 2)))]
+  "operands[2] = gen_lowpart (QImode, operands[2]);")
 
-(define_insn_and_split "*ashl3_mask_1"
+(define_split
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
  (and:QI
-   (match_operand:QI 2 "register_operand" "c,r")
-   (match_operand:QI 3 "const_int_operand"
-   (clobber (reg:CC FLAGS_REG))]
+   (match_operand:QI 2 "register_operand")
+   (match_operand:QI 3 "const_int_operand"]
   "ix86_binary_operator_ok (ASHIFT, mode, operands)
&& (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
-  == GET_MODE_BITSIZE (mode)-1
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-  ""
-  [(set_attr "isa" "*,bmi2")])
+  == GET_MODE_BITSIZE (mode)-1"
+  [(set (match_dup 0)
+   (ashift:SWI48 (match_dup 1)
+ (match_dup

[committed] doc: "used" attribute saves decls from linker garbage collection

2020-12-07 Thread Jozef Lawrynowicz

Since 6fbec038f7a, the "used" attribute will save the symbol declaration
it is applied to from linker garbage collection, if the target supports
the SHF_GNU_RETAIN section flag.

This patch documents this new functionality.

Committed as obvious.
>From 724390745213d5192af04a51bb08cf44da7c396d Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 7 Dec 2020 14:26:46 +
Subject: [PATCH] doc: "used" attribute saves decls from linker garbage
 collection

gcc/ChangeLog:

* doc/extend.texi (used function attribute): Document saving
the declaration from linker garbage collection.
(used variable attribute): Likewise.
---
 gcc/doc/extend.texi | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fd282aa0157..0c969085d1f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3859,6 +3859,14 @@ When applied to a member function of a C++ class 
template, the
 attribute also means that the function is instantiated if the
 class itself is instantiated.
 
+For ELF targets that support the GNU or FreeBSD OSABIs, this attribute
+will also save the function from linker garbage collection.  To support
+this behavior, functions that have not been placed in specific sections
+(e.g. by the @code{section} attribute, or the @code{-ffunction-sections}
+option), will be placed in new, unique sections.
+
+This additional functionality requires Binutils version 2.36 or later.
+
 @item visibility ("@var{visibility_type}")
 @cindex @code{visibility} function attribute
 This attribute affects the linkage of the declaration to which it is attached.
@@ -7420,6 +7428,14 @@ When applied to a static data member of a C++ class 
template, the
 attribute also means that the member is instantiated if the
 class itself is instantiated.
 
+For ELF targets that support the GNU or FreeBSD OSABIs, this attribute
+will also save the variable from linker garbage collection.  To support
+this behavior, variables that have not been placed in specific sections
+(e.g. by the @code{section} attribute, or the @code{-fdata-sections} option),
+will be placed in new, unique sections.
+
+This additional functionality requires Binutils version 2.36 or later.
+
 @item vector_size (@var{bytes})
 @cindex @code{vector_size} variable attribute
 This attribute specifies the vector size for the type of the declared
-- 
2.29.2

[patch] Fix PR tree-optimization/96344

2020-12-07 Thread Eric Botcazou

Hi,

the very recent addition of the if_to_switch pass has partially disabled the 
optimization I added back in June:

2020-06-26  Eric Botcazou  

* tree-ssa-reassoc.c (dump_range_entry): New function.
(debug_range_entry): New debug function.
(update_range_test): Invoke dump_range_entry for dumping.
(optimize_range_tests_to_bit_test): Merge the entry test in the
bit test when possible and lower the profitability threshold.

as witnessed by the 3 new failures in the gnat.dg testsuite.  It turns out 
that both tree-ssa-reassoc.c and tree-switch-conversion.c can turn things into 
bit tests so the optimization must also be added to bit_test_cluster::emit.

The patch also contains a secondary optimization, whereby the full bit-test 
sequence is sent to the folder before being gimplified in case there is only 
one test, so that the most optimized sequence (bt + jc on x86) can be emitted 
like with optimize_range_tests_to_bit_test.

Bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2020-12-07  Eric Botcazou  

PR tree-optimization/96344
* tree-switch-conversion.c (bit_test_cluster::emit): Compute the
range only if an entry test is necessary.  Merge the entry test in
the bit test when possible.  Use PREC local variable consistently.
When there is only one test, do a single gimplification at the end.

-- 
Eric Botcazoudiff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index a87a2a3cd15..989bd7710d1 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1511,7 +1511,6 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
 
   tree minval = get_low ();
   tree maxval = get_high ();
-  tree range = int_const_binop (MINUS_EXPR, maxval, minval);
   unsigned HOST_WIDE_INT bt_range = get_range (minval, maxval);
 
   /* Go through all case labels, and collect the case labels, profile
@@ -1550,11 +1549,38 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
 
   qsort (test, count, sizeof (*test), case_bit_test::cmp);
 
+  /* If every possible relative value of the index expression is a valid shift
+ amount, then we can merge the entry test in the bit test.  */
+  wide_int min, max;
+  bool entry_test_needed;
+  if (TREE_CODE (index_expr) == SSA_NAME
+  && get_range_info (index_expr, &min, &max) == VR_RANGE
+  && wi::leu_p (max - min, prec - 1))
+{
+  wide_int iminval = wi::to_wide (minval);
+  tree minval_type = TREE_TYPE (minval);
+  if (wi::lt_p (min, iminval, TYPE_SIGN (minval_type)))
+	{
+	  minval = wide_int_to_tree (minval_type, min);
+	  for (i = 0; i < count; i++)
+	test[i].mask = wi::lshift (test[i].mask, iminval - min);
+	}
+  else if (wi::gt_p (min, iminval, TYPE_SIGN (minval_type)))
+	{
+	  minval = wide_int_to_tree (minval_type, min);
+	  for (i = 0; i < count; i++)
+	test[i].mask = wi::lrshift (test[i].mask, min - iminval);
+	}
+  maxval = wide_int_to_tree (minval_type, max);
+  entry_test_needed = false;
+}
+  else
+entry_test_needed = true;
+
   /* If all values are in the 0 .. BITS_PER_WORD-1 range, we can get rid of
  the minval subtractions, but it might make the mask constants more
  expensive.  So, compare the costs.  */
-  if (compare_tree_int (minval, 0) > 0
-  && compare_tree_int (maxval, GET_MODE_BITSIZE (word_mode)) < 0)
+  if (compare_tree_int (minval, 0) > 0 && compare_tree_int (maxval, prec) < 0)
 {
   int cost_diff;
   HOST_WIDE_INT m = tree_to_uhwi (minval);
@@ -1577,7 +1603,6 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
 	  for (i = 0; i < count; i++)
 	test[i].mask = wi::lshift (test[i].mask, m);
 	  minval = build_zero_cst (TREE_TYPE (minval));
-	  range = maxval;
 	}
 }
 
@@ -1593,8 +1618,9 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
   /*simple=*/true, NULL_TREE,
   /*before=*/true, GSI_SAME_STMT);
 
-  if (m_handles_entire_switch)
+  if (m_handles_entire_switch && entry_test_needed)
 {
+  tree range = int_const_binop (MINUS_EXPR, maxval, minval);
   /* if (idx > range) goto default */
   range
 	= force_gimple_operand_gsi (&gsi,
@@ -1608,16 +1634,22 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
   gsi = gsi_last_bb (new_bb);
 }
 
-  /* csui = (1 << (word_mode) idx) */
-  csui = make_ssa_name (word_type_node);
   tmp = fold_build2 (LSHIFT_EXPR, word_type_node, word_mode_one,
 		 fold_convert (word_type_node, idx));
-  tmp = force_gimple_operand_gsi (&gsi, tmp,
-  /*simple=*/false, NULL_TREE,
-  /*before=*/true, GSI_SAME_STMT);
-  shift_stmt = gimple_build_assign (csui, tmp);
-  gsi_insert_before (&gsi, shift_stmt, GSI_SAME_STMT);
-  update_stmt (shift_stmt);
+
+  /* csui = (1 << (word_mode) idx) */
+  if (count > 1)
+{
+  csui = make_ssa_name (word_type_node);
+  tmp = force_gimple_operand_gsi (&gsi, tmp,
+ /*simple=*/false, NULL_TREE,
+

Re: [PATCH v2] c++: ICE with switch and scoped enum bit-fields [PR98043]

2020-12-07 Thread Jason Merrill via Gcc-patches


On 12/4/20 10:46 PM, Marek Polacek wrote:

On Wed, Dec 02, 2020 at 09:50:33PM -0500, Jason Merrill wrote:

On 12/2/20 6:18 PM, Marek Polacek wrote:

In this testcase we are crashing trying to gimplify a switch, because
the types of the switch condition and case constants have different
TYPE_PRECISIONs.

This started with my r5-3726 fix: SWITCH_STMT_TYPE is supposed to be the
original type of the switch condition before any conversions, so in the
C++ FE we need to use unlowered_expr_type to get the unlowered type of
enum bit-fields.

Normally, the switch type is subject to integral promotions, but here
we have a scoped enum type and those don't promote:

enum class B { A };
struct C { B c : 8; };

switch (x.c) // type B
  case B::A: // type int, will be converted to B

Here TREE_TYPE is "signed char" but SWITCH_STMT_TYPE is "B".  When
gimplifying this in gimplify_switch_expr, the index type is "B" and
we convert all the case values to "B" in preprocess_case_label_vec,
but SWITCH_COND is of type "signed char": gimple_switch_index should
be the (possibly promoted) type, not the original type, so we gimplify
the "x.c" SWITCH_COND to a SSA_NAME of type "signed char".  And then
we crash because the precision of the index type doesn't match the
precision of the case value type.

I think it makes sense to do the following; at the end of pop_switch
we've already issued the switch warnings, and since scoped enums don't
promote, it should be okay to use the type of SWITCH_STMT_COND.  The
r5-3726 change was about giving warnings for enum bit-fields anyway.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/98043
* decl.c (pop_switch): If SWITCH_STMT_TYPE is a scoped enum type,
set it to the type of SWITCH_STMT_COND.


It might make sense to do this in cp_genericize_r instead, but here is fine.


Right.  In the end I chose pop_switch due to the other SWITCH_STMT_* handling,
so that it's in the same function.


--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -3711,6 +3711,17 @@ pop_switch (void)
   SWITCH_STMT_ALL_CASES_P (cs->switch_stmt) = 1;
 if (!cs->break_stmt_seen_p)
   SWITCH_STMT_NO_BREAK_P (cs->switch_stmt) = 1;
+  /* Now that we're done with the switch warnings, set the switch type
+ to the type of the condition if the index type was of scoped enum type.
+ (Such types don't participate in the integer promotions.)  We do this
+ because of bit-fields whose declared type is a scoped enum type:
+ gimplification will use the lowered index type, but convert the
+ case values to SWITCH_STMT_TYPE, which would have been the declared type
+ and verify_gimple_switch doesn't accept that.  */
+  if (SWITCH_STMT_TYPE (cs->switch_stmt)
+  && SCOPED_ENUM_P (SWITCH_STMT_TYPE (cs->switch_stmt)))
+SWITCH_STMT_TYPE (cs->switch_stmt)
+  = TREE_TYPE (SWITCH_STMT_COND (cs->switch_stmt));


What would be the impact of doing this for all
is_bitfield_expr_with_lowered_type conditions, rather than all scoped enum
conditions?


The impact is the same: for ordinary bit-fields and unscoped enum bit-fields,
cond will already have been promoted here, so e.g. "(int) a.b", and
is_bitfield_expr_with_lowered_type will return NULL_TREE.  And for scoped
enum bit-fields we will do the same thing as in v1.  I think the SCOPED_ENUM_P
check is cheaper than is_bitfield_expr_with_lowered_type but I'm fine with
either version.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?


OK.


-- >8 --
In this testcase we are crashing trying to gimplify a switch, because
the types of the switch condition and case constants have different
TYPE_PRECISIONs.

This started with my r5-3726 fix: SWITCH_STMT_TYPE is supposed to be the
original type of the switch condition before any conversions, so in the
C++ FE we need to use unlowered_expr_type to get the unlowered type of
enum bit-fields.

Normally, the switch type is subject to integral promotions, but here
we have a scoped enum type and those don't promote:

   enum class B { A };
   struct C { B c : 8; };

   switch (x.c) // type B
 case B::A: // type int, will be converted to B

Here TREE_TYPE is "signed char" but SWITCH_STMT_TYPE is "B".  When
gimplifying this in gimplify_switch_expr, the index type is "B" and
we convert all the case values to "B" in preprocess_case_label_vec,
but SWITCH_COND is of type "signed char": gimple_switch_index should
be the (possibly promoted) type, not the original type, so we gimplify
the "x.c" SWITCH_COND to a SSA_NAME of type "signed char".  And then
we crash because the precision of the index type doesn't match the
precision of the case value type.

I think it makes sense to do the following; at the end of pop_switch
we've already issued the switch warnings, and since scoped enums don't
promote, it should be okay to use the type of SWITCH_STMT_COND.  The
r5-3726 change was about giving warnings for enum bit-fields anyway.

gcc/cp/C

Re: [PATCH] c-family: Fix hang with -Wsequence-point [PR98126]

2020-12-07 Thread Jason Merrill via Gcc-patches


On 12/4/20 10:40 PM, Marek Polacek wrote:

verify_sequence_points uses verify_tree to recursively walk the
subexpressions of an expression, and while recursing, it also
keeps lists of expressions found after/before a sequence point.
For a large expression, the list can grow significantly.  And
merge_tlist is at least N(n^2): for a list of length n it will
iterate n(n -1) times, and call candidate_equal_p each time, and
that can recurse further.  warn_for_collision also has to go
through the whole list.  With a large-enough expression, the
compilation can easily get stuck here for 24 hours.

This patch is a simple kludge: if we see that the expression is
overly complex, don't even try.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/c-family/ChangeLog:

PR c++/98126
* c-common.c (verify_tree_lim_r): New function.
(verify_sequence_points): Use it.  Use nullptr instead of 0.

gcc/testsuite/ChangeLog:

PR c++/98126
* g++.dg/warn/Wsequence-point-4.C: New test.
---
  gcc/c-family/c-common.c   | 32 +--
  gcc/testsuite/g++.dg/warn/Wsequence-point-4.C | 53 +++
  2 files changed, 80 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wsequence-point-4.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index dda23520b96..0b348aec77b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -2056,23 +2056,45 @@ verify_tree (tree x, struct tlist **pbefore_sp, struct 
tlist **pno_sp,
  }
  }
  
+static constexpr size_t verify_sequence_points_limit = 1024;

+
+/* Called from verify_sequence_points via walk_tree.  */
+
+static tree
+verify_tree_lim_r (tree *tp, int *walk_subtrees, void *data)
+{
+  if (++*((size_t *) data) > verify_sequence_points_limit)
+return integer_zero_node;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  return NULL_TREE;
+}
+
  /* Try to warn for undefined behavior in EXPR due to missing sequence
 points.  */
  
  void

  verify_sequence_points (tree expr)
  {
-  struct tlist *before_sp = 0, *after_sp = 0;
+  tlist *before_sp = nullptr, *after_sp = nullptr;
+
+  /* verify_tree is highly recursive, and merge_tlist is O(n^2),
+ so we return early if the expression is too big.  */
+  size_t n = 0;
+  if (walk_tree (&expr, verify_tree_lim_r, &n, nullptr))
+return;
  
-  warned_ids = 0;

-  save_expr_cache = 0;
-  if (tlist_firstobj == 0)
+  warned_ids = nullptr;
+  save_expr_cache = nullptr;
+  if (!tlist_firstobj)
  {
gcc_obstack_init (&tlist_obstack);
tlist_firstobj = (char *) obstack_alloc (&tlist_obstack, 0);
  }
  
-  verify_tree (expr, &before_sp, &after_sp, 0);

+  verify_tree (expr, &before_sp, &after_sp, NULL_TREE);
warn_for_collisions (after_sp);
obstack_free (&tlist_obstack, tlist_firstobj);
  }
diff --git a/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C 
b/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C
new file mode 100644
index 000..1382ab5a934
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C
@@ -0,0 +1,53 @@
+// PR c++/98126
+// { dg-do compile }
+// { dg-options "-Wsequence-point" }
+// Make sure we don't hand when verify_tree processes a large expression.
+
+struct T { bool operator==(const T &ot) const; };
+
+#define CMP(M, N, L) t[100 * M + 10 * N + L] == ot.t[100 * M + 10 * N + L] &&
+
+#define CMP1(M, N) \
+  CMP(M, N, 0) \
+  CMP(M, N, 1) \
+  CMP(M, N, 2) \
+  CMP(M, N, 3) \
+  CMP(M, N, 4) \
+  CMP(M, N, 5) \
+  CMP(M, N, 6) \
+  CMP(M, N, 7) \
+  CMP(M, N, 8) \
+  CMP(M, N, 9)
+
+#define CMP2(M) \
+  CMP1(M, 0) \
+  CMP1(M, 1) \
+  CMP1(M, 2) \
+  CMP1(M, 3) \
+  CMP1(M, 4) \
+  CMP1(M, 5) \
+  CMP1(M, 6) \
+  CMP1(M, 7) \
+  CMP1(M, 8) \
+  CMP1(M, 9)
+
+#define GENERATE_CMPS \
+  CMP2(0) \
+  CMP2(1) \
+  CMP2(2) \
+  CMP2(3) \
+  CMP2(4) \
+  CMP2(5) \
+  CMP2(6) \
+  CMP2(7) \
+  CMP2(8) \
+  CMP2(9)
+
+struct C {
+  bool operator==(const C &ot) const {
+return
+  GENERATE_CMPS
+  true;
+  }
+  T t[999];
+};

base-commit: df933e307b1950ce12472660dcac1765b8eb431d

[committed][wwwdocs] gcc-11/changes: "used" attribute saves decls from linker garbage collection

2020-12-07 Thread Jozef Lawrynowicz

Since 6fbec038f7a, the "used" attribute will save the symbol declaration
it is applied to from linker garbage collection, if the target supports
the SHF_GNU_RETAIN section flag.

This patch documents this functionality in the GCC 11 changes document.
Some users might need to amend their source code to handle the fact that
GCC can now place "used" decls in a new, unique section.

Committed as obvious.
>From 7c56c86ebe102849ec239f0c57e93988169d90f4 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 7 Dec 2020 14:41:13 +
Subject: [PATCH] gcc-11/changes: "used" attribute saves decls from linker
 garbage collection

---
 htdocs/gcc-11/changes.html | 12 
 1 file changed, 12 insertions(+)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index ed289744..4d3efed5 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -158,6 +158,18 @@ a work-in-progress.
   
 
   
+  
+For ELF targets that support the GNU or FreeBSD OSABIs, the
+used attribute will now save the symbol declaration it is
+applied to from linker garbage collection.
+
+To support this behavior, used symbols that have not
+been placed in specific sections (e.g. with the section
+attribute, or the -f{function,data}-sections options) will
+be placed in new, unique sections.
+
+This functionality requires Binutils version 2.36 or later.
+  
 
 
 C++
-- 
2.29.2

Re: [PATCH] Avoid atomic for guard acquire when that is expensive

2020-12-07 Thread Jason Merrill via Gcc-patches


On 12/5/20 7:37 AM, Bernd Edlinger wrote:

On 12/2/20 7:57 PM, Jason Merrill wrote:

On 12/1/20 1:28 PM, Bernd Edlinger wrote:

On 11/24/20 11:10 PM, Jason Merrill wrote:

On 11/22/20 3:05 AM, Bernd Edlinger wrote:

Hi,

this avoids the need to use -fno-threadsafe-statics on
arm-none-eabi or working around that problem by supplying
a dummy __sync_synchronize function which might
just lead to silent code failure of the worst kind
(non-reproducable, racy) at runtime, as was pointed out
on previous discussions here.

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.


Instead of a new target macro, can't you follow 
fold_builtin_atomic_always_lock_free/can_atomic_load_p?


Yes, thanks, that should work too.
Would you like this better?



+is_atomic_expensive_p (machine_mode mode)
+{
+  if (!flag_inline_atomics)
+    return false;


Why not true?


Ooops...
Yes, I ought to return true here.
I must have made a mistake when I tested the last version of this patch,
sorry for the confusion.


+  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
+    return false;


This also seems backwards; I'd think we want to return false if either of those 
tests are true.  Or maybe just can_atomic_load_p, and not bother about 
compare-and-swap.


Yes, you are right.
Unfortuately can_atomic_load_p is too weak, since it does not cover
the memory barrier.

And can_compare_and_swap_p (..., false) is actually a bit too strong,
but if it returns true, we should be able to use any atomic without
need for a library call.



+  tree type = targetm.cxx.guard_mask_bit ()
+  ? TREE_TYPE (guard) : char_type_node;
+
+  if (is_atomic_expensive_p (TYPE_MODE (type)))
+    guard = integer_zero_node;
+  else
+    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);


It should still work to load a single byte, it just needs to be the 
least-significant byte.


I still don't think you need to load the whole word to check the guard bit.


And this isn't an EABI issue; it looks like the non-EABI code is also broken 
for big-endian targets, both the atomic load and the normal load in 
get_guard_bits.




I think the non-EABI code is always using bit 0 in the first byte,
by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).


Except that set_guard sets the least-significant bit on all targets.


Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.

For all other targets when _GLIBCXX_USE_FUTEX is defined,
__cxa_guard_XXX accesses the value as int* while the memory
is a 64-bit long, so I could imagine that is an aliasing violation.


But nothing that needs to be fixed immediately.


Agreed.


Attached is the corrected patch.

Tested again on arm-none-eabi with arm-sim.
Is it OK for trunk?

Thanks
Bernd.


Jason

Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-07 Thread Segher Boessenkool

Hi!

On Mon, Dec 07, 2020 at 03:27:14PM +0100, Uros Bizjak wrote:
> On Fri, Dec 4, 2020 at 7:26 PM Segher Boessenkool
>  wrote:
> > A splitter can *already* split to only one insn.
> 
> Unfortunately, the attached patch with the following testcase:
> 
> --cut here--
> int test (int a, int b)
> {
>  return a << (b & 31);
> }
> --cut here--
> 
> does not trigger the call to combine_split_insns. The reason in the
> following condition:
> 
>   /* If we were combining three insns and the result is a simple SET
>  with no ASM_OPERANDS that wasn't recognized, try to split it into two
>  insns.  There are two ways to do this.  It can be split using a
>  machine-specific method (like when you have an addition of a large
>  constant) or by combine in the function find_split_point.  */
> 
>   if (i1 && insn_code_number < 0 && GET_CODE (newpat) == SET
>   && asm_noperands (newpat) < 0)
> 
> where i1 is null.

I see.  We could also allow this for 2->2 combinations (w/ the same
restrictions we do for other 2->2 combinations, but that probably falls
out automatically).

This will need careful testing of course, but I don't expect anything
bad (famous last words of course).

Could you please open a PR for this?  Just so I can keep track :-)


Segher

[COMMITTED] Fix location info in ipa_param_body_adjustments::modify_call_stmt

2020-12-07 Thread Bernd Edlinger

On 12/7/20 11:50 AM, Richard Biener wrote:
> 
> The ipa-param-manipulation.c hunk is OK, please commit separately.
> 

Okay, this is the part I just committed.

commit 92e563d91b012f09da8fd152e934f6b964ae49cb
Author: Bernd Edlinger 
Date:   Mon Dec 7 16:00:00 2020 +0100

Fix location info in ipa_param_body_adjustments::modify_call_stmt

Copy the location info from the passed in call stmt
to the newly built gimple call stmt.

2020-12-07  Bernd Edlinger  

* ipa-param-manipulation.c
(ipa_param_body_adjustments::modify_call_stmt): Set location info.

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 2bbea21..9ab4a10 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1681,6 +1681,8 @@ ipa_param_body_adjustments::modify_call_stmt (gcall 
**stmt_p)
}
}
   gcall *new_stmt = gimple_build_call_vec (gimple_call_fn (stmt), vargs);
+  if (gimple_has_location (stmt))
+   gimple_set_location (new_stmt, gimple_location (stmt));
   gimple_call_set_chain (new_stmt, gimple_call_chain (stmt));
   gimple_call_copy_flags (new_stmt, stmt);
   if (tree lhs = gimple_call_lhs (stmt))

Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Dec 07, 2020 at 09:28:19AM -0600, Segher Boessenkool wrote:
> I see.  We could also allow this for 2->2 combinations (w/ the same
> restrictions we do for other 2->2 combinations, but that probably falls
> out automatically).

In this case we actually want a 2->1 combination, where the result isn't
recognized, but the splitter splits the 1 insn into another 1.
Could be limited just to that, but yes, even 2->1 combination where
the 1 is split into 2 insns and costs say it is ok (and other restrictions
satisfied) should be ok too.

Jakub

Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-07 Thread Segher Boessenkool

On Mon, Dec 07, 2020 at 04:40:39PM +0100, Jakub Jelinek wrote:
> On Mon, Dec 07, 2020 at 09:28:19AM -0600, Segher Boessenkool wrote:
> > I see.  We could also allow this for 2->2 combinations (w/ the same
> > restrictions we do for other 2->2 combinations, but that probably falls
> > out automatically).
> 
> In this case we actually want a 2->1 combination, where the result isn't
> recognized, but the splitter splits the 1 insn into another 1.
> Could be limited just to that, but yes, even 2->1 combination where
> the 1 is split into 2 insns and costs say it is ok (and other restrictions
> satisfied) should be ok too.

Yes, that needs exactly the same code (allow a split if we started with
just two insns), and it all falls out naturally :-)


Segher

Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-07 Thread Uros Bizjak via Gcc-patches

On Mon, Dec 7, 2020 at 4:30 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Mon, Dec 07, 2020 at 03:27:14PM +0100, Uros Bizjak wrote:
> > On Fri, Dec 4, 2020 at 7:26 PM Segher Boessenkool
> >  wrote:
> > > A splitter can *already* split to only one insn.
> >
> > Unfortunately, the attached patch with the following testcase:
> >
> > --cut here--
> > int test (int a, int b)
> > {
> >  return a << (b & 31);
> > }
> > --cut here--
> >
> > does not trigger the call to combine_split_insns. The reason in the
> > following condition:
> >
> >   /* If we were combining three insns and the result is a simple SET
> >  with no ASM_OPERANDS that wasn't recognized, try to split it into two
> >  insns.  There are two ways to do this.  It can be split using a
> >  machine-specific method (like when you have an addition of a large
> >  constant) or by combine in the function find_split_point.  */
> >
> >   if (i1 && insn_code_number < 0 && GET_CODE (newpat) == SET
> >   && asm_noperands (newpat) < 0)
> >
> > where i1 is null.
>
> I see.  We could also allow this for 2->2 combinations (w/ the same
> restrictions we do for other 2->2 combinations, but that probably falls
> out automatically).
>
> This will need careful testing of course, but I don't expect anything
> bad (famous last words of course).
>
> Could you please open a PR for this?  Just so I can keep track :-)

I have opened PR98178.

Thanks,
Uros.

Re: [PATCH] Avoid atomic for guard acquire when that is expensive

2020-12-07 Thread Bernd Edlinger

On 12/7/20 4:04 PM, Jason Merrill wrote:
> On 12/5/20 7:37 AM, Bernd Edlinger wrote:
>> On 12/2/20 7:57 PM, Jason Merrill wrote:
>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote: +  tree type = 
>>> targetm.cxx.guard_mask_bit ()
 +  ? TREE_TYPE (guard) : char_type_node;
 +
 +  if (is_atomic_expensive_p (TYPE_MODE (type)))
 +    guard = integer_zero_node;
 +  else
 +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>>
>>> It should still work to load a single byte, it just needs to be the 
>>> least-significant byte.
> 
> I still don't think you need to load the whole word to check the guard bit.
> 

Of course that is also possible.  But I would not expect an
address offset and a byte access to be cheaper than a word access.

I just saw that get_guard_bits does the same thing:
access the first byte if !targetm.cxx.guard_mask_bit ()
and access the whole word otherwise, which is only
there for ARM EABI.

>> And this isn't an EABI issue; it looks like the non-EABI code is also broken 
>> for big-endian targets, both the atomic load and the normal load in 
>> get_guard_bits.
>>>
>>
>> I think the non-EABI code is always using bit 0 in the first byte,
>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 
>> 1).
> 
> Except that set_guard sets the least-significant bit on all targets.
> 

Hmm, as I said, get_guard_bits gets the guard as a word if 
!targetm.cxx.guard_mask_bit (),
and as the first byte otherwise.  Of course it could get the third byte,
if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more 
complicated
this way, wouldn't it?


Bernd.

>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 
>> otherwise.
>>
>> For all other targets when _GLIBCXX_USE_FUTEX is defined,
>> __cxa_guard_XXX accesses the value as int* while the memory
>> is a 64-bit long, so I could imagine that is an aliasing violation.
>>
>>
>> But nothing that needs to be fixed immediately.
> 
> Agreed.
> 
>> Attached is the corrected patch.
>>
>> Tested again on arm-none-eabi with arm-sim.
>> Is it OK for trunk?
>>
>> Thanks
>> Bernd.
>>
>>> Jason
>>>
>

Re: [committed] Fix non-unique testnames

2020-12-07 Thread Martin Sebor via Gcc-patches


On 12/4/20 3:17 PM, Jeff Law via Gcc-patches wrote:



On 12/4/20 2:55 PM, Mike Stump wrote:

On Nov 30, 2020, at 8:00 AM, Jeff Law via Gcc-patches  
wrote:

This patch fixes a handful of tests with non-unique names which confuse
the living hell out of compare_tests, particularly if one of two tests
[x]fail while the other is [x]pass which compare_tests will flag as a
regression each and every run.

Thanks.  The other way to fix the issue is to fix the tools so that they never 
fail.  :-)

Yes, but either way tests should be unique.


I know I introduced some (or even most) of these and I still have
to change the strings and make them unique.  What bothers me is
that it's yet another unnecessary hoop for us to jump through.
It's not documented anywhere that these things need to be unique,
there's no tooling to detect when they're not, and it gets missed
in code reviews.  Why not instead change the test harness to do
this for us?

Martin

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches




> On Dec 7, 2020, at 1:12 AM, Richard Biener  wrote:
> 
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  > wrote:
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>  wrote:
>> 
>> 
>> Richard Biener via Gcc-patches  writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much 
>> more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>> the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
> 
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.

Yes, during uninitialized variable analysis pass, we should specially handle 
the defs with “.DEFERRED_INIT”, to treat them as uninitializations.

>> If we want to keep the current -Wuninitialized analysis untouched, this is a 
>> quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis 
>> untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the 
>> whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an 
>> implementation ready.
> 
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emi

c++: check alias match for specializations [PR98116]

2020-12-07 Thread Nathan Sidwell


This fixes the underlying problem my recent (backedout) changesto
array type creation uncovered.We had paths through
structural_comptypes that ignored alias templates, even when
significant.  This addsthe necessary checks.

PR c++/98116
gcc/cp/
* typeck.c (structural_comptypes): Move early outs to comptype.
Always check template-alias match when comparing_specializations.
(comptypes): Do early out checking here.
gcc/testsuite/
* g++.dg/template/pr98116.C: Remove dg-ice.
* g++.dg/template/pr98116-2.C: New.

pushing to trunk
--
Nathan Sidwell
diff --git c/gcc/cp/typeck.c w/gcc/cp/typeck.c
index 267b284ea40..4d499af5ccb 100644
--- c/gcc/cp/typeck.c
+++ w/gcc/cp/typeck.c
@@ -1247,14 +1247,8 @@ cxx_safe_function_type_cast_p (tree t1, tree t2)
 static bool
 structural_comptypes (tree t1, tree t2, int strict)
 {
-  if (t1 == t2)
-return true;
-
-  /* Suppress errors caused by previously reported errors.  */
-  if (t1 == error_mark_node || t2 == error_mark_node)
-return false;
-
-  gcc_assert (TYPE_P (t1) && TYPE_P (t2));
+  /* Both should be types that are not obviously the same.  */
+  gcc_checking_assert (t1 != t2 && TYPE_P (t1) && TYPE_P (t2));
 
   if (!comparing_specializations)
 {
@@ -1300,13 +1294,13 @@ structural_comptypes (tree t1, tree t2, int strict)
   /* Allow for two different type nodes which have essentially the same
  definition.  Note that we already checked for equality of the type
  qualifiers (just above).  */
-
   if (TREE_CODE (t1) != ARRAY_TYPE
   && TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2))
-return true;
-
+goto check_alias;
 
-  /* Compare the types.  Break out if they could be the same.  */
+  /* Compare the types.  Return false on known not-same. Break on not
+ known.   Never return true from this switch -- you'll break
+ specialization comparison.*/
   switch (TREE_CODE (t1))
 {
 case VOID_TYPE:
@@ -1332,7 +1326,11 @@ structural_comptypes (tree t1, tree t2, int strict)
 	 have identical properties, different TYPE_MAIN_VARIANTs, but
 	 represent the same type.  The canonical type system keeps
 	 track of equivalence in this case, so we fall back on it.  */
-  return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
+  if (TYPE_CANONICAL (t1) != TYPE_CANONICAL (t2))
+	return false;
+
+  /* We don't need or want the attribute comparison.  */
+  goto check_alias;
 
 case TEMPLATE_TEMPLATE_PARM:
 case BOUND_TEMPLATE_TEMPLATE_PARM:
@@ -1477,24 +1475,28 @@ structural_comptypes (tree t1, tree t2, int strict)
   return false;
 }
 
-  /* Don't treat an alias template specialization with dependent
- arguments as equivalent to its underlying type when used as a
- template argument; we need them to be distinct so that we
- substitute into the specialization arguments at instantiation
- time.  And aliases can't be equivalent without being ==, so
- we don't need to look any deeper.  */
+  /* If we get here, we know that from a target independent POV the
+ types are the same.  Make sure the target attributes are also
+ the same.  */
+  if (!comp_type_attributes (t1, t2))
+return false;
+
+ check_alias:
   if (comparing_specializations)
 {
+  /* Don't treat an alias template specialization with dependent
+	 arguments as equivalent to its underlying type when used as a
+	 template argument; we need them to be distinct so that we
+	 substitute into the specialization arguments at instantiation
+	 time.  And aliases can't be equivalent without being ==, so
+	 we don't need to look any deeper.  */
   tree dep1 = dependent_alias_template_spec_p (t1, nt_transparent);
   tree dep2 = dependent_alias_template_spec_p (t2, nt_transparent);
   if ((dep1 || dep2) && dep1 != dep2)
 	return false;
 }
 
-  /* If we get here, we know that from a target independent POV the
- types are the same.  Make sure the target attributes are also
- the same.  */
-  return comp_type_attributes (t1, t2);
+  return true;
 }
 
 /* Return true if T1 and T2 are related as allowed by STRICT.  STRICT
@@ -1509,6 +1511,13 @@ comptypes (tree t1, tree t2, int strict)
   gcc_checking_assert (TREE_CODE (t1) != TYPE_ARGUMENT_PACK
 		   && TREE_CODE (t2) != TYPE_ARGUMENT_PACK);
 
+  if (t1 == t2)
+return true;
+
+  /* Suppress errors caused by previously reported errors.  */
+  if (t1 == error_mark_node || t2 == error_mark_node)
+return false;
+
   if (strict == COMPARE_STRICT && comparing_specializations
   && (t1 != TYPE_CANONICAL (t1) || t2 != TYPE_CANONICAL (t2)))
 /* If comparing_specializations, treat dependent aliases as distinct.  */
@@ -1516,12 +1525,6 @@ comptypes (tree t1, tree t2, int strict)
 
   if (strict == COMPARE_STRICT)
 {
-  if (t1 == t2)
-	return true;
-
-  if (t1 == error_mark_node || t2 == error_mark_node)
-	return false;
-
   if (TYPE_STRUCTURAL_EQUALITY_P (t1) ||

c++: Adjust array type construction

2020-12-07 Thread Nathan Sidwell


This restores the dependent array changes I reverted, now that pr98116
appears fixed.  As mentioned before, when deserializing a module we
need to construct arrays without using the dependent-type predicates
themselves.

gcc/cp/
* cp-tree.h (build_cplus_array_type): Add defaulted DEP parm.
* tree.c (set_array_type_common): Add DEP parm.
(build_cplus_array_type): Add DEP parm, determine dependency if
needed.  Mark dependency of new types.
(cp_build_qualified_type_real): Adjust array-building call, assert
no surprising dependency.
(strip_typedefs): Likewise.

pushing to trunk

--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 00901fe42d4..b72069eecda 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -7559,7 +7559,7 @@ extern bool is_local_temp			(tree);
 extern tree build_aggr_init_expr		(tree, tree);
 extern tree get_target_expr			(tree);
 extern tree get_target_expr_sfinae		(tree, tsubst_flags_t);
-extern tree build_cplus_array_type		(tree, tree);
+extern tree build_cplus_array_type		(tree, tree, int is_dep = -1);
 extern tree build_array_of_n_type		(tree, int);
 extern bool array_of_runtime_bound_p		(tree);
 extern bool vla_type_p(tree);
diff --git i/gcc/cp/tree.c w/gcc/cp/tree.c
index 4e6bf9abba6..d9fa505041f 100644
--- i/gcc/cp/tree.c
+++ w/gcc/cp/tree.c
@@ -998,7 +998,7 @@ build_min_array_type (tree elt_type, tree index_type)
build_cplus_array_type.  */
 
 static void
-set_array_type_canon (tree t, tree elt_type, tree index_type)
+set_array_type_canon (tree t, tree elt_type, tree index_type, bool dep)
 {
   /* Set the canonical type for this new node.  */
   if (TYPE_STRUCTURAL_EQUALITY_P (elt_type)
@@ -1009,30 +1009,33 @@ set_array_type_canon (tree t, tree elt_type, tree index_type)
 TYPE_CANONICAL (t)
   = build_cplus_array_type (TYPE_CANONICAL (elt_type),
 index_type
-? TYPE_CANONICAL (index_type) : index_type);
+? TYPE_CANONICAL (index_type) : index_type,
+dep);
   else
 TYPE_CANONICAL (t) = t;
 }
 
 /* Like build_array_type, but handle special C++ semantics: an array of a
variant element type is a variant of the array of the main variant of
-   the element type.  */
+   the element type.  IS_DEPENDENT is -ve if we should determine the
+   dependency.  Otherwise its bool value indicates dependency.  */
 
 tree
-build_cplus_array_type (tree elt_type, tree index_type)
+build_cplus_array_type (tree elt_type, tree index_type, int dependent)
 {
   tree t;
 
   if (elt_type == error_mark_node || index_type == error_mark_node)
 return error_mark_node;
 
-  bool dependent = (uses_template_parms (elt_type)
-		|| (index_type && uses_template_parms (index_type)));
+  if (dependent < 0)
+dependent = (uses_template_parms (elt_type)
+		 || (index_type && uses_template_parms (index_type)));
 
   if (elt_type != TYPE_MAIN_VARIANT (elt_type))
 /* Start with an array of the TYPE_MAIN_VARIANT.  */
 t = build_cplus_array_type (TYPE_MAIN_VARIANT (elt_type),
-index_type);
+index_type, dependent);
   else if (dependent)
 {
   /* Since type_hash_canon calls layout_type, we need to use our own
@@ -1062,13 +1065,20 @@ build_cplus_array_type (tree elt_type, tree index_type)
 	  *e = t;
 
 	  /* Set the canonical type for this new node.  */
-	  set_array_type_canon (t, elt_type, index_type);
+	  set_array_type_canon (t, elt_type, index_type, dependent);
+
+	  /* Mark it as dependent now, this saves time later.  */
+	  TYPE_DEPENDENT_P_VALID (t) = true;
+	  TYPE_DEPENDENT_P (t) = true;
 	}
 }
   else
 {
   bool typeless_storage = is_byte_access_type (elt_type);
   t = build_array_type (elt_type, index_type, typeless_storage);
+
+  /* Mark as non-dependenty now, this will save time later.  */
+  TYPE_DEPENDENT_P_VALID (t) = true;
 }
 
   /* Now check whether we already have this array variant.  */
@@ -1083,7 +1093,10 @@ build_cplus_array_type (tree elt_type, tree index_type)
   if (!t)
 	{
 	  t = build_min_array_type (elt_type, index_type);
-	  set_array_type_canon (t, elt_type, index_type);
+	  /* Mark dependency now, this saves time later.  */
+	  TYPE_DEPENDENT_P_VALID (t) = true;
+	  TYPE_DEPENDENT_P (t) = dependent;
+	  set_array_type_canon (t, elt_type, index_type, dependent);
 	  if (!dependent)
 	{
 	  layout_type (t);
@@ -1319,7 +1332,10 @@ cp_build_qualified_type_real (tree type,
 
   if (!t)
 	{
-	  t = build_cplus_array_type (element_type, TYPE_DOMAIN (type));
+	  gcc_checking_assert (TYPE_DEPENDENT_P_VALID (type)
+			   || !dependent_type_p (type));
+	  t = build_cplus_array_type (element_type, TYPE_DOMAIN (type),
+  TYPE_DEPENDENT_P (type));
 
 	  /* Keep the typedef name.  */
 	  if (TYPE_NAME (t) != TYPE_NAME (type))
@@ -1555,7 +1571,9 @@ strip_typedefs (tree t, bool *remove_attributes, unsigned int flags)
 case ARRAY_TYPE:
   type = strip_typedefs (TREE_TYPE (t), remove_attributes

Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-12-07 Thread Stefan Kanthak

Jeff Law wrote Wednesday, November 25, 2020 7:11 PM:

> On 11/25/20 6:18 AM, Stefan Kanthak wrote:
>> Jeff Law  wrote:

[...]

>>> My inclination is to leave the overflow checking double-word multiplier
>>> as-is.
>> See but  ff.
> Already read and considered it. 
>>
>>> Though I guess you could keep the same structure as the existing
>>> implementation which tries to avoid unnecessary multiplies and still use
>>> the __builtin_{add,mul}_overflow to simplify the code a bit less
>>> aggressively.

Only relying on GCC to generate efficient code for __builtin_mul_overflow,
this should be either

DWtype
__mulvDI3 (DWtype u, DWtype v)
{
DWtype w;
if (__builtin_mul_overflow (u, v, &w))
abort ();
return w;
}

or

DWtype
__mulvDI3 (DWtype u, DWtype v)
{
UDWtype w, s = 0 - (u < 0), t = 0 - (v < 0);
if (__builtin_mul_overflow ((u ^ s) - s, (v ^ t) - t, &w))
abort ();
s ^= t;
w = (w ^ s) - s;
if ((DWtype) (w ^ s) < 0)
abort ();
return w;
}

if the latter produces better machine code (which it does at least for
i386 and AMD64).

> Instead keep the tests that detect the special cases that don't need as
> many multiplies and use the overflow builtins within that implementation
> framework. In cases where we can use the operands directly, that's
> helpful as going through the struct/union likely leads to unnecessary
> register shuffling.

regards
Stefan

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Richard Sandiford via Gcc-patches

Qing Zhao via Gcc-patches  writes:
>> On Dec 7, 2020, at 1:12 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao > > wrote:
>>> 
>>> 
>>> 
>>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>>  wrote:
>>> 
>>> 
>>> Richard Biener via Gcc-patches  writes:
>>> 
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>>> 
>>> Another issue is, in order to check whether an auto-variable has 
>>> initializer, I plan to add a new bit in “decl_common” as:
>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>> unsigned decl_is_initialized :1;
>>> 
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>> 
>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>>> 
>>> Do you have any comment and suggestions?
>>> 
>>> 
>>> As said above - do you want to cover registers as well as locals?  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see 
>>> expand_stack_vars)
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too 
>>> late.
>>> 
>>> 
>>> Haven't thought about this much, so it might be a daft idea, but would a
>>> compromise be to use a const internal function:
>>> 
>>> X1 = .DEFERRED_INIT (X0, INIT)
>>> 
>>> where the X0 argument is an uninitialised value and the INIT argument
>>> describes the initialisation pattern?  So for a decl we'd have:
>>> 
>>> X = .DEFERRED_INIT (X, INIT)
>>> 
>>> and for an SSA name we'd have:
>>> 
>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>> 
>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>> 
>>> * Having the X0 argument would keep the uninitialised use of the
>>> variable around for the later warning passes.
>>> 
>>> * Using a const function should still allow the UB to be deleted as dead
>>> if X1 isn't needed.
>>> 
>>> * Having a function in the way should stop passes from taking advantage
>>> of direct uninitialised uses for optimisation.
>>> 
>>> This means we won't be able to optimise based on the actual init
>>> value at the gimple level, but that seems like a fair trade-off.
>>> AIUI this is really a security feature or anti-UB hardening feature
>>> (in the sense that users are more likely to see predictable behaviour
>>> “in the field” even if the program has UB).
>>> 
>>> 
>>> The question is whether it's in line of peoples expectation that
>>> explicitely zero-initialized code behaves differently from
>>> implicitely zero-initialized code with respect to optimization
>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>> 
>>> Introducing a new concept like .DEFERRED_INIT is much more
>>> heavy-weight than an explicit zero initializer.
>>> 
>>> 
>>> What exactly you mean by “heavy-weight”? More difficult to implement or 
>>> much more run-time overhead or both? Or something else?
>>> 
>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>>> the current -Wuninitialized analysis untouched and also pass
>>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>
> Yes, during uninitialized variable analysis pass, we should specially handle 
> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.

Are you sure we need to do that?  The point of having the first argument
to .DEFERRED_INIT was that that argument would still provide an
uninitialised use of the variable.  And the values are passed and
returned by value, so the lack of initialisation is explicit in
the gcall itself, without knowing what the target function does.

The idea is that we can essentially treat .DEFERRED_INIT as a normal
(const) function call.  I'd be surprised if many passes needed to
handle it specially.

Thanks,
Richard

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener via Gcc-patches  writes:
>> > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> >> Another issue is, in order to check whether an auto-variable has 
>> >> initializer, I plan to add a new bit in “decl_common” as:
>> >>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> >>   unsigned decl_is_initialized :1;
>> >>
>> >> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> >> #define DECL_IS_INITIALIZED(NODE) \
>> >>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> >>
>> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep 
>> >> it
>> >> even though DECL_INITIAL might be NULLed.
>> >
>> > For locals it would be more reliable to set this flag during 
>> > gimplification.
>> >
>> >> Do you have any comment and suggestions?
>> >
>> > As said above - do you want to cover registers as well as locals?  I'd do
>> > the actual zeroing during RTL expansion instead since otherwise you
>> > have to figure youself whether a local is actually used (see 
>> > expand_stack_vars)
>> >
>> > Note that optimization will already made have use of "uninitialized" state
>> > of locals so depending on what the actual goal is here "late" may be too 
>> > late.
>>
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>>
>>   X1 = .DEFERRED_INIT (X0, INIT)
>>
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>>
>>   X = .DEFERRED_INIT (X, INIT)
>>
>> and for an SSA name we'd have:
>>
>>   X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>
>> * Having the X0 argument would keep the uninitialised use of the
>>   variable around for the later warning passes.
>>
>> * Using a const function should still allow the UB to be deleted as dead
>>   if X1 isn't needed.
>>
>> * Having a function in the way should stop passes from taking advantage
>>   of direct uninitialised uses for optimisation.
>>
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).

>From my understanding, that's OK.  I don't think this option is like -g,
which is supposed to have no observable effect other than adding or
removing debug info.

It's OK for implicit zero initialisation to be slower than explicit
zero initialisation.  After all, if someone actively wants something
to be initialised to zero, they're still expected to do it in the
source code.  The implicit initalisation is just a safety net.

Similarly, I think it's OK that code won't be optimised identically
with and without .DEFERRED_INIT (or whatever other mechanism we use),
and so won't provide identical late warnings.  In both cases we should
just do our best to diagnose what we can.

> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.

Yeah, this isn't about trying to match compilers diagnostic-for-diagnostic.
It's more about matching them principle-for-principle.

Thanks,
Richard

Re: [AArch64] Add --with-tune configure flag

2020-12-07 Thread Richard Sandiford via Gcc-patches

"Pop, Sebastian"  writes:
> On 11/19/20, 10:52 AM, "Richard Earnshaw (lists)"  
> wrote:
>> Having the same option have a completely different meaning would be even
>> worse than not having the option at all.  So no, that's a non-starter.
>
> The attached patch 0001 removes --with-{cpu,arch,tune}-32.
> Bootstrap and regression testing pass on aarch64-linux.
> Ok to commit to trunk and active branches?
>
> I would like to ping the two patches from Wilco Dijkstra that fix issues in 
> configure --with-mtune flag:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553865.html
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553866.html
>
> Please see patches 0002 and 0003 attached, rebased on trunk as of today and 
> tested with bootstrap and regression testing.
> Ok to commit to trunk and all active branches?
>
> Thanks,
> Sebastian
>
> From f1aa5f45dde8df5eee41ae166176775f5c5f1acc Mon Sep 17 00:00:00 2001
> From: Sebastian Pop 
> Date: Thu, 3 Dec 2020 17:35:18 +
> Subject: [PATCH 1/3] [AArch64] disable --with-{cpu,arch,tune}-32
>
> gcc/
>   * config.gcc (aarch64*-*-*): Remove --with-{cpu,arch,tune}-32 flags.

OK, thanks.

> ChangeLog:
> 2020-09-03  Wilco Dijkstra  
>
> * config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
> processing.  Add support for architectural extensions.
> * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
> AARCH64_CPU_DEFAULT_FLAGS.
> * config/aarch64/aarch64.c (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
> (get_tune_cpu): Assert CPU is always valid.
> (get_arch): Assert architecture is always valid.
> (aarch64_override_options): Cleanup CPU selection code and simplify 
> logic.

I share Richard E's concern about the effect of this on people who run
./cc1 directly.  (And I'm being selfish here, because I regularly run
./cc1 directly on toolchains configured with --with-arch=armv8.2-a+sve.)
So TBH my preference would be to keep the TARGET_CPU_DEFAULT_FLAGS stuff.

However, if one of the other maintainers strongly thinks this is the
right way to go, I'm happy to defer to them.

> Add support for --with-tune. Like --with-cpu and --with-arch, the argument is
> validated and transformed into a -mtune option to be processed like any other
> command-line option.  --with-tune has no effect if a -mcpu or -mtune option
> is used. The validating code didn't allow --with-cpu=native, so explicitly
> allow that.
>
> Co-authored-by:  Delia Burduv  
>
> Bootstrap OK, regress pass, OK to commit?
>
> ChangeLog
> 2020-09-03  Wilco Dijkstra  
>
> * config.gcc
> (aarch64*-*-*): Add --with-tune. Support --with-cpu=native.
> * config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Add --with-tune.
>
> 2020-09-03  Wilco Dijkstra  
>
> * gcc/testsuite/lib/target-supports.exp:
> (check_effective_target_tune_cortex_a76): New effective target test.
> * gcc.target/aarch64/with-tune-config.c: New test.
> * gcc.target/aarch64/with-tune-march.c: Likewise.
> * gcc.target/aarch64/with-tune-mcpu.c: Likewise.
> * gcc.target/aarch64/with-tune-mtune.c: Likewise.

OK for this one, thanks.

Richard

Go patch committed: Don't name type descriptor for alias type

2020-12-07 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend avoids using an alias type name when
naming a type descriptor.  The test case for this
https://golang.org/cl/275632.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
47cf6054ce8d11aa681846d08433afdb4404abe7
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 019aafdde9a..02083edeece 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-342e5f0b349553a69d7c99a18162ae2a1e6e5775
+2184750d74d37580486e90df1284c07fdee91670
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/names.cc b/gcc/go/gofrontend/names.cc
index 0097417cff0..8e73e5e157c 100644
--- a/gcc/go/gofrontend/names.cc
+++ b/gcc/go/gofrontend/names.cc
@@ -1017,7 +1017,7 @@ Gogo::type_descriptor_backend_name(const Type* type, 
Named_type* nt,
   bool is_pointer = false;
   if (nt == NULL && type->points_to() != NULL)
 {
-  nt = type->points_to()->named_type();
+  nt = type->points_to()->unalias()->named_type();
   is_pointer = true;
 }

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches

> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
> wrote:

 Another issue is, in order to check whether an auto-variable has 
 initializer, I plan to add a new bit in “decl_common” as:
 /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
 unsigned decl_is_initialized :1;

 /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
 #define DECL_IS_INITIALIZED(NODE) \
 (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)

 set this bit when setting DECL_INITIAL for the variables in FE. then keep 
 it
 even though DECL_INITIAL might be NULLed.

 For locals it would be more reliable to set this flag during 
 gimplification.

 Do you have any comment and suggestions?

 As said above - do you want to cover registers as well as locals?  I'd do
 the actual zeroing during RTL expansion instead since otherwise you
 have to figure youself whether a local is actually used (see 
 expand_stack_vars)

 Note that optimization will already made have use of "uninitialized" state
 of locals so depending on what the actual goal is here "late" may be too 
 late.

 Haven't thought about this much, so it might be a daft idea, but would a
 compromise be to use a const internal function:

 X1 = .DEFERRED_INIT (X0, INIT)

 where the X0 argument is an uninitialised value and the INIT argument
 describes the initialisation pattern?  So for a decl we'd have:

 X = .DEFERRED_INIT (X, INIT)

 and for an SSA name we'd have:

 X_2 = .DEFERRED_INIT (X_1(D), INIT)

 with all other uses of X_1(D) being replaced by X_2.  The idea is that:

 * Having the X0 argument would keep the uninitialised use of the
 variable around for the later warning passes.

 * Using a const function should still allow the UB to be deleted as dead
 if X1 isn't needed.

 * Having a function in the way should stop passes from taking advantage
 of direct uninitialised uses for optimisation.

 This means we won't be able to optimise based on the actual init
 value at the gimple level, but that seems like a fair trade-off.
 AIUI this is really a security feature or anti-UB hardening feature
 (in the sense that users are more likely to see predictable behaviour
 “in the field” even if the program has UB).

 The question is whether it's in line of peoples expectation that
 explicitely zero-initialized code behaves differently from
 implicitely zero-initialized code with respect to optimization
 and secondary side-effects (late diagnostics, latent bugs, etc.).

 Introducing a new concept like .DEFERRED_INIT is much more
 heavy-weight than an explicit zero initializer.

 What exactly you mean by “heavy-weight”? More difficult to implement or 
 much more run-time overhead or both? Or something else?

 The major benefit of the approach of “.DEFERRED_INIT”  is to enable us 
 keep the current -Wuninitialized analysis untouched and also pass
 the “uninitialized” info from source code level to “pass_expand”.
>>> 
>>> Well, "untouched" is a bit oversimplified.  You do need to handle
>>> .DEFERRED_INIT as not
>>> being an initialization which will definitely get interesting.
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle 
>> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
> 
> Are you sure we need to do that?  The point of having the first argument
> to .DEFERRED_INIT was that that argument would still provide an
> uninitialised use of the variable.  And the values are passed and
> returned by value, so the lack of initialisation is explicit in
> the gcall itself, without knowing what the target function does.
> 
> The idea is that we can essentially treat .DEFERRED_INIT as a normal
> (const) function call.  I'd be surprised if many passes needed to
> handle it specially.
> 

Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):

qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
extern int DEFFERED_INIT (int, int) __attribute__ ((const));

int foo (int n, int r)
{
  int v;

  v = DEFFERED_INIT (v, 0);
  if (n < 10) 
v = r;

  return v;
}
qinzhao@gcc10:~/Bugs/auto-init$ sh t
/home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all 
-S t.c
t.c: In function ‘foo’:
t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
7 |   v = DEFFERED_INIT (v, 0);
  |   ^~~~

We can see that the current uninitialized variable analysis treats the new 
added artificial initialization as the first use of the uninialized variable.  
Therefore report the warning there.
However, we should report warning at “return v”. 
So, I think that we still need to specifically handle the new added artificial 
initialization

Re: [PATCH] builtins: Avoid ICE with __builtin_clear_padding on POINTERS_EXTEND_UNSIGNED targets [PR98147]

2020-12-07 Thread Richard Sandiford via Gcc-patches

Jakub Jelinek via Gcc-patches  writes:
> Hi!
>
> The function that calls targetm.emit_call_builtin___clear_cache
> asserts that each of the begin and end operands has either ptr_mode or
> Pmode.
> On most targets that is the same mode, but e.g. on aarch64 -mabi=ilp32
> or a few others it is different.  When a target has a clear cache
> non-library handler, it will use create_address_operand which will do the
> conversion to the right mode automatically, but when emitting a library
> call, we just say the operands are ptr_mode even when they can be Pmode
> too; in that case we need to convert explicitly.
>
> Fixed thusly, tested on the testcase with cross to aarch64 -mabi=ilp32,
> bootstrapped/regtested on x86_64-linux and i686-linux (I really don't
> have aarch64 ilp32 testing setup), ok for trunk?
>
> 2020-12-07  Jakub Jelinek  
>
>   PR target/98147
>   * builtins.c (default_emit_call_builtin___clear_cache): Call
>   convert_memory_address to ptr_mode on both begin and end.

OK, thanks.

Richard

>
>   * gcc.dg/pr98147.c: New test.
>
> --- gcc/builtins.c.jj 2020-12-04 08:08:06.350436898 +0100
> +++ gcc/builtins.c2020-12-05 14:47:09.555476027 +0100
> @@ -7790,8 +7790,8 @@ default_emit_call_builtin___clear_cache
>  
>emit_library_call (callee,
>LCT_NORMAL, VOIDmode,
> -  begin, ptr_mode,
> -  end, ptr_mode);
> +  convert_memory_address (ptr_mode, begin), ptr_mode,
> +  convert_memory_address (ptr_mode, end), ptr_mode);
>  }
>  
>  /* Emit a call to __builtin___clear_cache, unless the target specifies
> --- gcc/testsuite/gcc.dg/pr98147.c.jj 2020-12-05 14:49:53.817626223 +0100
> +++ gcc/testsuite/gcc.dg/pr98147.c2020-12-05 14:48:20.984671644 +0100
> @@ -0,0 +1,10 @@
> +/* PR target/98147 */
> +
> +char buffer[32] = "foo bar";
> +
> +int
> +main ()
> +{
> +  __builtin___clear_cache (buffer, buffer + 32);
> +  return 0;
> +}
>
>   Jakub

Go testsuite patch committed: Don't quote quoted parentheses

2020-12-07 Thread Ian Lance Taylor via Gcc-patches

This patch fixes go-test.exp to not backslash quote parentheses that
are already backslash quoted.  This regexp quoting isn't fully
general, it just has to handle the cases that arise in the testsuite.
This change fixes a case in a new test, not yet committed.  (It would
be nice to have a general fix if anybody has one).  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* go.test/go-test.exp (errchk): Don't backslash quote parentheses
that are already backslash quoted.
diff --git a/gcc/testsuite/go.test/go-test.exp 
b/gcc/testsuite/go.test/go-test.exp
index 8f17cb3db71..d129e1c65da 100644
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -131,11 +131,11 @@ proc errchk { test opts } {
set index [string first "dg-error" $out_line]
regsub -start $index -all "\(\[^]\)\}\(.\)" $out_line 
"\\1\[\\\}\]\\2" out_line
}
-   if [string match "*dg-error*\(*" $out_line] {
+   if [string match "*dg-error*\\\[^\]\(*" $out_line] {
set index [string first "dg-error" $out_line]
regsub -start $index -all "\\\(" $out_line "\[\\\(\]" 
out_line
}
-   if [string match "*dg-error*\)*\}" $out_line] {
+   if [string match "*dg-error*\\\[^\]\)*\}" $out_line] {
set index [string first "dg-error" $out_line]
regsub -start $index -all "\\\)\(.\)" $out_line 
"\[\\\)\]\\1" out_line
}

Go patch committed: The type of a string index is byte

2020-12-07 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend sets the type of a string index
expression to byte, which is a type alias.  To make this work from the
do_type method, we add "byte" and "rune" to the list of known integer
types, and look them up that way rather than via gogo->lookup_global.
This is for https://golang.org/issue/8745.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
94f7dfa37366244e0bb12497df3d1527b07f141c
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 02083edeece..711353d2550 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-2184750d74d37580486e90df1284c07fdee91670
+81687fccc568a088fee8f627ddde599e17c648c2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index ebe1b36eb53..6d484d9a339 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -13468,7 +13468,7 @@ Type*
 String_index_expression::do_type()
 {
   if (this->end_ == NULL)
-return Type::lookup_integer_type("uint8");
+return Type::lookup_integer_type("byte");
   else
 return this->string_->type();
 }
@@ -14021,7 +14021,7 @@ Field_reference_expression::do_lower(Gogo* gogo, 
Named_object* function,
 
   Expression* length_expr = Expression::make_integer_ul(s.length(), NULL, loc);
 
-  Type* byte_type = gogo->lookup_global("byte")->type_value();
+  Type* byte_type = Type::lookup_integer_type("byte");
   Array_type* array_type = Type::make_array_type(byte_type, length_expr);
   array_type->set_is_array_incomparable();
 
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index a5e4521469b..e31a038dbd3 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -117,17 +117,11 @@ Gogo::Gogo(Backend* backend, Linemap* linemap, int, int 
pointer_size)
 
   // "byte" is an alias for "uint8".
   uint8_type->integer_type()->set_is_byte();
-  Named_object* byte_type = Named_object::make_type("byte", NULL, uint8_type,
-   loc);
-  byte_type->type_value()->set_is_alias();
-  this->add_named_type(byte_type->type_value());
+  this->add_named_type(Type::make_integer_type_alias("byte", uint8_type));
 
   // "rune" is an alias for "int32".
   int32_type->integer_type()->set_is_rune();
-  Named_object* rune_type = Named_object::make_type("rune", NULL, int32_type,
-   loc);
-  rune_type->type_value()->set_is_alias();
-  this->add_named_type(rune_type->type_value());
+  this->add_named_type(Type::make_integer_type_alias("rune", int32_type));
 
   this->add_named_type(Type::make_named_bool_type());
 
@@ -765,7 +759,7 @@ Gogo::register_gc_vars(const std::vector& 
var_gc,
 
   Type* pvt = Type::make_pointer_type(Type::make_void_type());
   Type* uintptr_type = Type::lookup_integer_type("uintptr");
-  Type* byte_type = this->lookup_global("byte")->type_value();
+  Type* byte_type = Type::lookup_integer_type("byte");
   Type* pointer_byte_type = Type::make_pointer_type(byte_type);
   Struct_type* root_type =
 Type::make_builtin_struct_type(4,
diff --git a/gcc/go/gofrontend/statements.cc b/gcc/go/gofrontend/statements.cc
index 25e25364cee..af82f36bc43 100644
--- a/gcc/go/gofrontend/statements.cc
+++ b/gcc/go/gofrontend/statements.cc
@@ -6341,7 +6341,7 @@ For_range_statement::do_lower(Gogo* gogo, Named_object*, 
Block* enclosing,
   else if (range_type->is_string_type())
 {
   index_type = Type::lookup_integer_type("int");
-  value_type = gogo->lookup_global("rune")->type_value();
+  value_type = Type::lookup_integer_type("rune");
 }
   else if (range_type->map_type() != NULL)
 {
@@ -6812,7 +6812,7 @@ For_range_statement::lower_range_string(Gogo* gogo,
 rune_type = value_temp->type();
   else
 {
-  rune_type = gogo->lookup_global("rune")->type_value();
+  rune_type = Type::lookup_integer_type("rune");
   value_temp = Statement::make_temporary(rune_type, NULL, loc);
   init->add_statement(value_temp);
 }
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index 23d1647c1fa..d2741f6db58 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -2764,7 +2764,7 @@ class Ptrmask
   symname() const;
 
   Expression*
-  constructor(Gogo* gogo) const;
+  constructor() const;
 
  private:
   void
@@ -2959,10 +2959,10 @@ Ptrmask::symname() const
 // initialize the runtime ptrmask value.
 
 Expression*
-Ptrmask::constructor(Gogo* gogo) const
+Ptrmask::constructor() const
 {
   Location bloc = Linemap::predeclared_location();
-  Type* byte_type = gogo->lookup_global("byte")->type_value();
+  Type* byte_type = Type::lookup_integer_type("byte");
   Expression* len = Expression::make_integer_ul(this->bits_.size(), NULL,
bloc);
   Array_type* at = Type::

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Richard Sandiford via Gcc-patches

Qing Zhao  writes:
>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
>> wrote:
> 
> Another issue is, in order to check whether an auto-variable has 
> initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
> 
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
> 
> set this bit when setting DECL_INITIAL for the variables in FE. then keep 
> it
> even though DECL_INITIAL might be NULLed.
> 
> 
> For locals it would be more reliable to set this flag during 
> gimplification.
> 
> Do you have any comment and suggestions?
> 
> 
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see 
> expand_stack_vars)
> 
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too 
> late.
> 
> 
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
> 
> X1 = .DEFERRED_INIT (X0, INIT)
> 
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
> 
> X = .DEFERRED_INIT (X, INIT)
> 
> and for an SSA name we'd have:
> 
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
> 
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
> 
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
> 
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
> 
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
> 
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
> 
> 
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
> 
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
> 
> 
> What exactly you mean by “heavy-weight”? More difficult to implement or 
> much more run-time overhead or both? Or something else?
> 
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us 
> keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
 
 Well, "untouched" is a bit oversimplified.  You do need to handle
 .DEFERRED_INIT as not
 being an initialization which will definitely get interesting.
>>> 
>>> Yes, during uninitialized variable analysis pass, we should specially 
>>> handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> Are you sure we need to do that?  The point of having the first argument
>> to .DEFERRED_INIT was that that argument would still provide an
>> uninitialised use of the variable.  And the values are passed and
>> returned by value, so the lack of initialisation is explicit in
>> the gcall itself, without knowing what the target function does.
>> 
>> The idea is that we can essentially treat .DEFERRED_INIT as a normal
>> (const) function call.  I'd be surprised if many passes needed to
>> handle it specially.
>> 
>
> Just checked with a small testing case (to emulate the .DEFERRED_INIT 
> approach):
>
> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>
> int foo (int n, int r)
> {
>   int v;
>
>   v = DEFFERED_INIT (v, 0);
>   if (n < 10) 
> v = r;
>
>   return v;
> }
> qinzhao@gcc10:~/Bugs/auto-init$ sh t
> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all 
> -S t.c
> t.c: In function ‘foo’:
> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
> 7 |   v = DEFFERED_INIT (v, 0);
>   |   ^~~~
>
> We can see that the current uninitialized variable analysis treats the new 
> added artificial initialization as the first use of the uninialized variable. 
>  Therefore repor

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches




> On Dec 7, 2020, at 12:05 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
>>> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then 
>> keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during 
>> gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" 
>> state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or 
>> much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us 
>> keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
> 
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
 
 Yes, during uninitialized variable analysis pass, we should specially 
 handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>>> 
>>> Are you sure we need to do that?  The point of having the first argument
>>> to .DEFERRED_INIT was that that argument would still provide an
>>> uninitialised use of the variable.  And the values are passed and
>>> returned by value, so the lack of initialisation is explicit in
>>> the gcall itself, without knowing what the target function does.
>>> 
>>> The idea is that we can essentially treat .DEFERRED_INIT as a normal
>>> (const) function call.  I'd be surprised if many passes needed to
>>> handle it specially.
>>> 
>> 
>> Just checked with a small testing case (to emulate the .DEFERRED_INIT 
>> approach):
>> 
>> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
>> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>> 
>> int foo (int n, int r)
>> {
>>  int v;
>> 
>>  v = DEFFERED_INIT (v, 0);
>>  if (n < 10) 
>>v = r;
>> 
>>  return v;
>> }
>> qinzhao@gcc10:~/Bugs/auto-init$ sh t
>> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized 
>> -fdump-tree-all -S t.c
>> t.c: In function ‘foo’:
>> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
>>7 |   v = DEFFERED_INIT

Re: [AArch64] Add --with-tune configure flag

2020-12-07 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> I share Richard E's concern about the effect of this on people who run
> ./cc1 directly.  (And I'm being selfish here, because I regularly run
> ./cc1 directly on toolchains configured with --with-arch=armv8.2-a+sve.)
> So TBH my preference would be to keep the TARGET_CPU_DEFAULT_FLAGS stuff.

And what exactly should it do? It didn't work correctly before, and even now 
it's not
clear what TARGET_CPU_DEFAULT should be set to for all possible combinations of
--with-cpu=X --with-arch=Y --with-tune=Z. Should it have exactly the same 
behaviour
with and without the driver? If so, that's difficult to achieve in config.gcc, 
and it would
require a completely different mechanism to ensure the default ends up the same
without the driver (eg. store the original strings in TARGET_CPU_DEFAULT and 
then
postprocess in the backend as if they were actual --cpu/--arch/--tune commands 
if
the command-line didn't already set them, ie. replicating what the driver 
already does).
And it would need to be done for --with-abi as well.

So is changing your preferred config to --with-cpu=neoverse-v1 really a problem?

Cheers,
Wilco

Re: [PATCH 1/2] libstdc++: Add --enable-pure-stdio-libstdcxx option

2020-12-07 Thread Jonathan Wakely via Gcc-patches


On 07/12/20 10:39 -0800, Keith Packard via Libstdc++ wrote:

This option directs the library to only use simple stdio APIs instead
of using fileno to get the file descriptor for use with POSIX APIs.


Makes sense, thanks.

I've replaced the gcc@ list with gcc-patches@ in the recipients.


Signed-off-by: Keith Packard 
---
libstdc++-v3/ChangeLog | 13 ++
libstdc++-v3/acinclude.m4  | 13 ++
libstdc++-v3/config/io/basic_file_stdio.cc | 46 +++---
libstdc++-v3/configure.ac  |  1 +
4 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index d3ee18ed7d1..7f6b0697534 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1147,6 +1147,19 @@
* include/bits/uniform_int_dist.h (uniform_int_distribution::_S_nd):
Use qualified-id to refer to static member functions.

+2020-11-04  Keith Packard  
+
+   * acinclude.m4: Add GLIBCXX_ENABLE_PURE_STDIO for
+   --enable-libstdcxx-pure-stdio
+   * configure.ac: Use GLIBCXX_ENABLE_PURE_STDIO for
+   --enable-libstdcxx-pure-stdio
+   * configure: regenerate
+   * config.h.in: regenerate
+   * config/io/basic_file_stdio.cc: Add support for
+   _GLIBCXX_USE_PURE_STDIO. This makes libstdc++ use only defined
+   stdio entry points for all I/O operations, without direct calls to
+   underlying POSIX functions.
+
2020-11-03  Jonathan Wakely  


GCC changelog files are autogenerated now, so patches should not touch
them. Just include the ChangeLog entry in the Git commit log (which
will usually end up being quoted in the patch and/or the email body of
the mail to gcc-patches).

I think the right way to do this (or at least, the way that was
intended when basic_file_stdio.cc was added) is to provide a new file
and change GLIBCXX_ENABLE_CSTDIO in acinclude.m4 to use that new file.

The two biggest downsides of that are that it duplicates a lot of the
file (because the diffs for your changes are small) and that the
correct name for your new file is already taken!

I think we should rename basic_file_stdio.{h,cc} to basic_file_posix
to vacate the basic_file_stdio name for what you want.

However, it's rather late in the GCC 11 process to make a change like
that (even though it's really just renaming some files). Would you be
OK waiting until after GCC 11 is released (in 4-5 months) to do it
"properly"? Is this blocking something that would require doing it
sooner?

Re: [PATCH] Remove misleading debug line entries

2020-12-07 Thread Bernd Edlinger

On 12/7/20 11:50 AM, Richard Biener wrote:
> The ipa-param-manipulation.c hunk is OK, please commit separately.
> 

done.

> The tree-inline.c and cfgexpand.c changes are OK as well, for the
> tree-ssa-live.c change see below
> 
> 
> From 85b0e37d0c0d3ecac4908ebbfd67edc612ef22b2 Mon Sep 17 00:00:00 2001
> From: Bernd Edlinger 
> Date: Wed, 2 Dec 2020 12:32:02 +0100
> Subject: [PATCH] Remove misleading debug line entries
> 
> This removes gimple_debug_begin_stmts without block info which remain
> after a gimple block originating from an inline function is unused.
> 
> The line numbers from these stmts are from the inline function,
> but since the inline function is completely optimized away,
> there will be no DW_TAG_inlined_subroutine so the debugger has
> no callstack available at this point, and therefore those
> line table entries are not helpful to the user.
> 
> 2020-12-02  Bernd Edlinger  
> 
>   * cfgexpand.c (expand_gimple_basic_block): Remove special handling
>   of debug_inline_entries without block info.
>   * ipa-param-manipulation.c
>   (ipa_param_body_adjustments::modify_call_stmt): Set location info.
>   * tree-inline.c (remap_gimple_stmt): Drop debug_nonbind_markers when
>   the call statement has no block info.
>   (copy_debug_stmt): Remove debug_nonbind_markers when inlining
>   and the block info is mapped to NULL.
>   * tree-ssa-live.c (clear_unused_block_pointer): Remove
>   debug_nonbind_markers originating from inlined functions.
> ---
>  gcc/cfgexpand.c  |  9 +
>  gcc/ipa-param-manipulation.c |  2 ++
>  gcc/tree-inline.c| 14 ++
>  gcc/tree-ssa-live.c  | 22 --
>  4 files changed, 33 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 7e0bdd58e85..df7b62080b6 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5953,14 +5953,7 @@ expand_gimple_basic_block (basic_block bb, bool 
> disable_tail_calls)
> else if (gimple_debug_begin_stmt_p (stmt))
>   val = GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT ();
> else if (gimple_debug_inline_entry_p (stmt))
> - {
> -   tree block = gimple_block (stmt);
> -
> -   if (block)
> - val = GEN_RTX_DEBUG_MARKER_INLINE_ENTRY_PAT ();
> -   else
> - goto delink_debug_stmt;
> - }
> + val = GEN_RTX_DEBUG_MARKER_INLINE_ENTRY_PAT ();
> else
>   gcc_unreachable ();
>  
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 2bbea21be2e..9ab4a10096d 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -1681,6 +1681,8 @@ ipa_param_body_adjustments::modify_call_stmt (gcall 
> **stmt_p)
>   }
>   }
>gcall *new_stmt = gimple_build_call_vec (gimple_call_fn (stmt), vargs);
> +  if (gimple_has_location (stmt))
> + gimple_set_location (new_stmt, gimple_location (stmt));
>gimple_call_set_chain (new_stmt, gimple_call_chain (stmt));
>gimple_call_copy_flags (new_stmt, stmt);
>if (tree lhs = gimple_call_lhs (stmt))
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index d9814bd10d3..360b85f14dc 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -1819,12 +1819,11 @@ remap_gimple_stmt (gimple *stmt, copy_body_data *id)
> /* If the inlined function has too many debug markers,
>don't copy them.  */
> if (id->src_cfun->debug_marker_count
> -   > param_max_debug_marker_count)
> +   > param_max_debug_marker_count
> +   || id->reset_location)
>   return stmts;
>  
> gdebug *copy = as_a  (gimple_copy (stmt));
> -   if (id->reset_location)
> - gimple_set_location (copy, input_location);
> id->debug_stmts.safe_push (copy);
> gimple_seq_add_stmt (&stmts, copy);
> return stmts;
> @@ -3169,7 +3168,14 @@ copy_debug_stmt (gdebug *stmt, copy_body_data *id)
>  }
>  
>if (gimple_debug_nonbind_marker_p (stmt))
> -return;
> +{
> +  if (id->call_stmt && !gimple_block (stmt))
> + {
> +   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +   gsi_remove (&gsi, true);
> + }
> +  return;
> +}
>  
>/* Remap all the operands in COPY.  */
>memset (&wi, 0, sizeof (wi));
> diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
> index 21a9ee43e6b..acba4a58626 100644
> --- a/gcc/tree-ssa-live.c
> +++ b/gcc/tree-ssa-live.c
> @@ -623,13 +623,31 @@ clear_unused_block_pointer (void)
>{
>   unsigned i;
>   tree b;
> - gimple *stmt = gsi_stmt (gsi);
> + gimple *stmt;
>  
> +  next:
> + stmt = gsi_stmt (gsi);
>   if (!is_gimple_debug (stmt) && !gimple_clobber_p (stmt))
> continue;
>   b = gimple_block (stmt);
>   if (b && !TREE_USED (b))
> -   gimple_set_block (stmt, NU

Re: [AArch64] Add --with-tune configure flag

2020-12-07 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra via Gcc-patches  writes:
> Hi Richard,
>
>> I share Richard E's concern about the effect of this on people who run
>> ./cc1 directly.  (And I'm being selfish here, because I regularly run
>> ./cc1 directly on toolchains configured with --with-arch=armv8.2-a+sve.)
>> So TBH my preference would be to keep the TARGET_CPU_DEFAULT_FLAGS stuff.
>
> And what exactly should it do? It didn't work correctly before, and even now 
> it's not
> clear what TARGET_CPU_DEFAULT should be set to for all possible combinations 
> of
> --with-cpu=X --with-arch=Y --with-tune=Z. Should it have exactly the same 
> behaviour
> with and without the driver? If so, that's difficult to achieve in 
> config.gcc, and it would
> require a completely different mechanism to ensure the default ends up the 
> same
> without the driver (eg. store the original strings in TARGET_CPU_DEFAULT and 
> then
> postprocess in the backend as if they were actual --cpu/--arch/--tune 
> commands if
> the command-line didn't already set them, ie. replicating what the driver 
> already does).
> And it would need to be done for --with-abi as well.

The specific case I mentioned works well enough though (given that I also
run ./cc1 without -march, -mcpu or -mtune flags).  I agree it might not
work for all combinations of --with-arch, --with-cpu and --with-tune,
but I'm not sure that's a strong reason to remove the cases that do work
(or at least, work well enough).  This cc1 behaviour is a feature for
developers rather than users after all.

Perhaps I've missed the point, but why do you need to remove this code
in order to do what you need to do?

> So is changing your preferred config to --with-cpu=neoverse-v1 really a 
> problem?

I specifically want to test generic SVE rather than SVE tuned for a
specific core, so --with-arch=armv8.2-a+sve is the thing I want to test.
I think the question is instead whether it's really a problem to change
my workflow so that I don't run cc1 directly any more, or that I remember
to respecify the --with-arch value as an -march flag when running cc1.
And yeah, maybe the right answer is that I should learn to do that.
It's just that I don't personally understand why this change is necessary.

Like I say though, I'm happy to defer to a maintainer who does think the
patch is a good thing.

Thanks,
Richard

Re: [PATCH 1/2] libstdc++: Add --enable-pure-stdio-libstdcxx option

2020-12-07 Thread Keith Packard via Gcc-patches

Jonathan Wakely  writes:

> GCC changelog files are autogenerated now, so patches should not touch
> them. Just include the ChangeLog entry in the Git commit log (which
> will usually end up being quoted in the patch and/or the email body of
> the mail to gcc-patches).

Awesome.

> I think the right way to do this (or at least, the way that was
> intended when basic_file_stdio.cc was added) is to provide a new file
> and change GLIBCXX_ENABLE_CSTDIO in acinclude.m4 to use that new file.
>
> The two biggest downsides of that are that it duplicates a lot of the
> file (because the diffs for your changes are small) and that the
> correct name for your new file is already taken!

I can definitely see a reason to use a separate file when implementing
the basic_file interface on top of something other than stdio, but
this patch doesn't do that -- it only changes the interaction between
basic_file and stdio in a few places.

I think it makes the best long-term sense to leave everything in
basic_file_stdio.cc and avoid having the two implementations diverge in
the future.

> However, it's rather late in the GCC 11 process to make a change like
> that (even though it's really just renaming some files). Would you be
> OK waiting until after GCC 11 is released (in 4-5 months) to do it
> "properly"? Is this blocking something that would require doing it
> sooner?

This patch enables the use of C++ with picolibc, a libc designed for 32-
and 64- bit embedded systems.

Right now, I'm working on getting picolibc support integrated into
Zephyr, which uses toolchains build by crosstool-ng. I've gotten
picolibc support merged to crosstool-ng, but the Zephyr developers are
interested in having a single toolchain support three different libc
implementations (newlib, newlib-nano and picolibc), but that's blocked
on having C++ support available in all three libraries.

So, you're at the bottom of my current dependency graph :-)

I don't particularly need this released in gcc, but I would like to
get patches reviewed and the general approach agreed on so that I can
feel more confident in preparing patches to be applied to gcc in
crosstool-ng itself.

Once that's done, I'll also be able to release new Debian packages of
GCC for embedded ARM and RISC-V and have those include suitable patches
so that we can support embedded C++ development there too. 

-- 
-keith

signature.asc
Description: PGP signature

Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2020-12-07 Thread will schmidt via Gcc-patches

On Fri, 2020-12-04 at 13:19 -0600, acsawdey--- via Gcc-patches wrote:
> From: Aaron Sawdey 
> 

Assorted comments sprinkled around below.
thanks
-Will


> This patch adds the first batch of patterns to support p10 fusion. These
> will allow combine to create a single insn for a pair of instructions
> that that power10 can fuse and execute. These particular ones have the

Just one that, or maybe 'that the'.
s/ones/fusion pairs/ ?

> requirement that only cr0 can be used when fusing a load with a compare
> immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
> to put that requirement in, and if it doesn't work out later the splitter
> can get used.

... splitter can get used, or ... splitter will 

> 
> The patterns are generated by a script genfusion.pl and live in new file
> fusion.md. This script will be expanded to generate more patterns for
> fusion.

ok

> 
> This also adds option -mpower10-fusion which defaults on for power10 and
> will gate all these fusion patterns. In addition I have added an
> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
> that just controls the load+compare-immediate patterns. I have make

made

> these default on for power10 but they are not disallowed for earlier
> processors because it is still valid code. This allows us to test the
> correctness of fusion code generation by turning it on explicitly.
> 
> If bootstrap/regtest is clean, ok for trunk?
> 
> Thanks!
> 
>Aaron
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/genfusion.pl: New file, script to generate
>   define_insn_and_split patterns so combine can arrange fused
>   instructions next to each other.

New script to generate ...

>   * config/rs6000/fusion.md: New file, generated fused instruction
>   patterns for combine.

>   * config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
>   (non_update_memory_operand): New predicate.
ok
>   * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
>   OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
>   POWERPC_MASKS.
>   * config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
>   prototype.

All usages of address_is_non_pfx_d_or_x() appear to be negated, i.e. 
+   || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), 
DImode, NON_PREFIXED_DS))" 
Fully understanding that naming is
hard, I'd wonder if that can be adjusted to avoid the double negative. 
something like (address_load_mode_requires_prefix (...foo) ?


>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
>   if target is power10.  (rs600_opt_masks): Allow -mpower10-fusion
>   in function attributes.  (address_is_non_pfx_d_or_x): New function.

ok

>   * config/rs6000/rs6000.h: Add MASK_P10_FUSION.
>   * config/rs6000/rs6000.md: Include fusion.md.
>   * config/rs6000/rs6000.opt: Add -mpower10-fusion
>   and -mpower10-fusion-ld-cmpi.

ok

>   * config/rs6000/t-rs6000: Add dependencies involving fusion.md.

ok


> ---
>  gcc/config/rs6000/fusion.md   | 357 ++
>  gcc/config/rs6000/genfusion.pl| 144 
>  gcc/config/rs6000/predicates.md   |  14 ++
>  gcc/config/rs6000/rs6000-cpus.def |   6 +-
>  gcc/config/rs6000/rs6000-protos.h |   2 +
>  gcc/config/rs6000/rs6000.c|  51 +
>  gcc/config/rs6000/rs6000.h|   1 +
>  gcc/config/rs6000/rs6000.md   |   1 +
>  gcc/config/rs6000/rs6000.opt  |   8 +
>  gcc/config/rs6000/t-rs6000|   6 +-
>  10 files changed, 588 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/config/rs6000/fusion.md
>  create mode 100755 gcc/config/rs6000/genfusion.pl
> 
> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
> new file mode 100644
> index 000..a4d3a6ae7f3
> --- /dev/null
> +++ b/gcc/config/rs6000/fusion.md
> @@ -0,0 +1,357 @@
> +;; -*- buffer-read-only: t -*-
> +;; Generated automatically by genfusion.pl
> +
> +;; Copyright (C) 2020 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it under
> +;; the terms of the GNU General Public License as published by the Free
> +;; Software Foundation; either version 3, or (at your option) any later
> +;; version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +;; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +;; for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is DI result mode is clobber compare mode is CC extend is none
> +(define_ins

[PATCH][GCC] aarch64: Add +pauth to -march

2020-12-07 Thread Przemyslaw Wirkus via Gcc-patches

New +pauth (Pointer Authentication from Armv8.3-A) feature option for
-march command line option.

Please note that majority of PAUTH instructions are implemented behind HINT
instruction. PAUTH stays a Armv8.3-A feature but now can be assigned to other
architectures or CPUs.

Patch includes:
- new +pauth command line option.
- docs update to +flagm command line option in docs.

Regression tested and no issues.

OK for master?

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New +pauth option in -march for AArch64.
* config/aarch64/aarch64.h (AARCH64_FL_PAUTH): New pauth extension 
bitmask.
(AARCH64_ISA_PUATH): New ISA bitmask for PAUTH.
(AARCH64_FL_FOR_ARCH8_3): Add PAUTH to Armv8.3-A.
(TARGET_PAUTH): New target mask to isolate PAUTH instructions.
* config/aarch64/aarch64.md (do_return): Condition set to TARGET_PAUTH.
* doc/invoke.texi: Update docs (+flagm, +pauth).


rb13808.patch
Description: rb13808.patch

Re: [PATCH] PR target/98152: Checking python is available before using

2020-12-07 Thread Jim Wilson

On Sat, Dec 5, 2020 at 10:12 PM Kito Cheng  wrote:

> gcc/ChangeLog:
>
> * config.gcc (riscv*-*-*): Checking python, python3 or python2
> is available, and skip doing with_arch canonicalize if no python
> available.
>

Looks good to me.

Jim

Re: Merge from trunk to gccgo branch

2020-12-07 Thread Ian Lance Taylor via Gcc-patches

I've now merged trunk revision
b737b70fad398728f6006e8397d1bb31ccea4ce7 to the gccgo branch.

Ian

Re: [C PATCH] fix atomic loads [PR 97981]

2020-12-07 Thread Joseph Myers

On Sat, 5 Dec 2020, Uecker, Martin wrote:

> I should have taken the new warning for
> 
> _Atomic int y;
> y; // warning statement with no effect 
> 
> as a tell-tale sign that something is wrong,
> although I still think the warning would be
> correct. Or has a atomic load some special
> semantics which imply that it can't be
> removed?

The warning is logically correct for an atomic load (even if in some cases 
it's built on primitives that require write access and don't work for a 
load from read-only memory).

> C: Fix atomic loads. [PR97981]
> 
> To handle atomic loads correctly, we need to move the code
> top qualifiers in lvalue conversion after
> the code that
> handles atomics.

The patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [AArch64] Add --with-tune configure flag

2020-12-07 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

>>> I share Richard E's concern about the effect of this on people who run
>>> ./cc1 directly.  (And I'm being selfish here, because I regularly run
>>> ./cc1 directly on toolchains configured with --with-arch=armv8.2-a+sve.)
>>> So TBH my preference would be to keep the TARGET_CPU_DEFAULT_FLAGS stuff.
>>
>> And what exactly should it do? It didn't work correctly before, and even now 
>> it's not
>> clear what TARGET_CPU_DEFAULT should be set to for all possible combinations 
>> of
>> --with-cpu=X --with-arch=Y --with-tune=Z. Should it have exactly the same 
>> behaviour
>> with and without the driver? If so, that's difficult to achieve in 
>> config.gcc, and it would
>> require a completely different mechanism to ensure the default ends up the 
>> same
>> without the driver (eg. store the original strings in TARGET_CPU_DEFAULT and 
>> then
>> postprocess in the backend as if they were actual --cpu/--arch/--tune 
>> commands if
>> the command-line didn't already set them, ie. replicating what the driver 
>> already does).
>> And it would need to be done for --with-abi as well.
>
> The specific case I mentioned works well enough though (given that I also
> run ./cc1 without -march, -mcpu or -mtune flags).  I agree it might not
> work for all combinations of --with-arch, --with-cpu and --with-tune,
> but I'm not sure that's a strong reason to remove the cases that do work
> (or at least, work well enough).  This cc1 behaviour is a feature for
> developers rather than users after all.

Well how do you suggest we keep the cases that work but block the cases that
don't work, but only in cc1 without knowing when someone uses cc1 without the
driver?

> Perhaps I've missed the point, but why do you need to remove this code
> in order to do what you need to do?

Basically it's buggy by design. We have 3 related options which are compressed 
into
a single CPU value plus a set of architecture extension flags. That obviously 
means
we can't describe the --with- flags, let alone any combinations, right?

For example the architecture to compile for is taken from that single CPU, so 
the
architecture is often wrong (depending on the order of processing in 
config.gcc).
Try bootstrapping trunk using --with-tune=neoverse-n2 to see what I mean.
It doesn't just set the tune, it forces an incorrect architecture!

My patch works precisely because I ignore the incorrect TARGET_CPU_DEFAULT
and allow the driver to do its job properly in prioritizing the options. Then 
it's
a simple matter of reading out the --cpu/arch/tune flags.

Like I said, it may be possible to fix this if we pass the --with strings to 
the backend
and process them there. But that means duplicating functionality of the driver 
for
a few users who use cc1. Is that really worth the effort?

>> So is changing your preferred config to --with-cpu=neoverse-v1 really a 
>> problem?
>
> I specifically want to test generic SVE rather than SVE tuned for a
> specific core, so --with-arch=armv8.2-a+sve is the thing I want to test.
> I think the question is instead whether it's really a problem to change
> my workflow so that I don't run cc1 directly any more, or that I remember
> to respecify the --with-arch value as an -march flag when running cc1.
> And yeah, maybe the right answer is that I should learn to do that.
> It's just that I don't personally understand why this change is necessary.

How about creating a macro that adds the --arch for you? Eg. cc1 
becomes ./cc1 -march=armv8.2-a+sve 

Cheers,
Wilco

[PATCH, powerpc] testsuite update tests for powerpc power10 target codegen.

2020-12-07 Thread will schmidt via Gcc-patches

[PATCH, powerpc] testsuite update tests for powerpc power10 target codegen.

Hi,

Assorted fix-ups to include prefixed load and store instructions in the
scan-assembler stanzas for the gcc.target/powerpc tests.
For these tests, we simply need to add pstxv or plxv added to the chain
of expected instructions for the load or store codegen tests to cover the
power10 targets codegen.

Sniff-testing completed ok against a gcc built to target power10.  Undergoing
regtest to ensure I didn't trip up anything on p7,p8,p9 (older) targets.

OK for trunk?

Thanks,
-Will


testsuite/ChangeLog:
* fold-vec-load-builtin_vec_xl-char.c: Update scan-assembler-times
stanza to reflect power10 target codegen.
* fold-vec-load-builtin_vec_xl-double.c: Ditto.
* fold-vec-load-builtin_vec_xl-float.c: Ditto.
* fold-vec-load-builtin_vec_xl-int.c: Ditto.
* fold-vec-load-builtin_vec_xl-longlong.c: Ditto.
* fold-vec-load-builtin_vec_xl-short.c: Ditto.
* fold-vec-load-vec_vsx_ld-char.c: Ditto.
* fold-vec-load-vec_vsx_ld-double.c: Ditto.
* fold-vec-load-vec_vsx_ld-float.c: Ditto.
* fold-vec-load-vec_vsx_ld-int.c: Ditto.
* fold-vec-load-vec_vsx_ld-longlong.c: Ditto.
* fold-vec-load-vec_vsx_ld-short.c: Ditto.
* fold-vec-load-vec_xl-char.c: Ditto.
* fold-vec-load-vec_xl-double.c: Ditto.
* fold-vec-load-vec_xl-float.c: Ditto.
* fold-vec-load-vec_xl-int.c: Ditto.
* fold-vec-load-vec_xl-longlong.c: Ditto.
* fold-vec-load-vec_xl-short.c: Ditto.
* fold-vec-splat-floatdouble.c: Ditto.
* fold-vec-splat-longlong.c: Ditto.
* fold-vec-store-builtin_vec_xst-char.c: Ditto.
* fold-vec-store-builtin_vec_xst-double.c: Ditto.
* fold-vec-store-builtin_vec_xst-float.c: Ditto.
* fold-vec-store-builtin_vec_xst-int.c: Ditto.
* fold-vec-store-builtin_vec_xst-longlong.c: Ditto.
* fold-vec-store-builtin_vec_xst-short.c: Ditto.
* fold-vec-store-vec_vsx_st-char.c: Ditto.
* fold-vec-store-vec_vsx_st-double.c: Ditto.
* fold-vec-store-vec_vsx_st-float.c: Ditto.
* fold-vec-store-vec_vsx_st-int.c: Ditto.
* fold-vec-store-vec_vsx_st-longlong.c: Ditto.
* fold-vec-store-vec_vsx_st-short.c: Ditto.
* fold-vec-store-vec_xst-char.c: Ditto.
* fold-vec-store-vec_xst-double.c: Ditto.
* fold-vec-store-vec_xst-float.c: Ditto.
* fold-vec-store-vec_xst-int.c: Ditto.
* fold-vec-store-vec_xst-longlong.c: Ditto.
* fold-vec-store-vec_xst-short.c: Ditto.


diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
index 9b199c219bf6..104710700c89 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
@@ -34,6 +34,6 @@ BUILD_CST_TEST( test9, vector signed char,   6, vector signed 
char);
 
 BUILD_VAR_TEST( test10, vector unsigned char, signed long long, vector 
unsigned char);
 BUILD_VAR_TEST( test11, vector unsigned char, signed int, vector unsigned 
char);
 BUILD_CST_TEST( test12, vector unsigned char, 8, vector unsigned char);
 
-/* { dg-final { scan-assembler-times {\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mlvx\M} 
12 } } */
+/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mlvx\M|\mplxv\M} 12 } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
index c49dfe8d95b8..bfb3cfbc081e 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
@@ -26,6 +26,6 @@ BUILD_CST_TEST( test3, vector double, 12, double);
 
 BUILD_VAR_TEST( test4, vector double, signed long long, vector double);
 BUILD_VAR_TEST( test5, vector double, signed int, vector double);
 BUILD_CST_TEST( test6, vector double, 12, vector double);
 
-/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxvx\M|\mlvx\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxvx\M|\mlvx\M|\mplxv\M} 6 
} } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
index cdded361b128..373bead2e605 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
@@ -26,6 +26,6 @@ BUILD_CST_TEST( test3, vector float, 12, float);
 
 BUILD_VAR_TEST( test4, vector float, signed long long, vector float);
 BUILD_VAR_TEST( test5, vector float, signed in

[PATCH v4] rs6000, vector integer multiply/divide/modulo instructions

2020-12-07 Thread Carl Love via Gcc-patches

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

  Carl


GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
longintegers. The existing support for the vec_div()and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

-

>From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Fri, 4 Sep 2020 19:24:22 -0500
Subject: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-12-07  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): Newdefines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add
VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH
overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
statements for builtins.
* config/rs6000/rs6000.md (bits): Add new attribute sizes.
* config/rs6000/vsx.md (VIlong): New define_mode_iterator.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
(vsx_mul_v2di): Add if TARGET_POWER10 statement.
(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   4 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  22 +
 gcc/config/rs6000/rs6000-call.c   |  53 +++
 gcc/config/rs6000/rs6000.md   |   4 +-
 gcc/config/rs6000/vsx.md  | 212 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 8 files changed, 762 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..b678e5cf28d 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def

libgo patch commited: Don't use AF_LINK on Hurd

2020-12-07 Thread Ian Lance Taylor via Gcc-patches

This libgo patch by Svante Signell fixes the libgo build by not using
AF_LINK on Hurd systems.  This fixes GCC PR 98153.  Bootstrapped and
ran Go tests on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
eafa578e83e4d4f5e29e7e69e9701b4c55e3bd02
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 711353d2550..d4c8e30d1b4 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-81687fccc568a088fee8f627ddde599e17c648c2
+3363fc239f642d3c3fb9a138d2833985d85dc083
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/socket_bsd.go b/libgo/go/syscall/socket_bsd.go
index b230a3212e6..983d554849b 100644
--- a/libgo/go/syscall/socket_bsd.go
+++ b/libgo/go/syscall/socket_bsd.go
@@ -4,7 +4,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd hurd openbsd netbsd
+// +build darwin dragonfly freebsd openbsd netbsd
 
 package syscall
 
diff --git a/libgo/go/syscall/socket_hurd.go b/libgo/go/syscall/socket_hurd.go
new file mode 100644
index 000..fcb239ef1ca
--- /dev/null
+++ b/libgo/go/syscall/socket_hurd.go
@@ -0,0 +1,101 @@
+// socket_hurd.go -- Socket handling specific to GNU/Hurd based systems.
+// This file is derived from socket_bsd.go without SockaddrDatalink since
+// AF_LINK is not yet supported on GNU/Hurd.
+
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build hurd
+
+package syscall
+
+import "unsafe"
+
+const SizeofSockaddrInet4 = 16
+const SizeofSockaddrInet6 = 28
+const SizeofSockaddrUnix = 110
+
+type RawSockaddrInet4 struct {
+   Lenuint8
+   Family uint8
+   Port   uint16
+   Addr   [4]byte /* in_addr */
+   Zero   [8]uint8
+}
+
+func (sa *RawSockaddrInet4) setLen() Socklen_t {
+   sa.Len = SizeofSockaddrInet4
+   return SizeofSockaddrInet4
+}
+
+type RawSockaddrInet6 struct {
+   Len  uint8
+   Family   uint8
+   Port uint16
+   Flowinfo uint32
+   Addr [16]byte /* in6_addr */
+   Scope_id uint32
+}
+
+func (sa *RawSockaddrInet6) setLen() Socklen_t {
+   sa.Len = SizeofSockaddrInet6
+   return SizeofSockaddrInet6
+}
+
+type RawSockaddrUnix struct {
+   Lenuint8
+   Family uint8
+   Path   [108]int8
+}
+
+func (sa *RawSockaddrUnix) setLen(n int) {
+   sa.Len = uint8(3 + n) // 2 for Family, Len; 1 for NUL.
+}
+
+func (sa *RawSockaddrUnix) getLen() (int, error) {
+   if sa.Len < 2 || sa.Len > SizeofSockaddrUnix {
+   return 0, EINVAL
+   }
+
+   // Some BSDs include the trailing NUL in the length, whereas
+   // others do not. Work around this by subtracting the leading
+   // family and len. The path is then scanned to see if a NUL
+   // terminator still exists within the length.
+   n := int(sa.Len) - 2 // subtract leading Family, Len
+   for i := 0; i < n; i++ {
+   if sa.Path[i] == 0 {
+   // found early NUL; assume Len included the NUL
+   // or was overestimating.
+   n = i
+   break
+   }
+   }
+   return n, nil
+}
+
+func (sa *RawSockaddrUnix) adjustAbstract(sl Socklen_t) Socklen_t {
+   return sl
+}
+
+type RawSockaddr struct {
+   Lenuint8
+   Family uint8
+   Data   [14]int8
+}
+
+// BindToDevice binds the socket associated with fd to device.
+func BindToDevice(fd int, device string) (err error) {
+   return ENOSYS
+}
+
+func anyToSockaddrOS(rsa *RawSockaddrAny) (Sockaddr, error) {
+   return nil, EAFNOSUPPORT
+}
+
+func GetsockoptIPv6MTUInfo(fd, level, opt int) (*IPv6MTUInfo, error) {
+   var value IPv6MTUInfo
+   vallen := Socklen_t(SizeofIPv6MTUInfo)
+   err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen)
+   return &value, err
+}

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-07 Thread Julian Brown

Ping?

Thanks,

Julian

On Fri, 13 Nov 2020 20:54:54 +
Julian Brown  wrote:

> Hi Alexander,
> 
> Thanks for the review! Comments below.
> 
> On Tue, 10 Nov 2020 00:32:36 +0300
> Alexander Monakov  wrote:
> 
> > On Mon, 26 Oct 2020, Jakub Jelinek wrote:
> >   
> > > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> > > > This patch adds caching for the stack block allocated for
> > > > offloaded OpenMP kernel launches on NVPTX. This is a performance
> > > > optimisation -- we observed an average 11% or so performance
> > > > improvement with this patch across a set of accelerated GPU
> > > > benchmarks on one machine (results vary according to individual
> > > > benchmark and with hardware used).
> > 
> > In this patch you're folding two changes together: reuse of
> > allocated stacks and removing one host-device synchronization.  Why
> > is that? Can you report performance change separately for each
> > change (and split out the patches)?  
> 
> An accident of the development process of the patch, really -- the
> idea for removing the post-kernel-launch synchronisation came from the
> OpenACC side, and adapting it to OpenMP meant the stacks had to remain
> allocated after the return of the GOMP_OFFLOAD_run function.
> 
> > > > A given kernel launch will reuse the stack block from the
> > > > previous launch if it is large enough, else it is freed and
> > > > reallocated. A slight caveat is that memory will not be freed
> > > > until the device is closed, so e.g. if code is using highly
> > > > variable launch geometries and large amounts of GPU RAM, you
> > > > might run out of resources slightly quicker with this patch.
> > > > 
> > > > Another way this patch gains performance is by omitting the
> > > > synchronisation at the end of an OpenMP offload kernel launch --
> > > > it's safe for the GPU and CPU to continue executing in parallel
> > > > at that point, because e.g. copies-back from the device will be
> > > > synchronised properly with kernel completion anyway.
> > 
> > I don't think this explanation is sufficient. My understanding is
> > that OpenMP forbids the host to proceed asynchronously after the
> > target construct unless it is a 'target nowait' construct. This may
> > be observable if there's a printf in the target region for example
> > (or if it accesses memory via host pointers).
> > 
> > So this really needs to be a separate patch with more explanation
> > why this is okay (if it is okay).  
> 
> As long as the offload kernel only touches GPU memory and does not
> have any CPU-visible side effects (like the printf you mentioned -- I
> hadn't really considered that, oops!), it's probably OK.
> 
> But anyway, the benefit obtained on OpenMP code (the same set of
> benchmarks run before) of omitting the synchronisation at the end of
> GOMP_OFFLOAD_run seems minimal. So it's good enough to just do the
> stacks caching, and miss out the synchronisation removal for now. (It
> might still be something worth considering later, perhaps, as long as
> we can show some given kernel doesn't use printf or access memory via
> host pointers -- I guess the former might be easier than the latter. I
> have observed the equivalent OpenACC patch provide a significant boost
> on some benchmarks, so there's probably something that could be gained
> on the OpenMP side too.)
> 
> The benefit with the attached patch -- just stacks caching, no
> synchronisation removal -- is about 12% on the same set of benchmarks
> as before. Results are a little noisy on the machine I'm benchmarking
> on, so this isn't necessarily proof that the synchronisation removal
> is harmful for performance!
> 
> > > > In turn, the last part necessitates a change to the way
> > > > "(perhaps abort was called)" errors are detected and reported.
> > > >   
> > 
> > As already mentioned using callbacks is problematic. Plus, I'm sure
> > the way you lock out other threads is a performance loss when
> > multiple threads have target regions: even though they will not run
> > concurrently on the GPU, you still want to allow host threads to
> > submit GPU jobs while the GPU is occupied.
> > 
> > I would suggest to have a small pool (up to 3 entries perhaps) of
> > stacks. Then you can arrange reuse without totally serializing host
> > threads on target regions.  
> 
> I'm really wary of the additional complexity of adding a stack pool,
> and the memory allocation/freeing code paths in CUDA appear to be so
> slow that we get a benefit with this patch even when the GPU stream
> has to wait for the CPU to unlock the stacks block. Also, for large
> GPU launches, the size of the soft-stacks block isn't really trivial
> (I've seen something like 50MB on the hardware I'm using, with default
> options), and multiplying that by 3 could start to eat into the GPU
> heap memory for "useful data" quite significantly.
> 
> Consider the attached (probably not amazingly-written) microbenchmark.
> It spawns 8 threads which each lau

Re: H8 cc0 conversion

2020-12-07 Thread Maciej W. Rozycki

On Sun, 22 Nov 2020, Jeff Law via Gcc-patches wrote:

> This is a combination of my son's work as well as my own.

 Congratulations to Austin for this achievement!

  Maciej

Re: [patch] Fix PR tree-optimization/96344

2020-12-07 Thread Richard Biener via Gcc-patches

On Mon, Dec 7, 2020 at 3:53 PM Eric Botcazou  wrote:
>
> Hi,
>
> the very recent addition of the if_to_switch pass has partially disabled the
> optimization I added back in June:
>
> 2020-06-26  Eric Botcazou  
>
> * tree-ssa-reassoc.c (dump_range_entry): New function.
> (debug_range_entry): New debug function.
> (update_range_test): Invoke dump_range_entry for dumping.
> (optimize_range_tests_to_bit_test): Merge the entry test in the
> bit test when possible and lower the profitability threshold.
>
> as witnessed by the 3 new failures in the gnat.dg testsuite.  It turns out
> that both tree-ssa-reassoc.c and tree-switch-conversion.c can turn things into
> bit tests so the optimization must also be added to bit_test_cluster::emit.
>
> The patch also contains a secondary optimization, whereby the full bit-test
> sequence is sent to the folder before being gimplified in case there is only
> one test, so that the most optimized sequence (bt + jc on x86) can be emitted
> like with optimize_range_tests_to_bit_test.
>
> Bootstrapped/regtested on x86-64/Linux, OK for the mainline?

OK.

Richard.

>
> 2020-12-07  Eric Botcazou  
>
> PR tree-optimization/96344
> * tree-switch-conversion.c (bit_test_cluster::emit): Compute the
> range only if an entry test is necessary.  Merge the entry test in
> the bit test when possible.  Use PREC local variable consistently.
> When there is only one test, do a single gimplification at the end.
>
> --
> Eric Botcazou

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Richard Biener via Gcc-patches

On Mon, Dec 7, 2020 at 7:34 PM Qing Zhao  wrote:
>
>
>
> On Dec 7, 2020, at 12:05 PM, Richard Sandiford  
> wrote:
>
> Qing Zhao  writes:
>
> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
> wrote:
>
>
> Another issue is, in order to check whether an auto-variable has initializer, 
> I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see 
> expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much 
> more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
> the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle 
> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
>
> Are you sure we need to do that?  The point of having the first argument
> to .DEFERRED_INIT was that that argument would still provide an
> uninitialised use of the variable.  And the values are passed and
> returned by value, so the lack of initialisation is explicit in
> the gcall itself, without knowing what the target function does.
>
> The idea is that we can essentially treat .DEFERRED_INIT as a normal
> (const) function call.  I'd be surprised if many passes needed to
> handle it specially.
>
>
> Just checked with a small testing case (to emulate the .DEFERRED_INIT 
> approach):
>
> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>
> int foo (int n, int r)
> {
>  int v;
>
>  v = DEFFERED_INIT (v, 0);
>  if (n < 10)
>v = r;
>
>  return v;
> }
> qinzhao@gcc10:~/Bugs/auto-init$ sh t
> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all 
> -S t.c
> t.c: In function ‘foo’:
> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
>7 |   v = DEFFERED_INIT (v, 0);
>  |   ^~~~
>
> We can see that the current uninitialized variable analysis treats the new 
> added artificial initialization as the first use of the uninialized variable. 
>  Therefore report the warning there.
> However, we should report warning at “return v”.
>
>
> Ah, OK, so this is about the quality of the warning, rather than about
> whether we report a warning or not?
>
> So, I think that we still need to specifically handle the new added 
> a

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Richard Biener via Gcc-patches

On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao  wrote:
>
>
>
> On Dec 7, 2020, at 1:12 AM, Richard Biener  wrote:
>
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  wrote:
>
>
>
>
> On Dec 4, 2020, at 2:50 AM, Richard Biener  wrote:
>
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>  wrote:
>
>
> Richard Biener via Gcc-patches  writes:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>
> Another issue is, in order to check whether an auto-variable has initializer, 
> I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see 
> expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much 
> more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
> the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle 
> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
> If we want to keep the current -Wuninitialized analysis untouched, this is a 
> quite reasonable approach.
>
> However, if it’s not required to keep the current -Wuninitialized analysis 
> untouched, adding zero-initializer directly during gimplification should
> be much easier and simpler, and also smaller run-time overhead.
>
>
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>
>
> Runtime overhead for -fauto-init=zero is one important consideration for the 
> whole feature, we should minimize the runtime overhead for zero
> Initialization since it will be used in production build.
> We can do some run-time performance evaluation when we have an implementation 
> ready.
>
>
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emitting at
> block boundary the .DEFERRED_INIT
> will act as code-motion barrier (and it itself likely cannot be moved)
>

Re: [PATCH] PR target/98152: Checking python is available before using

2020-12-07 Thread Kito Cheng via Gcc-patches

Committed

On Tue, Dec 8, 2020 at 5:33 AM Jim Wilson  wrote:
>
> On Sat, Dec 5, 2020 at 10:12 PM Kito Cheng  wrote:
>
> > gcc/ChangeLog:
> >
> > * config.gcc (riscv*-*-*): Checking python, python3 or python2
> > is available, and skip doing with_arch canonicalize if no python
> > available.
> >
>
> Looks good to me.
>
> Jim

74 matches

Mail list logo