Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Richard Biener
On Wed, 13 Dec 2017, Qing Zhao wrote:

> Hi,
> 
> I updated gimple-fold.c as you suggested, bootstrapped and re-tested on both 
> x86 and aarch64. no any issue.
> 
> 
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 353a46e..eb6a87a 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
> *visited, int type,
>the array could have zero length.  */
> *minlen = ssize_int (0);
>   }
> +
> +   if (VAR_P (arg) 
> +   && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
> + {
> +   val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
> +   if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
> + return false;
> +   val = wide_int_to_tree (TREE_TYPE (val), 
> +   wi::sub(wi::to_wide (val), 1));
> +   /* Set the minimum size to zero since the string in
> +  the array could have zero length.  */
> +   *minlen = ssize_int (0);
> + }
>   }
> 
> 
> I plan to commit the change very soon. 
> let me know if you have further comment.

Looks good to me.

Richard.

> thanks.
> 
> Qing
> 
> ==
> 
> the updated full patch is as following:
> 
> gcc/ChangeLog
> 
> 2017-12-13  Qing Zhao  
> 
>  PR middle_end/79538
>  * gimple-fold.c (get_range_strlen): Add the handling of non-member array.
> 
> gcc/fortran/ChangeLog
> 
> 2017-12-13  Qing Zhao  
> 
>   PR middle_end/79538
>   * class.c (gfc_build_class_symbol): Replace call to
>   sprintf with xasprintf to avoid format-overflow warning.
>   (generate_finalization_wrapper): Likewise.
>   (gfc_find_derived_vtab): Likewise.
>   (find_intrinsic_vtab): Likewise.
> 
> 
> gcc/c-family/ChangeLog
> 
> 2017-12-13  Qing Zhao  
> 
>  PR middle_end/79538 
>   * c-cppbuiltin.c (builtin_define_with_hex_fp_value):
>  Adjust the size of buf1 and buf2, add a new buf to avoid
>  format-overflow warning.
> 
> gcc/testsuite/ChangeLog
> 
> 2017-12-13  Qing Zhao  
> 
>  PR middle_end/79538
>  * gcc.dg/pr79538.c: New test.
> 
> ---
>  gcc/c-family/c-cppbuiltin.c| 10 -
>  gcc/fortran/class.c| 49 
> --
>  gcc/gimple-fold.c  | 13 +++
>  gcc/testsuite/gcc.dg/pr79538.c | 23 
>  4 files changed, 69 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr79538.c
> 
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 2ac9616..9e33aed 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1613,7 +1613,7 @@ builtin_define_with_hex_fp_value (const char *macro,
> const char *fp_cast)
>  {
>REAL_VALUE_TYPE real;
> -  char dec_str[64], buf1[256], buf2[256];
> +  char dec_str[64], buf[256], buf1[128], buf2[64];
>  
>/* This is very expensive, so if possible expand them lazily.  */
>if (lazy_hex_fp_value_count < 12
> @@ -1656,11 +1656,11 @@ builtin_define_with_hex_fp_value (const char *macro,
>  
>/* Assemble the macro in the following fashion
>   macro = fp_cast [dec_str fp_suffix] */
> -  sprintf (buf1, "%s%s", dec_str, fp_suffix);
> -  sprintf (buf2, fp_cast, buf1);
> -  sprintf (buf1, "%s=%s", macro, buf2);
> +  sprintf (buf2, "%s%s", dec_str, fp_suffix);
> +  sprintf (buf1, fp_cast, buf2);
> +  sprintf (buf, "%s=%s", macro, buf1);
>  
> -  cpp_define (parse_in, buf1);
> +  cpp_define (parse_in, buf);
>  }
>  
>  /* Return a string constant for the suffix for a value of type TYPE
> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index ebbd41b..a08fb8d 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -602,7 +602,8 @@ bool
>  gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
>   gfc_array_spec **as)
>  {
> -  char name[GFC_MAX_SYMBOL_LEN+1], tname[GFC_MAX_SYMBOL_LEN+1];
> +  char tname[GFC_MAX_SYMBOL_LEN+1];
> +  char *name;
>gfc_symbol *fclass;
>gfc_symbol *vtab;
>gfc_component *c;
> @@ -633,17 +634,17 @@ gfc_build_class_symbol (gfc_typespec *ts, 
> symbol_attribute *attr,
>rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank;
>get_unique_hashed_string (tname, ts->u.derived);
>if ((*as) && attr->allocatable)
> -sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank);
> +name = xasprintf ("__class_%s_%d_%da", tname, rank, (*as)->corank);
>else if ((*as) && attr->pointer)
> -sprintf (name, "__class_%s_%d_%dp", tname, rank, (*as)->corank);
> +name = xasprintf ("__class_%s_%d_%dp", tname, rank, (*as)->corank);
>else if ((*as))
> -sprintf (name, "__class_%s_%d_%dt", tname, rank, (*as)->corank);
> +name = xasprintf ("__class_%s_%d_%dt", tname, rank, (*as)->corank);
>else if (attr->pointer)
> -sprintf (name, "__class_%s_p", tn

[AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2017-12-14 Thread Sameera Deshpande
Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
  Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


* config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
(st1x2): Likewise.
(st1x3): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
(aarch64_ld1_x3_): Likewise
(aarch64_st1x2): Likewise
(aarch64_st1_x2_): Likewise
(aarch64_st1x3): Likewise
(aarch64_st1_x3_): Likewise
* config/aarch64/arm_neon.h (vld1_u8_x3): New function.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_fp16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
(vst1_s64_x3): Likewise.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 52d01342372..fa623e90017 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -441,6 +441,15 @@
   BUILTIN_VALL_F16 (STORE1, st1, 0)
   VAR1(STORE1P, st1, 0, v2di)
 
+  /* Implemented by aarch64_ld1x3.  */
+  BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0)
+
+  /* Implemented by aarch64_st1x2.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0)
+
+  /* Implemented by aarch64_st1x3.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0)
+
   /* Implemented by fma4.  */
   BUILTIN_VHSDF (TERNOP, fma, 4)
   VAR1 (TERNOP, fma, 4, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4fd34c18f95..852bcf0c16a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5038,6 +5038,70 @@
 }
 })
 
+
+(define_expand "aarch64_ld1x3"
+  [(match_operand:CI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "r")
+   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  rtx mem = gen_rtx_MEM (CImode, operands[1]);
+  emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem));
+  DONE;
+})
+
+(define_insn "aarch64_ld1_x3_"
+  [(set (match_operand:CI 0 "register_operand" "=w")
+(unspec:CI
+	  [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv")
+	   (unspec:VALLDIF [(const_int 3)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_LD1))]
+  "TARGET_SIMD"
+  "ld1\\t{%S0. - %U0.}, %1"
+  [(set_attr "type" "neon_load1_3reg")

[PATCH] Fix PR83418

2017-12-14 Thread Richard Biener

IVOPTs (at least) leaves unfolded stmts in the IL and VRP overzealously
asserts they cannot happen.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-12-14  Richard Biener  

PR tree-optimization/83418
* vr-values.c (vr_values::extract_range_for_var_from_comparison_expr):
Instead of asserting we don't get unfolded comparisons deal with
them.

* gcc.dg/torture/pr83418.c: New testcase.

Index: gcc/vr-values.c
===
--- gcc/vr-values.c (revision 255622)
+++ gcc/vr-values.c (working copy)
@@ -445,11 +445,12 @@ vr_values::extract_range_for_var_from_co
   tree  min, max, type;
   value_range *limit_vr;
   type = TREE_TYPE (var);
-  gcc_assert (limit != var);
 
   /* For pointer arithmetic, we only keep track of pointer equality
- and inequality.  */
-  if (POINTER_TYPE_P (type) && cond_code != NE_EXPR && cond_code != EQ_EXPR)
+ and inequality.  If we arrive here with unfolded conditions like
+ _1 > _1 do not derive anything.  */
+  if ((POINTER_TYPE_P (type) && cond_code != NE_EXPR && cond_code != EQ_EXPR)
+  || limit == var)
 {
   set_value_range_to_varying (vr_p);
   return;
Index: gcc/testsuite/gcc.dg/torture/pr83418.c
===
--- gcc/testsuite/gcc.dg/torture/pr83418.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr83418.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+void
+yj (int j4)
+{
+  int t3;
+
+  for (t3 = 0; t3 < 6; ++t3)
+{
+  short int v4 = t3;
+
+  if (v4 == j4 || v4 > t3)
+   for (;;)
+ {
+ }
+}
+}


[PATCH][C] Fix PR83415

2017-12-14 Thread Richard Biener

The C FE fails to handle VIEW_CONVERT_EXPR properly when folding
it in a lvalue context.  Fixed as follows.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Will commit as obvious.

Thanks,
Richard.

2017-12-14  Richard Biener  

PR c/83415
c/
* c-fold.c (c_fully_fold_internal): Treat VIEW_CONVERT_EXPR
like REALPART_EXPR for the behavior of whether its operand
is an lvalue.

* gcc.dg/pr83415.c: New testcase.

Index: gcc/c/c-fold.c
===
--- gcc/c/c-fold.c  (revision 255622)
+++ gcc/c/c-fold.c  (working copy)
@@ -434,6 +434,7 @@ c_fully_fold_internal (tree expr, bool i
   goto unary;
 case REALPART_EXPR:
 case IMAGPART_EXPR:
+case VIEW_CONVERT_EXPR:
   op0_lval = lval;
   /* FALLTHRU */
 case INDIRECT_REF:
@@ -441,7 +442,6 @@ c_fully_fold_internal (tree expr, bool i
 case FLOAT_EXPR:
 CASE_CONVERT:
 case ADDR_SPACE_CONVERT_EXPR:
-case VIEW_CONVERT_EXPR:
 case NON_LVALUE_EXPR:
 case NEGATE_EXPR:
 case BIT_NOT_EXPR:
Index: gcc/testsuite/gcc.dg/pr83415.c
===
--- gcc/testsuite/gcc.dg/pr83415.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr83415.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+const short __attribute__((vector_size(16))) y = { 0, 1, 2, 3, 4, 5, 6, 7 };
+
+int
+main (int argc, short *argv[])
+{
+  int i = argc;
+  y[i] = 7 - i; /* { dg-warning "read-only" } */
+  return 0;
+}


[PATCH] Fix gimple-ssa-sprintf.c ICE (PR tree-optimization/83198, take 2)

2017-12-14 Thread Jakub Jelinek
On Wed, Dec 13, 2017 at 03:55:13PM -0700, Martin Sebor wrote:
> For the second part, can you please also add a compile-time test
> to verify that the result isn't constrained to the same range as
> with a real argument?  Checking that the abort below isn't
> eliminated would do it for %f:
> 
>   void f (char *d)
>   {
> int n = __builtin_sprintf (d, "%f", 1.0Q);
> if (n < 8 || 13 < n)
>   __builtin_abort ();
>   }
> 
> A test that has a convenient setup for this is tree-ssa/builtin-
> sprintf-2.c in case you want to add to it.

Can't use that test, because I need __float128, add options for it etc.

Anyway, here is the same patch with added extra test, tested on x86_64-linux
and powerpc64-linux.  The tree-ssa test FAILs without the patch on both
targets.

2017-12-14  Jakub Jelinek  

PR tree-optimization/83198
* gimple-ssa-sprintf.c (format_floating): Set type solely based on
dir.modifier, regardless of TREE_TYPE (arg).  Assume non-REAL_CST
value if arg is a REAL_CST with incompatible type.

* gcc.dg/pr83198.c: New test.
* gcc.dg/tree-ssa/pr83198.c: New test.

--- gcc/gimple-ssa-sprintf.c.jj 2017-11-03 15:37:04.0 +0100
+++ gcc/gimple-ssa-sprintf.c2017-12-13 13:37:59.289435623 +0100
@@ -1885,6 +1885,8 @@ static fmtresult
 format_floating (const directive &dir, tree arg)
 {
   HOST_WIDE_INT prec[] = { dir.prec[0], dir.prec[1] };
+  tree type = (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
+  ? long_double_type_node : double_type_node);
 
   /* For an indeterminate precision the lower bound must be assumed
  to be zero.  */
@@ -1892,10 +1894,6 @@ format_floating (const directive &dir, t
 {
   /* Get the number of fractional decimal digits needed to represent
 the argument without a loss of accuracy.  */
-  tree type = arg ? TREE_TYPE (arg) :
-   (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
-? long_double_type_node : double_type_node);
-
   unsigned fmtprec
= REAL_MODE_FORMAT (TYPE_MODE (type))->p;
 
@@ -1946,7 +1944,9 @@ format_floating (const directive &dir, t
}
 }
 
-  if (!arg || TREE_CODE (arg) != REAL_CST)
+  if (!arg
+  || TREE_CODE (arg) != REAL_CST
+  || !useless_type_conversion_p (type, TREE_TYPE (arg)))
 return format_floating (dir, prec);
 
   /* The minimum and maximum number of bytes produced by the directive.  */
--- gcc/testsuite/gcc.dg/pr83198.c.jj   2017-12-13 13:43:36.056192309 +0100
+++ gcc/testsuite/gcc.dg/pr83198.c  2017-12-13 13:47:11.716474956 +0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/83198 */
+/* { dg-do compile } */
+/* { dg-options "-Wall -Wno-format" } */
+
+int
+foo (char *d[6], int x)
+{
+  int r = 0;
+  r += __builtin_sprintf (d[0], "%f", x);
+  r += __builtin_sprintf (d[1], "%a", x);
+  r += __builtin_sprintf (d[2], "%f", "foo");
+  r += __builtin_sprintf (d[3], "%a", "bar");
+#ifdef __SIZEOF_FLOAT128__
+  r += __builtin_sprintf (d[4], "%a", 1.0Q);
+  r += __builtin_sprintf (d[5], "%Lf", 1.0Q);
+#endif
+  return r;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr83198.c.jj  2017-12-14 09:43:07.534102549 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr83198.c 2017-12-14 09:52:39.772964559 
+0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/83198 */
+/* { dg-do compile { target __float128 } } */
+/* { dg-options "-O2 -fprintf-return-value -Wno-format -fdump-tree-optimized" 
} */
+/* { dg-add-options __float128 } */
+
+void bar (void);
+void link_error (void);
+
+void
+foo (char *x)
+{
+  int a = __builtin_sprintf (x, "%f", 1.0Q);
+  if (a < 8)
+link_error ();
+  if (a > 13)
+bar ();
+  if (a > 322)
+link_error ();
+}
+
+/* Verify we don't optimize return value to [8, 13].  */
+/* { dg-final { scan-tree-dump-not "link_error \\(\\);" "optimized" } } */
+/* { dg-final { scan-tree-dump "bar \\(\\);" "optimized" } } */


Jakub


Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-14 Thread mpezz
  If Luc's explicit green light will not arrive before it is decision
time, Paolo's point 2- below is doable.

Il 13.12.2017 12:51 Jonathan
Wakely ha scritto: 

> On 12/12/17 21:37 +0100, Paolo Carlini wrote:
>

>> Hi, On 12/12/2017 19:42, Michele Pezzutti wrote: 
>> 
>>> Hi. Yes, I
looked at the text before submitting the patch. I contacted Devroye and
he confirmed that another reader had also pointed out this bug but not
the solution. I sent him my proposed patch, he will look into it (no
idea when though).
>> Nice. 
>> 
>>> I would state that "comparison
function for x = 1 is e^(1/78)" (which becomes 1/78 as the algorithm
uses log-probabilities). I think the change is needed because otherwise,
for that particular bin, the rejection probability is lower than it
should be, resulting in a higher number of samples.
>> Ok. Ideally I
would be much less nervous about committing the patch if we either 1-
Had Luc's explicit green light; 2- Were able to *rigorously deduce*
within the framework of the book why the change is needed. That said,
the patch makes sense to me and so far holds up well in all my tests
(I'm currently running a full make check). I would say, let's wait a
week or so and then make the final decision. Jon, do you agree? Ideas
about further testing? (eg, some code you are aware of stressing
Poisson?)
> 
> No, I have nothing useful to add here, but I CC'd Ed on
the PR as I'd
> like his input.
  


Con Mobile Open 7 GB a 9 euro/4 sett navighi veloce con 7 GB di Internet e hai 
200 minuti ed SMS a 12 cent. Passa a Tiscali Mobile! 
http://tisca.li/OPEN7GBFirma



[PATCH] Fix PR83326

2017-12-14 Thread Richard Biener

After the previous fix of mine to no longer peel loops during cunrolli
to avoid littering an outer loop with tons of loop exit tests and thus
disabling vectorization there's a reported slowdown in the exchange
benchmark.  This is because we no longer fully unroll a loop with
appearant constant iterations early.  This is a long-standing issue
of cunrolli running before loop header copying and thus receiving
niters like (a < b) ? 2 : 0.  The fix is to treat zero or constant
iteration counts like if they were constant, doing loop-header copying
on-the fly during unrolling by simply not removing the first peeled
copies exit test.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-12-14  Richard Biener  

PR tree-optimization/83326
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add
may_be_zero parameter and handle it by not marking the first
peeled copy as not exiting the loop.
(try_peel_loop): Likewise.
(canonicalize_loop_induction_variables): Use number_of_iterations_exit
to handle the case of constant or zero iterations and perform
loop header copying on-the-fly.

* gcc.dg/tree-ssa/pr81388-2.c: Adjust.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 255622)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -681,7 +681,7 @@ unloop_loops (bitmap loop_closed_ssa_inv
 
 static bool
 try_unroll_loop_completely (struct loop *loop,
-   edge exit, tree niter,
+   edge exit, tree niter, bool may_be_zero,
enum unroll_level ul,
HOST_WIDE_INT maxiter,
location_t locus, bool allow_peel)
@@ -893,6 +893,8 @@ try_unroll_loop_completely (struct loop
  exit = NULL;
  bitmap_clear (wont_exit);
}
+  if (may_be_zero)
+   bitmap_clear_bit (wont_exit, 1);
 
   if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge 
(loop),
 n_unroll, wont_exit,
@@ -977,7 +979,7 @@ estimated_peeled_sequence_size (struct l
 
 static bool
 try_peel_loop (struct loop *loop,
-  edge exit, tree niter,
+  edge exit, tree niter, bool may_be_zero,
   HOST_WIDE_INT maxiter)
 {
   HOST_WIDE_INT npeel;
@@ -1080,6 +1082,8 @@ try_peel_loop (struct loop *loop,
   exit = NULL;
   bitmap_clear (wont_exit);
 }
+  if (may_be_zero)
+bitmap_clear_bit (wont_exit, 1);
   if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
 npeel, wont_exit,
 exit, &edges_to_remove,
@@ -1152,13 +1156,35 @@ canonicalize_loop_induction_variables (s
   HOST_WIDE_INT maxiter;
   bool modified = false;
   location_t locus = UNKNOWN_LOCATION;
+  struct tree_niter_desc niter_desc;
+  bool may_be_zero = false;
 
-  niter = number_of_latch_executions (loop);
+  /* For unrolling allow conditional constant or zero iterations, thus
+ perform loop-header copying on-the-fly.  */
   exit = single_exit (loop);
+  niter = chrec_dont_know;
+  if (exit && number_of_iterations_exit (loop, exit, &niter_desc, false))
+{
+  niter = niter_desc.niter;
+  may_be_zero
+   = niter_desc.may_be_zero && !integer_zerop (niter_desc.may_be_zero);
+}
   if (TREE_CODE (niter) == INTEGER_CST)
 locus = gimple_location (last_stmt (exit->src));
   else
 {
+  /* For non-constant niter fold may_be_zero into niter again.  */
+  if (may_be_zero)
+   {
+ if (COMPARISON_CLASS_P (niter_desc.may_be_zero))
+   niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
+niter_desc.may_be_zero,
+build_int_cst (TREE_TYPE (niter), 0), niter);
+ else
+   niter = chrec_dont_know;
+ may_be_zero = false;
+   }
+
   /* If the loop has more than one exit, try checking all of them
 for # of iterations determinable through scev.  */
   if (!exit)
@@ -1213,17 +1239,31 @@ canonicalize_loop_induction_variables (s
  populates the loop bounds.  */
   modified |= remove_redundant_iv_tests (loop);
 
-  if (try_unroll_loop_completely (loop, exit, niter, ul, maxiter, locus,
- allow_peel))
+  if (try_unroll_loop_completely (loop, exit, niter, may_be_zero, ul,
+ maxiter, locus, allow_peel))
 return true;
 
   if (create_iv
   && niter && !chrec_contains_undetermined (niter)
   && exit && just_once_each_iteration_p (loop, exit->src))
-create_canonical_iv (loop, exit, niter);
+{
+  tree iv_niter = niter;
+  if (may_be_zero)
+   {
+ if (COMPARISON_CLASS_P (niter_desc.may_be_zero))
+   iv_niter = fold_b

Re: [PATCH] Fix gimple-ssa-sprintf.c ICE (PR tree-optimization/83198)

2017-12-14 Thread Richard Biener
On Wed, 13 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> This patch fixes 2 issues in format_floating.  One is that when determining
> precision, we should consider solely the type *printf* will read the
> argument as (i.e. double unless L or ll modifier is used, in which case
> long double), not the type of the argument, because the corresponding
> argument could have any type, even not floating, or say __float128 etc.
> 
> This is fixed in the first 2 hunks.
> 
> The last hunk is to treat REAL_CSTs arguments of incompatible types
> as unknown argument, we really don't know what say __float128 passed to
> %f or double passed to %La will do; that is something diagnosed by -Wformat,
> so the patch just treats it as arbitrary value of the type.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-12-13  Jakub Jelinek  
> 
>   PR tree-optimization/83198
>   * gimple-ssa-sprintf.c (format_floating): Set type solely based on
>   dir.modifier, regardless of TREE_TYPE (arg).  Assume non-REAL_CST
>   value if arg is a REAL_CST with incompatible type.
> 
>   * gcc.dg/pr83198.c: New test.
> 
> --- gcc/gimple-ssa-sprintf.c.jj   2017-11-03 15:37:04.0 +0100
> +++ gcc/gimple-ssa-sprintf.c  2017-12-13 13:37:59.289435623 +0100
> @@ -1885,6 +1885,8 @@ static fmtresult
>  format_floating (const directive &dir, tree arg)
>  {
>HOST_WIDE_INT prec[] = { dir.prec[0], dir.prec[1] };
> +  tree type = (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
> +? long_double_type_node : double_type_node);
>  
>/* For an indeterminate precision the lower bound must be assumed
>   to be zero.  */
> @@ -1892,10 +1894,6 @@ format_floating (const directive &dir, t
>  {
>/* Get the number of fractional decimal digits needed to represent
>the argument without a loss of accuracy.  */
> -  tree type = arg ? TREE_TYPE (arg) :
> - (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
> -  ? long_double_type_node : double_type_node);
> -
>unsigned fmtprec
>   = REAL_MODE_FORMAT (TYPE_MODE (type))->p;
>  
> @@ -1946,7 +1944,9 @@ format_floating (const directive &dir, t
>   }
>  }
>  
> -  if (!arg || TREE_CODE (arg) != REAL_CST)
> +  if (!arg
> +  || TREE_CODE (arg) != REAL_CST
> +  || !useless_type_conversion_p (type, TREE_TYPE (arg)))
>  return format_floating (dir, prec);
>  
>/* The minimum and maximum number of bytes produced by the directive.  */
> --- gcc/testsuite/gcc.dg/pr83198.c.jj 2017-12-13 13:43:36.056192309 +0100
> +++ gcc/testsuite/gcc.dg/pr83198.c2017-12-13 13:47:11.716474956 +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/83198 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -Wno-format" } */
> +
> +int
> +foo (char *d[6], int x)
> +{
> +  int r = 0;
> +  r += __builtin_sprintf (d[0], "%f", x);
> +  r += __builtin_sprintf (d[1], "%a", x);
> +  r += __builtin_sprintf (d[2], "%f", "foo");
> +  r += __builtin_sprintf (d[3], "%a", "bar");
> +#ifdef __SIZEOF_FLOAT128__
> +  r += __builtin_sprintf (d[4], "%a", 1.0Q);
> +  r += __builtin_sprintf (d[5], "%Lf", 1.0Q);
> +#endif
> +  return r;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83396#c28 on ia64 (PR bootstrap/83396, take 2)

2017-12-14 Thread Richard Biener
On Wed, 13 Dec 2017, Jakub Jelinek wrote:

> On Wed, Dec 13, 2017 at 03:25:07PM +0100, Jakub Jelinek wrote:
> > I think there are 2 issues.  One is that the ia64 backend emits
> > the group barrier insns before BB_HEAD label, so it isn't part of a bb,
> > but has BLOCK_FOR_INSN of the following block, that looks invalid to me
> > and the ia64.c hunk ought to fix that, except that I don't have access to
> > ia64 anymore and so can't test it.  Andreas, could you try that?
> > 
> > Another thing is that if we because of this end up with insns outside of
> > basic blocks, the vt_initialize asserts will fire again.  Here, first of
> > all, IMNSHO we should assert that debug bind insns aren't outside of basic
> > blocks, the other patches and checking should ensure that (and if any slips
> > in, we want to fix that too rather than work-around).
> > Another is that while walking from get_first_insn to one before BB_HEAD 
> > (bb->next_bb),
> > we can encounter insns outside of bb not just before BB_HEAD (bb), but also
> > after BB_END (bb), both cases are outside of a bb and thus we can
> > expect BLOCK_FOR_INSN being NULL.
> > 
> > Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> > regtest on powerpc64-linux pending.  Ok for trunk perhaps without the
> > ia64.c bits until that gets tested?
> > 
> > Or, in the PR there is a variant patch which just doesn't do the asserts and
> > doesn't have to track outside_bb.
> 
> Here is another variant, without trying to change ia64 backend which
> apparently doesn't bootstrap for other reasons.
> 
> This patch instead ignores insns outside of basic blocks during var-tracking
> exactly as it has been ignoring them before, and just processes the debug
> begin stmt markers in there (and verifies no debug bind stmts appear in
> between bbs).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-12-13  Jakub Jelinek  
> 
>   PR bootstrap/83396
>   * var-tracking.c (vt_initialize): Ignore non-DEBUG_INSNs outside of
>   basic blocks.  Assert debug bind insns don't appear outside of bbs,
>   don't reset them.  Assert insns without BLOCK_FOR_INSN are outside of
>   bb.  Simplify.
> 
>   * gcc.dg/pr83396.c: New test.
> 
> --- gcc/var-tracking.c.jj 2017-12-13 13:22:59.651783152 +0100
> +++ gcc/var-tracking.c2017-12-13 19:11:13.895699735 +0100
> @@ -10157,25 +10157,31 @@ vt_initialize (void)
>insns that might be before it too.  Unfortunately,
>BB_HEADER and BB_FOOTER are not set while we run this
>pass.  */
> -   insn = get_first_insn (bb);
> -   for (rtx_insn *next;
> -insn != BB_HEAD (bb->next_bb)
> -  ? next = NEXT_INSN (insn), true : false;
> +   rtx_insn *next;
> +   bool outside_bb = true;
> +   for (insn = get_first_insn (bb); insn != BB_HEAD (bb->next_bb);
>  insn = next)
>   {
> +   if (insn == BB_HEAD (bb))
> + outside_bb = false;
> +   else if (insn == NEXT_INSN (BB_END (bb)))
> + outside_bb = true;
> +   next = NEXT_INSN (insn);
> if (INSN_P (insn))
>   {
> +   if (outside_bb)
> + {
> +   /* Ignore non-debug insns outside of basic blocks.  */
> +   if (!DEBUG_INSN_P (insn))
> + continue;
> +   /* Debug binds shouldn't appear outside of bbs.  */
> +   gcc_assert (!DEBUG_BIND_INSN_P (insn));
> + }
> basic_block save_bb = BLOCK_FOR_INSN (insn);
> if (!BLOCK_FOR_INSN (insn))
>   {
> +   gcc_assert (outside_bb);
> BLOCK_FOR_INSN (insn) = bb;
> -   gcc_assert (DEBUG_INSN_P (insn));
> -   /* Reset debug insns between basic blocks.
> -  Their location is not reliable, because they
> -  were probably not maintained up to date.  */
> -   if (DEBUG_BIND_INSN_P (insn))
> - INSN_VAR_LOCATION_LOC (insn)
> -   = gen_rtx_UNKNOWN_VAR_LOC ();
>   }
> else
>   gcc_assert (BLOCK_FOR_INSN (insn) == bb);
> --- gcc/testsuite/gcc.dg/pr83396.c.jj 2017-12-13 15:53:15.446687005 +0100
> +++ gcc/testsuite/gcc.dg/pr83396.c2017-12-13 15:53:15.446687005 +0100
> @@ -0,0 +1,12 @@
> +/* PR bootstrap/83396 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -g" } */
> +
> +int bar (int);
> +int baz (int);
> +
> +int
> +foo (int x)
> +{
> +  return bar (x) || baz (x) != 0;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix gimple-ssa-sprintf.c ICE (PR tree-optimization/83198, take 2)

2017-12-14 Thread Richard Biener
On Thu, 14 Dec 2017, Jakub Jelinek wrote:

> On Wed, Dec 13, 2017 at 03:55:13PM -0700, Martin Sebor wrote:
> > For the second part, can you please also add a compile-time test
> > to verify that the result isn't constrained to the same range as
> > with a real argument?  Checking that the abort below isn't
> > eliminated would do it for %f:
> > 
> >   void f (char *d)
> >   {
> > int n = __builtin_sprintf (d, "%f", 1.0Q);
> > if (n < 8 || 13 < n)
> >   __builtin_abort ();
> >   }
> > 
> > A test that has a convenient setup for this is tree-ssa/builtin-
> > sprintf-2.c in case you want to add to it.
> 
> Can't use that test, because I need __float128, add options for it etc.
> 
> Anyway, here is the same patch with added extra test, tested on x86_64-linux
> and powerpc64-linux.  The tree-ssa test FAILs without the patch on both
> targets.

Ok

> 2017-12-14  Jakub Jelinek  
> 
>   PR tree-optimization/83198
>   * gimple-ssa-sprintf.c (format_floating): Set type solely based on
>   dir.modifier, regardless of TREE_TYPE (arg).  Assume non-REAL_CST
>   value if arg is a REAL_CST with incompatible type.
> 
>   * gcc.dg/pr83198.c: New test.
>   * gcc.dg/tree-ssa/pr83198.c: New test.
> 
> --- gcc/gimple-ssa-sprintf.c.jj   2017-11-03 15:37:04.0 +0100
> +++ gcc/gimple-ssa-sprintf.c  2017-12-13 13:37:59.289435623 +0100
> @@ -1885,6 +1885,8 @@ static fmtresult
>  format_floating (const directive &dir, tree arg)
>  {
>HOST_WIDE_INT prec[] = { dir.prec[0], dir.prec[1] };
> +  tree type = (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
> +? long_double_type_node : double_type_node);
>  
>/* For an indeterminate precision the lower bound must be assumed
>   to be zero.  */
> @@ -1892,10 +1894,6 @@ format_floating (const directive &dir, t
>  {
>/* Get the number of fractional decimal digits needed to represent
>the argument without a loss of accuracy.  */
> -  tree type = arg ? TREE_TYPE (arg) :
> - (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
> -  ? long_double_type_node : double_type_node);
> -
>unsigned fmtprec
>   = REAL_MODE_FORMAT (TYPE_MODE (type))->p;
>  
> @@ -1946,7 +1944,9 @@ format_floating (const directive &dir, t
>   }
>  }
>  
> -  if (!arg || TREE_CODE (arg) != REAL_CST)
> +  if (!arg
> +  || TREE_CODE (arg) != REAL_CST
> +  || !useless_type_conversion_p (type, TREE_TYPE (arg)))
>  return format_floating (dir, prec);
>  
>/* The minimum and maximum number of bytes produced by the directive.  */
> --- gcc/testsuite/gcc.dg/pr83198.c.jj 2017-12-13 13:43:36.056192309 +0100
> +++ gcc/testsuite/gcc.dg/pr83198.c2017-12-13 13:47:11.716474956 +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/83198 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -Wno-format" } */
> +
> +int
> +foo (char *d[6], int x)
> +{
> +  int r = 0;
> +  r += __builtin_sprintf (d[0], "%f", x);
> +  r += __builtin_sprintf (d[1], "%a", x);
> +  r += __builtin_sprintf (d[2], "%f", "foo");
> +  r += __builtin_sprintf (d[3], "%a", "bar");
> +#ifdef __SIZEOF_FLOAT128__
> +  r += __builtin_sprintf (d[4], "%a", 1.0Q);
> +  r += __builtin_sprintf (d[5], "%Lf", 1.0Q);
> +#endif
> +  return r;
> +}
> --- gcc/testsuite/gcc.dg/tree-ssa/pr83198.c.jj2017-12-14 
> 09:43:07.534102549 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr83198.c   2017-12-14 09:52:39.772964559 
> +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/83198 */
> +/* { dg-do compile { target __float128 } } */
> +/* { dg-options "-O2 -fprintf-return-value -Wno-format 
> -fdump-tree-optimized" } */
> +/* { dg-add-options __float128 } */
> +
> +void bar (void);
> +void link_error (void);
> +
> +void
> +foo (char *x)
> +{
> +  int a = __builtin_sprintf (x, "%f", 1.0Q);
> +  if (a < 8)
> +link_error ();
> +  if (a > 13)
> +bar ();
> +  if (a > 322)
> +link_error ();
> +}
> +
> +/* Verify we don't optimize return value to [8, 13].  */
> +/* { dg-final { scan-tree-dump-not "link_error \\(\\);" "optimized" } } */
> +/* { dg-final { scan-tree-dump "bar \\(\\);" "optimized" } } */
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [07/13] Make vec_perm_indices use new vector encoding

2017-12-14 Thread Richard Biener
On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>>  wrote:
>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>> that stores a canonicalised permutation, using the same encoding
>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>>> information about the number of vectors being permuted (currently
>>> always 1 or 2) and the number of elements in each input vector.
>>
>> Before I dive into  the C++ details can you explain why it needs this
>> info and how it encodes it for variable-length vectors?  To interleave
>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>> can directly encode N here, can we?  extract even/odd should just
>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>> one or two vectors (the one vector case just has two times the same
>> vector) or how many elements each of the vectors (or the result) has.
>
> One of the later patches switches the element types to HOST_WIDE_INT,
> so that we can represent all ssizetypes.  Then there's a poly_int
> patch (not yet posted) to make that poly_int64, so that we can
> represent the N even for variable-length vectors.
>
> The class needs to know the number of elements because that affects
> the canonical representation.  E.g. extract even on fixed-length
> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
> which we can't encode as a simple series.  Interleave low with both
> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
> variable-length vectors.

Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
are referencing concat'ed input vectors.  So yes, for two same vectors
that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
that should
be the canonical form?

> Also, operator[] is supposed to return an in-range selector even if
> the selector element is only implicitly encoded.  So we need to know
> the number of input elements there.
>
> Separating the number of input elements into the number of inputs
> and the number of elements per input isn't really necessary, but made
> it easier to provide routines for testing whether all selected
> elements come from a particular input, and for rotating the selector
> by a whole number of inputs.
>
> Thanks,
> Richard


Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-14 Thread Sudakshina Das

Hi

On 13/12/17 16:56, James Greenhalgh wrote:

On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote:

On 13/12/17 16:42, Sudakshina Das wrote:

Hi

This patch is a follow up to the existing discussions on
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent
behavior.

As discussed earlier on the gcc-patches discussion and on the bug
report, PR81647 was occurring because of how UNEQ was handled in
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is
guaranteed to not give an FP exception but LTGT might,
__builtin_islessgreater gets converted to ~UNEQ very early on in
fold_builtin_unordered_cmp. Thus I will post a separate patch for
correcting how UNEQ and other unordered comparisons are handled in
aarch64-simd.md.

This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new compile time test case that gives
out LTGT to make sure it doesn't ICE.

Is this ok for trunk?


OK.

Thanks,
James



Thanks for the review.
Committed as r255625.
I think this needs a back-port as well to gcc-7-branch. Is that ok?

Sudi


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Richard Biener
On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:
> On 12/12/2017 05:35 PM, Jeff Law wrote:
>>
>> On 12/12/2017 01:15 PM, Martin Sebor wrote:
>>>
>>> Bug 83373 - False positive reported by -Wstringop-overflow, is
>>> another example of warning triggered by a missed optimization
>>> opportunity, this time in the strlen pass.  The optimization
>>> is discussed in pr78450 - strlen(s) return value can be assumed
>>> to be less than the size of s.  The gist of it is that the result
>>> of strlen(array) can be assumed to be less than the size of
>>> the array (except in the corner case of last struct members).
>>>
>>> To avoid the false positive the attached patch adds this
>>> optimization to the strlen pass.  Although the patch passes
>>> bootstrap and regression tests for all front-ends I'm not sure
>>> the way it determines the upper bound of the range is 100%
>>> correct for languages with arrays with a non-zero lower bound.
>>> Maybe it's just not as tight as it could be.
>>
>> What about something hideous like
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in the realm
>> of undefined behavior at that point (I hope so)?
>
>
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
>
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>
> is undefined.

There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).

>  _FORTIFY_SOURCE allows it for raw memory functions
> because some low level code copies regions of the same object that
> span two or more members (I've seen an example or two in the Linux
> kernel but, IIRC, nowhere else).  With the patch, using memchr
> would be the only way to get away with it.
>
> Other than that, the new -Wstringop-truncation warning is designed
> to prevent creating unterminated character arrays and the manual
> suggests using attribute nonstring when it's necessary.  The
> -Wrestrict warning I'm about to check in also complains about
> forming invalid pointers by built-ins with restrict-qualified
> arguments (although it won't diagnose the above).
>
> Although I would prefer not to, I suppose if letting strlen cross
> the boundaries of subobjects was considered an important use to
> accommodate in limited cases the optimization could be disabled
> for member arrays declared with the new nonstring attribute (while
> still issuing a warning for it as GCC does today).
>
> Another alternative (if the above use case is considered important
> enough) might be to suppress the optimization for member character
> arrays that are immediately followed by another such array.
>
> Martin


Re: [PATCH v2] vrp_prop: Use dom_walker for -Warray-bounds (PR tree-optimization/83312)

2017-12-14 Thread Richard Biener
On Wed, Dec 13, 2017 at 10:30 PM, David Malcolm  wrote:
> On Wed, 2017-12-13 at 10:47 -0700, Jeff Law wrote:
>> On 12/13/2017 09:24 AM, Richard Biener wrote:
>> > >
>> > > Alternately we could to the dom_walker ctor that an initial state
>> > > of
>> > > EDGE_EXECUTABLE is already set.
>> >
>> > I'm quite sure that wouldn't help for VRP.
>>
>> Not sure why.  But it's not worth digging deep into.
>>
>> I do think the current structure could still fail to pick up some
>> secondary cases where blocks become unreachable as a result of both
>> not
>> needing to visit during the lattice propagation step and the
>> substitution step.  But I'd expect this to be rare.
>>
>> > I think David's approach is fine just we don't need any other API
>> > to get at a known executable outgoing edge. We can improve the
>> > existing one or just add the trivial folding required.
>>
>> I think Michael's suggestion to pass in NULL for the value and allow
>> find_edge to try and determine the value makes the most sense here.
>>
>> Jeff
>
> Michael: thanks for the hint about find_taken_edge; I assumed that such
> a "find the relevant out-edge" function would already exist; but I
> didn't find it (I'm relatively unfamiliar with this part of the code).
>
> Here's an updated version of the patch, which eliminates the stuff I
> added to gimple.h/gimple.c changes in favor of using
> find_taken_edge (bb, NULL_TREE),
> generalizing it to work with arbitrary bbs, so that the dom_walker
> vfunc can simply use:
>   return find_taken_edge (bb, NULL_TREE);
> without having to check e.g. for there being a last stmt (ENTRY
> and EXIT), or having to check that it is indeed a control statement
> (is there a requirement at this point of the IR that we don't just
> fall off the last statment through an out-edge?)
>
> I handled var == NULL_TREE for GIMPLE_COND and GIMPLE_SWITCH,
> but not for computed goto (find_taken_edge already handles that by
> bailing out).
>
> I also made some things "const" whilst I was touching it.
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
> PR tree-optimization/83312
> * domwalk.h (dom_walker::dom_walker): Fix typo in comment.
> * tree-cfg.c (find_taken_edge): Update to handle NULL_TREE for
> "val" param, and to cope with arbitrary basic blocks.
> (find_taken_edge_cond_expr): Add "cond_stmt" param and use it to
> handle NULL_TREE for "val".
> (find_taken_edge_switch_expr): Make "switch_stmt" param const.
> Handle NULL_TREE for "val".
> (find_case_label_for_value): Make "switch_stmt" param const.
> * tree-vrp.c (class check_array_bounds_dom_walker): New subclass
> of dom_walker.
> (vrp_prop::check_all_array_refs): Reimplement as...
> (check_array_bounds_dom_walker::before_dom_children): ...this new
> vfunc.  Replace linear search through BB block list, excluding
> those with non-executable in-edges via dominator walk.
>
> gcc/testsuite/ChangeLog:
> PR tree-optimization/83312
> * gcc.dg/pr83312.c: New test case.
> ---
>  gcc/domwalk.h  |  2 +-
>  gcc/testsuite/gcc.dg/pr83312.c | 30 +
>  gcc/tree-cfg.c | 59 +---
>  gcc/tree-vrp.c | 76 
> ++
>  4 files changed, 117 insertions(+), 50 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr83312.c
>
> diff --git a/gcc/domwalk.h b/gcc/domwalk.h
> index 6ac93eb..c7e3450 100644
> --- a/gcc/domwalk.h
> +++ b/gcc/domwalk.h
> @@ -32,7 +32,7 @@ class dom_walker
>  public:
>static const edge STOP;
>
> -  /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
> +  /* Use SKIP_UNREACHABLE_BLOCKS = true when your client can discover
>   that some edges are not executable.
>
>   If a client can discover that a COND, SWITCH or GOTO has a static
> diff --git a/gcc/testsuite/gcc.dg/pr83312.c b/gcc/testsuite/gcc.dg/pr83312.c
> new file mode 100644
> index 000..2eb241d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr83312.c
> @@ -0,0 +1,30 @@
> +/* { dg-options "-O2 -Warray-bounds" } */
> +
> +struct ptlrpcd_ctl {
> +  char pc_name[20];
> +};
> +struct ptlrpcd {
> +  struct ptlrpcd_ctl pd_threads[6];
> +};
> +struct ptlrpcd *ptlrpcd_init_pd;
> +static void ptlrpcd_ctl_init(struct ptlrpcd_ctl *pc, int index) {
> +  if (index < 0)
> +__builtin_snprintf(pc->pc_name, sizeof(pc->pc_name), "ptlrpcd_rcv");
> +  else
> +__builtin_snprintf(pc->pc_name, sizeof(pc->pc_name), "ptlrpcd_%d", 
> index);
> +}
> +int ptlrpcd_init_ncpts;
> +static int ptlrpcd_init(int nthreads) {
> +  int j;
> +  if (ptlrpcd_init_ncpts) {
> +ptlrpcd_ctl_init(&ptlrpcd_init_pd->pd_threads[0], -1);
> +for (j = 1; j < nthreads; j++)
> +  ptlrpcd_ctl_init(&ptlrpcd_init_pd->pd_threads[j], j);
> +  }
> +  return 0;
> +}
> +int ptlrpcd_init_groupsize;
> +void ptlrpcd_a

Re: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.

2017-12-14 Thread Tamar Christina
The 12/13/2017 08:49, Christophe Lyon wrote:
> On 12 December 2017 at 18:29, Tamar Christina  wrote:
> > Hi All,
> >
> > The previous test made use of arm_neon.h which made the whole test
> > rather fragile and only applicable to some of the arm targets.
> >
> > So instead I make use of different fpus now to test the generation of
> > fmla instructions. The actual instruction itself is not tested as all
> > we care about if that the proper .fpu directives are generated.
> >
> > Regtested on arm-none-eabi and arm-none-linux-gnueabihf
> > with no regressions.
> >
> > Ok for trunk?
> >
> >
> > gcc/testsuite/
> > 2017-12-12  Tamar Christina  
> >
> > PR target/82641
> > * gcc.target/arm/pragma_fpu_attribute.c: New.
> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.

Hi Christophe,

My apologies, I have rebased the patch.
New Changelog:

gcc/testsuite/
2017-12-14  Tamar Christina  

PR target/82641
* gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
no NEON.
* gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.

> >
> Sorry, it seems your patch does not apply against ToT, and
> the ChangeLog looks incorrect (these are not new files)
> 
> Christophe

Thanks,
Tamar

-- 
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
index f47c745855e4acc099afd554838dcf7d031f798c..8191deac25965564a3c78dc08959a5e5637a0066 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
@@ -1,18 +1,19 @@
 /* Test for target attribute assembly extension generations.  */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-additional-options "-std=gnu99 -mfpu=vfpv3-d16" } */
+/* { dg-additional-options "-std=gnu99" } */
 
 #include 
-#include 
 
-extern uint32_t bar();
+extern uint32_t bar ();
 
-__attribute__((target("fpu=crypto-neon-fp-armv8"))) poly64x1_t vsricw(poly64x1_t crc, uint32_t val)
+#pragma GCC target("fpu=vfpv3-d16")
+
+extern float fmaf (float, float, float);
+
+float
+__attribute__((target("fpu=vfpv4"))) vfma32 (float x, float y, float z)
 {
-poly64x1_t res;
-asm("vsri %0, %1, %2" : "=r"(res) : "r"(crc), "r"(val));
-return res;
+  return fmaf (x, y, z);
 }
 
 uint32_t restored ()
@@ -20,5 +21,5 @@ uint32_t restored ()
   return bar();
 }
 
-/* { dg-final { scan-assembler-times {\.fpu\s+crypto-neon-fp-armv8} 1 } } */
+/* { dg-final { scan-assembler-times {\.fpu\s+vfpv4} 1 } } */
 /* { dg-final { scan-assembler-times {\.fpu\s+vfpv3-d16} 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
index f23fd83779e57e48c0035b6688a21850d12cb4ab..b50d4107b56ed7abd8b95fd2a3a08df8ab54410a 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
@@ -1,20 +1,22 @@
 /* Test for #pragma assembly extension generations.  */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-additional-options "-std=gnu99 -mfpu=vfpv3-d16" } */
+/* { dg-additional-options "-std=gnu99" } */
 
 #include 
 #include 
 
+#pragma GCC target("fpu=vfpv3-d16")
+
 extern uint32_t bar();
 
 #pragma GCC push_options
-#pragma GCC target("fpu=crypto-neon-fp-armv8")
-poly64x1_t vsricw(poly64x1_t crc, uint32_t val)
+#pragma GCC target("fpu=vfpv4")
+extern float fmaf (float, float, float);
+
+float
+vfma32 (float x, float y, float z)
 {
-poly64x1_t res;
-asm("vsri %0, %1, %2" : "=r"(res) : "r"(crc), "r"(val));
-return res;
+  return fmaf (x, y, z);
 }
 #pragma GCC pop_options
 
@@ -23,5 +25,5 @@ uint32_t restored ()
   return bar();
 }
 
-/* { dg-final { scan-assembler-times {\.fpu\s+crypto-neon-fp-armv8} 1 } } */
+/* { dg-final { scan-assembler-times {\.fpu\s+vfpv4} 1 } } */
 /* { dg-final { scan-assembler-times {\.fpu\s+vfpv3-d16} 1 } } */



Re: Support for aliasing with variable strides

2017-12-14 Thread Richard Biener
On Fri, Nov 17, 2017 at 11:17 PM, Richard Sandiford
 wrote:
> This patch adds runtime alias checks for loops with variable strides,
> so that we can vectorise them even without a restrict qualifier.
> There are several parts to doing this:
>
> 1) For accesses like:
>
>  x[i * n] += 1;
>
>we need to check whether n (and thus the DR_STEP) is nonzero.
>vect_analyze_data_ref_dependence records values that need to be
>checked in this way, then prune_runtime_alias_test_list records a
>bounds check on DR_STEP being outside the range [0, 0].
>
> 2) For accesses like:
>
>  x[i * n] = x[i * n + 1] + 1;
>
>we simply need to test whether abs (n) >= 2.
>prune_runtime_alias_test_list looks for cases like this and tries
>to guess whether it is better to use this kind of check or a check
>for non-overlapping ranges.  (We could do an OR of the two conditions
>at runtime, but that isn't implemented yet.)
>
> 3) Checks for overlapping ranges need to cope with variable strides.
>At present the "length" of each segment in a range check is
>represented as an offset from the base that lies outside the
>touched range, in the same direction as DR_STEP.  The length
>can therefore be negative and is sometimes conservative.
>
>With variable steps it's easier to reaon about if we split
>this into two:
>
>  seg_len:
>distance travelled from the first iteration of interest
>to the last, e.g. DR_STEP * (VF - 1)
>
>  access_size:
>the number of bytes accessed in each iteration
>
>with access_size always being a positive constant and seg_len
>possibly being variable.  We can then combine alias checks
>for two accesses that are a constant number of bytes apart by
>adjusting the access size to account for the gap.  This leaves
>the segment length unchanged, which allows the check to be combined
>with further accesses.
>
>When seg_len is positive, the runtime alias check has the form:
>
> base_a >= base_b + seg_len_b + access_size_b
>  || base_b >= base_a + seg_len_a + access_size_a
>
>In many accesses the base will be aligned to the access size, which
>allows us to skip the addition:
>
> base_a > base_b + seg_len_b
>  || base_b > base_a + seg_len_a
>
>A similar saving is possible with "negative" lengths.
>
>The patch therefore tracks the alignment in addition to seg_len
>and access_size.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
> PS. This is the last SVE patch for GCC 8!
>
>
> 2017-11-17  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vectorizer.h (vec_lower_bound): New structure.
> (_loop_vec_info): Add check_nonzero and lower_bounds.
> (LOOP_VINFO_CHECK_NONZERO): New macro.
> (LOOP_VINFO_LOWER_BOUNDS): Likewise.
> (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
> * tree-data-ref.h (dr_with_seg_len): Add access_size and align
> fields.  Make seg_len the distance travelled, not including the
> access size.
> (dr_direction_indicator): Declare.
> (dr_zero_step_indicator): Likewise.
> (dr_known_forward_stride_p): Likewise.
> * tree-data-ref.c: Include stringpool.h, tree-vrp.h and
> tree-ssanames.h.
> (runtime_alias_check_p): Allow runtime alias checks with
> variable strides.
> (operator ==): Compare access_size and align.
> (prune_runtime_alias_test_list): Rework for new distinction between
> the access_size and seg_len.
> (create_intersect_range_checks_index): Likewise.  Cope with polynomial
> segment lengths.
> (get_segment_min_max): New function.
> (create_intersect_range_checks): Use it.
> (dr_step_indicator): New function.
> (dr_direction_indicator): Likewise.
> (dr_zero_step_indicator): Likewise.
> (dr_known_forward_stride_p): Likewise.
> * tree-loop-distribution.c (data_ref_segment_size): Return
> DR_STEP * (niters - 1).
> (compute_alias_check_pairs): Update call to the dr_with_seg_len
> constructor.
> * tree-vect-data-refs.c (vect_check_nonzero_value): New function.
> (vect_preserves_scalar_order_p): New function, split out from...
> (vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
> (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
> (vect_vfa_access_size): New function.
> (vect_vfa_align): Likewise.
> (vect_compile_time_alias): Take access_size_a and access_b arguments.
> (dump_lower_bound): New function.
> (vect_check_lower_bound): Likewise.
> (vect_small_gap_p): Likewise.
> (vectorizable_with_step_bound_p): Likewise.
> (vect_pr

Re: [SFN] Bootstrap broken

2017-12-14 Thread Alexandre Oliva
Jakub, I summed up to you yesterday on IRC what I expand below; this is
just for the public record.

On Dec 13, 2017, Jakub Jelinek  wrote:

> Furthermore, I must say I don't understand why
> can_move_early_debug_stmts should care whether there are any labels in
> dest bb or not.

An earlier attempt tried to move all labels and debug stmts in a single
loop, and started with this early debug setting and inserting at the
block start stmt, and later switched to the preexisting debug setting
and inserting at the after-label stmt.  In the end, I decided this was
trickier and riskier than two separate loops, but failed to simplify the
logic back all the way.

> Though, if
> gsi_stmt (gsi) == gsi_stmt (gsie), then the loop right below it will not
> do anything and nothing cares about can_move_early_debug_stmts afterwards.
> So, in short, can_move_early_debug_stmts is used only if
> gsi_stmt (gsi) != gsi_stmt (gsie), and therefore
> can_move_early_debug_stmts if it is used is can_move_debug_stmts && (1 || 
> ...);

*nod*

> So, can we get rid of can_move_early_debug_stmts altogether and just use
> can_move_debug_stmts in there instead?

Yes.

> Another thing I find risky is that you compute gsie_to so early and don't
> update it.

The point of computing it early was *precisely* to preserve the
insertion point should we have both before-label and after-label debug
stmts to move.  But in the end it doesn't matter: we'll either find the
right spot after moving labels, or we'll be at it if there weren't any
labels to move.


Thanks for the improvements!

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [C++ Patch PING] [C++ Patch] PR 82235 (Copy ctor is not found for copying array of an object when it's marked explicit)

2017-12-14 Thread Paolo Carlini

Hi Jason,

On 13/12/2017 23:27, Jason Merrill wrote:

These two don't match:


+   When initializing a temporary to be bound to the first
+   parameter of a constructor where the parameter is of type



+/* Return true if current_function_decl is a constructor
+   and its first argument is a reference type and it is


The language is talking about the function being called, and 
ref_first_parm_of_constructor is looking at the function we're 
currently in.

Indeed. Thanks Jason for the feedback.

I was going to send an improved patch, among other things using 
CP_DECL_CONTEXT, when I noticed that this issue seems essentially a 
special case of *your* core/899, right? For 899 we already have some 
infrastructure in place and I believe we only have to figure out why we 
don't handle correctly *arrays*, because otherwise we already accept a 
similar testcase involving a plain 'Foo m' member. Please let me know, 
otherwise I'm going to work in this direction!


Thanks again,
Paolo.


Re: [SFN] Bootstrap broken

2017-12-14 Thread Alexandre Oliva
On Dec 13, 2017, Jakub Jelinek  wrote:

> In particular, this testcase is using selective scheduling, therefore
> we turn off -fvar-tracking-assignments, but the debug stmt markers are
> emitted anyway.

*nod*, that much was intended (though I could be convinced to change it ;-)

> -fvar-tracking is still true, so the var-tracking pass does everything it
> normally does (successfully), then the free_cfg pass removes all
> BLOCK_FOR_INSN notes, then some targets in their machine reorg recompute
> those, but sparc apparently doesn't, and finally in final.c:

>   /* Turn debug markers into notes.  */
>   if (!MAY_HAVE_DEBUG_BIND_INSNS && MAY_HAVE_DEBUG_MARKER_INSNS)
> variable_tracking_main ();

> Eeek, this runs all of the var tracking again,

Eeek, indeed ;-)  /me takes a mental note that flag_var_tracking !=
> MAY_HAVE_DEBUG_BIND_INSNS, and underlines it several times ;-D

Thanks for spotting the deeper problem and for fixing it!

I'm glad the patch I posted to fix the shallower one still serves as a
basis for the ongoing attempts to fix one of the remaining ia64 issues.
I'm on it.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: Add an "early rematerialisation" pass

2017-12-14 Thread Richard Biener
On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
 wrote:
> This patch looks for pseudo registers that are live across a call
> and for which no call-preserved hard registers exist.  It then
> recomputes the pseudos as necessary to ensure that they are no
> longer live across a call.  The comment at the head of the file
> describes the approach.
>
> A new target hook selects which modes should be treated in this way.
> By default none are, in which case the pass is skipped very early.
>
> It might also be worth looking for cases like:
>
>C1: R1 := f (...)
>...
>C2: R2 := f (...)
>C3: R1 := C2
>
> and giving the same value number to C1 and C3, effectively treating
> it like:
>
>C1: R1 := f (...)
>...
>C2: R2 := f (...)
>C3: R1 := f (...)
>
> Another (much more expensive) enhancement would be to apply value
> numbering to all pseudo registers (not just rematerialisation
> candidates), so that we can handle things like:
>
>   C1: R1 := f (...R2...)
>   ...
>   C2: R1 := f (...R3...)
>
> where R2 and R3 hold the same value.  But the current pass seems
> to catch the vast majority of cases.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?

Can you tell anything about the complexity of the algorithm?  How
does it relate to what LRA can do?  AFAIK LRA doesn't try to find
any global optimal solution and previous hardreg assignments
may work against it?

That said - I would have expected remat to be done before the
first scheduling pass?  Even before pass_sms (not sure
what pass_live_range_shrinkage does).  Or be integrated
with scheduling and it's register pressure cost model.

Also I would have expected the approach to apply to all modes,
just the cost of spilling is different.  But if you can, say, reduce
register pressure by one by rematerializing a bit-not then that
should be always profitable, no?  postreload-cse will come to
the rescue anyhow.

Disclaimer: didn't look at the pass implementation, RTL isn't my
primary expertise.

Richard.

> Richard
>
>
> 2017-11-17  Richard Sandiford  
>
> gcc/
> * Makefile.in (OBJS): Add early-remat.o.
> * target.def (select_early_remat_modes): New hook.
> * doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_select_early_remat_modes): Declare.
> * targhooks.c (default_select_early_remat_modes): New function.
> * timevar.def (TV_EARLY_REMAT): New timevar.
> * passes.def (pass_early_remat): New pass.
> * tree-pass.h (make_pass_early_remat): Declare.
> * early-remat.c: New file.
> * config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New
> function.
> (TARGET_SELECT_EARLY_REMAT_MODES): Define.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve_spill_1.c: Also test that no predicates
> are spilled.
> * gcc.target/aarch64/sve_spill_2.c: New test.
> * gcc.target/aarch64/sve_spill_3.c: Likewise.
> * gcc.target/aarch64/sve_spill_4.c: Likewise.
> * gcc.target/aarch64/sve_spill_5.c: Likewise.
> * gcc.target/aarch64/sve_spill_6.c: Likewise.
> * gcc.target/aarch64/sve_spill_7.c: Likewise.
>
> Index: gcc/Makefile.in
> ===
> --- gcc/Makefile.in 2017-11-17 15:56:28.653842781 +
> +++ gcc/Makefile.in 2017-11-17 15:56:53.016759921 +
> @@ -1273,6 +1273,7 @@ OBJS = \
> dwarf2asm.o \
> dwarf2cfi.o \
> dwarf2out.o \
> +   early-remat.o \
> emit-rtl.o \
> et-forest.o \
> except.o \
> Index: gcc/target.def
> ===
> --- gcc/target.def  2017-11-17 15:56:45.551761353 +
> +++ gcc/target.def  2017-11-17 15:56:53.022759920 +
> @@ -5531,6 +5531,19 @@ reload from using some alternatives, lik
>   default_preferred_output_reload_class)
>
>  DEFHOOK
> +(select_early_remat_modes,
> + "On some targets, certain modes cannot be held in registers around a\n\
> +standard ABI call and are relatively expensive to spill to the stack.\n\
> +The early rematerialization pass can help in such cases by aggressively\n\
> +recomputing values after calls, so that they don't need to be spilled.\n\
> +\n\
> +This hook returns the set of such modes by setting the associated bits\n\
> +in @var{modes}.  The default implementation selects no modes, which has\n\
> +the effect of disabling the early rematerialization pass.",
> + void, (sbitmap modes),
> + default_select_early_remat_modes)
> +
> +DEFHOOK
>  (class_likely_spilled_p,
>   "A target hook which returns @code{true} if pseudos that have been 
> assigned\n\
>  to registers of class @var{rclass} would likely be spilled because\n\
> Index: gcc/doc/tm.texi.in
> ===
> --- gcc/doc

Re: [PATCH] PR libstdc++/59568 fix error handling for std::complex stream extraction

2017-12-14 Thread Jonathan Wakely

On 13/12/17 18:42 +, Jonathan Wakely wrote:

The bug here is that we called putback even if the initial __is >> __ch
extraction failed and set eofbit, and putback clears the eofbit. I
found a number of other problems though, such as not even trying to
call putback after failing to find the ',' and ')' characters.

I decided to rewrite the function following the proposed resolution
for https://wg21.link/lwg2714 which is a much more precise
specification for much more desirable semantics.

PR libstdc++/59568
* include/std/complex (operator>>): Implement proposed resolution to
LWG 2714. Use putback if and only if a character has been successfully
extracted but isn't a delimiter. Use ctype::widen and traits::eq when
testing if extracted characters match delimiters.
* testsuite/26_numerics/complex/dr2714.cc: New test.

Tested powerpc64le-linux, committed to trunk.


Here's a tweak for the testcase. Tested x86_64-linux, committed to trunk.


commit 4b8abc556e9ae9d4ff76f0b446eadf4a6f216638
Author: Jonathan Wakely 
Date:   Wed Dec 13 19:47:19 2017 +

Improve std::complex test and move to sub-directory

* testsuite/26_numerics/complex/dr2714.cc: Move to ...
* testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc:
... Here. Remove duplicate header and dg-options. Check first invalid
character gets putback. Remove wchar_t test.

diff --git a/libstdc++-v3/testsuite/26_numerics/complex/dr2714.cc b/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
similarity index 92%
rename from libstdc++-v3/testsuite/26_numerics/complex/dr2714.cc
rename to libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
+++ b/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
@@ -15,11 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++98" }
-
 #include 
 #include 
-#include 
 #include 
 
 void
@@ -37,16 +34,6 @@ test01()
 void
 test02()
 {
-  std::wistringstream in(L" ( 2.0 , 0.5 ) ");
-  std::complex c;
-  in >> c;
-  VERIFY( in.good() );
-  VERIFY( c.real() == 2.0 && c.imag() == 0.5 );
-}
-
-void
-test03()
-{
   std::istringstream in;
   std::complex c(-1, -1);
   const std::complex c0 = c;
@@ -55,6 +42,7 @@ test03()
   in >> c;
   VERIFY( in.fail() );
   in.clear();
+  VERIFY( in.get() == 'a' );
 
   in.str(" ( ) ");
   in >> c;
@@ -71,11 +59,10 @@ test03()
   in.str("(b)");
   in >> c;
   VERIFY( in.fail() );
-
   in.clear();
   VERIFY( in.get() == 'b' );
-  in.str("( c)");
 
+  in.str("( c)");
   in >> c;
   VERIFY( in.fail() );
   in.clear();
@@ -121,7 +108,7 @@ test03()
 }
 
 void
-test04()
+test03()
 {
   // PR libstdc++/59568
   std::istringstream in;
@@ -164,5 +151,4 @@ main()
   test01();
   test02();
   test03();
-  test04();
 }


[Ada] Fix discrepancy in annotated vs computed sizes

2017-12-14 Thread Eric Botcazou
This fixes the discrepancy present in annotated vs computed sizes for objects 
either directly subject to an alignment clause or of a type itself subject to 
an alignment clause.  In the former case, the annotated size can be wrong 
whereas the computed size can be wrong in the latter case.

Tested on x86_64-suse-linux, applied on the mainline.


2017-12-14  Eric Botcazou  

* gcc-interface/gigi.h (pad_type_has_rm_size): Declare.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Do notbuild
a padding type for the alignment before validating the size.
Flip conditional construct and add a comment.
* gcc-interface/trans.c (Attribute_to_gnu) : Make sure to
apply the exception for padded objects to the type of the object.
* gcc-interface/utils.c (hash_pad_type): New static function.
(lookup_and_insert_pad_type): Rename into...
(canonicalize_pad_type): ...this.  Call hash_pad_type, do only one
lookup with insertion and always return the canonical type.
(maybe_pad_type): Adjust to above changes.  Set debug type later.
(pad_type_has_rm_size): New predicate.
(set_reverse_storage_order_on_pad_type): Adjust to above changes.


2017-12-14  Eric Botcazou  

* gnat.dg/alignment11.adb: New test.
* gnat.dg/alignment12.adb: Likewise.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 255622)
+++ gcc-interface/decl.c	(working copy)
@@ -713,48 +713,20 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  }
 
 	/* If an alignment is specified, use it if valid.  Note that exceptions
-	   are objects but don't have an alignment.  We must do this before we
-	   validate the size, since the alignment can affect the size.  */
-	if (kind != E_Exception && Known_Alignment (gnat_entity))
-	  {
-	gcc_assert (Present (Alignment (gnat_entity)));
-
-	align = validate_alignment (Alignment (gnat_entity), gnat_entity,
-	TYPE_ALIGN (gnu_type));
-
-	/* No point in changing the type if there is an address clause
-	   as the final type of the object will be a reference type.  */
-	if (Present (Address_Clause (gnat_entity)))
-	  align = 0;
-	else
-	  {
-		tree orig_type = gnu_type;
-
-		gnu_type
-		  = maybe_pad_type (gnu_type, NULL_TREE, align, gnat_entity,
-false, false, definition, true);
-
-		/* If a padding record was made, declare it now since it will
-		   never be declared otherwise.  This is necessary to ensure
-		   that its subtrees are properly marked.  */
-		if (gnu_type != orig_type && !DECL_P (TYPE_NAME (gnu_type)))
-		  create_type_decl (TYPE_NAME (gnu_type), gnu_type, true,
-debug_info_p, gnat_entity);
-	  }
-	  }
-
-	/* If we are defining the object, see if it has a Size and validate it
-	   if so.  If we are not defining the object and a Size clause applies,
-	   simply retrieve the value.  We don't want to ignore the clause and
-	   it is expected to have been validated already.  Then get the new
-	   type, if any.  */
-	if (definition)
-	  gnu_size = validate_size (Esize (gnat_entity), gnu_type,
-gnat_entity, VAR_DECL, false,
-Has_Size_Clause (gnat_entity));
-	else if (Has_Size_Clause (gnat_entity))
-	  gnu_size = UI_To_gnu (Esize (gnat_entity), bitsizetype);
+	   are objects but don't have an alignment and there is also no point in
+	   setting it for an address clause, since the final type of the object
+	   will be a reference type.  */
+	if (Known_Alignment (gnat_entity)
+	&& kind != E_Exception
+	&& No (Address_Clause (gnat_entity)))
+	  align = validate_alignment (Alignment (gnat_entity), gnat_entity,
+  TYPE_ALIGN (gnu_type));
 
+	/* Likewise, if a size is specified, use it if valid.  */
+	if (Known_Esize (gnat_entity) && No (Address_Clause (gnat_entity)))
+	  gnu_size
+	= validate_size (Esize (gnat_entity), gnu_type, gnat_entity,
+			 VAR_DECL, false, Has_Size_Clause (gnat_entity));
 	if (gnu_size)
 	  {
 	gnu_type
@@ -4580,15 +4552,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  gnu_type = change_qualified_type (gnu_type, quals);
 	}
 
-  if (!gnu_decl)
-	gnu_decl = create_type_decl (gnu_entity_name, gnu_type,
- artificial_p, debug_info_p,
- gnat_entity);
-  else
+  /* If we already made a decl, just set the type, otherwise create it.  */
+  if (gnu_decl)
 	{
 	  TREE_TYPE (gnu_decl) = gnu_type;
 	  TYPE_STUB_DECL (gnu_type) = gnu_decl;
 	}
+  else
+	gnu_decl = create_type_decl (gnu_entity_name, gnu_type, artificial_p,
+ debug_info_p, gnat_entity);
 }
 
   /* If we got a type that is not dummy, back-annotate the alignment of the
Index: gcc-interface/gigi.h
===
--- gcc-interface/gigi.h	(revision 255622)
+++ gcc-interface/gigi.h	(working copy)
@@ -151,6 +151,9 @@ extern tree maybe_pad_type (tree type, t

Re: [PATCH] PR libstdc++/59568 fix error handling for std::complex stream extraction

2017-12-14 Thread Jonathan Wakely

On 14/12/17 11:28 +, Jonathan Wakely wrote:

On 13/12/17 18:42 +, Jonathan Wakely wrote:

The bug here is that we called putback even if the initial __is >> __ch
extraction failed and set eofbit, and putback clears the eofbit. I
found a number of other problems though, such as not even trying to
call putback after failing to find the ',' and ')' characters.

I decided to rewrite the function following the proposed resolution
for https://wg21.link/lwg2714 which is a much more precise
specification for much more desirable semantics.

PR libstdc++/59568
* include/std/complex (operator>>): Implement proposed resolution to
LWG 2714. Use putback if and only if a character has been successfully
extracted but isn't a delimiter. Use ctype::widen and traits::eq when
testing if extracted characters match delimiters.
* testsuite/26_numerics/complex/dr2714.cc: New test.

Tested powerpc64le-linux, committed to trunk.


Here's a tweak for the testcase. Tested x86_64-linux, committed to trunk.


And an extra test for whitespace handling. Committed to trunk.


commit fdcb019de0eab2e6b3d0844422e8984bf5d95622
Author: Jonathan Wakely 
Date:   Thu Dec 14 11:41:43 2017 +

Test whitespace handling in std::complex extraction

* testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc:
Add tests using noskipws.

diff --git a/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc b/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
index 17fb8a249d9..952c52f4a2b 100644
--- a/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
+++ b/libstdc++-v3/testsuite/26_numerics/complex/inserters_extractors/char/dr2714.cc
@@ -145,10 +145,38 @@ test03()
   in.clear();
 }
 
+void
+test04()
+{
+  // Test noskipws handling
+  std::istringstream in;
+  const char* bad_inputs[] = {
+" 1", " (2)", "( 2)", "(2 )", "(2 ,3)", "(2,3 )", 0
+  };
+  const std::complex c0(-1, -1);
+  std::complex c;
+  for (int i = 0; bad_inputs[i]; ++i)
+  {
+c = c0;
+in.clear();
+in.str(bad_inputs[i]);
+in >> std::noskipws >> c;
+VERIFY( in.fail() );
+VERIFY( c == c0 );
+
+in.clear();
+in.str(bad_inputs[i]);
+in >> std::skipws >> c;
+VERIFY( !in.fail() );
+VERIFY( c != c0 );
+  }
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }


Re: [SFN+LVU+IEPM v4 1/9] [SFN] adjust RTL insn-walking API

2017-12-14 Thread Alexandre Oliva
On Dec 12, 2017, Alexandre Oliva  wrote:

> On Dec  7, 2017, Jeff Law  wrote:
>> On 11/09/2017 07:34 PM, Alexandre Oliva wrote:
>>> (prev_nonnote_insn_bb, next_nonnote_insn_bb): Remove.
> Thanks, FTR, here it is, as installed:

On Aug 31, 2017, Alexandre Oliva  wrote:

>   (prev_nonnote_insn_bb, next_nonnote_insn_bb): Remove.

[SFN] next/prev_nonnote_insn_bb are no more, even for ports

The patch that added _nondebug to next_ and prev_nonnote_insn_bb
failed to find and adjust uses within config.  Fixed.

Sanity-checked by cross-building both *-elf targets (with binutils and
newlib) on x86_64-linux-gnu.  I'm going ahead and checking it in
shortly.

for  gcc/ChangeLog

PR bootstrap/83396
* config/arc/arc.c (hwloop_optimize): Skip debug insns.
* config/sh/sh-protos.h (sh_find_set_of_reg): Adjust.
* config/sh/sh.c: Skip debug insns besides notes.
* config/sh/sh.md: Likewise.
* config/sh/sh_treg_combine.cc: Likewise.
* config/sh/sync.md: Likewise.
---
 gcc/config/arc/arc.c |2 +-
 gcc/config/sh/sh-protos.h|2 +-
 gcc/config/sh/sh.c   |   10 +-
 gcc/config/sh/sh.md  |   18 +-
 gcc/config/sh/sh_treg_combine.cc |8 
 gcc/config/sh/sync.md|2 +-
 6 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index b8eec10086dd..9974a1f999b5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7499,7 +7499,7 @@ hwloop_optimize (hwloop_info loop)
  && NOTE_KIND (entry_after) != NOTE_INSN_CALL_ARG_LOCATION))
 entry_after = NEXT_INSN (entry_after);
 #endif
-  entry_after = next_nonnote_insn_bb (entry_after);
+  entry_after = next_nonnote_nondebug_insn_bb (entry_after);
 
   gcc_assert (entry_after);
   emit_insn_before (seq, entry_after);
diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index e98030d31bd7..0a83fbe17011 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -122,7 +122,7 @@ struct set_of_reg
 
 /* Given a reg rtx and a start insn, try to find the insn that sets the
specified reg by using the specified insn stepping function, such as 
-   'prev_nonnote_insn_bb'.  When the insn is found, try to extract the rtx
+   'prev_nonnote_nondebug_insn_bb'.  When the insn is found, try to extract 
the rtx
of the reg set.  */
 template  inline set_of_reg
 sh_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 0d7d7bc53ca2..3776415f1589 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -11897,7 +11897,7 @@ sh_is_logical_t_store_expr (rtx op, rtx_insn* insn)
   else
{
  set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
- prev_nonnote_insn_bb);
+ 
prev_nonnote_nondebug_insn_bb);
  if (op_set.set_src == NULL_RTX)
continue;
 
@@ -11929,7 +11929,7 @@ sh_try_omit_signzero_extend (rtx extended_op, rtx_insn* 
insn)
   if (GET_MODE (extended_op) != SImode)
 return NULL_RTX;
 
-  set_of_reg s = sh_find_set_of_reg (extended_op, insn, prev_nonnote_insn_bb);
+  set_of_reg s = sh_find_set_of_reg (extended_op, insn, 
prev_nonnote_nondebug_insn_bb);
   if (s.set_src == NULL_RTX)
 return NULL_RTX;
 
@@ -11966,9 +11966,9 @@ sh_split_movrt_negc_to_movt_xor (rtx_insn* curr_insn, 
rtx operands[])
 return false;
 
   set_of_reg t_before_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
-prev_nonnote_insn_bb);
+prev_nonnote_nondebug_insn_bb);
   set_of_reg t_after_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
-   next_nonnote_insn_bb);
+   next_nonnote_nondebug_insn_bb);
 
   if (t_before_negc.set_rtx != NULL_RTX && t_after_negc.set_rtx != NULL_RTX
   && rtx_equal_p (t_before_negc.set_rtx, t_after_negc.set_rtx)
@@ -12010,7 +12010,7 @@ sh_find_extending_set_of_reg (rtx reg, rtx_insn* 
curr_insn)
  cases, where a zero_extend is followed an (implicit) sign_extend, and it
  fails to see the sign_extend.  */
   sh_extending_set_of_reg result =
-   sh_find_set_of_reg (reg, curr_insn, prev_nonnote_insn_bb, true);
+   sh_find_set_of_reg (reg, curr_insn, prev_nonnote_nondebug_insn_bb, 
true);
 
   if (result.set_src != NULL)
 {
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 840fd922d41e..587af3b25581 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -848,7 +848,7 @@
   /* FIXME: Maybe also search the predecessor basic blocks to catch
  more cases.  */
   set_of_reg op = sh_find_set_of_reg (operands[0], curr_insn,
- prev_nonnote_insn_bb);
+  

Re: [SFN+LVU+IEPM v4 1/9] [SFN] adjust RTL insn-walking API

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 09:55:30AM -0200, Alexandre Oliva wrote:
> for  gcc/ChangeLog
> 
>   PR bootstrap/83396
>   * config/arc/arc.c (hwloop_optimize): Skip debug insns.
>   * config/sh/sh-protos.h (sh_find_set_of_reg): Adjust.
>   * config/sh/sh.c: Skip debug insns besides notes.
>   * config/sh/sh.md: Likewise.
>   * config/sh/sh_treg_combine.cc: Likewise.
>   * config/sh/sync.md: Likewise.

Please fix up formatting.  Otherwise LGTM.

> --- a/gcc/config/sh/sh-protos.h
> +++ b/gcc/config/sh/sh-protos.h
> @@ -122,7 +122,7 @@ struct set_of_reg
>  
>  /* Given a reg rtx and a start insn, try to find the insn that sets the
> specified reg by using the specified insn stepping function, such as 
> -   'prev_nonnote_insn_bb'.  When the insn is found, try to extract the rtx
> +   'prev_nonnote_nondebug_insn_bb'.  When the insn is found, try to extract 
> the rtx

Too long line.

> of the reg set.  */
>  template  inline set_of_reg
>  sh_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
> diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
> index 0d7d7bc53ca2..3776415f1589 100644
> --- a/gcc/config/sh/sh.c
> +++ b/gcc/config/sh/sh.c
> @@ -11897,7 +11897,7 @@ sh_is_logical_t_store_expr (rtx op, rtx_insn* insn)
>else
>   {
> set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
> -   prev_nonnote_insn_bb);
> +   
> prev_nonnote_nondebug_insn_bb);

Likewise.  Just do:
- set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
- prev_nonnote_insn_bb);
+ set_of_reg op_set
+   = sh_find_set_of_reg (ops[i], insn, prev_nonnote_nondebug_insn_bb);

> @@ -11929,7 +11929,7 @@ sh_try_omit_signzero_extend (rtx extended_op, 
> rtx_insn* insn)
>if (GET_MODE (extended_op) != SImode)
>  return NULL_RTX;
>  
> -  set_of_reg s = sh_find_set_of_reg (extended_op, insn, 
> prev_nonnote_insn_bb);
> +  set_of_reg s = sh_find_set_of_reg (extended_op, insn, 
> prev_nonnote_nondebug_insn_bb);

Likewise.

>if (s.set_src == NULL_RTX)
>  return NULL_RTX;
>  
> @@ -11966,9 +11966,9 @@ sh_split_movrt_negc_to_movt_xor (rtx_insn* curr_insn, 
> rtx operands[])
>  return false;
>  
>set_of_reg t_before_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
> -  prev_nonnote_insn_bb);
> +  prev_nonnote_nondebug_insn_bb);
>set_of_reg t_after_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
> - next_nonnote_insn_bb);
> + next_nonnote_nondebug_insn_bb);

2x further.
>  
>if (t_before_negc.set_rtx != NULL_RTX && t_after_negc.set_rtx != NULL_RTX
>&& rtx_equal_p (t_before_negc.set_rtx, t_after_negc.set_rtx)
> @@ -12010,7 +12010,7 @@ sh_find_extending_set_of_reg (rtx reg, rtx_insn* 
> curr_insn)
>   cases, where a zero_extend is followed an (implicit) sign_extend, and it
>   fails to see the sign_extend.  */
>sh_extending_set_of_reg result =
> - sh_find_set_of_reg (reg, curr_insn, prev_nonnote_insn_bb, true);
> + sh_find_set_of_reg (reg, curr_insn, prev_nonnote_nondebug_insn_bb, 
> true);

Likewise.

> @@ -3106,7 +3106,7 @@
> if (GET_CODE (pat) == SET
> && t_reg_operand (XEXP (pat, 0), SImode)
> && negt_reg_operand (XEXP (pat, 1), SImode))
> -   prev_set_t_insn = prev_nonnote_insn_bb (prev_set_t_insn);
> +   prev_set_t_insn = prev_nonnote_nondebug_insn_bb (prev_set_t_insn);

Likewise.

> @@ -3206,7 +3206,7 @@
> if (GET_CODE (pat) == SET
> && t_reg_operand (XEXP (pat, 0), SImode)
> && negt_reg_operand (XEXP (pat, 1), SImode))
> -   prev_set_t_insn = prev_nonnote_insn_bb (prev_set_t_insn);
> +   prev_set_t_insn = prev_nonnote_nondebug_insn_bb (prev_set_t_insn);

Likewise.

> @@ -750,7 +750,7 @@ sh_treg_combine::record_set_of_reg (rtx reg, rtx_insn 
> *start_insn,
>log_msg ("tracing ccreg\n");
>new_entry.setcc =
> find_set_of_reg_bb (m_ccreg,
> -   prev_nonnote_insn_bb (new_entry.cstore.insn));
> +   prev_nonnote_nondebug_insn_bb 
> (new_entry.cstore.insn));

Likewise.  + = shouldn't be at the end of line.  I'm afraid best will be to
use a temporary here.
>  
>// If cstore was found but setcc was not found continue anyway, as
>// for some of the optimization types the setcc is irrelevant.
> @@ -1353,7 +1353,7 @@ sh_treg_combine::try_optimize_cbranch (rtx_insn *insn)
>//   (set (reg ccreg) (eq (reg) (const_int 0)))
>// The testing insn could also be outside of the current basic block, but
>// for now we limit the search to the current basic block.
> -  trace.setcc = find_set_of_reg_bb (m_cc

RE: [compare-debug] use call loc for nop_endbr

2017-12-14 Thread Tsimbalist, Igor V
> -Original Message-
> From: Alexandre Oliva [mailto:aol...@redhat.com]
> Sent: Wednesday, December 13, 2017 8:34 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tsimbalist, Igor V 
> Subject: [compare-debug] use call loc for nop_endbr
> 
> We skip debug insns and notes after a call that needs a nop_endbr, but
> since a debug insn could be the last in a block, it may affect the loc
> in the emitted nop_endbr insn.  Although this has no effect on
> codegen, it does mess with debug info a bit, and it causes
> -fcompare-debug to fail for e.g. libsanitizer's
> tsan/tsan_platform_linux.cc on x86_64.
> 
> So, pick the location of the call insn for the nop_endbr insn, to
> avoid the line number differences in dumps, including -fcompare-debug
> ones.
> 
> Also, we don't need to determine what the insert point would be unless
> we're actually emitting the nop_endbr insn after the call, so
> rearrange the code to avoid wasting cycles.
> 
> Finally, it seems like testing for barriers is a mistake.  We probably
> never actually pass that test, for the barriers would hit BB_END
> first.  If we did, we'd end up emitting the nop_endbr outside any BB,
> even after the end of the function!  That would be Very Bad (TM).
> Now, since the test as it is can't hurt, I figured I wouldn't change
> the logic right now, just add a comment so that someone involved in
> endbr stuff can have a second look and hopefully fix it.
> 
> I'd appreciate if you'd try to drop the BARRIER_P from the loop test,
> Igor, so as to address the final ??? in the comment I add.  Narrowing
> the skipped notes to only the relevant post-call ones might make sense
> as well, but it's not quite as important IMHO.
> 
> Regstrapping with -fcompare-debug on stage3 host and target builds on
> x86_64- and i686-linux-gnu; ok to install?

Ok from me.

Am I correct the error you had was related to improper location information,
not the placement of the instruction? I will try to skip NOTE insns only.

Igor

> for  gcc/ChangeLog
> 
>   * config/i386/i386.c (rest_of_insert_endbranch): Use call loc
>   for its nop_endbr.
> ---
>  gcc/config/i386/i386.c |   20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e323102cef59..8960b966b7fc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2609,21 +2609,25 @@ rest_of_insert_endbranch (void)
>   {
> if (INSN_P (insn) && GET_CODE (insn) == CALL_INSN)
>   {
> -   rtx_insn *next_insn = insn;
> +   if (find_reg_note (insn, REG_SETJMP, NULL) == NULL)
> + continue;
> +   /* Generate ENDBRANCH after CALL, which can return more than
> +  twice, setjmp-like functions.  */
> 
> +   /* Skip notes and debug insns that must be next to the
> +  call insn.  ??? This might skip a lot more than
> +  that...  ??? Skipping barriers and emitting code
> +  after them surely looks like a mistake; we probably
> +  won't ever hit it, for we'll hit BB_END first.  */
> +   rtx_insn *next_insn = insn;
> while ((next_insn != BB_END (bb))
> && (DEBUG_INSN_P (NEXT_INSN (next_insn))
> || NOTE_P (NEXT_INSN (next_insn))
> || BARRIER_P (NEXT_INSN (next_insn
>   next_insn = NEXT_INSN (next_insn);
> 
> -   /* Generate ENDBRANCH after CALL, which can return more than
> -  twice, setjmp-like functions.  */
> -   if (find_reg_note (insn, REG_SETJMP, NULL) != NULL)
> - {
> -   cet_eb = gen_nop_endbr ();
> -   emit_insn_after (cet_eb, next_insn);
> - }
> +   cet_eb = gen_nop_endbr ();
> +   emit_insn_after_setloc (cet_eb, next_insn, INSN_LOCATION
> (insn));
> continue;
>   }
> 
> 
> --
> Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH GCC]More conservative interchanging small loops with const initialized simple reduction

2017-12-14 Thread Richard Biener
On Tue, Dec 12, 2017 at 1:42 PM, Bin.Cheng  wrote:
> On Fri, Dec 8, 2017 at 2:40 PM, Richard Biener
>  wrote:
>> On Fri, Dec 8, 2017 at 1:43 PM, Bin.Cheng  wrote:
>>> On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener
>>>  wrote:
 On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng  wrote:
> Hi,
> This simple patch makes interchange even more conservative for small 
> loops with constant initialized simple reduction.
> The reason is undoing such reduction introduces new data reference and 
> cond_expr, which could cost too much in a small
> loop.
> Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Is it OK if 
> test passes?

 Shouldn't we do this even for non-constant initialzied simple
 reduction?  Because for any simple
 reduction we add two DRs that are not innermost, for constant
 initialized we add an additional
 cond-expr.  So ...

 +  /* Conservatively skip interchange in cases only have few data 
 references
 + and constant initialized simple reduction since it introduces new 
 data
 + reference as well as ?: operation.  */
 +  if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= 
 datarefs.length ())
 +return false;
 +

 can you, instead of carrying num_const_init_simple_reduc simply loop
 over m_reductions
 and classify them in this function accordingly?  I think we want to
 cost non-constant-init
 reductions as well.  The :? can eventually count for another DR for
 cost purposes.
>>> Number of non-constant-init reductions can still be carried in struct
>>> loop_cand?  I am not very sure what's the advantage of an additional
>>> loop over m_reductions getting the same information.
>>> Perhaps the increase of stmts should be counted like:
>>>   num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs
>>> Question is which number should this be compared against.  (we may
>>> need to shift num_new_inv_drs to the other side for wrapping issue).
>>>

 It looks like we do count the existing DRs for the reduction?  Is that
 why you arrive
 at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:)
>>> Yes.
 But we don't really know whether the DR was invariant in the outer
 loop (well, I suppose
>>> Hmm, I might misunderstand here.  num_old_inv_drs tracks the number of
>>> invariant reference with regarding to inner loop, rather than the
>>> outer loop.  The same to num_new_inv_drs,
>>> which means a reference become invariant after loop interchange with
>>> regarding to (the new) inner loop.  This invariant information is
>>> always known from data reference, right?
>>> As for DRs for reduction, we know it's invariant because we set its
>>> inner loop stride to zero.
>>>
 we could remember the DR in m_reductions).

 Note that the good thing is that the ?: has an invariant condition and
 thus vectorization
 can hoist the mask generation out of the vectorized loop which means
 it boils down to
 cheap operations.  My gut feeling is that just looking at the number
 of memory references
 isn't a good indicator of profitability as the regular stmt workload
 has a big impact on
 profitability of vectorization.
>>> It's not specific to vectorization.  The generated new code also costs
>>> too much in small loops without vectorization.  But yes, # of mem_refs
>>> may be too inaccurate, maybe we should check against num_stmts.
>>
>> Not specific to vectorization but the interchange may pay off only when
>> vectorizing a loop.  Would the loop in loop-interchange-5.c be still
>> interchanged?  If we remove the multiplication and just keep
>> c[i][j] = c[i][j] + b[k][j];
>> ?  That is, why is the constant init so special?  Even for non-constant init
>> we're changing two outer loop DRs to two non-consecutive inner loop DRs.
> Hi Richard,
> This is updated patch taking stmt cost into consideration.
>
> Firstly stmt cost (from # of stmt)
> of loops are recorded.  Then stmt cost of outer loop is adjusted by decreasing
> number of IVs, increasing by the number of constant initialized simple
> reductions.
> Lastly we check stmt cost between inner/outer loops and give up on interchange
> if outer loop has too many stmts.
>
> Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Bootstrap and 
> test
> on x86_64 andAArch64.  Any comment?

This looks good now!

Ok for trunk.

Thanks,
Richard.

> Thanks,
> bin
> 2017-12-12  Bin Cheng  
>
> * gimple-loop-interchange.cc (STMT_COST_RATIO): New macro.
> (loop_cand::m_num_stmts, loop_cand::m_const_init_reduc): New members.
> (loop_cand::loop_cand): Initialize above members.
> (loop_cand::supported_operations): Delete.
> (loop_cand::can_interchange_p): Inline above function.
> (loop_cand::classify_simple_reduction): Record number of constant
> initialized simple reductions.
> (should_interchange_loops): Ne

[testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Tom de Vries

Hi,

this patch introduces an effective target weak_undefined, and uses it in 
test-case visibility-22.c.


I've tested this on trunk for x86_64, and the test still runs and passes.

I've tested this on an internal branch for the gcn target, where it 
makes the test unsupported (instead of timing out due to random code 
execution).


OK for trunk?

Thanks,
- Tom
Require effective target weak_undefined for visibility-22.c

2017-12-14  Tom de Vries  

	* lib/target-supports.exp (check_effective_target_weak_undefined): New
	proc.
	* gcc.dg/visibility-22.c: Require effective target weak_undefined.

---
 gcc/testsuite/gcc.dg/visibility-22.c  | 1 +
 gcc/testsuite/lib/target-supports.exp | 9 +
 2 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/visibility-22.c b/gcc/testsuite/gcc.dg/visibility-22.c
index 52f59be..49a95ce 100644
--- a/gcc/testsuite/gcc.dg/visibility-22.c
+++ b/gcc/testsuite/gcc.dg/visibility-22.c
@@ -1,6 +1,7 @@
 /* PR target/32219 */
 /* { dg-do run } */
 /* { dg-require-visibility "" } */
+/* { dg-require-effective-target weak_undefined } */
 /* { dg-options "-O2 -fPIC" { target fpic } } */
 /* This test requires support for undefined weak symbols.  This support
is not available on hppa*-*-hpux*.  The test is skipped rather than
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 9750b7a..227d8f6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -328,6 +328,15 @@ proc check_weak_available { } {
 }
 }
 
+# return 1 if weak undefined symbols are supported.
+
+proc check_effective_target_weak_undefined { } {
+return [check_runtime weak_undefined {
+	extern void foo () __attribute__((weak));
+	int main(void) { if (foo) return 1; return 0; }
+} ""]
+}
+
 ###
 # proc check_weak_override_available { }
 ###


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 02:40:12PM +0100, Tom de Vries wrote:
> --- a/gcc/testsuite/gcc.dg/visibility-22.c
> +++ b/gcc/testsuite/gcc.dg/visibility-22.c
> @@ -1,6 +1,7 @@
>  /* PR target/32219 */
>  /* { dg-do run } */
>  /* { dg-require-visibility "" } */
> +/* { dg-require-effective-target weak_undefined } */
>  /* { dg-options "-O2 -fPIC" { target fpic } } */
>  /* This test requires support for undefined weak symbols.  This support
> is not available on hppa*-*-hpux*.  The test is skipped rather than

Shouldn't then the:
/* This test requires support for undefined weak symbols.  This support
   is not available on hppa*-*-hpux*.  The test is skipped rather than
   xfailed to suppress the warning that would otherwise arise.  */
/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
stuff be dropped too?

Jakub


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Tom de Vries

On 12/14/2017 02:47 PM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 02:40:12PM +0100, Tom de Vries wrote:

--- a/gcc/testsuite/gcc.dg/visibility-22.c
+++ b/gcc/testsuite/gcc.dg/visibility-22.c
@@ -1,6 +1,7 @@
  /* PR target/32219 */
  /* { dg-do run } */
  /* { dg-require-visibility "" } */
+/* { dg-require-effective-target weak_undefined } */
  /* { dg-options "-O2 -fPIC" { target fpic } } */
  /* This test requires support for undefined weak symbols.  This support
 is not available on hppa*-*-hpux*.  The test is skipped rather than


Shouldn't then the:
/* This test requires support for undefined weak symbols.  This support
is not available on hppa*-*-hpux*.  The test is skipped rather than
xfailed to suppress the warning that would otherwise arise.  */
/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
stuff be dropped too?


I don't know whether the new effective target test will fail for each of 
these 3 targets. But the warning mentioned for hppa*-*-hpux* will make 
the effective target test fail, so I think that one can be removed.


Thanks,
- Tom


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 03:09:02PM +0100, Tom de Vries wrote:
> On 12/14/2017 02:47 PM, Jakub Jelinek wrote:
> > On Thu, Dec 14, 2017 at 02:40:12PM +0100, Tom de Vries wrote:
> > > --- a/gcc/testsuite/gcc.dg/visibility-22.c
> > > +++ b/gcc/testsuite/gcc.dg/visibility-22.c
> > > @@ -1,6 +1,7 @@
> > >   /* PR target/32219 */
> > >   /* { dg-do run } */
> > >   /* { dg-require-visibility "" } */
> > > +/* { dg-require-effective-target weak_undefined } */
> > >   /* { dg-options "-O2 -fPIC" { target fpic } } */
> > >   /* This test requires support for undefined weak symbols.  This support
> > >  is not available on hppa*-*-hpux*.  The test is skipped rather than
> > 
> > Shouldn't then the:
> > /* This test requires support for undefined weak symbols.  This support
> > is not available on hppa*-*-hpux*.  The test is skipped rather than
> > xfailed to suppress the warning that would otherwise arise.  */
> > /* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
> > stuff be dropped too?
> 
> I don't know whether the new effective target test will fail for each of
> these 3 targets. But the warning mentioned for hppa*-*-hpux* will make the
> effective target test fail, so I think that one can be removed.

Or you can remove all 3, and if unsure, just add those to the weak_undefined
effective target (return 0 for them).  And ask target maintainers to verify
and perhaps remove.

Jakub


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Rainer Orth
Hi Tom,

> this patch introduces an effective target weak_undefined, and uses it in
> test-case visibility-22.c.
>
> I've tested this on trunk for x86_64, and the test still runs and passes.
>
> I've tested this on an internal branch for the gcn target, where it makes
> the test unsupported (instead of timing out due to random code execution).
>
> OK for trunk?

I like the idea, thanks for doing this.

> Require effective target weak_undefined for visibility-22.c
>
> 2017-12-14  Tom de Vries  
>
>   * lib/target-supports.exp (check_effective_target_weak_undefined): New
>   proc.
>   * gcc.dg/visibility-22.c: Require effective target weak_undefined.

Ok with documentation added to sourcebuild.texi and the dg-skip-if
removed.  If need be, the proc can still be refined.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Rainer Orth
Hi Jakub,

> On Thu, Dec 14, 2017 at 03:09:02PM +0100, Tom de Vries wrote:
>> On 12/14/2017 02:47 PM, Jakub Jelinek wrote:
>> > On Thu, Dec 14, 2017 at 02:40:12PM +0100, Tom de Vries wrote:
>> > > --- a/gcc/testsuite/gcc.dg/visibility-22.c
>> > > +++ b/gcc/testsuite/gcc.dg/visibility-22.c
>> > > @@ -1,6 +1,7 @@
>> > >   /* PR target/32219 */
>> > >   /* { dg-do run } */
>> > >   /* { dg-require-visibility "" } */
>> > > +/* { dg-require-effective-target weak_undefined } */
>> > >   /* { dg-options "-O2 -fPIC" { target fpic } } */
>> > >   /* This test requires support for undefined weak symbols.  This support
>> > >  is not available on hppa*-*-hpux*.  The test is skipped rather than
>> > 
>> > Shouldn't then the:
>> > /* This test requires support for undefined weak symbols.  This support
>> > is not available on hppa*-*-hpux*.  The test is skipped rather than
>> > xfailed to suppress the warning that would otherwise arise.  */
>> > /* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
>> > stuff be dropped too?
>> 
>> I don't know whether the new effective target test will fail for each of
>> these 3 targets. But the warning mentioned for hppa*-*-hpux* will make the
>> effective target test fail, so I think that one can be removed.
>
> Or you can remove all 3, and if unsure, just add those to the weak_undefined
> effective target (return 0 for them).  And ask target maintainers to verify
> and perhaps remove.

I'd do it the other way round: remove dg-skip-if completely and ping the
target maintainers to check (and eventually improve) the proc.
Otherwise those (probably unnecessary) special cases tend to stay
around forever ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [SFN] Bootstrap broken

2017-12-14 Thread Andreas Schwab
This fixes the m68k ICE.

Andreas.

PR bootstrap/83396
* reload1.c (emit_input_reload_insns): Skip debug markers.

---
 gcc/reload1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/reload1.c b/gcc/reload1.c
index fe1ec0d011..baedc43b75 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -7345,12 +7345,12 @@ emit_input_reload_insns (struct insn_chain *chain, 
struct reload *rl,
 
  /* Adjust any debug insns between temp and insn.  */
  while ((temp = NEXT_INSN (temp)) != insn)
-   if (DEBUG_INSN_P (temp))
+   if (DEBUG_BIND_INSN_P (temp))
  INSN_VAR_LOCATION_LOC (temp)
= simplify_replace_rtx (INSN_VAR_LOCATION_LOC (temp),
old, reloadreg);
else
- gcc_assert (NOTE_P (temp));
+ gcc_assert (DEBUG_INSN_P (temp) || NOTE_P (temp));
}
  else
{
-- 
2.15.1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [SFN] Bootstrap broken

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 03:41:24PM +0100, Andreas Schwab wrote:
> This fixes the m68k ICE.
> 
> Andreas.
> 
>   PR bootstrap/83396
>   * reload1.c (emit_input_reload_insns): Skip debug markers.

This is ok for trunk, thanks.

> --- a/gcc/reload1.c
> +++ b/gcc/reload1.c
> @@ -7345,12 +7345,12 @@ emit_input_reload_insns (struct insn_chain *chain, 
> struct reload *rl,
>  
> /* Adjust any debug insns between temp and insn.  */
> while ((temp = NEXT_INSN (temp)) != insn)
> - if (DEBUG_INSN_P (temp))
> + if (DEBUG_BIND_INSN_P (temp))
> INSN_VAR_LOCATION_LOC (temp)
>   = simplify_replace_rtx (INSN_VAR_LOCATION_LOC (temp),
>   old, reloadreg);
>   else
> -   gcc_assert (NOTE_P (temp));
> +   gcc_assert (DEBUG_INSN_P (temp) || NOTE_P (temp));
>   }
> else
>   {

Jakub


[PATCH] Fix PR67842

2017-12-14 Thread Richard Biener

The PR says just remove the #if 0 code.  Done below, committed as obvious.

Richard.

2017-12-14  Richard Biener  

PR tree-optimization/67842
* sese.h (bb_in_region): Remove #if 0'ed code.

Index: gcc/sese.h
===
--- gcc/sese.h  (revision 255622)
+++ gcc/sese.h  (working copy)
@@ -120,20 +120,6 @@ sese_nb_params (sese_info_p region)
 static inline bool
 bb_in_region (const_basic_block bb, const_basic_block entry, const_basic_block 
exit)
 {
-  /* FIXME: PR67842.  */
-#if 0
-  if (flag_checking)
-{
-  edge e;
-  edge_iterator ei;
-
-  /* Check that there are no edges coming in the region: all the
-predecessors of EXIT are dominated by ENTRY.  */
-  FOR_EACH_EDGE (e, ei, exit->preds)
-   gcc_assert (dominated_by_p (CDI_DOMINATORS, e->src, entry));
-}
-#endif
-
   return dominated_by_p (CDI_DOMINATORS, bb, entry)
 && !(dominated_by_p (CDI_DOMINATORS, bb, exit)
  && !dominated_by_p (CDI_DOMINATORS, entry, exit));


[PATCH] Testcase for fixed PR65258

2017-12-14 Thread Richard Biener

Committed.

Richard.

2017-12-14  Richard Biener  

PR tree-optimization/65258
* gcc.dg/Warray-bounds-23.c: New testcase.

Index: gcc/testsuite/gcc.dg/Warray-bounds-23.c
===
--- gcc/testsuite/gcc.dg/Warray-bounds-23.c (nonexistent)
+++ gcc/testsuite/gcc.dg/Warray-bounds-23.c (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+int main()
+{
+  unsigned i, j, a[10] = {0};
+
+  for (j = 23; j < 25; j++){
+  for (i = j / 8; i --> 0;) a[i] = 0; /* { dg-bogus "array bounds" } */
+  for (i = 1; i --> 0;) __builtin_printf("%u", a[i]);
+  }
+
+  return 0;
+}
+


Re: [testsuite] Require effective target weak_undefined for visibility-22.c

2017-12-14 Thread Tom de Vries

On 12/14/2017 03:18 PM, Rainer Orth wrote:

Hi Jakub,


On Thu, Dec 14, 2017 at 03:09:02PM +0100, Tom de Vries wrote:

On 12/14/2017 02:47 PM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 02:40:12PM +0100, Tom de Vries wrote:

--- a/gcc/testsuite/gcc.dg/visibility-22.c
+++ b/gcc/testsuite/gcc.dg/visibility-22.c
@@ -1,6 +1,7 @@
   /* PR target/32219 */
   /* { dg-do run } */
   /* { dg-require-visibility "" } */
+/* { dg-require-effective-target weak_undefined } */
   /* { dg-options "-O2 -fPIC" { target fpic } } */
   /* This test requires support for undefined weak symbols.  This support
  is not available on hppa*-*-hpux*.  The test is skipped rather than

Shouldn't then the:
/* This test requires support for undefined weak symbols.  This support
 is not available on hppa*-*-hpux*.  The test is skipped rather than
 xfailed to suppress the warning that would otherwise arise.  */
/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
stuff be dropped too?

I don't know whether the new effective target test will fail for each of
these 3 targets. But the warning mentioned for hppa*-*-hpux* will make the
effective target test fail, so I think that one can be removed.

Or you can remove all 3, and if unsure, just add those to the weak_undefined
effective target (return 0 for them).  And ask target maintainers to verify
and perhaps remove.

I'd do it the other way round: remove dg-skip-if completely and ping the
target maintainers to check (and eventually improve) the proc.


Hi,

I've committed patch below which removes this dg-skip-if from 
visibility-22.c and replaces it with a test for a new effective target 
weak_undefined:

...
/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
...

Could you please verify that the test for effective target 
weak_undefined indeed fails for these targets?


Thanks,
- Tom
Require effective target weak_undefined for visibility-22.c

2017-12-14  Tom de Vries  

	* lib/target-supports.exp (check_effective_target_weak_undefined): New
	proc.
	* gcc.dg/visibility-22.c: Require effective target weak_undefined.

	* doc/sourcebuild.texi (Effective-Target Keywords, Other attributes):
	Add item for weak_undefined.

---
 gcc/doc/sourcebuild.texi  | 3 +++
 gcc/testsuite/gcc.dg/visibility-22.c  | 5 +
 gcc/testsuite/lib/target-supports.exp | 9 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 60b6b77..04e18df 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2238,6 +2238,9 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 @item comdat_group
 Target uses comdat groups.
 
+@item weak_undefined
+Target supports weak undefined symbols.
+
 @item word_mode_no_slow_unalign
 Target does not have slow unaligned access when doing word size accesses.
 @end table
diff --git a/gcc/testsuite/gcc.dg/visibility-22.c b/gcc/testsuite/gcc.dg/visibility-22.c
index 5e8cdad..e2b78d1 100644
--- a/gcc/testsuite/gcc.dg/visibility-22.c
+++ b/gcc/testsuite/gcc.dg/visibility-22.c
@@ -1,11 +1,8 @@
 /* PR target/32219 */
 /* { dg-do run } */
 /* { dg-require-visibility "" } */
+/* { dg-require-effective-target weak_undefined } */
 /* { dg-options "-O2 -fPIC" { target fpic } } */
-/* This test requires support for undefined weak symbols.  This support
-   is not available on hppa*-*-hpux*.  The test is skipped rather than
-   xfailed to suppress the warning that would otherwise arise.  */
-/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "*-*-darwin*" } }  */
 
 extern void foo () __attribute__((weak,visibility("hidden")));
 int
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ff8c805..114c1f1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -328,6 +328,15 @@ proc check_weak_available { } {
 }
 }
 
+# return 1 if weak undefined symbols are supported.
+
+proc check_effective_target_weak_undefined { } {
+return [check_runtime weak_undefined {
+	extern void foo () __attribute__((weak));
+	int main (void) { if (foo) return 1; return 0; }
+} ""]
+}
+
 ###
 # proc check_weak_override_available { }
 ###


[PATCH] Testcase for PR66974

2017-12-14 Thread Richard Biener

Committed.

2017-12-14  Richard Biener  

PR tree-optimization/66974
* gcc.dg/Warray-bounds-24.c: New testcase.

Index: gcc/testsuite/gcc.dg/Warray-bounds-24.c
===
--- gcc/testsuite/gcc.dg/Warray-bounds-24.c (nonexistent)
+++ gcc/testsuite/gcc.dg/Warray-bounds-24.c (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Warray-bounds" } */
+
+int foo(unsigned order)
+{
+  int c[3] = {1, 2, 3};
+  unsigned i, j;
+  for (i = 1; i < order; i++) {
+  for (j = 0; j < i / 2; j++) {
+ c[j] += c[i] * c[i-j-1]; /* { dg-bogus "array bounds" } */
+ c[i-j-1] += c[i] * c[j]; /* { dg-bogus "array bounds" } */
+  }
+  }
+  return c[0];
+}


[committed] Testcase for PR lto/81406

2017-12-14 Thread Jakub Jelinek
Hi!

This got fixed by r251220.  Testcase tested on x86_64-linux,
trunk as well as r251218 (where it FAILs), committed to trunk as obvious.

2017-12-14  Jakub Jelinek  

PR lto/81406
* gcc.dg/lto/pr81406_0.c: New test.

--- gcc/testsuite/gcc.dg/lto/pr81406_0.c.jj 2017-12-14 16:20:44.867849609 
+0100
+++ gcc/testsuite/gcc.dg/lto/pr81406_0.c2017-12-14 16:17:34.0 
+0100
@@ -0,0 +1,20 @@
+/* PR lto/81406 */
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O2 -g -flto } } } */
+/* { dg-extra-ld-options { -g -r -nostdlib } } */
+
+int a;
+int *foo (void);
+
+static inline int __attribute__ ((__artificial__))
+bar (void)
+{
+  if (a)
+*foo () = 2;
+}
+
+void *
+baz (void)
+{
+  return (void *) bar;
+}

Jakub


Re: [PATCH] Fix PR83418

2017-12-14 Thread Jeff Law
On 12/14/2017 01:54 AM, Richard Biener wrote:
> 
> IVOPTs (at least) leaves unfolded stmts in the IL and VRP overzealously
> asserts they cannot happen.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2017-12-14  Richard Biener  
> 
>   PR tree-optimization/83418
>   * vr-values.c (vr_values::extract_range_for_var_from_comparison_expr):
>   Instead of asserting we don't get unfolded comparisons deal with
>   them.
> 
>   * gcc.dg/torture/pr83418.c: New testcase.
I think this also potentially affects dumping.  I've seen the dumper
crash trying to access a INTEGER_CST where we expected to find an
SSA_NAME while iterating over a statement's operands.

I haven't submitted the workaround because I hadn't tracked down the
root cause to verify something deeper isn't wrong.

Jeff


[PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Kyrill Tkachov

Hi all,

This patch adds an -mverbose-cost-dump option, similar to the one in 
aarch64.
It makes the RTX cost dump print the RTX we're costing in the backend, 
as well as its cost.

This can be distracting in other cost-related RTL dumps like combine's.

So now we don't dump the backend information by default, but provide the 
power-user option -mverbose-cost-dump

to enable the old verbose dumping.

This option is for GCC developers debugging the compiler only, so no 
documentation are added.

It's also trivially simple in functionality so no test is added either.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committed to trunk.

Thanks,
Kyrill

2017-12-14  Kyrylo Tkachov  

* config/arm/arm.opt (mverbose-cost-dump): New option.
* config/arm/arm.c (arm_rtx_costs): Use it.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f34b6e0..9b29a0a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11068,7 +11068,7 @@ arm_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 current_tune->insn_extra_cost,
 total, speed);
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  if (dump_file && arm_verbose_cost)
 {
   print_rtl_single (dump_file, x);
   fprintf (dump_file, "\n%s cost: %d (%s)\n", speed ? "Hot" : "Cold",
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 6060516..48f64588 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -254,6 +254,10 @@ mvectorize-with-neon-double
 Target Report RejectNegative Mask(NEON_VECTORIZE_DOUBLE)
 Use Neon double-word (rather than quad-word) registers for vectorization.
 
+mverbose-cost-dump
+Common Undocumented Var(arm_verbose_cost) Init(0)
+Enable more verbose RTX cost dumps during debug.  For GCC developers use only.
+
 mword-relocations
 Target Report Var(target_word_relocations) Init(TARGET_DEFAULT_WORD_RELOCATIONS)
 Only generate absolute relocations on word sized values.


Re: Add an "early rematerialisation" pass

2017-12-14 Thread Jeff Law
On 12/14/2017 04:09 AM, Richard Biener wrote:
> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>  wrote:
>> This patch looks for pseudo registers that are live across a call
>> and for which no call-preserved hard registers exist.  It then
>> recomputes the pseudos as necessary to ensure that they are no
>> longer live across a call.  The comment at the head of the file
>> describes the approach.
>>
>> A new target hook selects which modes should be treated in this way.
>> By default none are, in which case the pass is skipped very early.
>>
>> It might also be worth looking for cases like:
>>
>>C1: R1 := f (...)
>>...
>>C2: R2 := f (...)
>>C3: R1 := C2
>>
>> and giving the same value number to C1 and C3, effectively treating
>> it like:
>>
>>C1: R1 := f (...)
>>...
>>C2: R2 := f (...)
>>C3: R1 := f (...)
>>
>> Another (much more expensive) enhancement would be to apply value
>> numbering to all pseudo registers (not just rematerialisation
>> candidates), so that we can handle things like:
>>
>>   C1: R1 := f (...R2...)
>>   ...
>>   C2: R1 := f (...R3...)
>>
>> where R2 and R3 hold the same value.  But the current pass seems
>> to catch the vast majority of cases.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
> 
> Can you tell anything about the complexity of the algorithm?  How
> does it relate to what LRA can do?  AFAIK LRA doesn't try to find
> any global optimal solution and previous hardreg assignments
> may work against it?
> 
> That said - I would have expected remat to be done before the
> first scheduling pass?  Even before pass_sms (not sure
> what pass_live_range_shrinkage does).  Or be integrated
> with scheduling and it's register pressure cost model.
> 
> Also I would have expected the approach to apply to all modes,
> just the cost of spilling is different.  But if you can, say, reduce
> register pressure by one by rematerializing a bit-not then that
> should be always profitable, no?  postreload-cse will come to
> the rescue anyhow.
> 
> Disclaimer: didn't look at the pass implementation, RTL isn't my
> primary expertise.
I'd just started looking at this yesterday.

At a high level, yes, we could consider doing this on other modes.  It
certainly would have helped Sparc in the past (I have no idea if it
still would).  The basic problem was that all FP registers were call
clobbered, so our FP performance really sucked.

We eventually fixed by dramatically improving the basic caller-saves
costing model, avoid unnecessary saves/restores and save/restore in
wider modes (it tended to use word_mode chunks, even for things like 64
bit FP values).

Something like an aggressive remat pass would likely have helped as well.

Things have changed quite a lot since those days (1990-ish), thankfully.

There's other targets that have register classes which are call
clobbered and very painful to spill -- but they tend not to be heavily
used classes.  I'm thinking of things like loop counter registers, shift
amount registers and the like.

Anyway, hoping to dig further into the remat bits today.  And there's
the fully-predicated loop bits that are still unreviewed as well -- want
to take those Richi?


Jeff


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Martin Sebor

On 12/14/2017 03:43 AM, Richard Biener wrote:

On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:

On 12/12/2017 05:35 PM, Jeff Law wrote:


On 12/12/2017 01:15 PM, Martin Sebor wrote:


Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.


What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?



Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.


There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).


Right, and memcpy dereferences it, so it's undefined.

Martin



[PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Sudakshina Das

Hi

This patch is a follow up on my previous patch with r255536 that was a 
back-port for fixing a wrong code generation

(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
As pointed out by Christophe Lyon 
(https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case 
started to fail on the new dejagnu for arm-none-linux-gnueabi and 
arm-none-eabi.
This patch just removes the dg-add-options from the test case because I 
think dg-options has all that is needed anyway.


Testing: Since I could not reproduce the failure on my machine, 
Christophe would it be possible for you to check if this patch fixes the

regression for you?

Thanks
Sudi
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
index ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff..09adddfd57ca13e831c276aef25621e7340bcfff 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -1,7 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
 /* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" }  */
-/* { dg-add-options arm_v8_2a_fp16_scalar }  */
 
 __fp16
 test_select (__fp16 a, __fp16 b, __fp16 c)


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jeff Law
On 12/12/2017 08:47 PM, Martin Sebor wrote:
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in the realm
>> of undefined behavior at that point (I hope so)?
> 
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
That's what I expected -- I just wanted to be sure there wasn't some
kind of escape clause that would allow one to compose an object of that
nature, then use pfu->x1 to read/write pfu->x2.


> 
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
> 
> is undefined.  _FORTIFY_SOURCE allows it for raw memory functions
> because some low level code copies regions of the same object that
> span two or more members (I've seen an example or two in the Linux
> kernel but, IIRC, nowhere else).  With the patch, using memchr
> would be the only way to get away with it.
Presumably the right way to go about things in this situation is to use
a pointer to the outer object.

> 
> Other than that, the new -Wstringop-truncation warning is designed
> to prevent creating unterminated character arrays and the manual
> suggests using attribute nonstring when it's necessary.  The
> -Wrestrict warning I'm about to check in also complains about
> forming invalid pointers by built-ins with restrict-qualified
> arguments (although it won't diagnose the above).
Right.  The concern is that it's an new option and folks likely haven't
cleaned up their codebases for these kinds of issues.

> 
> Although I would prefer not to, I suppose if letting strlen cross
> the boundaries of subobjects was considered an important use to
> accommodate in limited cases the optimization could be disabled
> for member arrays declared with the new nonstring attribute (while
> still issuing a warning for it as GCC does today).
> 
> Another alternative (if the above use case is considered important
> enough) might be to suppress the optimization for member character
> arrays that are immediately followed by another such array.
History tells us that there will be someone out there that does this
kind of thing -- the question is how pervasive is it.  My suspicion is
that it is not common.

Given that I don't expect those uses to be common, the only thing that
should break is non-conforming code and we have a (new) warning for such
code my inclination is to go forward.

So I'm OK with the patch.  I'd give folks till Monday to chime in with
dissenting opinions.

Jeff


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote:
> > Although I would prefer not to, I suppose if letting strlen cross
> > the boundaries of subobjects was considered an important use to
> > accommodate in limited cases the optimization could be disabled
> > for member arrays declared with the new nonstring attribute (while
> > still issuing a warning for it as GCC does today).
> > 
> > Another alternative (if the above use case is considered important
> > enough) might be to suppress the optimization for member character
> > arrays that are immediately followed by another such array.
> History tells us that there will be someone out there that does this
> kind of thing -- the question is how pervasive is it.  My suspicion is
> that it is not common.
> 
> Given that I don't expect those uses to be common, the only thing that
> should break is non-conforming code and we have a (new) warning for such
> code my inclination is to go forward.
> 
> So I'm OK with the patch.  I'd give folks till Monday to chime in with
> dissenting opinions.

Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.

Jakub


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2017-12-14 Thread Christophe Lyon
2017-12-14 9:29 GMT+01:00 Sameera Deshpande :
> Hi!
>
> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
> vst1_*_x3 intrinsics as defined by Neon document.
>
> Ok for trunk?
>
> - Thanks and regards,
>   Sameera D.
>
> gcc/Changelog:
>
> 2017-11-14  Sameera Deshpande  
>
>
> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> (st1x2): Likewise.
> (st1x3): Likewise.
> * config/aarch64/aarch64-simd.md
> (aarch64_ld1x3): New pattern.
> (aarch64_ld1_x3_): Likewise
> (aarch64_st1x2): Likewise
> (aarch64_st1_x2_): Likewise
> (aarch64_st1x3): Likewise
> (aarch64_st1_x3_): Likewise
> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
> (vld1_s8_x3): Likewise.
> (vld1_u16_x3): Likewise.
> (vld1_s16_x3): Likewise.
> (vld1_u32_x3): Likewise.
> (vld1_s32_x3): Likewise.
> (vld1_u64_x3): Likewise.
> (vld1_s64_x3): Likewise.
> (vld1_fp16_x3): Likewise.
> (vld1_f32_x3): Likewise.
> (vld1_f64_x3): Likewise.
> (vld1_p8_x3): Likewise.
> (vld1_p16_x3): Likewise.
> (vld1_p64_x3): Likewise.
> (vld1q_u8_x3): Likewise.
> (vld1q_s8_x3): Likewise.
> (vld1q_u16_x3): Likewise.
> (vld1q_s16_x3): Likewise.
> (vld1q_u32_x3): Likewise.
> (vld1q_s32_x3): Likewise.
> (vld1q_u64_x3): Likewise.
> (vld1q_s64_x3): Likewise.
> (vld1q_f16_x3): Likewise.
> (vld1q_f32_x3): Likewise.
> (vld1q_f64_x3): Likewise.
> (vld1q_p8_x3): Likewise.
> (vld1q_p16_x3): Likewise.
> (vld1q_p64_x3): Likewise.
> (vst1_s64_x2): Likewise.
> (vst1_u64_x2): Likewise.
> (vst1_f64_x2): Likewise.
> (vst1_s8_x2): Likewise.
> (vst1_p8_x2): Likewise.
> (vst1_s16_x2): Likewise.
> (vst1_p16_x2): Likewise.
> (vst1_s32_x2): Likewise.
> (vst1_u8_x2): Likewise.
> (vst1_u16_x2): Likewise.
> (vst1_u32_x2): Likewise.
> (vst1_f16_x2): Likewise.
> (vst1_f32_x2): Likewise.
> (vst1_p64_x2): Likewise.
> (vst1q_s8_x2): Likewise.
> (vst1q_p8_x2): Likewise.
> (vst1q_s16_x2): Likewise.
> (vst1q_p16_x2): Likewise.
> (vst1q_s32_x2): Likewise.
> (vst1q_s64_x2): Likewise.
> (vst1q_u8_x2): Likewise.
> (vst1q_u16_x2): Likewise.
> (vst1q_u32_x2): Likewise.
> (vst1q_u64_x2): Likewise.
> (vst1q_f16_x2): Likewise.
> (vst1q_f32_x2): Likewise.
> (vst1q_f64_x2): Likewise.
> (vst1q_p64_x2): Likewise.
> (vst1_s64_x3): Likewise.
> (vst1_u64_x3): Likewise.
> (vst1_f64_x3): Likewise.
> (vst1_s8_x3): Likewise.
> (vst1_p8_x3): Likewise.
> (vst1_s16_x3): Likewise.
> (vst1_p16_x3): Likewise.
> (vst1_s32_x3): Likewise.
> (vst1_u8_x3): Likewise.
> (vst1_u16_x3): Likewise.
> (vst1_u32_x3): Likewise.
> (vst1_f16_x3): Likewise.
> (vst1_f32_x3): Likewise.
> (vst1_p64_x3): Likewise.
> (vst1q_s8_x3): Likewise.
> (vst1q_p8_x3): Likewise.
> (vst1q_s16_x3): Likewise.
> (vst1q_p16_x3): Likewise.
> (vst1q_s32_x3): Likewise.
> (vst1q_s64_x3): Likewise.
> (vst1q_u8_x3): Likewise.
> (vst1q_u16_x3): Likewise.
> (vst1q_u32_x3): Likewise.
> (vst1q_u64_x3): Likewise.
> (vst1q_f16_x3): Likewise.
> (vst1q_f32_x3): Likewise.
> (vst1q_f64_x3): Likewise.
> (vst1q_p64_x3): Likewise.

Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe


Re: [RFC] Add means to split dump file into several files -- Use in lra

2017-12-14 Thread Vladimir Makarov



On 12/07/2017 09:53 AM, Tom de Vries wrote:

Hi,

I'm currently debugging a problem in lra, and got a bit lost in the 
20k+ lines dump file.


I observed that:
- the lra dump file is one of the biggest ones
- lra itself consists of a looping of sub-passes.

So, I've:
- written a dump infrastructure addition that can be used within a pass
  to mark the end of the current (sub)dump file and start a next
  subdump file.
- used that infrastructure to instrument lra to dump info from
  different subpasses into separate files.

Using this patch I managed to split the reload dump file into smaller 
bits:

...
$ wc -l *.reload.*
   3 no-scevccp-outer-10.c.276r.reload
   0 no-scevccp-outer-10.c.276r.reload.001.lra_start
   3 no-scevccp-outer-10.c.276r.reload.002.remove_scratches
    2335 no-scevccp-outer-10.c.276r.reload.003.lra_constraints
    1781 no-scevccp-outer-10.c.276r.reload.004.lra_create_live_ranges
 460 no-scevccp-outer-10.c.276r.reload.005.lra_inheritance
 920 no-scevccp-outer-10.c.276r.reload.006.lra_create_live_ranges
 563 no-scevccp-outer-10.c.276r.reload.007.lra_assign
 184 no-scevccp-outer-10.c.276r.reload.008.lra_undo_inheritance
 830 no-scevccp-outer-10.c.276r.reload.009.lra_create_live_ranges
   3 no-scevccp-outer-10.c.276r.reload.010.lra_coalesce
 165 no-scevccp-outer-10.c.276r.reload.011.lra_constraints
 844 no-scevccp-outer-10.c.276r.reload.012.lra_create_live_ranges
 110 no-scevccp-outer-10.c.276r.reload.013.lra_inheritance
 879 no-scevccp-outer-10.c.276r.reload.014.lra_create_live_ranges
  22 no-scevccp-outer-10.c.276r.reload.015.lra_assign
  74 no-scevccp-outer-10.c.276r.reload.016.lra_undo_inheritance
  19 no-scevccp-outer-10.c.276r.reload.017.lra_constraints
 845 no-scevccp-outer-10.c.276r.reload.018.lra_create_live_ranges
  80 no-scevccp-outer-10.c.276r.reload.019.lra_remat
  27 no-scevccp-outer-10.c.276r.reload.020.lra_spill
 866 no-scevccp-outer-10.c.276r.reload.021.lra_constraints
 830 no-scevccp-outer-10.c.276r.reload.022.lra_create_live_ranges
   0 no-scevccp-outer-10.c.276r.reload.023.lra_inheritance
 830 no-scevccp-outer-10.c.276r.reload.024.lra_create_live_ranges
  53 no-scevccp-outer-10.c.276r.reload.025.lra_assign
   5 no-scevccp-outer-10.c.276r.reload.026.lra_constraints
 370 no-scevccp-outer-10.c.276r.reload.027.lra_finishing
    4137 no-scevccp-outer-10.c.276r.reload.028.lra_end
   0 no-scevccp-outer-10.c.276r.reload.029.lra_start
  27 no-scevccp-outer-10.c.276r.reload.030.remove_scratches
 553 no-scevccp-outer-10.c.276r.reload.031.lra_constraints
 188 no-scevccp-outer-10.c.276r.reload.032.lra_create_live_ranges
   8 no-scevccp-outer-10.c.276r.reload.033.lra_inheritance
 188 no-scevccp-outer-10.c.276r.reload.034.lra_create_live_ranges
  21 no-scevccp-outer-10.c.276r.reload.035.lra_assign
   3 no-scevccp-outer-10.c.276r.reload.036.lra_undo_inheritance
   5 no-scevccp-outer-10.c.276r.reload.037.lra_constraints
  99 no-scevccp-outer-10.c.276r.reload.038.lra_finishing
 515 no-scevccp-outer-10.c.276r.reload.039.lra_end
...

Notes:
- dump info from different functions is not put together
- this is on by default atm. We probably want to enable this only
  using a switch fsplit-dump or some such.
- the lra_end dump files ends with ";; Function ...", which should be in
  the next lra_start dump file. Once we enable this using a switch we
  can probably do better.

Any comments?

Personally for me the size of LRA dump file is not a problem.  What is 
inconvenient is that sometimes I need to see RTL after each subpass.  
Currently I do it by calling debug_rtx_range in gdb after the subpass in 
question.


 So this approach would be useful at least for me in some cases. But I 
think it should be not a default approach because in many other cases 
the current LRA dump file is enough.  Moreover I even prefer to have one 
file for IRA and LRA and frequently use -fira-verbose=1 which 
output both IRA and LRA dumps into stderr.




Re: patch to fix PR82353

2017-12-14 Thread Vladimir Makarov



On 12/13/2017 07:34 AM, Tom de Vries wrote:

On 10/16/2017 10:38 PM, Vladimir Makarov wrote:

This is another version of the patch to fix

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82353

The patch was successfully bootstrapped on x86-64 with Go and Ada.

Committed as rev. 253796.


Hi Vladimir,

AFAIU this bit of the patch makes sure that the flags register show up 
in the bb_livein of the bb in which it's used (and not defined before 
the use), but not in the bb_liveout of the predecessors of that bb.


I wonder if that's a compile-speed optimization, or an oversight.

Hi, Tom.  It was just a minimal fix.  I prefer minimal fixes for LRA 
because even for me it is hard to predict in many cases how the patch 
will affect all the targets.  Therefore many LRA patches have a few 
iterations before to be final.


I remember that I had some serious problems in the past when I tried to 
implement fixed hard reg liveness propagation in LRA.  It was long ago 
so we could try it again.  If you send patch you mentioned to gcc 
mailing list, I'll review and approve it.  But we need to be ready to 
revert it if some problems occur again.




[Ada] Fix misalignment in record with aliased field and rep clause

2017-12-14 Thread Eric Botcazou
This is a regression present on all actives branches: the compiler doesn't 
correctly align a field in a record type if there is a partial representation 
clause on this record type that does not cover the field, but covers another 
field which is aliased (or contains an aliased subcomponent) and whose nominal 
alignment is smaller than that of the former field.

Tested on x86_64-suse-linux, applied on the mainline, 7 & 6 branches.


2017-12-14  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_field): Do not set the alignment
of the enclosing record type if it is not already set.


2017-12-14  Eric Botcazou  

* gnat.dg/alignment13.adb: New test.

-- 
Eric Botcazou-- { dg-do run }
-- { dg-options "-gnatws" }

procedure Alignment13 is

  type Rec is record
I1 : aliased Short_Integer;
I2 : Integer;
  end record;

  for Rec use record
I1 at 0 range 0 .. 15;
  end record;

  R : Rec;

begin
  if R.I2'Bit_Position /= 32 then
raise Program_Error;
  end if;
end;
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 255631)
+++ gcc-interface/decl.c	(working copy)
@@ -6890,7 +6890,8 @@ gnat_to_gnu_field (Entity_Id gnat_field,
 	{
 	  const unsigned int type_align = TYPE_ALIGN (gnu_field_type);
 
-	  if (TYPE_ALIGN (gnu_record_type) < type_align)
+	  if (TYPE_ALIGN (gnu_record_type)
+	  && TYPE_ALIGN (gnu_record_type) < type_align)
 	SET_TYPE_ALIGN (gnu_record_type, type_align);
 
 	  /* If the position is not a multiple of the alignment of the type,


Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Sandra Loosemore

On 12/14/2017 08:53 AM, Kyrill Tkachov wrote:

Hi all,

This patch adds an -mverbose-cost-dump option, similar to the one in 
aarch64.
It makes the RTX cost dump print the RTX we're costing in the backend, 
as well as its cost.

This can be distracting in other cost-related RTL dumps like combine's.

So now we don't dump the backend information by default, but provide the 
power-user option -mverbose-cost-dump

to enable the old verbose dumping.

This option is for GCC developers debugging the compiler only, so no 
documentation are added.


No, it doesn't work that way.  There is a section in invoke.texi "GCC 
Developer Options" that is specifically for options that are only 
interesting to GCC developers debugging the compiler.  Please document 
this new option there.


-Sandra


[Ada] Fix compilation time explosion due to recursive inlining

2017-12-14 Thread Eric Botcazou
This clears DECL_DISREGARD_INLINE_LIMITS on recursive expression functions.

Tested on x86_64-suse-linux, applied on the mainline.


2017-12-14  Eric Botcazou  

* gcc-interface/trans.c (Call_to): Set DECL_DISREGARD_INLINE_LIMITS
to 0 on the callee if the call is recursive.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 255631)
+++ gcc-interface/trans.c	(working copy)
@@ -4306,10 +4306,16 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
   return call_expr;
 }
 
-  /* For a call to a nested function, check the inlining status.  */
-  if (TREE_CODE (gnu_subprog) == FUNCTION_DECL
-  && decl_function_context (gnu_subprog))
-check_inlining_for_nested_subprog (gnu_subprog);
+  if (TREE_CODE (gnu_subprog) == FUNCTION_DECL)
+{
+  /* For a call to a nested function, check the inlining status.  */
+  if (decl_function_context (gnu_subprog))
+	check_inlining_for_nested_subprog (gnu_subprog);
+
+  /* For a recursive call, avoid explosion due to recursive inlining.  */
+  if (gnu_subprog == current_function_decl)
+	DECL_DISREGARD_INLINE_LIMITS (gnu_subprog) = 0;
+}
 
   /* The only way we can be making a call via an access type is if Name is an
  explicit dereference.  In that case, get the list of formal args from the


Re: [PATCH] Fix PR83418

2017-12-14 Thread Richard Biener
On December 14, 2017 4:43:42 PM GMT+01:00, Jeff Law  wrote:
>On 12/14/2017 01:54 AM, Richard Biener wrote:
>> 
>> IVOPTs (at least) leaves unfolded stmts in the IL and VRP
>overzealously
>> asserts they cannot happen.
>> 
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>> 
>> Richard.
>> 
>> 2017-12-14  Richard Biener  
>> 
>>  PR tree-optimization/83418
>>  * vr-values.c
>(vr_values::extract_range_for_var_from_comparison_expr):
>>  Instead of asserting we don't get unfolded comparisons deal with
>>  them.
>> 
>>  * gcc.dg/torture/pr83418.c: New testcase.
>I think this also potentially affects dumping.  I've seen the dumper
>crash trying to access a INTEGER_CST where we expected to find an
>SSA_NAME while iterating over a statement's operands.
>
>I haven't submitted the workaround because I hadn't tracked down the
>root cause to verify something deeper isn't wrong.

Yes, I've seen this as well, see my comment in the PR. The issue is that DOM 
calls VRP analyze (and dump) routines with not up to date operands during 
optimize_stmt. 

Richard. 

>Jeff



[Ada] Minor tweak to default_pass_by_ref

2017-12-14 Thread Eric Botcazou
Tested on x86_64-suse-linux, applied on the mainline.


2017-12-14  Eric Botcazou  

* gcc-interface/misc.c (default_pass_by_ref): Minor tweak.

-- 
Eric BotcazouIndex: gcc-interface/misc.c
===
--- gcc-interface/misc.c	(revision 331642)
+++ gcc-interface/misc.c	(revision 331643)
@@ -1144,18 +1144,18 @@ default_pass_by_ref (tree gnu_type)
  is an In Out parameter, but it's probably best to err on the side of
  passing more things by reference.  */
 
-  if (pass_by_reference (NULL, TYPE_MODE (gnu_type), gnu_type, true))
-return true;
-
-  if (targetm.calls.return_in_memory (gnu_type, NULL_TREE))
-return true;
-
   if (AGGREGATE_TYPE_P (gnu_type)
   && (!valid_constant_size_p (TYPE_SIZE_UNIT (gnu_type))
 	  || 0 < compare_tree_int (TYPE_SIZE_UNIT (gnu_type),
    TYPE_ALIGN (gnu_type
 return true;
 
+  if (pass_by_reference (NULL, TYPE_MODE (gnu_type), gnu_type, true))
+return true;
+
+  if (targetm.calls.return_in_memory (gnu_type, NULL_TREE))
+return true;
+
   return false;
 }
 


[committed] v3: vrp_prop: Use dom_walker for -Warray-bounds (PR tree-optimization/83312)

2017-12-14 Thread David Malcolm
On Thu, 2017-12-14 at 11:53 +0100, Richard Biener wrote:
> On Wed, Dec 13, 2017 at 10:30 PM, David Malcolm 
> wrote:
> > On Wed, 2017-12-13 at 10:47 -0700, Jeff Law wrote:
> > > On 12/13/2017 09:24 AM, Richard Biener wrote:
> > > > > 
> > > > > Alternately we could to the dom_walker ctor that an initial
> > > > > state
> > > > > of
> > > > > EDGE_EXECUTABLE is already set.
> > > > 
> > > > I'm quite sure that wouldn't help for VRP.
> > > 
> > > Not sure why.  But it's not worth digging deep into.
> > > 
> > > I do think the current structure could still fail to pick up some
> > > secondary cases where blocks become unreachable as a result of
> > > both
> > > not
> > > needing to visit during the lattice propagation step and the
> > > substitution step.  But I'd expect this to be rare.
> > > 
> > > > I think David's approach is fine just we don't need any other
> > > > API
> > > > to get at a known executable outgoing edge. We can improve the
> > > > existing one or just add the trivial folding required.
> > > 
> > > I think Michael's suggestion to pass in NULL for the value and
> > > allow
> > > find_edge to try and determine the value makes the most sense
> > > here.
> > > 
> > > Jeff
> > 
> > Michael: thanks for the hint about find_taken_edge; I assumed that
> > such
> > a "find the relevant out-edge" function would already exist; but I
> > didn't find it (I'm relatively unfamiliar with this part of the
> > code).
> > 
> > Here's an updated version of the patch, which eliminates the stuff
> > I
> > added to gimple.h/gimple.c changes in favor of using
> > find_taken_edge (bb, NULL_TREE),
> > generalizing it to work with arbitrary bbs, so that the dom_walker
> > vfunc can simply use:
> >   return find_taken_edge (bb, NULL_TREE);
> > without having to check e.g. for there being a last stmt (ENTRY
> > and EXIT), or having to check that it is indeed a control statement
> > (is there a requirement at this point of the IR that we don't just
> > fall off the last statment through an out-edge?)
> > 
> > I handled var == NULL_TREE for GIMPLE_COND and GIMPLE_SWITCH,
> > but not for computed goto (find_taken_edge already handles that by
> > bailing out).
> > 
> > I also made some things "const" whilst I was touching it.
> > 
> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> > 
> > gcc/ChangeLog:
> > PR tree-optimization/83312
> > * domwalk.h (dom_walker::dom_walker): Fix typo in comment.
> > * tree-cfg.c (find_taken_edge): Update to handle NULL_TREE
> > for
> > "val" param, and to cope with arbitrary basic blocks.
> > (find_taken_edge_cond_expr): Add "cond_stmt" param and use
> > it to
> > handle NULL_TREE for "val".
> > (find_taken_edge_switch_expr): Make "switch_stmt" param
> > const.
> > Handle NULL_TREE for "val".
> > (find_case_label_for_value): Make "switch_stmt" param
> > const.
> > * tree-vrp.c (class check_array_bounds_dom_walker): New
> > subclass
> > of dom_walker.
> > (vrp_prop::check_all_array_refs): Reimplement as...
> > (check_array_bounds_dom_walker::before_dom_children):
> > ...this new
> > vfunc.  Replace linear search through BB block list,
> > excluding
> > those with non-executable in-edges via dominator walk.
> > 
> > gcc/testsuite/ChangeLog:
> > PR tree-optimization/83312
> > * gcc.dg/pr83312.c: New test case.
> > ---
> >  gcc/domwalk.h  |  2 +-
> >  gcc/testsuite/gcc.dg/pr83312.c | 30 +
> >  gcc/tree-cfg.c | 59 +-
> > --
> >  gcc/tree-vrp.c | 76 ++--
> > --
> >  4 files changed, 117 insertions(+), 50 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr83312.c
> > 
> > diff --git a/gcc/domwalk.h b/gcc/domwalk.h
> > index 6ac93eb..c7e3450 100644
> > --- a/gcc/domwalk.h
> > +++ b/gcc/domwalk.h
> > @@ -32,7 +32,7 @@ class dom_walker
> >  public:
> >static const edge STOP;
> > 
> > -  /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can
> > discover
> > +  /* Use SKIP_UNREACHABLE_BLOCKS = true when your client can
> > discover
> >   that some edges are not executable.
> > 
> >   If a client can discover that a COND, SWITCH or GOTO has a
> > static
> > diff --git a/gcc/testsuite/gcc.dg/pr83312.c
> > b/gcc/testsuite/gcc.dg/pr83312.c
> > new file mode 100644
> > index 000..2eb241d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr83312.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-options "-O2 -Warray-bounds" } */
> > +
> > +struct ptlrpcd_ctl {
> > +  char pc_name[20];
> > +};
> > +struct ptlrpcd {
> > +  struct ptlrpcd_ctl pd_threads[6];
> > +};
> > +struct ptlrpcd *ptlrpcd_init_pd;
> > +static void ptlrpcd_ctl_init(struct ptlrpcd_ctl *pc, int index) {
> > +  if (index < 0)
> > +__builtin_snprintf(pc->pc_name, sizeof(pc->pc_name),
> > "ptlrpcd_rcv");
> > +  else
> > +__bu

[PATCH] PR libstdc++/83427 detect weak result type from noexcept functions

2017-12-14 Thread Jonathan Wakely

Richard Smith pointed out that our "weak result type" implementation
(used by std::reference_wrapper and std::bind) doesn't work for
noexcept functions.

The simple fix is to adjust every partial specialization to include
noexcept deduction (which I'll do for gcc-7-branch), but I took the
opportunity to clean things up a bit on trunk. We already have the
required logic in _Mem_fn_traits, so I replaced all the partial
specializations for member function pointers with new helpers that use
_Mem_fn_traits. That adds support for ref-qualifiers as well, which
was missing.

I've also removed a number of useless partial specializations that can
never match anything. We don't need to support cv-qualified functions
types (so-called abominable function types) in std::bind, because you
can never call it with an argument of such a type, so I removed all
the _Weak_result_type_impl partial specializations. We
do need to support those types in std::reference_wrapper, but that
seems to be an obvious defect, so I've reported an issue to fix the
standard.  In the meanwhile I'm not adding noexcept deduction to those
specializations, because I want them to go away, not get fixed.
Finally, we also don't need _Weak_result_type_impl because
std::bind decays its functor, and reference_wrapper doesn't need a
weak result type for references to functions (and my defect report
might make them undefined anyway).

PR libstdc++/83427
* include/bits/refwrap.h (_Maybe_unary_or_binary_function): Move here
from .
(_Mem_fn_traits_base, _Mem_fn_traits): Move here, from .
(_Weak_result_type_impl, _Reference_wrapper_base): Deduce noexcept
for function types. Remove partial specializations for member
functions.
(_Weak_result_type_impl): Remove unused partial specializations for
non-referenceable function types and for references to functions.
(_Weak_result_type_memfun, _Reference_wrapper_base_memfun): New
helpers to handle member functions via _Mem_fn_traits.
(_Weak_result_type, reference_wrapper): Derive from new helpers.
* include/bits/std_function.h (_Maybe_unary_or_binary_function): Move
to .
* include/std/functional (_Pack, _AllConvertible, _NotSame): Remove.
(_Mem_fn_traits_base, _Mem_fn_traits): Move to .
* testsuite/20_util/bind/83427.cc: New test.
* testsuite/20_util/bind/refqual.cc: Add noexcept to functions and
check for weak result types.
* testsuite/20_util/reference_wrapper/83427.cc: New test.
   
Tested powerpc64le-linux, committed to trunk.


commit 0088fed20e8dadd79606876d896314e1dab19945
Author: Jonathan Wakely 
Date:   Thu Dec 14 16:07:15 2017 +

PR libstdc++/83427 detect weak result type from noexcept functions

PR libstdc++/83427
* include/bits/refwrap.h (_Maybe_unary_or_binary_function): Move 
here
from .
(_Mem_fn_traits_base, _Mem_fn_traits): Move here, from .
(_Weak_result_type_impl, _Reference_wrapper_base): Deduce noexcept
for function types. Remove partial specializations for member
functions.
(_Weak_result_type_impl): Remove unused partial specializations for
non-referenceable function types and for references to functions.
(_Weak_result_type_memfun, _Reference_wrapper_base_memfun): New
helpers to handle member functions via _Mem_fn_traits.
(_Weak_result_type, reference_wrapper): Derive from new helpers.
* include/bits/std_function.h (_Maybe_unary_or_binary_function): 
Move
to .
* include/std/functional (_Pack, _AllConvertible, _NotSame): Remove.
(_Mem_fn_traits_base, _Mem_fn_traits): Move to .
* testsuite/20_util/bind/83427.cc: New test.
* testsuite/20_util/bind/refqual.cc: Add noexcept to functions and
check for weak result types.
* testsuite/20_util/reference_wrapper/83427.cc: New test.

diff --git a/libstdc++-v3/include/bits/refwrap.h 
b/libstdc++-v3/include/bits/refwrap.h
index 5e5b61060e3..1b64cd8fbf5 100644
--- a/libstdc++-v3/include/bits/refwrap.h
+++ b/libstdc++-v3/include/bits/refwrap.h
@@ -44,6 +44,69 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  /**
+   * Derives from @c unary_function or @c binary_function, or perhaps
+   * nothing, depending on the number of arguments provided. The
+   * primary template is the basis case, which derives nothing.
+   */
+  template
+struct _Maybe_unary_or_binary_function { };
+
+  /// Derives from @c unary_function, as appropriate.
+  template
+struct _Maybe_unary_or_binary_function<_Res, _T1>
+: std::unary_function<_T1, _Res> { };
+
+  /// Derives from @c binary_function, as appropriate.
+  template
+struct _Maybe_unary_or_binary_function<_Res, _T1, _T2>
+: std::binary_function<_T1, _T2, _Res> { };
+
+

Re: [PATCH] PR libstdc++/83427 detect weak result type from noexcept functions

2017-12-14 Thread Jonathan Wakely

On 14/12/17 17:29 +, Jonathan Wakely wrote:

Richard Smith pointed out that our "weak result type" implementation
(used by std::reference_wrapper and std::bind) doesn't work for
noexcept functions.

The simple fix is to adjust every partial specialization to include
noexcept deduction (which I'll do for gcc-7-branch), [...]


Here's the patch for the branch, which keeps all the partial
specializations for member functions, adding the deduction to them.

This doesn't remove the unused partial specializations, but also
doesn't "fix" them to detect noexcept.

Tested x86_64-linux, committed to trunk.



commit d12e6a7a3c605b95ee526bb6d11dbc286663492f
Author: Jonathan Wakely 
Date:   Thu Dec 14 16:47:47 2017 +

PR libstdc++/83427 detect weak result type from noexcept functions

PR libstdc++/83427
* include/bits/refwrap.h (_Weak_result_type_impl)
(_Reference_wrapper_base): Deduce noexcept for function types.
* testsuite/20_util/bind/83427.cc: New test.
* testsuite/20_util/reference_wrapper/83427.cc: New test.

diff --git a/libstdc++-v3/include/bits/refwrap.h b/libstdc++-v3/include/bits/refwrap.h
index 124ee97bd2a..86260dac993 100644
--- a/libstdc++-v3/include/bits/refwrap.h
+++ b/libstdc++-v3/include/bits/refwrap.h
@@ -64,12 +64,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   /// Retrieve the result type for a function type.
-  template
-struct _Weak_result_type_impl<_Res(_ArgTypes...)>
+  template
+struct _Weak_result_type_impl<_Res(_ArgTypes...) _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
-struct _Weak_result_type_impl<_Res(_ArgTypes..)>
+  template
+struct _Weak_result_type_impl<_Res(_ArgTypes..) _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   template
@@ -106,50 +106,65 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { typedef _Res result_type; };
 
   /// Retrieve the result type for a function pointer.
-  template
-struct _Weak_result_type_impl<_Res(*)(_ArgTypes...)>
+  template
+struct _Weak_result_type_impl<_Res(*)(_ArgTypes...) _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
-struct _Weak_result_type_impl<_Res(*)(_ArgTypes..)>
+  template
+struct _Weak_result_type_impl<_Res(*)(_ArgTypes..)
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   /// Retrieve result type for a member function pointer.
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...)>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...)
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..)>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..)
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   /// Retrieve result type for a const member function pointer.
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...) const>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...) const
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..) const>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..) const
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   /// Retrieve result type for a volatile member function pointer.
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...) volatile>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...) volatile
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
-struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..) volatile>
+  template
+struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..) volatile
+  _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   /// Retrieve result type for a const volatile member function pointer.
-  template
+  template
 struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes...)
-  const volatile>
+  const volatile _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
-  template
+  template
 struct _Weak_result_type_impl<_Res (_Class::*)(_ArgTypes..)
-  const volatile>
+  const volatile _GLIBCXX_NOEXCEPT_QUAL>
 { typedef _Res result_type; };
 
   /**
@@ -201,8 +216,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   // - a function type (unary)
-  template
-struct _Reference_wrapper_base<_Res(_T1)>
+  template
+struct _Reference_wrapper_base<_Res(_T1) _GLIBCXX_NOEXCEPT_QUAL>
 : unary_function<_T1, _Res>
 { };
 
@@ -222,8 +237,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   // - a function type (binary)
-  template
-struct _Reference_wrapper_base<_Res(_T1, _T2)>
+  template
+struct _Reference_wrapper_base<_Res(_

Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Christophe Lyon
On 14 December 2017 at 17:05, Sudakshina Das  wrote:
> Hi
>
> This patch is a follow up on my previous patch with r255536 that was a
> back-port for fixing a wrong code generation
> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
> As pointed out by Christophe Lyon
> (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case
> started to fail on the new dejagnu for arm-none-linux-gnueabi and
> arm-none-eabi.
> This patch just removes the dg-add-options from the test case because I
> think dg-options has all that is needed anyway.
>
> Testing: Since I could not reproduce the failure on my machine, Christophe
> would it be possible for you to check if this patch fixes the
> regression for you?
>

Manually  tested on one of the offending configs, it did the trick.
Thanks

Christophe

> Thanks
> Sudi


Re: [Patch][Aarch64] Fix aarch64 libatomic build with older binutils

2017-12-14 Thread James Greenhalgh
On Thu, Dec 07, 2017 at 11:56:55PM +, Steve Ellcey wrote:
> James,
> 
> Here is a patch that will turn off the use of IFUNC and the LSE
> instructions in libatomic if the compiler/assembler toolchain do not
> understand the '-march=armv8-a+lse' option (changed from
> -march=armv8.1-a).  Rather than check the assembler directly, I used
> the existing ACX_PROG_CC_WARNING_OPTS macro to test this.  This will
> cause the GCC being built to send the option in question to the
> assembler and if the assembler complains that is enough to cause us to
> not set enable_aarch64_lse, and thus not set try_ifunc.

Thanks for the fix.

OK,

James

> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-12-07  Steve Ellcey  
> 
>   * Makefile.am (IFUNC_OPTIONS): Change aarch64
>   option from -march=armv8.1-a to -march=armv8-a+lse.
>   * configure.ac (*aarch64*): Check to see if
>   compiler understands -march=armv8-a+lse option.
>   * configure.tgt (*aarch64*): Only set try_ifunc
>   if compiler understands -march=armv8-a+lse option.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * aclocal.m4: Regenerate.
> 



Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Kyrill Tkachov

Hi Sandra,

On 14/12/17 17:09, Sandra Loosemore wrote:

On 12/14/2017 08:53 AM, Kyrill Tkachov wrote:

Hi all,

This patch adds an -mverbose-cost-dump option, similar to the one in 
aarch64.
It makes the RTX cost dump print the RTX we're costing in the 
backend, as well as its cost.

This can be distracting in other cost-related RTL dumps like combine's.

So now we don't dump the backend information by default, but provide 
the power-user option -mverbose-cost-dump

to enable the old verbose dumping.

This option is for GCC developers debugging the compiler only, so no 
documentation are added.


No, it doesn't work that way.  There is a section in invoke.texi "GCC 
Developer Options" that is specifically for options that are only 
interesting to GCC developers debugging the compiler.  Please document 
this new option there.


-Sandra


Thanks for pointing me to this. I had assumed that we only want
to document target-independent options as I didn't see any other
target-dependent options in that section, but you are right.

There is no reason from a readers perspective not to document the
target-specific ones as well.

This patch adds a target-specific developer options subsection and adds
a short entry for the aarch64 and arm ones that I'm aware of.

make pdf works fine for me and the resulting pdf looks reasonable to my 
eyes.


That said, I'm not very familiar with the texi directives and I'm not 
sure what
the argument to the @table directive means. I've used @gcctabopt as I've 
seen it used
in other @table directives. Please let me know if I should use something 
else.


Ok for trunk?

Thanks,
Kyrill

2017-12-14  Kyrylo Tkachov  

* doc/invoke.texi (GCC Developer options): Add Target-specific
developer options subsection.  Populate it with AArch64 and ARM
options.
commit 34622cad273e03df44d2aaabc5e4bb664af50862
Author: Kyrylo Tkachov 
Date:   Thu Dec 14 17:44:22 2017 +

[doc] Add target-specific developer options subsection

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0d565b4..00de74c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14223,6 +14223,24 @@ Print the compiler's built-in specs---and don't do anything else.  (This
 is used when GCC itself is being built.)  @xref{Spec Files}.
 @end table
 
+@subsection Target-specific developer options
+Some targets define additional debug options that are documented here.
+@subsubsection AArch64 developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -moverride=
+Override CPU optimization parameters.
+@end table
+
+@subsubsection ARM developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -mflip-thumb
+Switch ARM/Thumb modes on alternating functions.
+@end table
+
 @node Submodel Options
 @section Machine-Dependent Options
 @cindex submodel options


Re: [SFN] Bootstrap broken

2017-12-14 Thread Alexandre Oliva
On Dec 13, 2017, Alexandre Oliva  wrote:

> On Dec 12, 2017, David Edelsohn  wrote:
>> Rainer,
>> PR83396 opened.  you can add Solaris to the list of targets.

> Andreas,

> Here's a fix for the ia64 regression you mentioned in that PR.

And here's a patch that fixes the two other ia64 build failures you'd
mentioned.

Regstrapped on x86_64- and i686-linux-gnu by myself; bootstrapped on
ia64-linux-gnu by yourself IIUC, though with some weird bootstrap
compare errors I'm very curious to learn more about.

Ok to install?

Emitting markers before labels turned out to not be worth the trouble.
The markers outside BBs confuse the ebb scheduler, and they don't add
any useful information.  I'll arrange for markers to be moved past
labels, even in gimple, but for now this will fix the two remaining
known problems on ia64.

for  gcc/ChangeLog

PR bootstrap/83396
* cfgexpand.c (expand_gimple_basic_block): Expand label first,
even if there are markers before it.
* cfgrtl.c (rtl_verify_bb_layout): Reject DEBUG_INSNs outside BBs.
---
 gcc/cfgexpand.c |   12 
 gcc/cfgrtl.c|1 -
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index ce98264214ae..30d9bac1118e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5510,20 +5510,16 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
   gimple *label_stmt = stmt;
   rtx_code_label **elt = lab_rtx_for_bb->get (bb);
 
-  if (stmt)
-/* We'll get to it in the loop below, and get back to
-   emit_label_and_note then.  */
-;
-  else if (stmt || elt)
+  if (stmt || elt)
 {
-emit_label_and_note:
   gcc_checking_assert (!note);
   last = get_last_insn ();
 
   if (stmt)
{
  expand_gimple_stmt (stmt);
- gsi_next (&gsi);
+ if (gsi_stmt (gsi) == stmt)
+   gsi_next (&gsi);
}
 
   if (elt)
@@ -5550,7 +5546,7 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
   stmt = gsi_stmt (gsi);
 
   if (stmt == label_stmt)
-   goto emit_label_and_note;
+   continue;
 
   /* If this statement is a non-debug one, and we generate debug
 insns, then this one might be the last real use of a TERed
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index b127ea1a0b38..bc1e3ee7ece8 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -2954,7 +2954,6 @@ rtl_verify_bb_layout (void)
{
case BARRIER:
case NOTE:
-   case DEBUG_INSN:
  break;
 
case CODE_LABEL:


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [SFN] Bootstrap broken

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 04:08:45PM -0200, Alexandre Oliva wrote:
> On Dec 13, 2017, Alexandre Oliva  wrote:
> 
> > On Dec 12, 2017, David Edelsohn  wrote:
> >> Rainer,
> >> PR83396 opened.  you can add Solaris to the list of targets.
> 
> > Andreas,
> 
> > Here's a fix for the ia64 regression you mentioned in that PR.
> 
> And here's a patch that fixes the two other ia64 build failures you'd
> mentioned.
> 
> Regstrapped on x86_64- and i686-linux-gnu by myself; bootstrapped on
> ia64-linux-gnu by yourself IIUC, though with some weird bootstrap
> compare errors I'm very curious to learn more about.
> 
> Ok to install?
> 
> Emitting markers before labels turned out to not be worth the trouble.
> The markers outside BBs confuse the ebb scheduler, and they don't add
> any useful information.  I'll arrange for markers to be moved past
> labels, even in gimple, but for now this will fix the two remaining
> known problems on ia64.
> 
> for  gcc/ChangeLog
> 
>   PR bootstrap/83396
>   * cfgexpand.c (expand_gimple_basic_block): Expand label first,
>   even if there are markers before it.
>   * cfgrtl.c (rtl_verify_bb_layout): Reject DEBUG_INSNs outside BBs.

Ok, but please work on reversion of everything that has been changed
in order to support those (on RTL e.g. the var-tracking.c
  for (insn = get_first_insn (bb);
   insn != BB_HEAD (bb->next_bb)
 ? next = NEXT_INSN (insn), true : false;
   insn = next)
and
  rtx_insn *next;
  bool outside_bb = true;
  for (insn = get_first_insn (bb); insn != BB_HEAD (bb->next_bb);
   insn = next)
{
  if (insn == BB_HEAD (bb))
outside_bb = false;
  else if (insn == NEXT_INSN (BB_END (bb)))
outside_bb = true;
...
  if (outside_bb)
{
  /* Ignore non-debug insns outside of basic blocks.  */
  if (!DEBUG_INSN_P (insn))
continue;
  /* Debug binds shouldn't appear outside of bbs.  */
  gcc_assert (!DEBUG_BIND_INSN_P (insn));
}
and the NULL BLOCK_FOR_INSN stuff, on GIMPLE the gsi_after_labels changes,
the tree-cfg.c verification needs to be changed to disallow even the
markers before labels, ...).  That can be done incrementally, it is better
to unbreak the tree ASAP.

Jakub


Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Sudakshina Das

Hi

On 14/12/17 17:37, Christophe Lyon wrote:

On 14 December 2017 at 17:05, Sudakshina Das  wrote:

Hi

This patch is a follow up on my previous patch with r255536 that was a
back-port for fixing a wrong code generation
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
As pointed out by Christophe Lyon
(https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case
started to fail on the new dejagnu for arm-none-linux-gnueabi and
arm-none-eabi.
This patch just removes the dg-add-options from the test case because I
think dg-options has all that is needed anyway.

Testing: Since I could not reproduce the failure on my machine, Christophe
would it be possible for you to check if this patch fixes the
regression for you?



Manually  tested on one of the offending configs, it did the trick.
Thanks



Thank you so much. I will wait for an OK and commit it!

Sudi


Christophe


Thanks
Sudi




Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Sandra Loosemore

On 12/14/2017 10:56 AM, Kyrill Tkachov wrote:


2017-12-14  Kyrylo Tkachov  

     * doc/invoke.texi (GCC Developer options): Add Target-specific
     developer options subsection.  Populate it with AArch64 and ARM
     options.

>

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0d565b4..00de74c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14223,6 +14223,24 @@ Print the compiler's built-in specs---and don't do 
anything else.  (This
 is used when GCC itself is being built.)  @xref{Spec Files}.
 @end table
 
+@subsection Target-specific developer options


Throughout this patch, please capitalize the first word of the section 
titles, and add @node and @cindex for each section.  There are plenty of 
examples in this chapter to copy the style from.



+Some targets define additional debug options that are documented here.
+@subsubsection AArch64 developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -moverride=
+Override CPU optimization parameters.


What parameters are these?  What is the syntax?  This isn't enough 
information to be useful.



+@end table
+
+@subsubsection ARM developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -mflip-thumb
+Switch ARM/Thumb modes on alternating functions.
+@end table
+
 @node Submodel Options
 @section Machine-Dependent Options
 @cindex submodel options


Please also list the new options in the "Option Summary" section.  Note 
that the list for developer options is currently mostly alphabetized.


-Sandra


Re: [SFN+LVU+IEPM v4 1/9] [SFN] adjust RTL insn-walking API

2017-12-14 Thread Alexandre Oliva
On Dec 14, 2017, Jakub Jelinek  wrote:

> On Thu, Dec 14, 2017 at 09:55:30AM -0200, Alexandre Oliva wrote:
>> for  gcc/ChangeLog

> Please fix up formatting.  Otherwise LGTM.

Thanks, here's what I ended up installing (formatting fixes were
slightly different from what you suggested, but to the same effect)

[SFN] next/prev_nonnote_insn_bb are no more, even for ports

The patch that added _nondebug to next_ and prev_nonnote_insn_bb
failed to find and adjust uses within config.  Fixed.

for  gcc/ChangeLog

PR bootstrap/83396
* config/arc/arc.c (hwloop_optimize): Skip debug insns.
* config/sh/sh-protos.h (sh_find_set_of_reg): Adjust.
* config/sh/sh.c: Skip debug insns besides notes.
* config/sh/sh.md: Likewise.
* config/sh/sh_treg_combine.cc: Likewise.
* config/sh/sync.md: Likewise.
---
 gcc/config/arc/arc.c |2 +-
 gcc/config/sh/sh-protos.h|8 
 gcc/config/sh/sh.c   |   19 ++-
 gcc/config/sh/sh.md  |   20 +++-
 gcc/config/sh/sh_treg_combine.cc |   16 
 gcc/config/sh/sync.md|2 +-
 6 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index b8eec10086dd..9974a1f999b5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7499,7 +7499,7 @@ hwloop_optimize (hwloop_info loop)
  && NOTE_KIND (entry_after) != NOTE_INSN_CALL_ARG_LOCATION))
 entry_after = NEXT_INSN (entry_after);
 #endif
-  entry_after = next_nonnote_insn_bb (entry_after);
+  entry_after = next_nonnote_nondebug_insn_bb (entry_after);
 
   gcc_assert (entry_after);
   emit_insn_before (seq, entry_after);
diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index e98030d31bd7..938487501f30 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -120,10 +120,10 @@ struct set_of_reg
   rtx set_src;
 };
 
-/* Given a reg rtx and a start insn, try to find the insn that sets the
-   specified reg by using the specified insn stepping function, such as 
-   'prev_nonnote_insn_bb'.  When the insn is found, try to extract the rtx
-   of the reg set.  */
+/* Given a reg rtx and a start insn, try to find the insn that sets
+   the specified reg by using the specified insn stepping function,
+   such as 'prev_nonnote_nondebug_insn_bb'.  When the insn is found,
+   try to extract the rtx of the reg set.  */
 template  inline set_of_reg
 sh_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
bool ignore_reg_reg_copies = false)
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 0d7d7bc53ca2..85cc77b873a5 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -11896,8 +11896,8 @@ sh_is_logical_t_store_expr (rtx op, rtx_insn* insn)
 
   else
{
- set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
- prev_nonnote_insn_bb);
+ set_of_reg op_set = sh_find_set_of_reg
+   (ops[i], insn, prev_nonnote_nondebug_insn_bb);
  if (op_set.set_src == NULL_RTX)
continue;
 
@@ -11929,7 +11929,8 @@ sh_try_omit_signzero_extend (rtx extended_op, rtx_insn* 
insn)
   if (GET_MODE (extended_op) != SImode)
 return NULL_RTX;
 
-  set_of_reg s = sh_find_set_of_reg (extended_op, insn, prev_nonnote_insn_bb);
+  set_of_reg s = sh_find_set_of_reg (extended_op, insn,
+prev_nonnote_nondebug_insn_bb);
   if (s.set_src == NULL_RTX)
 return NULL_RTX;
 
@@ -11965,10 +11966,10 @@ sh_split_movrt_negc_to_movt_xor (rtx_insn* curr_insn, 
rtx operands[])
   if (!can_create_pseudo_p ())
 return false;
 
-  set_of_reg t_before_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
-prev_nonnote_insn_bb);
-  set_of_reg t_after_negc = sh_find_set_of_reg (get_t_reg_rtx (), curr_insn,
-   next_nonnote_insn_bb);
+  set_of_reg t_before_negc = sh_find_set_of_reg
+(get_t_reg_rtx (), curr_insn, prev_nonnote_nondebug_insn_bb);
+  set_of_reg t_after_negc = sh_find_set_of_reg
+(get_t_reg_rtx (), curr_insn, next_nonnote_nondebug_insn_bb);
 
   if (t_before_negc.set_rtx != NULL_RTX && t_after_negc.set_rtx != NULL_RTX
   && rtx_equal_p (t_before_negc.set_rtx, t_after_negc.set_rtx)
@@ -12009,8 +12010,8 @@ sh_find_extending_set_of_reg (rtx reg, rtx_insn* 
curr_insn)
  Also try to look through the first extension that we hit.  There are some
  cases, where a zero_extend is followed an (implicit) sign_extend, and it
  fails to see the sign_extend.  */
-  sh_extending_set_of_reg result =
-   sh_find_set_of_reg (reg, curr_insn, prev_nonnote_insn_bb, true);
+  sh_extending_set_of_reg result = sh_find_set_of_reg
+(reg, curr_insn, prev_nonnote_nondebug_insn_bb, true);
 
   if (result.set_src != NULL)
 {
diff --git 

Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Kyrill Tkachov


On 14/12/17 18:17, Sudi Das wrote:

Hi

On 14/12/17 17:37, Christophe Lyon wrote:
> On 14 December 2017 at 17:05, Sudakshina Das  wrote:
>> Hi
>>
>> This patch is a follow up on my previous patch with r255536 that was a
>> back-port for fixing a wrong code generation
>> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
>> As pointed out by Christophe Lyon
>> (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test 
case

>> started to fail on the new dejagnu for arm-none-linux-gnueabi and
>> arm-none-eabi.
>> This patch just removes the dg-add-options from the test case because I
>> think dg-options has all that is needed anyway.
>>
>> Testing: Since I could not reproduce the failure on my machine, 
Christophe

>> would it be possible for you to check if this patch fixes the
>> regression for you?
>>
>
> Manually  tested on one of the offending configs, it did the trick.
> Thanks
>

Thank you so much. I will wait for an OK and commit it!



Thanks Sudi and Christophe.
The patch is ok with an appropriate ChangeLog entry.

Kyrill


Sudi

> Christophe
>
>> Thanks
>> Sudi





Re: [PATCHv3] Add a warning for invalid function casts

2017-12-14 Thread Bernd Edlinger
On 12/07/17 21:48, Bernd Edlinger wrote:
> On 12/06/17 23:35, Jason Merrill wrote:
>> On Fri, Dec 1, 2017 at 7:42 AM, Bernd Edlinger
>>  wrote:
>>> this version of the patch improves the heuristic check to take the
>>> target hook into account, to handle cases correctly when both or only
>>> one parameter is _not_ promoted to int.
>>
>> In looking at this, I discovered that the argument to
>> promote_prototypes should be the function type, not the parameter
>> type; the existing uses in the C++ front end were wrong.
>>
> 
> Bah, sorry.
> 
> Yes, it looks like there are at least two more target hooks that change
> the type promotion rules later-on: targetm.calls.promote_function_mode
> and PROMOTE_MODE.
> 
> In the ada FE we pass NULL_TREE to promote_prototypes which seems to
> mean if the target wants type promotion in principle.  So it returns
> true if some function may promote, and false if no function promotes.
> At least this is how the sh-target handles this parameter.
> This is BTW the only target that uses the argument of this callback.
> 
> So I would think for the purpose of this warning the following check
> should be sufficient:
> 
>    if (INTEGRAL_TYPE_P (t1)
>    && INTEGRAL_TYPE_P (t2)
>    && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>    && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>    || !targetm.calls.promote_prototypes (NULL_TREE)
>    || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
> 
> 
> What do you think?
> Is the patch OK with this change?
> 

So, is the simplified heuristic using only promote_prototypes (NULL)
OK?


Thanks
Bernd.


Re: [compare-debug] use call loc for nop_endbr

2017-12-14 Thread Alexandre Oliva
On Dec 14, 2017, "Tsimbalist, Igor V"  wrote:

>> Regstrapping with -fcompare-debug on stage3 host and target builds on
>> x86_64- and i686-linux-gnu; ok to install?

> Ok from me.

Thanks, I went ahead and installed it.

> Am I correct the error you had was related to improper location information,

Yeah, only location information.

> I will try to skip NOTE insns only.

You probably want to skip debug insns and notes, too.  Actually, IIRC
you insert these insns after var-tracking, so you probably only have to
deal with notes.  You don't have to, but if bindings are intended to
take effect right after the call, it would probably be nice if they
still did so, e.g., even if you happen to single-step out of the call
and stop at the nop_endbr insn.


BTW, is this the subject of a Cauldron 2017 talk in which I raised an
issue about PLT entries possibly needing special opcodes to enable them
to be used as call targets or somesuch?  I had initially retracted my
question, when it was stated that only indirect calls needed special
treatment, but later I realized that in some cases PLT entries *are*
used as function addresses even for functions that have their addresses
taken.  Please let me know if you're familiar with the issue and would
like me to detail the problem.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PR59319] output friends in debug info

2017-12-14 Thread Jason Merrill

On 12/07/2017 04:04 PM, Alexandre Oliva wrote:

On Apr  7, 2017, Alexandre Oliva  wrote:


On Mar 21, 2017, Alexandre Oliva  wrote:

On Jan 27, 2017, Alexandre Oliva  wrote:

On Oct 19, 2016, Alexandre Oliva  wrote:

On Sep 23, 2016, Alexandre Oliva  wrote:

On Aug 30, 2016, Alexandre Oliva  wrote:

Handling non-template friends is kind of easy, [...]

Ping?

Ping?  (conflicts resolved, patch refreshed and retested)

Ping?  (trivial conflicts resolved)

Ping?  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02112.html

Ping?

Ping? (refreshed, retested)

[PR59319] output friends in debug info

Handling non-template friends is kind of easy, but it required a bit
of infrastructure in dwarf2out to avoid (i) forcing debug info for
unused types or functions: DW_TAG_friend DIEs are only emitted if
their DW_AT_friend DIE is emitted, and (ii) creating DIEs for such
types or functions just to have them discarded at the end.  To this
end, I introduced a list (vec, actually) of types with friends,
processed at the end of the translation unit, and a list of
DW_TAG_friend DIEs that, when we're pruning unused types, reference
DIEs that are still not known to be used, revisited after we finish
deciding all other DIEs, so that we prune DIEs that would have
referenced pruned types or functions.

Handling template friends turned out to be trickier: there's no
representation in DWARF for templates.  I decided to give debuggers as
much information as possible, enumerating all specializations of
friend templates and outputting DW_TAG_friend DIEs referencing them as
well.  I considered marking those as DW_AT_artificial, to indicate
they're not explicitly stated in the source code, but in the end we
decided that was not useful.  The greatest challenge was to enumerate
all specializations of a template.  It looked trivial at first, given
DECL_TEMPLATE_INSTANTIATIONS, but it won't list specializations of
class-scoped functions and of nested templates.  For other templates,
I ended up writing code to look for specializations in the hashtables
of decl or type specializations.  That's not exactly efficient, but it
gets the job done.


How inefficient is it, exactly?  I'm concerned about the impact on 
compile time of scanning the entire hash table for each friend 
declaration in a template instantiation.  This sounds prohibitive for 
template-heavy code that uses friends.


I wonder about changing register_specialization to fill out 
DECL_TEMPLATE_INSTANTIATIONS for more templates (even more than you 
already do).



+  /* At DETAIL level 0, returns non-NULL if the named class TYPE has
+ any friends, NULL otherwise.  At higher detail levels, return a
+ tree list with the friends of the named class type.  Each
+ TREE_VALUE contains one friend type or function decl.  For
+ non-template friends, TREE_PURPOSE is NULL.  For template friend
+ declarations, the returned entries depend on the DETAIL level.
+ At level 1, and only at level 1, an entry with NULL TREE_VALUE
+ and non-NULL TREE_PURPOSE will START the returned list to
+ indicate the named class TYPE has at least one template friend.
+ At level 2, each template friend will be in an entry with NULL
+ TREE_VALUE, and with the TEMPLATE_DECL in TREE_PURPOSE.  At level
+ 3, instead of a NULL TREE_VALUE, we add one entry for each
+ instantiation or specialization of the template that fits the
+ template friend declaration, as long as there is at least one
+ instantiation or specialization; if there isn't any, an entry
+ with NULL TREE_VALUE is created.  A negative detail level will
+ omit non-template friends from the returned list.  */


The calls I see only seem to use details 0 and 3, while I would expect 
level 3 to only be used with -g3.  What is the purpose of the negative 
level?


Jason


Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Kyrill Tkachov


On 14/12/17 18:23, Sandra Loosemore wrote:

On 12/14/2017 10:56 AM, Kyrill Tkachov wrote:


2017-12-14  Kyrylo Tkachov  

 * doc/invoke.texi (GCC Developer options): Add Target-specific
 developer options subsection.  Populate it with AArch64 and ARM
 options.

>

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0d565b4..00de74c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14223,6 +14223,24 @@ Print the compiler's built-in specs---and 
don't do anything else.  (This

 is used when GCC itself is being built.)  @xref{Spec Files}.
 @end table

+@subsection Target-specific developer options


Throughout this patch, please capitalize the first word of the section 
titles, and add @node and @cindex for each section.  There are plenty 
of examples in this chapter to copy the style from.




Thanks, done.


+Some targets define additional debug options that are documented here.
+@subsubsection AArch64 developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -moverride=
+Override CPU optimization parameters.


What parameters are these?  What is the syntax?  This isn't enough 
information to be useful.




I'm afraid I don't know. I've not used this option myself, I'm just 
aware of its existence.
Do you think it's better to leave it out altogether and document it 
separately at a later date

(or ping its author to document it perhaps)?


+@end table
+
+@subsubsection ARM developer options
+@table @gcctabopt
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@item -mflip-thumb
+Switch ARM/Thumb modes on alternating functions.
+@end table
+
 @node Submodel Options
 @section Machine-Dependent Options
 @cindex submodel options


Please also list the new options in the "Option Summary" section. Note 
that the list for developer options is currently mostly alphabetized.




Thanks, done. I haven't created a new Target-specific developers options 
table but instead listed the targets the options apply to in parentheses.

Attached is the latest iteration.

Thank you for the review,
Kyrill

2017-12-14  Kyrylo Tkachov  

* doc/invoke.texi (GCC Developer options): Add Target-specific
developer options subsection.  Populate it with AArch64 and ARM
options.
(Options Summary): List the above.
commit 1c92b22d63cc16296cbbbcf4791951971b62094d
Author: Kyrylo Tkachov 
Date:   Thu Dec 14 17:44:22 2017 +

[doc] Add target-specific developer options subsection

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0d565b4..b8936d1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -574,6 +574,8 @@ Objective-C and Objective-C++ Dialects}.
 -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-verbose @gol
 -fstats  -fstack-usage  -ftime-report  -ftime-report-details @gol
 -fvar-tracking-assignments-toggle  -gtoggle @gol
+-mflip-thumb (ARM) -moverride=@var{string} (AArch64) @gol
+-mverbose-cost-dump (ARM, AArch64) @gol
 -print-file-name=@var{library}  -print-libgcc-file-name @gol
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory @gol
 -print-prog-name=@var{program}  -print-search-dirs  -Q @gol
@@ -14223,6 +14225,31 @@ Print the compiler's built-in specs---and don't do anything else.  (This
 is used when GCC itself is being built.)  @xref{Spec Files}.
 @end table
 
+@node Target-specific Developer Options
+@cindex target-specific developer options
+@subsection Target-specific Developer Options
+Some targets define additional debug options that are documented here.
+
+@node AArch64 Developer Options
+@cindex aarch64 developer options
+@subsubsection AArch64 Developer Options
+@table @gcctabopt
+@item -moverride=
+Override CPU optimization parameters.
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@end table
+
+@node ARM Developer Options
+@cindex arm developer options
+@subsubsection ARM Developer Options
+@table @gcctabopt
+@item -mflip-thumb
+Switch ARM/Thumb modes on alternating functions.
+@item -mverbose-cost-dump
+Enable verbose cost model dumping in the debug dump files.
+@end table
+
 @node Submodel Options
 @section Machine-Dependent Options
 @cindex submodel options


Re: [PATCHv3] Add a warning for invalid function casts

2017-12-14 Thread Jason Merrill
On Thu, Dec 7, 2017 at 3:48 PM, Bernd Edlinger
 wrote:
> On 12/06/17 23:35, Jason Merrill wrote:
>> On Fri, Dec 1, 2017 at 7:42 AM, Bernd Edlinger
>>  wrote:
>>> this version of the patch improves the heuristic check to take the
>>> target hook into account, to handle cases correctly when both or only
>>> one parameter is _not_ promoted to int.
>>
>> In looking at this, I discovered that the argument to
>> promote_prototypes should be the function type, not the parameter
>> type; the existing uses in the C++ front end were wrong.
>>
>
> Bah, sorry.
>
> Yes, it looks like there are at least two more target hooks that change
> the type promotion rules later-on: targetm.calls.promote_function_mode
> and PROMOTE_MODE.
>
> In the ada FE we pass NULL_TREE to promote_prototypes which seems to
> mean if the target wants type promotion in principle.  So it returns
> true if some function may promote, and false if no function promotes.
> At least this is how the sh-target handles this parameter.
> This is BTW the only target that uses the argument of this callback.
>
> So I would think for the purpose of this warning the following check
> should be sufficient:
>
>if (INTEGRAL_TYPE_P (t1)
>&& INTEGRAL_TYPE_P (t2)
>&& TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>&& (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>|| !targetm.calls.promote_prototypes (NULL_TREE)
>|| TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>
>
> What do you think?
> Is the patch OK with this change?

Yes.

Jason


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Martin Sebor

On 12/14/2017 09:18 AM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote:

Although I would prefer not to, I suppose if letting strlen cross
the boundaries of subobjects was considered an important use to
accommodate in limited cases the optimization could be disabled
for member arrays declared with the new nonstring attribute (while
still issuing a warning for it as GCC does today).

Another alternative (if the above use case is considered important
enough) might be to suppress the optimization for member character
arrays that are immediately followed by another such array.

History tells us that there will be someone out there that does this
kind of thing -- the question is how pervasive is it.  My suspicion is
that it is not common.

Given that I don't expect those uses to be common, the only thing that
should break is non-conforming code and we have a (new) warning for such
code my inclination is to go forward.

So I'm OK with the patch.  I'd give folks till Monday to chime in with
dissenting opinions.


Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.


Sure, that sounds like a useful enhancement.  I'll look into
adding it as a follow-on patch unless you feel that it needs
to be part of the same package.

Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:
> > Well, it would be nice to get sanitizers diagnose this at runtime.  If we
> > know the array length at compile time, simply compare after the strlen
> > call the result and fail if it returns something above it.  Or replace
> > the strlen call with strnlen for the compile time known size and add
> > instrumentation if strnlen returns the second argument.
> 
> Sure, that sounds like a useful enhancement.  I'll look into
> adding it as a follow-on patch unless you feel that it needs
> to be part of the same package.

The problem is if we'll need changes to libubsan for that (which we'll
likely do), then those need to be upstreamed, and e.g. my attempts
to upstream simple patch to diagnose noreturn function returns is suspended
upstream because clang doesn't have that support (and I have no interest
in adding to to clang).

In theory we could have some GCC only file in there, but then we'd be ABI
incompatible with them.

Jakub


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jeff Law
On 12/14/2017 11:55 AM, Jakub Jelinek wrote:
> On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:
>>> Well, it would be nice to get sanitizers diagnose this at runtime.  If we
>>> know the array length at compile time, simply compare after the strlen
>>> call the result and fail if it returns something above it.  Or replace
>>> the strlen call with strnlen for the compile time known size and add
>>> instrumentation if strnlen returns the second argument.
>>
>> Sure, that sounds like a useful enhancement.  I'll look into
>> adding it as a follow-on patch unless you feel that it needs
>> to be part of the same package.
> 
> The problem is if we'll need changes to libubsan for that (which we'll
> likely do), then those need to be upstreamed, and e.g. my attempts
> to upstream simple patch to diagnose noreturn function returns is suspended
> upstream because clang doesn't have that support (and I have no interest
> in adding to to clang).
> 
> In theory we could have some GCC only file in there, but then we'd be ABI
> incompatible with them.
So defer the sanitization side until Clang catches up?

jeff


[PATCH] Slight cost adjustment in SLSR

2017-12-14 Thread Bill Schmidt
Hi,

While looking at PR83253, I noticed that the cost model for MULT_EXPR
replacement can be improved.  Right now we use mult_by_coeff_cost to
determine the value of the possible replacement, but we use mul_cost
to determine the savings from removing the original expression.  This
overcounts the savings when the expression being replaced is a multiply
by a constant.  In such cases we can again use mult_by_coeff_cost to
determine the savings.  This will reduce the chance of making an
unprofitable replacement.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this okay for trunk?

Thanks,
Bill


2017-12-14  Bill Schmidt  

* gimple-ssa-strength-reduction.c (analyze_increments):
Distinguish replacement costs for constant strides from those for
unknown strides.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 255588)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -3083,7 +3083,17 @@ analyze_increments (slsr_cand_t first_dep, machine
   else if (first_dep->kind == CAND_MULT)
{
  int cost = mult_by_coeff_cost (incr, mode, speed);
- int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode);
+ int repl_savings;
+
+ if (tree_fits_shwi_p (first_dep->stride))
+   {
+ HOST_WIDE_INT hwi_stride = tree_to_shwi (first_dep->stride);
+ repl_savings = mult_by_coeff_cost (hwi_stride, mode, speed);
+   }
+ else
+   repl_savings = mul_cost (speed, mode);
+ repl_savings -= add_cost (speed, mode);
+
  if (speed)
cost = lowest_cost_path (cost, repl_savings, first_dep,
 incr_vec[i].incr, COUNT_PHIS);



Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-14 Thread Kyrill Tkachov


On 14/12/17 18:48, Kyrill Tkachov wrote:


On 14/12/17 18:23, Sandra Loosemore wrote:
> On 12/14/2017 10:56 AM, Kyrill Tkachov wrote:
>>
>> 2017-12-14  Kyrylo Tkachov 
>>
>>  * doc/invoke.texi (GCC Developer options): Add Target-specific
>>  developer options subsection.  Populate it with AArch64 and ARM
>>  options.
> >
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 0d565b4..00de74c 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -14223,6 +14223,24 @@ Print the compiler's built-in specs---and
>> don't do anything else.  (This
>>  is used when GCC itself is being built.) @xref{Spec Files}.
>>  @end table
>>
>> +@subsection Target-specific developer options
>
> Throughout this patch, please capitalize the first word of the section
> titles, and add @node and @cindex for each section. There are plenty
> of examples in this chapter to copy the style from.
>

Thanks, done.

>> +Some targets define additional debug options that are documented here.
>> +@subsubsection AArch64 developer options
>> +@table @gcctabopt
>> +@item -mverbose-cost-dump
>> +Enable verbose cost model dumping in the debug dump files.
>> +@item -moverride=
>> +Override CPU optimization parameters.
>
> What parameters are these?  What is the syntax?  This isn't enough
> information to be useful.
>

I'm afraid I don't know. I've not used this option myself, I'm just
aware of its existence.
Do you think it's better to leave it out altogether and document it
separately at a later date
(or ping its author to document it perhaps)?

>> +@end table
>> +
>> +@subsubsection ARM developer options
>> +@table @gcctabopt
>> +@item -mverbose-cost-dump
>> +Enable verbose cost model dumping in the debug dump files.
>> +@item -mflip-thumb
>> +Switch ARM/Thumb modes on alternating functions.
>> +@end table
>> +
>>  @node Submodel Options
>>  @section Machine-Dependent Options
>>  @cindex submodel options
>
> Please also list the new options in the "Option Summary" section. Note
> that the list for developer options is currently mostly alphabetized.
>

Thanks, done. I haven't created a new Target-specific developers options
table but instead listed the targets the options apply to in parentheses.
Attached is the latest iteration.



Actually trying to build the tree normally, not through make pdf I get 
errors such as:
" Node `Target-specific Developer Options' requires a sectioning command 
(e.g., @unnumberedsubsec)."


that I don't know how to fix yet.
So I'll work on getting that fixed first...

Sorry for the noise,
Kyrill


Thank you for the review,
Kyrill

2017-12-14  Kyrylo Tkachov  

 * doc/invoke.texi (GCC Developer options): Add Target-specific
 developer options subsection.  Populate it with AArch64 and ARM
 options.
 (Options Summary): List the above.




Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Qing Zhao

> On Dec 14, 2017, at 2:05 AM, Richard Biener  wrote:
> 
> On Wed, 13 Dec 2017, Qing Zhao wrote:
> 
>> Hi,
>> 
>> I updated gimple-fold.c as you suggested, bootstrapped and re-tested on both 
>> x86 and aarch64. no any issue.
>> 
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..eb6a87a 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg) 
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
>> +return false;
>> +  val = wide_int_to_tree (TREE_TYPE (val), 
>> +  wi::sub(wi::to_wide (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>> 
>> I plan to commit the change very soon. 
>> let me know if you have further comment.
> 
> Looks good to me.

thanks a lot for your review.

committed the patch as revision 255654
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255654 


PR 79538 was filed against GCC7.0, So, I assume that this patch need to be 
backported to GCC7 branch.
I will do the backporting to GCC7 later this week if there is no objection. 

Qing
> Richard.
> 



Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-14 Thread Jeff Law
On 12/11/2017 03:18 PM, Martin Sebor wrote:
> On 12/11/2017 02:08 PM, David Malcolm wrote:
>> On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:
>>> Bug 83369 - Missing diagnostics during inlining, notes that when
>>> -Wnonnull is issued for an inlined call to a built-in function,
>>> GCC doesn't print the inlining stack, making it hard to debug
>>> where the problem comes from.
>>>
>>> When the -Wnonnull warning was introduced into the middle-end
>>> the diagnostic machinery provided no way to print the inlining
>>> stack (analogous to %K for trees).  Since then GCC has gained
>>> support for the %G directive which does just that.  The attached
>>> patch makes use of the directive to print the inlining context
>>> for -Wnonnull.
>>>
>>> The patch doesn't include a test because the DejaGnu framework
>>> provides no mechanism to validate this part of GCC output (see
>>> also bug 83336).
>>>
>>> Tested on x86_64-linux with no regressions.
>>>
>>> Martin
>>
>> I'm wondering if we should eliminate %K and %G altogether, and make
>> tree-diagnostic.c and friends automatically print the inlining stack
>> -they just need a location_t (the issue is with system headers, I
>> suppose, but maybe we can just make that smarter: perhaps only suppress
>> if every location in the chain is in a system header?).  I wonder if
>> that would be GCC 9 material at this point though?
> 
> Getting rid of %G and %K sounds fine to me.  I can't think of
> a use case for suppressing middle end diagnostics in system
> headers so unless someone else can it might be a non-issue.
> Since the change would fix a known bug it seems to me that it
> should be acceptable even at this stage.
My recollection is we suppress warnings from bits in system headers
because neither we nor the end users necessarily have control over the
contents of the system headers -- in which case we'd be issuing warnings
for something that only the vendor can fix.


I believe default suppression of warnings from system headers probably
needs to stay.
jeff



Re: [PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2017-12-14 Thread David Malcolm
On Mon, 2017-12-11 at 21:10 -0500, Jason Merrill wrote:
> On 11/10/2017 04:45 PM, David Malcolm wrote:
> > The initial version of the patch kit added location wrapper nodes
> > around constants and uses-of-declarations, along with some other
> > places in the parser (typeid, alignof, sizeof, offsetof).
> > 
> > This version takes a much more minimal approach: it only adds
> > location wrapper nodes around the arguments at callsites, thus
> > not adding wrapper nodes around uses of constants and decls in
> > other
> > locations.
> > 
> > It keeps them for the other places in the parser (typeid, alignof,
> > sizeof, offsetof).
> > 
> > In addition, for now, each site that adds wrapper nodes is guarded
> > with !processing_template_decl, suppressing the creation of wrapper
> > nodes when processing template declarations.  This is to simplify
> > the patch kit so that we don't have to support wrapper nodes during
> > template expansion.
> 
> Hmm, it should be easy to support them, since NON_LVALUE_EXPR and 
> VIEW_CONVERT_EXPR don't otherwise appear in template trees.
> 
> Jason

I don't know if it's "easy"; it's at least non-trivial.

I attempted to support them in the obvious way by adding the two codes
to the switch statement tsubst_copy, reusing the case used by NOP_EXPR
and others, but ran into a issue when dealing with template parameter
packs.

Attached is the reproducer I've been testing with (minimized using
"delta" from a stdlib reproducer); my code was failing with:

../../src/cp-stdlib.ii: In instantiation of ‘struct 
allocator_traits >’:
../../src/cp-stdlib.ii:31:8:   required from ‘struct 
__alloc_traits, char>’
../../src/cp-stdlib.ii:43:75:   required from ‘class basic_string >’
../../src/cp-stdlib.ii:47:58:   required from here
../../src/cp-stdlib.ii:27:55: sorry, unimplemented: use of 
‘type_pack_expansion’ in template
 -> decltype(_S_construct(__a, __p, forward<_Args>(__args)...))  {   }
   ^~

The issue is that normally "__args" would be a PARM_DECL of type
TYPE_PACK_EXPANSION, and that's handled by tsubst_decl, but on adding a
wrapper node we now have a VIEW_CONVERT_EXPR of the same type i.e.
TYPE_PACK_EXPANSION wrapping the PARM_DECL.

When tsubst traverses the tree, the VIEW_CONVERT_EXPR is reached first,
and it attempts to substitute the type TYPE_PACK_EXPANSION, which leads
to the "sorry".

If I understand things right, during substitution, only tsubst_decl on
PARM_DECL can handle nodes with type with code TYPE_PACK_EXPANSION.

The simplest approach seems to be to not create wrapper nodes for decls
of type TYPE_PACK_EXPANSION, and that seems to fix the issue.

Alternatively I can handle TYPE_PACK_EXPANSION for VIEW_CONVERT_EXPR in
tsubst by remapping the type to that of what they wrap after
substitution; doing so also fixes the issue.

Does this sound correct and sane?  Do you have a preferred approach?

On fixing that I'm running into a different issue when testing the
stdlib (again with parameter packs, I think), but I'm still
investigating that one...

Thanks
Davetypedef long unsigned int size_t;

template
struct remove_reference {};

template
constexpr _Tp&&
forward(typename remove_reference<_Tp>::type& __t) noexcept
{
}

struct __allocator_traits_base {
  template
  struct __rebind
  {
using type = typename _Tp::template rebind<_Up>::other;
  };
};

template
using __alloc_rebind = typename __allocator_traits_base::template 
__rebind<_Alloc, _Up>::type;

template  struct allocator_traits {
  template  using rebind_alloc = __alloc_rebind<_Alloc, _Tp>;
  template
  static auto construct(_Alloc& __a, _Tp* __p, _Args&&... __args)
-> decltype(_S_construct(__a, __p, forward<_Args>(__args)...))  {   }
};

template
struct __alloc_traits: allocator_traits<_Alloc>{
  typedef allocator_traits<_Alloc> _Base_type;
  template   struct rebind   {   typedef typename 
_Base_type::template rebind_alloc<_Tp> other;   };
};

template class allocator {
  typedef _Tp value_type;
  template  struct rebind  {   typedef allocator<_Tp1> other;   
};
};

template
class basic_string {
  typedef typename __alloc_traits<_Alloc>::template rebind<_CharT>::other 
_Char_alloc_type;
};

template  struct _Base_bitset {
  static void foo (basic_string >) {}
};


Re: Add an "early rematerialisation" pass

2017-12-14 Thread Richard Sandiford
Jeff Law  writes:
> On 12/14/2017 04:09 AM, Richard Biener wrote:
>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>>  wrote:
>>> This patch looks for pseudo registers that are live across a call
>>> and for which no call-preserved hard registers exist.  It then
>>> recomputes the pseudos as necessary to ensure that they are no
>>> longer live across a call.  The comment at the head of the file
>>> describes the approach.
>>>
>>> A new target hook selects which modes should be treated in this way.
>>> By default none are, in which case the pass is skipped very early.
>>>
>>> It might also be worth looking for cases like:
>>>
>>>C1: R1 := f (...)
>>>...
>>>C2: R2 := f (...)
>>>C3: R1 := C2
>>>
>>> and giving the same value number to C1 and C3, effectively treating
>>> it like:
>>>
>>>C1: R1 := f (...)
>>>...
>>>C2: R2 := f (...)
>>>C3: R1 := f (...)
>>>
>>> Another (much more expensive) enhancement would be to apply value
>>> numbering to all pseudo registers (not just rematerialisation
>>> candidates), so that we can handle things like:
>>>
>>>   C1: R1 := f (...R2...)
>>>   ...
>>>   C2: R1 := f (...R3...)
>>>
>>> where R2 and R3 hold the same value.  But the current pass seems
>>> to catch the vast majority of cases.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.  OK to install?
>> 
>> Can you tell anything about the complexity of the algorithm?

Have to get back to you on that one. :-)

>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to find
>> any global optimal solution and previous hardreg assignments may work
>> against it?

Yeah, both of those are problems.  But the more important problem is
that it can't increase the live ranges of input registers as easily.
Doing it before RA means that IRA gets to see the new ranges.

>> That said - I would have expected remat to be done before the
>> first scheduling pass?  Even before pass_sms (not sure
>> what pass_live_range_shrinkage does).  Or be integrated
>> with scheduling and it's register pressure cost model.

SMS shouldn't be a problem.  Early remat wouldn't introduce new
instructions into a loop unless the loop also had a call, which would
prevent SMS.  And although it's theoretically possible that it could
remove instructions from a loop, that would only happen if:

  (a) the instruction actually computes the same value every time, so
  could have been moved outside the loop; and

  (b) the result is only used after a following call (and in particular
  isn't used within the loop itself)

(a) is a missed optimisation and (b) seems unlikely.

Integrating remat into scheduling would make it much less powerful,
since scheduling does only limited code motion between blocks.

Doing it before scheduling would be good in principle, but there
would then need to be a fake dependency between the call and remat
instructions to stop the scheduler moving the remat instructions
back before the call.  Adding early remat was a way of avoiding such
fake dependencies in "every" pass, but it might be that scheduling
is one case in which the dependencies make sense.

Either way, being able to run the pass before scheduling seems
like a future enhancement, blocked on a future enhancement to
the scheduler.

>> Also I would have expected the approach to apply to all modes,
>> just the cost of spilling is different.  But if you can, say, reduce
>> register pressure by one by rematerializing a bit-not then that
>> should be always profitable, no?  postreload-cse will come to
>> the rescue anyhow.

But that would then mean taking the register pressure into account when
deciding whether to rematerialise.  On the one hand would make it hard
to do before scheduling (which decides the final pre-RA pressure).
It would also make it a significantly different algorithm, since it
wouldn't be a standard availability problem any more.

For that use case, pressure-dependent remat in the scheduler might
be a better approach, like you were suggesting.  The early remat
pass is specifically for the extreme case of no registers being
call-preserved, where it's more important that we don't miss
remat opportunities, and more important that we treat it as
a global problem.

Thanks,
Richard


Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-14 Thread Jeff Law
On 12/11/2017 06:16 PM, David Malcolm wrote:
> On Mon, 2017-12-11 at 15:18 -0700, Martin Sebor wrote:
>> On 12/11/2017 02:08 PM, David Malcolm wrote:
>>> On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:
 Bug 83369 - Missing diagnostics during inlining, notes that when
 -Wnonnull is issued for an inlined call to a built-in function,
 GCC doesn't print the inlining stack, making it hard to debug
 where the problem comes from.

 When the -Wnonnull warning was introduced into the middle-end
 the diagnostic machinery provided no way to print the inlining
 stack (analogous to %K for trees).  Since then GCC has gained
 support for the %G directive which does just that.  The attached
 patch makes use of the directive to print the inlining context
 for -Wnonnull.

 The patch doesn't include a test because the DejaGnu framework
 provides no mechanism to validate this part of GCC output (see
 also bug 83336).

 Tested on x86_64-linux with no regressions.

 Martin
>>>
>>> I'm wondering if we should eliminate %K and %G altogether, and make
>>> tree-diagnostic.c and friends automatically print the inlining
>>> stack
>>> -they just need a location_t (the issue is with system headers, I
>>> suppose, but maybe we can just make that smarter: perhaps only
>>> suppress
>>> if every location in the chain is in a system header?).  I wonder
>>> if
>>> that would be GCC 9 material at this point though?
>>
>> Getting rid of %G and %K sounds fine to me.  I can't think of
>> a use case for suppressing middle end diagnostics in system
>> headers so unless someone else can it might be a non-issue.
>> Since the change would fix a known bug it seems to me that it
>> should be acceptable even at this stage.
Is there any kind of consensus on what we want to do here -- do we want
to try to tackle %G/%K removal for gcc-8 or defer it?  If we defer it,
how do we want to handle printing the inline stack in the interm?

jeff


[PR C++/59930] template friend classes & default args

2017-12-14 Thread Nathan Sidwell
PR 59930 concerns some problems with templated friend classes (of 
templates).  In rying to clean up our handling, I discovered we were 
accepting default args of such things.  This is ill formed


[temp.param]/12 'A default template-argument shall not be specified in a 
friend class template declaration.'


This patch addresses that problem by extending check_default_tmpl_args 
to deal with such friends.


We'll still have to deal with:
template  class X {
   template  friend class Y;
};

which is probably just as involved.  But at least this removes one 
source of confusion.


nathan
--
Nathan Sidwell
2017-12-14  Nathan Sidwell  

	PR c++/59930
	* decl.c (xref_tag_1): Correct comments about template friends and
	default args.
	* friend.c (make_friend_class): Move comments concerning
	self-friendliness to code dealing with such.
	* pt.c (check_default_tmpl_args): Deal with template friend
	classes too.
	(push_template_decl_real): Adjust for template friend classes.

	PR c++/59930
	* g++.dg/cpp0x/temp_default4.C: Adjust diagnostic.
	* g++.old-deja/g++.pt/friend23.C: Likewise.
	* g++.old-deja/g++.pt/friend24.C: Delete.

Index: cp/decl.c
===
--- cp/decl.c	(revision 255634)
+++ cp/decl.c	(working copy)
@@ -13538,37 +13538,28 @@ xref_tag_1 (enum tag_types tag_code, tre
 	 processing a (member) template declaration of a template
 	 class, we must be very careful; consider:
 
-	   template 
-	   struct S1
+	   template  struct S1
 
-	   template 
-	   struct S2
-	   { template 
-	   friend struct S1; };
+	   template  struct S2
+	   {
+	 template  friend struct S1;
+	   };
 
 	 Here, the S2::S1 declaration should not be confused with the
 	 outer declaration.  In particular, the inner version should
-	 have a template parameter of level 2, not level 1.  This
-	 would be particularly important if the member declaration
-	 were instead:
-
-	   template  friend struct S1;
-
-	 say, when we should tsubst into `U' when instantiating
-	 S2.  On the other hand, when presented with:
-
-	   template 
-	   struct S1 {
-	 template 
-	 struct S2 {};
-	 template 
-	 friend struct S2;
+	 have a template parameter of level 2, not level 1.
+
+	 On the other hand, when presented with:
+
+	   template  struct S1
+	   {
+	 template  struct S2 {};
+	 template  friend struct S2;
 	   };
 
-	 we must find the inner binding eventually.  We
-	 accomplish this by making sure that the new type we
-	 create to represent this declaration has the right
-	 TYPE_CONTEXT.  */
+	 the friend must find S1::S2 eventually.  We accomplish this
+	 by making sure that the new type we create to represent this
+	 declaration has the right TYPE_CONTEXT.  */
   context = TYPE_CONTEXT (t);
   t = NULL_TREE;
 }
@@ -13622,9 +13613,10 @@ xref_tag_1 (enum tag_types tag_code, tre
 	  return error_mark_node;
 	}
 
-  /* Make injected friend class visible.  */
   if (scope != ts_within_enclosing_non_class && TYPE_HIDDEN_P (t))
 	{
+	  /* This is no longer an invisible friend.  Make it
+	 visible.  */
 	  tree decl = TYPE_NAME (t);
 
 	  DECL_ANTICIPATED (decl) = false;
Index: cp/friend.c
===
--- cp/friend.c	(revision 255634)
+++ cp/friend.c	(working copy)
@@ -283,21 +283,18 @@ make_friend_class (tree type, tree frien
 return;
 
   if (friend_depth)
-/* If the TYPE is a template then it makes sense for it to be
-   friends with itself; this means that each instantiation is
-   friends with all other instantiations.  */
 {
+  /* [temp.friend] Friend declarations shall not declare partial
+	 specializations.  */
   if (CLASS_TYPE_P (friend_type)
 	  && CLASSTYPE_TEMPLATE_SPECIALIZATION (friend_type)
 	  && uses_template_parms (friend_type))
 	{
-	  /* [temp.friend]
-	 Friend declarations shall not declare partial
-	 specializations.  */
 	  error ("partial specialization %qT declared %",
 		 friend_type);
 	  return;
 	}
+
   if (TYPE_TEMPLATE_INFO (friend_type)
 	  && !PRIMARY_TEMPLATE_P (TYPE_TI_TEMPLATE (friend_type)))
 	{
@@ -311,7 +308,11 @@ make_friend_class (tree type, tree frien
 	  return;
 	}
 }
-  else if (same_type_p (type, friend_type))
+
+  /* It makes sense for a template class to be friends with itself,
+ that means the instantiations can be friendly.  Other cases are
+ not so meaningful.  */
+  if (!friend_depth && same_type_p (type, friend_type))
 {
   if (complain)
 	warning (0, "class %qT is implicitly friends with itself",
Index: cp/pt.c
===
--- cp/pt.c	(revision 255634)
+++ cp/pt.c	(working copy)
@@ -4980,9 +4980,10 @@ fixed_parameter_pack_p (tree parm)
a primary template.  IS_PARTIAL is true if DECL is a partial
specialization.
 
-   IS_FRIEND_DECL is nonzero if DECL is a friend function template
-   declaration (but not a definition); 1

Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-14 Thread Jeff Law
On 12/11/2017 09:51 AM, Martin Sebor wrote:
> Bug 83369 - Missing diagnostics during inlining, notes that when
> -Wnonnull is issued for an inlined call to a built-in function,
> GCC doesn't print the inlining stack, making it hard to debug
> where the problem comes from.
> 
> When the -Wnonnull warning was introduced into the middle-end
> the diagnostic machinery provided no way to print the inlining
> stack (analogous to %K for trees).  Since then GCC has gained
> support for the %G directive which does just that.  The attached
> patch makes use of the directive to print the inlining context
> for -Wnonnull.
> 
> The patch doesn't include a test because the DejaGnu framework
> provides no mechanism to validate this part of GCC output (see
> also bug 83336).
> 
> Tested on x86_64-linux with no regressions.
So thinking more about this, I don't see a strong reason not to move
forward with this fix in the immediate term.

If we ultimately remove %K/%G, then obviously we'll revisit this chunk
of code in the process.

OK for the trunk.

jeff


Re: Add an "early rematerialisation" pass

2017-12-14 Thread Richard Biener
On December 14, 2017 8:26:49 PM GMT+01:00, Richard Sandiford 
 wrote:
>Jeff Law  writes:
>> On 12/14/2017 04:09 AM, Richard Biener wrote:
>>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>>>  wrote:
 This patch looks for pseudo registers that are live across a call
 and for which no call-preserved hard registers exist.  It then
 recomputes the pseudos as necessary to ensure that they are no
 longer live across a call.  The comment at the head of the file
 describes the approach.

 A new target hook selects which modes should be treated in this
>way.
 By default none are, in which case the pass is skipped very early.

 It might also be worth looking for cases like:

C1: R1 := f (...)
...
C2: R2 := f (...)
C3: R1 := C2

 and giving the same value number to C1 and C3, effectively treating
 it like:

C1: R1 := f (...)
...
C2: R2 := f (...)
C3: R1 := f (...)

 Another (much more expensive) enhancement would be to apply value
 numbering to all pseudo registers (not just rematerialisation
 candidates), so that we can handle things like:

   C1: R1 := f (...R2...)
   ...
   C2: R1 := f (...R3...)

 where R2 and R3 hold the same value.  But the current pass seems
 to catch the vast majority of cases.

 Tested on aarch64-linux-gnu (with and without SVE),
>x86_64-linux-gnu
 and powerpc64le-linux-gnu.  OK to install?
>>> 
>>> Can you tell anything about the complexity of the algorithm?
>
>Have to get back to you on that one. :-)
>
>>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to
>find
>>> any global optimal solution and previous hardreg assignments may
>work
>>> against it?
>
>Yeah, both of those are problems.  But the more important problem is
>that it can't increase the live ranges of input registers as easily.
>Doing it before RA means that IRA gets to see the new ranges.
>
>>> That said - I would have expected remat to be done before the
>>> first scheduling pass?  Even before pass_sms (not sure
>>> what pass_live_range_shrinkage does).  Or be integrated
>>> with scheduling and it's register pressure cost model.
>
>SMS shouldn't be a problem.  Early remat wouldn't introduce new
>instructions into a loop unless the loop also had a call, which would
>prevent SMS.  And although it's theoretically possible that it could
>remove instructions from a loop, that would only happen if:
>
>  (a) the instruction actually computes the same value every time, so
>  could have been moved outside the loop; and
>
>  (b) the result is only used after a following call (and in particular
>  isn't used within the loop itself)
>
>(a) is a missed optimisation and (b) seems unlikely.
>
>Integrating remat into scheduling would make it much less powerful,
>since scheduling does only limited code motion between blocks.
>
>Doing it before scheduling would be good in principle, but there
>would then need to be a fake dependency between the call and remat
>instructions to stop the scheduler moving the remat instructions
>back before the call.  Adding early remat was a way of avoiding such
>fake dependencies in "every" pass, but it might be that scheduling
>is one case in which the dependencies make sense.
>
>Either way, being able to run the pass before scheduling seems
>like a future enhancement, blocked on a future enhancement to
>the scheduler.
>
>>> Also I would have expected the approach to apply to all modes,
>>> just the cost of spilling is different.  But if you can, say, reduce
>>> register pressure by one by rematerializing a bit-not then that
>>> should be always profitable, no?  postreload-cse will come to
>>> the rescue anyhow.
>
>But that would then mean taking the register pressure into account when
>deciding whether to rematerialise.  On the one hand would make it hard
>to do before scheduling (which decides the final pre-RA pressure).
>It would also make it a significantly different algorithm, since it
>wouldn't be a standard availability problem any more.
>
>For that use case, pressure-dependent remat in the scheduler might
>be a better approach, like you were suggesting.  The early remat
>pass is specifically for the extreme case of no registers being
>call-preserved, where it's more important that we don't miss
>remat opportunities, and more important that we treat it as
>a global problem.

On x86_64 all xmm registers are caller saved for example. That means all FP 
regs and all vectors. (yeah, stupid ABI decision)

Richard. 

>Thanks,
>Richard



Re: [PATCH] Slight cost adjustment in SLSR

2017-12-14 Thread Richard Biener
On December 14, 2017 8:12:36 PM GMT+01:00, Bill Schmidt 
 wrote:
>Hi,
>
>While looking at PR83253, I noticed that the cost model for MULT_EXPR
>replacement can be improved.  Right now we use mult_by_coeff_cost to
>determine the value of the possible replacement, but we use mul_cost
>to determine the savings from removing the original expression.  This
>overcounts the savings when the expression being replaced is a multiply
>by a constant.  In such cases we can again use mult_by_coeff_cost to
>determine the savings.  This will reduce the chance of making an
>unprofitable replacement.
>
>Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>regressions.  Is this okay for trunk?

OK. 

Richard. 

>Thanks,
>Bill
>
>
>2017-12-14  Bill Schmidt  
>
>   * gimple-ssa-strength-reduction.c (analyze_increments):
>   Distinguish replacement costs for constant strides from those for
>   unknown strides.
>
>
>Index: gcc/gimple-ssa-strength-reduction.c
>===
>--- gcc/gimple-ssa-strength-reduction.c(revision 255588)
>+++ gcc/gimple-ssa-strength-reduction.c(working copy)
>@@ -3083,7 +3083,17 @@ analyze_increments (slsr_cand_t first_dep,
>machine
>   else if (first_dep->kind == CAND_MULT)
>   {
> int cost = mult_by_coeff_cost (incr, mode, speed);
>-int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode);
>+int repl_savings;
>+
>+if (tree_fits_shwi_p (first_dep->stride))
>+  {
>+HOST_WIDE_INT hwi_stride = tree_to_shwi (first_dep->stride);
>+repl_savings = mult_by_coeff_cost (hwi_stride, mode, speed);
>+  }
>+else
>+  repl_savings = mul_cost (speed, mode);
>+repl_savings -= add_cost (speed, mode);
>+
> if (speed)
>   cost = lowest_cost_path (cost, repl_savings, first_dep,
>incr_vec[i].incr, COUNT_PHIS);



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Jeff Law
On 12/14/2017 12:22 PM, Qing Zhao wrote:
> 
>> On Dec 14, 2017, at 2:05 AM, Richard Biener > > wrote:
>>
>> On Wed, 13 Dec 2017, Qing Zhao wrote:
>>
>>> Hi,
>>>
>>> I updated gimple-fold.c as you suggested, bootstrapped and re-tested
>>> on both x86 and aarch64. no any issue.
>>>
>>> 
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 353a46e..eb6a87a 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>>> bitmap *visited, int type,
>>>  the array could have zero length.  */
>>>   *minlen = ssize_int (0);
>>> }
>>> +
>>> +  if (VAR_P (arg) 
>>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>>> +{
>>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
>>> +return false;
>>> +  val = wide_int_to_tree (TREE_TYPE (val), 
>>> +  wi::sub(wi::to_wide (val), 1));
>>> +  /* Set the minimum size to zero since the string in
>>> + the array could have zero length.  */
>>> +  *minlen = ssize_int (0);
>>> +}
>>> }
>>> 
>>>
>>> I plan to commit the change very soon. 
>>> let me know if you have further comment.
>>
>> Looks good to me.
> 
> thanks a lot for your review.
> 
> committed the patch as revision 255654
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255654
> 
> PR 79538 was filed against GCC7.0, So, I assume that this patch need to
> be backported to GCC7 branch.
> I will do the backporting to GCC7 later this week if there is no objection. 
We don't try to backport all fixes to the release branches -- we tend to
focus more on regressions that apply to those releases and codegen
correctness issues.

I'd think a missed warning isn't that important to backport.

Jeff


[PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-14 Thread Qing Zhao
Hi,

I am not sure whether it’s proper to send this patch during this late stage.
however, since the patch itself is quite straightforward, I decided to send it 
now. 

=

2nd Patch for PR78009
Patch for PR83026

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 

Inline strcmp with small constant strings

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026 

missing strlen optimization for strcmp of unequal strings

The design doc for PR78809 is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html 


this patch is for the second part of change of PR78809 and PR83026:

B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0

  B.1. (PR83026) When both arguments are constant strings and it's a strcmp:
 * if the length of the strings are NOT equal, we can safely fold the call
   to a non-zero value.
 * otherwise, do nothing now.

  B.2. (PR78809) When one of the arguments is a constant string, try to replace
  the call with a __builtin_memcmp_eq call where possible, i.e:

  strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is a
  constant string, C is a constant.
if (C <= strlen(STR) && sizeof_array(s) > C)
  {
replace this call with
memcmp (s, STR, C) (!)= 0
  }
if (C > strlen(STR)
  {
it can be safely treated as a call to strcmp (s, STR) (!)= 0
can handled by the following strcmp.
  }

  strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
  constant string.
if  (sizeof_array(s) > strlen(STR))
  {
replace this call with
memcmp (s, STR, strlen(STR)+1) (!)= 0
  }

  (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a simple
   sequence to access all bytes and accumulate the overall result in GCC by
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 


   as a result, such str(n)cmp call would like to be replaced by simple
   sequences to access all types and accumulate the overall results.

adding test case strcmpopt_2.c into gcc.dg for part B of PR78809
adding test case strcmpopt_3.c into gcc.dg for PR83026

bootstraped and tested on both X86 and Aarch64. no regression.

Okay for trunk?

thanks.

Qing


gcc/ChangeLog:

2017-12-11  Qing Zhao  mailto:qing.z...@oracle.com>>

PR middle-end/78809
PR middle-end/83026
* tree-ssa-strlen.c (compute_string_length): New function.
(handle_builtin_string_cmp): New function to handle calls to
string compare functions.
(strlen_optimize_stmt): Add handling to builtin string compare
calls. 

gcc/testsuite/ChangeLog:

2017-12-11  Qing Zhao  mailto:qing.z...@oracle.com>>

PR middle-end/78809
* gcc.dg/strcmpopt_2.c: New testcase.

PR middle-end/83026
* gcc.dg/strcmpopt_3.c: New testcase.


---
 gcc/testsuite/gcc.dg/strcmpopt_2.c |  67 +
 gcc/testsuite/gcc.dg/strcmpopt_3.c |  27 +
 gcc/tree-ssa-strlen.c  | 197 +
 3 files changed, 291 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_2.c
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_3.c

diff --git a/gcc/testsuite/gcc.dg/strcmpopt_2.c 
b/gcc/testsuite/gcc.dg/strcmpopt_2.c
new file mode 100644
index 000..ac8ff39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strcmpopt_2.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+char s[100] = {'a','b','c','d'};
+typedef struct { int x; char s[8]; } S;
+
+__attribute__ ((noinline)) int 
+f1 (S *s) 
+{ 
+  return __builtin_strcmp (s->s, "abc") != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f2 (void) 
+{ 
+  return __builtin_strcmp (s, "abc") != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f3 (S *s) 
+{ 
+  return __builtin_strcmp ("abc", s->s) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f4 (void) 
+{ 
+  return __builtin_strcmp ("abc", s) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f5 (S *s) 
+{ 
+  return __builtin_strncmp (s->s, "abc", 3) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f6 (void) 
+{ 
+  return __builtin_strncmp (s, "abc", 2) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f7 (S *s) 
+{ 
+  return __builtin_strncmp ("abc", s->s, 3) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f8 (void) 
+{ 
+  return __builtin_strncmp ("abc", s, 2) != 0; 
+}
+
+int main (void)
+{
+  S ss = {2, {'a','b','c'}};
+
+  if (f1 (&ss) != 0 || f2 () != 1 || f3 (&ss) != 0 ||
+  f4 () != 1 || f5 (&ss) != 0 || f6 () != 0 ||
+  f7 (&ss) != 0 || f8 () != 0)
+__builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "memcmp_eq \\(" 8 "strlen" } } */
diff --git a/gcc/testsuite/gcc.dg/strcmpopt_3.c 
b/gcc/testsuite/gcc.dg/strcmpopt_3.c
new file mode

Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Qing Zhao

> On Dec 14, 2017, at 1:36 PM, Jeff Law  wrote:
> 
> On 12/14/2017 12:22 PM, Qing Zhao wrote:
>> 
>>> On Dec 14, 2017, at 2:05 AM, Richard Biener >> > wrote:
>>> 
>>> On Wed, 13 Dec 2017, Qing Zhao wrote:
>>> 
 Hi,
 
 I updated gimple-fold.c as you suggested, bootstrapped and re-tested
 on both x86 and aarch64. no any issue.
 
 
 diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
 index 353a46e..eb6a87a 100644
 --- a/gcc/gimple-fold.c
 +++ b/gcc/gimple-fold.c
 @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
 bitmap *visited, int type,
  the array could have zero length.  */
   *minlen = ssize_int (0);
 }
 +
 +  if (VAR_P (arg) 
 +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
 +{
 +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
 +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
 +return false;
 +  val = wide_int_to_tree (TREE_TYPE (val), 
 +  wi::sub(wi::to_wide (val), 1));
 +  /* Set the minimum size to zero since the string in
 + the array could have zero length.  */
 +  *minlen = ssize_int (0);
 +}
 }
 
 
 I plan to commit the change very soon. 
 let me know if you have further comment.
>>> 
>>> Looks good to me.
>> 
>> thanks a lot for your review.
>> 
>> committed the patch as revision 255654
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255654
>> 
>> PR 79538 was filed against GCC7.0, So, I assume that this patch need to
>> be backported to GCC7 branch.
>> I will do the backporting to GCC7 later this week if there is no objection. 
> We don't try to backport all fixes to the release branches -- we tend to
> focus more on regressions that apply to those releases and codegen
> correctness issues.
> 
> I'd think a missed warning isn't that important to backport.

Okay. I see. 

then I will close PR79538 as fixed.

thanks.

Qing
> 
> Jeff



[PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-14 Thread Jakub Jelinek
Hi!

The following testcase FAILs -fcompare-debug, because one COND_EXPR
branch from the FE during gimplifications is just >
which doesn't have TREE_SIDE_EFFECTS, but for -gstatement-frontiers it
is a STATEMENT_LIST which contains # DEBUG BEGIN_STMT and that >.  Neither # DEBUG BEGIN_STMT nor that NOP_EXPR have
TREE_SIDE_EFFECTS, but STATEMENT_LIST has TREE_SIDE_EFFECTS already from
make_node and the gimplifier (and apparently the C++ FE too) checks
just that bit.  With { { { 0; } { 1; } { 2; } { 3; } } } one can end up
with quite large STATEMENT_LIST subtrees which in reality still don't
have any side-effects.
Maintaining accurate TREE_SIDE_EFFECTS bit on STATEMENT_LIST would be hard,
if we would only add and never remove statements, then we could just clear
it during make_node and set it whenever adding TREE_SIDE_EFFECTS statement
into it, but as soon as we sometimes remove from STATEMENT_LIST, or merge
STATEMENT_LISTs etc., maintaining this is too IMHO expensive, especially
when we usually just will not care about it.
So, I think it is better to just compute this lazily in the few spots where
we are interested about this, in real-world testcases most of the
STATEMENT_LISTs will have side-effects and should find them pretty early
when walking the tree.
As a side-effect, this patch will handle those
{ { { 0; } { 1; } { 2; } { 3; } } } and similar then/else statement lists
better.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-14  Jakub Jelinek  

PR debug/83419
* tree.h (statement_with_side_effects_p): Declare.
* tree.c (statement_with_side_effects_p): New function.
* gimplify.c (shortcut_cond_expr, gimplify_cond_expr): Use it.

* cp-gimplify.c (genericize_if_stmt, gimplify_expr_stmt,
cp_fold): Use statement_with_side_effects_p instead of
just TREE_SIDE_EFFECTS.

* gcc.dg/pr83419.c: New test.

--- gcc/tree.h.jj   2017-12-12 09:48:15.0 +0100
+++ gcc/tree.h  2017-12-14 13:22:34.157858781 +0100
@@ -4780,6 +4780,7 @@ extern tree obj_type_ref_class (const_tr
 extern bool types_same_for_odr (const_tree type1, const_tree type2,
bool strict=false);
 extern bool contains_bitfld_component_ref_p (const_tree);
+extern bool statement_with_side_effects_p (tree);
 extern bool block_may_fallthru (const_tree);
 extern void using_eh_for_cleanups (void);
 extern bool using_eh_for_cleanups_p (void);
--- gcc/tree.c.jj   2017-12-12 09:48:15.0 +0100
+++ gcc/tree.c  2017-12-14 13:21:25.857752480 +0100
@@ -12296,6 +12296,26 @@ contains_bitfld_component_ref_p (const_t
   return false;
 }
 
+/* Return true if STMT has side-effects.  This is like
+   TREE_SIDE_EFFECTS (stmt), except it returns false for NULL and if STMT
+   is a STATEMENT_LIST, it recurses on the statements.  */
+
+bool
+statement_with_side_effects_p (tree stmt)
+{
+  if (stmt == NULL_TREE)
+return false;
+  if (TREE_CODE (stmt) != STATEMENT_LIST)
+return TREE_SIDE_EFFECTS (stmt);
+
+  for (tree_stmt_iterator i = tsi_start (stmt);
+   !tsi_end_p (i); tsi_next (&i))
+if (statement_with_side_effects_p (tsi_stmt (i)))
+  return true;
+
+  return false;
+}
+
 /* Try to determine whether a TRY_CATCH expression can fall through.
This is a subroutine of block_may_fallthru.  */
 
--- gcc/gimplify.c.jj   2017-12-14 11:53:34.907142223 +0100
+++ gcc/gimplify.c  2017-12-14 13:18:19.261184074 +0100
@@ -3637,8 +3637,8 @@ shortcut_cond_expr (tree expr)
   tree *true_label_p;
   tree *false_label_p;
   bool emit_end, emit_false, jump_over_else;
-  bool then_se = then_ && TREE_SIDE_EFFECTS (then_);
-  bool else_se = else_ && TREE_SIDE_EFFECTS (else_);
+  bool then_se = statement_with_side_effects_p (then_);
+  bool else_se = statement_with_side_effects_p (else_);
 
   /* First do simple transformations.  */
   if (!else_se)
@@ -3656,7 +3656,7 @@ shortcut_cond_expr (tree expr)
  if (rexpr_has_location (pred))
SET_EXPR_LOCATION (expr, rexpr_location (pred));
  then_ = shortcut_cond_expr (expr);
- then_se = then_ && TREE_SIDE_EFFECTS (then_);
+ then_se = statement_with_side_effects_p (then_);
  pred = TREE_OPERAND (pred, 0);
  expr = build3 (COND_EXPR, void_type_node, pred, then_, NULL_TREE);
  SET_EXPR_LOCATION (expr, locus);
@@ -3678,7 +3678,7 @@ shortcut_cond_expr (tree expr)
  if (rexpr_has_location (pred))
SET_EXPR_LOCATION (expr, rexpr_location (pred));
  else_ = shortcut_cond_expr (expr);
- else_se = else_ && TREE_SIDE_EFFECTS (else_);
+ else_se = statement_with_side_effects_p (else_);
  pred = TREE_OPERAND (pred, 0);
  expr = build3 (COND_EXPR, void_type_node, pred, NULL_TREE, else_);
  SET_EXPR_LOCATION (expr, locus);
@@ -3982,9 +3982,9 @@ gimplify_cond_expr (tree *expr_p, gimple
  if (gimplify_ctxp->allow_rhs_cond_expr
  /* If either branch has s

[PATCH] Fix (-A) - B -> (-B) - A optimization in fold_binary_loc (PR tree-optimization/83269)

2017-12-14 Thread Jakub Jelinek
Hi!

As the following testcase shows, the (-A) - B -> (-B) - A optimization can't
be done the way it is if the negation of A is performed in type with
wrapping behavior while the subtraction is done in signed type (with the
same precision), as if A is (unsigned) INT_MIN, then (int) -(unsigned) INT_MIN
is INT_MIN and INT_MIN - B is different from (-B) - INT_MIN.
The reason we can see this is because we check that arg0 is NEGATE_EXPR, but
arg0 is STRIP_NOPS from op0.  If the NEGATE_EXPR is already done in signed
type, then it would be already UB if A was INT_MIN and so we can safely do
it.

Whether we perform the subtraction in the unsigned type or just don't
optimize I think doesn't matter that much, at least the only spot during
x86_64-linux and i686-linux bootstraps/regtests this new condition triggered
was the new testcase, nothing else.  So if you instead prefer to punt, I can
tweak the patch, move the negated condition to the if above it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-14  Jakub Jelinek  

PR tree-optimization/83269
* fold-const.c (fold_binary_loc): Perform (-A) - B -> (-B) - A
subtraction in arg0's type if type is signed and arg0 is unsigned.
Formatting fix.

* gcc.c-torture/execute/pr83269.c: New test.

--- gcc/fold-const.c.jj 2017-12-08 00:50:27.0 +0100
+++ gcc/fold-const.c2017-12-14 17:42:31.221398170 +0100
@@ -9098,8 +9098,8 @@ expr_not_equal_to (tree t, const wide_in
return NULL_TREE.  */
 
 tree
-fold_binary_loc (location_t loc,
-enum tree_code code, tree type, tree op0, tree op1)
+fold_binary_loc (location_t loc, enum tree_code code, tree type,
+tree op0, tree op1)
 {
   enum tree_code_class kind = TREE_CODE_CLASS (code);
   tree arg0, arg1, tem;
@@ -9770,10 +9770,34 @@ fold_binary_loc (location_t loc,
   /* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
   if (TREE_CODE (arg0) == NEGATE_EXPR
  && negate_expr_p (op1))
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   negate_expr (op1),
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)));
+   {
+ /* If arg0 is e.g. unsigned int and type is int, then we need to
+perform the subtraction in arg0's type, because if A is
+INT_MIN at runtime, the original expression can be well defined
+while the latter is not.  See PR83269.  */
+ if (ANY_INTEGRAL_TYPE_P (type)
+ && TYPE_OVERFLOW_UNDEFINED (type)
+ && ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+ && !TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)))
+   {
+ /* Don't do this when sanitizing, as by doing the subtraction
+in unsigned type we won't notice if the original program
+has been buggy.  */
+ if (!TYPE_OVERFLOW_SANITIZED (type))
+   {
+ tem = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (arg0),
+fold_convert_loc (loc,
+  TREE_TYPE (arg0),
+  negate_expr (op1)),
+TREE_OPERAND (arg0, 0));
+ return fold_convert_loc (loc, type, tem);
+   }
+   }
+ else
+   return fold_build2_loc (loc, MINUS_EXPR, type, negate_expr (op1),
+   fold_convert_loc (loc, type,
+ TREE_OPERAND (arg0, 0)));
+   }
 
   /* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
 __complex__ ( x, -y ).  This is not the same for SNaNs or if
--- gcc/testsuite/gcc.c-torture/execute/pr83269.c.jj2017-12-14 
17:43:24.534710997 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr83269.c   2017-12-14 
17:43:10.0 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/83269 */
+
+int
+main ()
+{
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ > 4 && __CHAR_BIT__ == 8
+  volatile unsigned char a = 1;
+  long long b = 0x8000L;
+  int c = -((int)(-b) - (-0x7fff * a));
+  if (c != 1)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub


[committed] Small formatting fix

2017-12-14 Thread Jakub Jelinek
Hi!

While looking into PR83422 I've noticed this formatting glitch,
bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk as obvious.

2017-12-14  Jakub Jelinek  

* var-tracking.c (variable_tracking_main_1): Formatting fix.

--- gcc/var-tracking.c.jj   2017-12-14 12:02:07.0 +0100
+++ gcc/var-tracking.c  2017-12-14 17:58:12.136309252 +0100
@@ -10424,8 +10424,8 @@ variable_tracking_main_1 (void)
   if (!flag_var_tracking)
 return 0;
 
-  if (n_basic_blocks_for_fn (cfun) > 500 &&
-  n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun) >= 20)
+  if (n_basic_blocks_for_fn (cfun) > 500
+  && n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun) >= 20)
 {
   vt_debug_insns_local (true);
   return 0;

Jakub


Re: [PATCH] make canonicalize_condition keep its promise

2017-12-14 Thread Jeff Law
On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
>> On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
>>> On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
 On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> So, the story of this very small patch starts with me adding
> patterns
> for ppc instructions bdz[tf] and bdnz[tf] such as this:
>
>   [(set (pc)
>   (if_then_else
>     (and
>    (ne (match_operand:P 1 "register_operand"
> "c,*b,*b,*b")
>    (const_int 1))
>    (match_operator 3 "branch_comparison_operator"
>     [(match_operand 4 "cc_reg_operand"
> "y,y,y,y")
>      (const_int 0)]))
>     (label_ref (match_operand 0))
>     (pc)))
>    (set (match_operand:P 2 "nonimmediate_operand"
> "=1,*r,m,*d*wi*c*l")
>   (plus:P (match_dup 1)
>   (const_int -1)))
>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
>
> However when this gets to the loop_doloop pass, we get an
> assert
> fail
> in iv_number_of_iterations():
>
>   gcc_assert (COMPARISON_P (condition));
>
> This is happening because this branch insn tests two things
> ANDed
> together so the and is at the top of the expression, not a
> comparison.

 Is this something you've created for an existing
 loop?  Presumably an
 existing loop that previously was a simple loop?
>>>
>>> The rtl to use this instruction is generated by new code I'm
>>> working on
>>> to do a builtin expansion of memcmp using a loop. I call
>>> gen_bdnztf_di() to generate rtl for the insn. It would be nice to
>>> be
>>> able to generate this instruction from doloop conversion but that
>>> is
>>> beyond the scope of what I'm working on presently.
>>
>> Understood.
>>
>> So what I think (and I'm hoping you can confirm one way or the other)
>> is
>> that by generating this instruction you're turing a loop which
>> previously was considered a simple loop by the IV code and turning it
>> into something the IV bits no longer think is a simple loop.
>>
>> I think that's problematical as when the loop is thought to be a
>> simple
>> loop, it has to have a small number of forms for its loop back/exit
>> loop
>> tests and whether or not a loop is a simple loop is cached in the
>> loop
>> structure.
>>
>> I think we need to dig into that first.  If my suspicion is correct
>> then
>> this patch is really just papering over that deeper problem.  So I
>> think
>> you need to dig a big deeper into why you're getting into the code in
>> question (canonicalize_condition) and whether or not the call chain
>> makes any sense given the changes you've made to the loop.
>>
> 
> Jeff,
>   There is no existing loop structure. This starts with a memcmp() call
> and then goes down through the builtin expansion mechanism, which is
> ultimately expanding the pattern cmpmemsi which is where my code is
> generating a loop that finishes with bdnzt. The code that's ultimately
> generated looks like this:
Understood.  But what I still struggle with is how you're getting into
check_simple_exit to begin with and whether or not that should be happening.


The only way to get into check_simple_exit is via find_simple_exit which
is only called from get_simple_loop_desc.

And if you're calling get_simple_loop_desc, then there is some kind of
loop structure in place AFAICT that contains this insn which is rather
surprising.

> 
> I really think the ultimate problem here is that both
> canonicalize_condition and get_condition promise in their documenting
> comments that they will return something that has a cond at the root of
> the rtx, or 0 if they don't understand what they're given. In this case
> they do not understand the rtx of bdnzt and are returning rtx rooted
> with an and, not a cond. This may seem like papering over the problem,
> but I think it is legitimate for these functions to return 0 when the
> branch insn in question does not have a simple cond at the heart of it.
> And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> Ultimately, yes something better ought to be done here.



Your pattern has the form:

  [(set (pc)
(if_then_else
  (and
 (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
 (const_int 1))
 (match_operator 3 "branch_comparison_operator"
  [(match_operand 4 "cc_reg_operand" "y,y,y,y")
   (const_int 0)]))
  (label_ref (match_operand 0))
  (pc)))
   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
(const_int -1)))
   (clobber (match_scratch:P 5 "=X,X,&r,r"))
   (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
   (

Re: [wwwdocs] mention AVR additions

2017-12-14 Thread Gerald Pfeifer

Hi Johann,

On Wed, 13 Dec 2017, Georg-Johann Lay wrote:

This adds AVR improvements to v8 Release Notes.


that's quite impressive a set of improvements!

Index: changes.html
===
The new devices are filed under 
https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html";>-mmcu=avrxmega3.


I'm not sure whether "filed under" is the best choice of wording (and
when I say that, I really am not).  So perhaps use "listed" under or
similar?


features like PROGMEM and __flash
are no more needed (as opposed to other AVR families for which


"are not needed any more"


that the compiler is used with Binutils 2.29 or newer so that
read-only data will be located in flash memory, see feature
https://sourceware.org/PR21472";>PR21472.


How about using "read-only data will be located in flash memory" the
text for that link, and omit the "see feature PR..." part?  That'll
make it easier to read.


  A new command line option -mshort-calls is supported.


"command-line"


This option is used internally for multilib selection of the
avrxmega3 variants.
It is not an optimization option, and you don't need to set
it by hand.


"do not" to emphasize this more strongly.


The compiler now implements feature http://gcc.gnu.org/PR20296";>PR20296
and will generate more efficient interrupt service routine (ISR)
prologues and epilogues.


Here I suggest "The compiler now generates..." (and perhaps make this a
link to the PR, but use https here).


  A new command line option -mno-gas-isr-prologues


"command-line"


has been added.  It disables the generation of the


How about "A new command line option -mno-gas-isr-prologues
disable the generation..."?  Shorter and easier to read.


Any non-naked ISR will save and restore SREG, tmp_reg and zero_reg,


... around SREG, tmp_reg and zero_reg.


  The feature is turned on per default for all optimization levels
except for -O0 and -Og. It can still be
  enabled by means of option -mgas-isr-prologues.


"It is explicitly enabled by..."


  Support has been added for a new
https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html";>AVR 
function attribute
no_gccisr.


Is there something missing after "a new"?  "A new" what?


Please go ahead and commit (taking the above into account) and just
share the final patch here in response.

Thank you for taking the time to document all this!

Gerald


[PATCH] PR libstdc++/68519 use native duration to avoid rounding errors

2017-12-14 Thread Jonathan Wakely

The result of system_clock::now() + duration(1) is a
duration with the same value as now(), due to the limited
precision of float.

This patch converts the duration to system_clock::duration before
doing the sum, so that we don't lose precision.

PR libstdc++/68519
* include/std/condition_variable (condition_variable::wait_for):
Convert duration to native clock's duration before addition.
* testsuite/30_threads/condition_variable/members/68519.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit bf3d2e0ce692a43f9370a3c98fd7dec62334ffe7
Author: Jonathan Wakely 
Date:   Thu Dec 14 20:18:30 2017 +

PR libstdc++/68519 use native duration to avoid rounding errors

PR libstdc++/68519
* include/std/condition_variable (condition_variable::wait_for):
Convert duration to native clock's duration before addition.
* testsuite/30_threads/condition_variable/members/68519.cc: New 
test.

diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 1d8f057ceb6..6d20d365531 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -135,14 +135,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   cv_status
   wait_for(unique_lock& __lock,
   const chrono::duration<_Rep, _Period>& __rtime)
-  { return wait_until(__lock, __clock_t::now() + __rtime); }
+  {
+   using __dur = typename __clock_t::duration;
+   auto __reltime = chrono::duration_cast<__dur>(__rtime);
+   if (__reltime < __rtime)
+ ++__reltime;
+   return wait_until(__lock, __clock_t::now() + __reltime);
+  }
 
 template
   bool
   wait_for(unique_lock& __lock,
   const chrono::duration<_Rep, _Period>& __rtime,
   _Predicate __p)
-  { return wait_until(__lock, __clock_t::now() + __rtime, std::move(__p)); 
}
+  {
+   using __dur = typename __clock_t::duration;
+   auto __reltime = chrono::duration_cast<__dur>(__rtime);
+   if (__reltime < __rtime)
+ ++__reltime;
+   return wait_until(__lock, __clock_t::now() + __reltime, std::move(__p));
+  }
 
 native_handle_type
 native_handle()
diff --git 
a/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc 
b/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc
new file mode 100644
index 000..71c1d29e231
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc
@@ -0,0 +1,51 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run }
+// { dg-options "-pthread"  }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target pthread }
+// { dg-require-cstdint "" }
+// { dg-require-gthreads "" }
+
+#include 
+#include 
+
+// PR libstdc++/68519
+
+bool val = false;
+std::mutex mx;
+std::condition_variable cv;
+
+void
+test01()
+{
+  for (int i = 0; i < 3; ++i)
+  {
+std::unique_lock l(mx);
+auto start = std::chrono::system_clock::now();
+cv.wait_for(l, std::chrono::duration(1), [] { return val; });
+auto t = std::chrono::system_clock::now();
+VERIFY( (t - start) >= std::chrono::seconds(1) );
+  }
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 01:45:21PM -0600, Qing Zhao wrote:
> 2017-12-11  Qing Zhao  mailto:qing.z...@oracle.com>>

No " " in ChangeLog entries please.

> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c
> @@ -2541,6 +2541,198 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi)
>return false;
>  }
>  
> +/* Given an index to the strinfo vector, compute the string length for the
> +   corresponding string. Return -1 when unknown.  */
> + 
> +static HOST_WIDE_INT 
> +compute_string_length (int idx)
> +{
> +  HOST_WIDE_INT string_leni = -1; 
> +  gcc_assert (idx != 0);
> +
> +  if (idx < 0)
> +string_leni = ~idx;
> +  else
> +{
> +  strinfo *si = get_strinfo (idx);
> +  if (si)
> + {
> +   tree const_string_len = get_string_length (si);
> +   string_leni
> + = (const_string_len && tree_fits_uhwi_p (const_string_len)
> +? tree_to_uhwi(const_string_len) : -1); 

So, you are returning a signed HWI, then clearly tree_fits_uhwi_p and
tree_to_uhwi are inappropriate, you should have used tree_fits_shwi_p
and tree_to_shwi.  Space after function name is missing too.
And, as you start by initializing string_leni to -1, there is no
point to write it this way rather than
  if (const_string_len && tree_fits_shwi_p (const_string_len))
string_leni = tree_to_shwi (const_string_len);

> + }
> +}

Maybe also do
  if (string_leni < 0)
return -1;

> +  return string_leni;

unless the callers just look for negative value as unusable.

> +  tree len = gimple_call_arg (stmt, 2);
> +  if (tree_fits_uhwi_p (len))
> +length = tree_to_uhwi (len);

Similarly to above, you are mixing signed and unsigned HWIs too much.

> +  if (gimple_code (ustmt) == GIMPLE_ASSIGN)

  if (is_gimple_assign (ustmt))

Usually we use use_stmt instead of ustmt.

> + {
> +   gassign *asgn = as_a  (ustmt);

No need for the gassign and ugly as_a, gimple_assign_rhs_code
as well as gimple_assign_rhs2 can be called on gimple * too.

> +   tree_code code = gimple_assign_rhs_code (asgn);
> +   if ((code != EQ_EXPR && code != NE_EXPR)
> +   || !integer_zerop (gimple_assign_rhs2 (asgn)))
> + return true;
> + }
> +  else if (gimple_code (ustmt) == GIMPLE_COND)
> + {
> +   tree_code code = gimple_cond_code (ustmt);
> +   if ((code != EQ_EXPR && code != NE_EXPR)
> +   || !integer_zerop (gimple_cond_rhs (ustmt)))
> + return true;

There is another case you are missing, assign stmt with
gimple_assign_rhs_code COND_EXPR, where gimple_assign_rhs1 is
tree with TREE_CODE EQ_EXPR or NE_EXPR with TREE_OPERAND (rhs1, 1)
integer_zerop.

> +  /* When both arguments are known, and their strlens are unequal, we can 
> + safely fold the call to a non-zero value for strcmp;
> + othewise, do nothing now.  */
> +  if (idx1 != 0 && idx2 != 0)
> +{
> +  HOST_WIDE_INT const_string_leni1 = -1;
> +  HOST_WIDE_INT const_string_leni2 = -1;
> +  const_string_leni1 = compute_string_length (idx1);
> +  const_string_leni2 = compute_string_length (idx2);

Why do you initialize the vars when you immediately overwrite it?
Just do
  HOST_WIDE_INT const_string_leni1 = compute_string_length (idx1);
etc.

> +  /* When one of args is constant string.  */
> +  tree var_string;
> +  HOST_WIDE_INT const_string_leni = -1;
> +  
> +  if (idx1)
> +{
> +  const_string_leni = compute_string_length (idx1);
> +  var_string = arg2;
> +} 
> +  else if (idx2)
> +{
> +  const_string_leni = compute_string_length (idx2);
> +  var_string = arg1;
> +} 

Haven't you checked earlier that one of idx1 and idx2 is non-zero?
If so, then the else if (idx2) will just might confuse -Wuninitialized,
if you just use else, you don't need to initialize const_string_leni
either.

> +  /* Try to get the min and max string length for var_string, the max length 
> is
> + the size of the array - 1, recorded in size[1].  */ 
> +  get_range_strlen (var_string, size);
> +  if (size[1] && tree_fits_uhwi_p (size[1]))
> +var_sizei = tree_to_uhwi (size[1]) + 1;

This is something that looks problematic to me.  get_range_strlen returns
some conservative upper bound on the string length, which is fine if
var_string points to say a TREE_STATIC variable where you know the allocated
size, or automatic variable.  But if somebody passes you a pointer to a
structure and the source doesn't contain aggregate copying for it, not sure
if you can take for granted that all the bytes are readable after the '\0'
in the string.  Hopefully at least for flexible array members and arrays in
such positions get_range_strlen will not provide the upper bound, but even
in other cases it doesn't feel safe to me.

Furthermore, in the comments you say that you do it only for small strings,
but in the patch I can't see any upper bound, so you could transform strlen
that would happen to return 

Re: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.

2017-12-14 Thread Christophe Lyon
On 14 December 2017 at 11:56, Tamar Christina  wrote:
> The 12/13/2017 08:49, Christophe Lyon wrote:
>> On 12 December 2017 at 18:29, Tamar Christina  
>> wrote:
>> > Hi All,
>> >
>> > The previous test made use of arm_neon.h which made the whole test
>> > rather fragile and only applicable to some of the arm targets.
>> >
>> > So instead I make use of different fpus now to test the generation of
>> > fmla instructions. The actual instruction itself is not tested as all
>> > we care about if that the proper .fpu directives are generated.
>> >
>> > Regtested on arm-none-eabi and arm-none-linux-gnueabihf
>> > with no regressions.
>> >
>> > Ok for trunk?
>> >
>> >
>> > gcc/testsuite/
>> > 2017-12-12  Tamar Christina  
>> >
>> > PR target/82641
>> > * gcc.target/arm/pragma_fpu_attribute.c: New.
>> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
>
> Hi Christophe,
>
> My apologies, I have rebased the patch.
> New Changelog:
>
> gcc/testsuite/
> 2017-12-14  Tamar Christina  
>
> PR target/82641
> * gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
> no NEON.
> * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
>

Hi,

Sorry I think there is still something wrong with this patch.
In pragma_fpu_attribute_2.c, you are not removing
#include 
as the ChangeLog seems to imply?

So, with this patch, there are problems on arm-none-linux-gnueabi and
arm-none-eabi:
FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
\\.fpu\\s+vfpv3-d16 1 (found 0 times)
FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
\\.fpu\\s+vfpv4 1 (found 0 times)

and pragma_fpu_attribute_2.c still fails to compile:
In file included from /gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c:6:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/gcc/include/arm_neon.h:31:2:
error: #error "NEON intrinsics not available with the soft-float ABI.
Please use -mfloat-abi=softfp or -mfloat-abi=hard"

I'm not sure why you don't see this when testing on arm-none-eabi?

If you want to see more details:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255624-rb8655.patch-2/report-build-info.html
(ignore the lines with "interrupted", this means there was a problem
on the host during the build)

Christophe


>> >
>> Sorry, it seems your patch does not apply against ToT, and
>> the ChangeLog looks incorrect (these are not new files)
>>
>> Christophe
>
> Thanks,
> Tamar
>
> --


[PATCH] PR libstdc++/83279 handle sendfile not copying entire file

2017-12-14 Thread Jonathan Wakely

I failed to notice that the man page for sendfile(3) says it won't
copy more than 2GiB. This refactors the code to first try sendfile
(because it's fast if it works) and if that fails, or stops before the
end of the file, then use filebufs to copy what's left over.

I'm not adding a test because creating two files larger than 2G isn't
a good idea, but I've tested it locally.

PR libstdc++/83279
* src/filesystem/std-ops.cc (do_copy_file): Handle sendfile not
copying entire file.

Tested x86_64-linux, committed to trunk. Backports to follow.

commit ae3ff01506b1bb3232be1063b3fc7df1c80994cb
Author: Jonathan Wakely 
Date:   Thu Dec 14 21:29:29 2017 +

PR libstdc++/83279 handle sendfile not copying entire file

PR libstdc++/83279
* src/filesystem/std-ops.cc (do_copy_file): Handle sendfile not
copying entire file.

diff --git a/libstdc++-v3/src/filesystem/std-ops.cc 
b/libstdc++-v3/src/filesystem/std-ops.cc
index fa5e19a36ba..a15857c31bf 100644
--- a/libstdc++-v3/src/filesystem/std-ops.cc
+++ b/libstdc++-v3/src/filesystem/std-ops.cc
@@ -382,48 +382,71 @@ fs::do_copy_file(const char* from, const char* to,
   return false;
 }
 
+  ssize_t n = 0;
+  size_t count = from_st->st_size;
 #ifdef _GLIBCXX_USE_SENDFILE
-  off_t offset = 0;
-  const auto n = ::sendfile(out.fd, in.fd, &offset, from_st->st_size);
-  if (n < 0 && (errno == ENOSYS || errno == EINVAL))
+  n = ::sendfile(out.fd, in.fd, nullptr, count);
+  if (n < 0 && errno != ENOSYS && errno != EINVAL)
 {
-#endif // _GLIBCXX_USE_SENDFILE
-  __gnu_cxx::stdio_filebuf sbin(in.fd, std::ios::in);
-  __gnu_cxx::stdio_filebuf sbout(out.fd, std::ios::out);
-  if (sbin.is_open())
-   in.fd = -1;
-  if (sbout.is_open())
-   out.fd = -1;
-  if (from_st->st_size && !(std::ostream(&sbout) << &sbin))
-   {
- ec = std::make_error_code(std::errc::io_error);
- return false;
-   }
-  if (!sbout.close() || !sbin.close())
+  ec.assign(errno, std::generic_category());
+  return false;
+}
+  if ((size_t)n == count)
+{
+  if (!out.close() || !in.close())
{
  ec.assign(errno, std::generic_category());
  return false;
}
-
   ec.clear();
   return true;
-
-#ifdef _GLIBCXX_USE_SENDFILE
 }
-  if (n != from_st->st_size)
+  else if (n > 0)
+count -= n;
+#endif // _GLIBCXX_USE_SENDFILE
+
+  __gnu_cxx::stdio_filebuf sbin(in.fd, std::ios::in);
+  __gnu_cxx::stdio_filebuf sbout(out.fd, std::ios::out);
+
+  if (sbin.is_open())
+in.fd = -1;
+  if (sbout.is_open())
+out.fd = -1;
+
+  const std::streampos errpos(std::streamoff(-1));
+
+  if (n < 0)
+{
+  auto p1 = sbin.pubseekoff(0, std::ios_base::beg, std::ios_base::in);
+  auto p2 = sbout.pubseekoff(0, std::ios_base::beg, std::ios_base::out);
+  if (p1 == errpos || p2 == errpos)
+   {
+ ec = std::make_error_code(std::errc::io_error);
+ return false;
+   }
+}
+  else if (n > 0)
+{
+  auto p = sbout.pubseekoff(n, std::ios_base::beg, std::ios_base::out);
+  if (p == errpos)
+   {
+ ec = std::make_error_code(std::errc::io_error);
+ return false;
+   }
+}
+
+  if (count && !(std::ostream(&sbout) << &sbin))
+{
+  ec = std::make_error_code(std::errc::io_error);
+  return false;
+}
+  if (!sbout.close() || !sbin.close())
 {
   ec.assign(errno, std::generic_category());
   return false;
 }
-  if (!out.close() || !in.close())
-{
-  ec.assign(errno, std::generic_category());
-  return false;
-}
-
   ec.clear();
   return true;
-#endif // _GLIBCXX_USE_SENDFILE
 }
 #endif // NEED_DO_COPY_FILE
 #endif // _GLIBCXX_HAVE_SYS_STAT_H


Re: [PATCH] make canonicalize_condition keep its promise

2017-12-14 Thread Aaron Sawdey
On Thu, 2017-12-14 at 13:43 -0700, Jeff Law wrote:
> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> > On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> > > On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > > > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > > > So, the story of this very small patch starts with me
> > > > > > adding
> > > > > > patterns
> > > > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > > > 
> > > > > >   [(set (pc)
> > > > > > (if_then_else
> > > > > >   (and
> > > > > >  (ne (match_operand:P 1 "register_operand"
> > > > > > "c,*b,*b,*b")
> > > > > >  (const_int 1))
> > > > > >  (match_operator 3 "branch_comparison_operator"
> > > > > >   [(match_operand 4 "cc_reg_operand"
> > > > > > "y,y,y,y")
> > > > > >    (const_int 0)]))
> > > > > >   (label_ref (match_operand 0))
> > > > > >   (pc)))
> > > > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > > > "=1,*r,m,*d*wi*c*l")
> > > > > > (plus:P (match_dup 1)
> > > > > > (const_int -1)))
> > > > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > > > 
> > > > > > However when this gets to the loop_doloop pass, we get an
> > > > > > assert
> > > > > > fail
> > > > > > in iv_number_of_iterations():
> > > > > > 
> > > > > >   gcc_assert (COMPARISON_P (condition));
> > > > > > 
> > > > > > This is happening because this branch insn tests two things
> > > > > > ANDed
> > > > > > together so the and is at the top of the expression, not a
> > > > > > comparison.
> > > > > 
> > > > > Is this something you've created for an existing
> > > > > loop?  Presumably an
> > > > > existing loop that previously was a simple loop?
> > > > 
> > > > The rtl to use this instruction is generated by new code I'm
> > > > working on
> > > > to do a builtin expansion of memcmp using a loop. I call
> > > > gen_bdnztf_di() to generate rtl for the insn. It would be nice
> > > > to
> > > > be
> > > > able to generate this instruction from doloop conversion but
> > > > that
> > > > is
> > > > beyond the scope of what I'm working on presently.
> > > 
> > > Understood.
> > > 
> > > So what I think (and I'm hoping you can confirm one way or the
> > > other)
> > > is
> > > that by generating this instruction you're turing a loop which
> > > previously was considered a simple loop by the IV code and
> > > turning it
> > > into something the IV bits no longer think is a simple loop.
> > > 
> > > I think that's problematical as when the loop is thought to be a
> > > simple
> > > loop, it has to have a small number of forms for its loop
> > > back/exit
> > > loop
> > > tests and whether or not a loop is a simple loop is cached in the
> > > loop
> > > structure.
> > > 
> > > I think we need to dig into that first.  If my suspicion is
> > > correct
> > > then
> > > this patch is really just papering over that deeper problem.  So
> > > I
> > > think
> > > you need to dig a big deeper into why you're getting into the
> > > code in
> > > question (canonicalize_condition) and whether or not the call
> > > chain
> > > makes any sense given the changes you've made to the loop.
> > > 
> > 
> > Jeff,
> >   There is no existing loop structure. This starts with a memcmp()
> > call
> > and then goes down through the builtin expansion mechanism, which
> > is
> > ultimately expanding the pattern cmpmemsi which is where my code is
> > generating a loop that finishes with bdnzt. The code that's
> > ultimately
> > generated looks like this:
> 
> Understood.  But what I still struggle with is how you're getting
> into
> check_simple_exit to begin with and whether or not that should be
> happening.
> 
> 
> The only way to get into check_simple_exit is via find_simple_exit
> which
> is only called from get_simple_loop_desc.
> 
> And if you're calling get_simple_loop_desc, then there is some kind
> of
> loop structure in place AFAICT that contains this insn which is
> rather
> surprising.
> 
> > 
> > I really think the ultimate problem here is that both
> > canonicalize_condition and get_condition promise in their
> > documenting
> > comments that they will return something that has a cond at the
> > root of
> > the rtx, or 0 if they don't understand what they're given. In this
> > case
> > they do not understand the rtx of bdnzt and are returning rtx
> > rooted
> > with an and, not a cond. This may seem like papering over the
> > problem,
> > but I think it is legitimate for these functions to return 0 when
> > the
> > branch insn in question does not have a simple cond at the heart of
> > it.
> > And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> > Ultimately, yes something better ought to be done here.
> 
> 
> 
> Your pattern has the form:
> 

Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-14 Thread Martin Sebor

On 12/14/2017 12:24 PM, Jeff Law wrote:

On 12/11/2017 03:18 PM, Martin Sebor wrote:

On 12/11/2017 02:08 PM, David Malcolm wrote:

On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:

Bug 83369 - Missing diagnostics during inlining, notes that when
-Wnonnull is issued for an inlined call to a built-in function,
GCC doesn't print the inlining stack, making it hard to debug
where the problem comes from.

When the -Wnonnull warning was introduced into the middle-end
the diagnostic machinery provided no way to print the inlining
stack (analogous to %K for trees).  Since then GCC has gained
support for the %G directive which does just that.  The attached
patch makes use of the directive to print the inlining context
for -Wnonnull.

The patch doesn't include a test because the DejaGnu framework
provides no mechanism to validate this part of GCC output (see
also bug 83336).

Tested on x86_64-linux with no regressions.

Martin


I'm wondering if we should eliminate %K and %G altogether, and make
tree-diagnostic.c and friends automatically print the inlining stack
-they just need a location_t (the issue is with system headers, I
suppose, but maybe we can just make that smarter: perhaps only suppress
if every location in the chain is in a system header?).  I wonder if
that would be GCC 9 material at this point though?


Getting rid of %G and %K sounds fine to me.  I can't think of
a use case for suppressing middle end diagnostics in system
headers so unless someone else can it might be a non-issue.
Since the change would fix a known bug it seems to me that it
should be acceptable even at this stage.

My recollection is we suppress warnings from bits in system headers
because neither we nor the end users necessarily have control over the
contents of the system headers -- in which case we'd be issuing warnings
for something that only the vendor can fix.


I believe default suppression of warnings from system headers probably
needs to stay.


I agree. I wasn't suggesting to get rid of -Wsystem altogether.
Rather, I meant that unlike front-end warnings, I don't know
and can't think of middle-end warnings (and only those) that
should be suppressed for system header code.   They indicate
bugs in the emitted and (in most cases) reachable object code.
There should be no such code in system headers, and the middle
end warnings I've worked on go out of their way to make sure
they are emitted regardless.  (It's easy to forget this part
and end up with Glibc macros like strncpy suppressing warnings
for serious bugs in user code.)

> Is there any kind of consensus on what we want to do here -- do we want
> to try to tackle %G/%K removal for gcc-8 or defer it?  If we defer it,
> how do we want to handle printing the inline stack in the interm?

I would support it but I don't expect to have the cycles to
do the work myself at this stage.  I view it as a nice design
improvement but give its minimal impact on functionality (IIUC)
it's lower priority than the bugs I'd like to fix.  I don't
have the impression that improving the detail included in
the inlining stack is predicated on this API change, but I'll
let David speak to that.

Martin

>
> jeff




  1   2   >