date:20170803

[testsuite, PR81662, committed] Skip fpatchable-function-entry tests for nvptx

2017-08-03 Thread Tom de Vries


Hi,

fpatchable-function-entry requires nop, which nvptx does not have.

I've disabled the corresponding test for nvptx.

I suppose I could add an effective target has_nop, but I'm not sure if 
it's worth the trouble.


Committed.

Thanks,
- Tom
Skip fpatchable-function-entry tests for nvptx

2017-08-02  Tom de Vries  

	PR target/81662
	* c-c++-common/patchable_function_entry-decl.c: Skip for nvptx.
	* c-c++-common/patchable_function_entry-default.c: Same.
	* c-c++-common/patchable_function_entry-definition.c: Same.

---
 gcc/testsuite/c-c++-common/patchable_function_entry-decl.c   | 2 +-
 gcc/testsuite/c-c++-common/patchable_function_entry-default.c| 2 +-
 gcc/testsuite/c-c++-common/patchable_function_entry-definition.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c
index 8514b10..5c39a35 100644
--- a/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c
+++ b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
 /* { dg-final { scan-assembler-times "nop" 2 } } */
 
diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c
index 0dcf118..48094f7 100644
--- a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c
+++ b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
 /* { dg-final { scan-assembler-times "nop" 3 } } */
 
diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c b/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c
index a007867..af8202f 100644
--- a/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c
+++ b/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
 /* { dg-final { scan-assembler-times "nop" 1 } } */

[testsuite, committed] Require label_values for gcc.dg/torture/pr80163.c

2017-08-03 Thread Tom de Vries


Hi,

this patch requires effective target label_values for 
gcc.dg/torture/pr80163.c.


Committed.

Thanks,
- Tom
Require label_values for gcc.dg/torture/pr80163.c

2017-08-02  Tom de Vries  

	* gcc.dg/torture/pr80163.c: Require label_values.

---
 gcc/testsuite/gcc.dg/torture/pr80163.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr80163.c b/gcc/testsuite/gcc.dg/torture/pr80163.c
index 80cc68d..a9a4438 100644
--- a/gcc/testsuite/gcc.dg/torture/pr80163.c
+++ b/gcc/testsuite/gcc.dg/torture/pr80163.c
@@ -1,5 +1,6 @@
 /* PR c/80163 */
 /* { dg-do compile { target int128 } } */
+/* { dg-require-effective-target label_values } */
 
 volatile int v;

[testsuite, committed] Require alloca for gcc.dg/attr-noipa.c

2017-08-03 Thread Tom de Vries


Hi,

this patch requires effective target alloca for gcc.dg/attr-noipa.c.

Committed.

Thanks,
- Tom
Require alloca for gcc.dg/attr-noipa.c

2017-08-02  Tom de Vries  

	* gcc.dg/attr-noipa.c: Require alloca.

---
 gcc/testsuite/gcc.dg/attr-noipa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/attr-noipa.c b/gcc/testsuite/gcc.dg/attr-noipa.c
index 1d2b868..e2349b6 100644
--- a/gcc/testsuite/gcc.dg/attr-noipa.c
+++ b/gcc/testsuite/gcc.dg/attr-noipa.c
@@ -1,6 +1,7 @@
 /* Test the noipa attribute.  */
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-require-effective-target alloca } */
 
 static inline int __attribute__((noipa))
 fn1 (void) /* { dg-warning "inline function \[^\n\]* given attribute noinline" "" } */

[testsuite, committed] Require alias for gcc.dg/pr56727-2.c

2017-08-03 Thread Tom de Vries


Hi,

this patch requires alias support for gcc.dg/pr56727-2.c.

Committed.

Thanks,
- Tom
Require alias for gcc.dg/pr56727-2.c

2017-08-02  Tom de Vries  

	* gcc.dg/pr56727-2.c: Require alias.

---
 gcc/testsuite/gcc.dg/pr56727-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c
index e47ee3c..62a74d1 100644
--- a/gcc/testsuite/gcc.dg/pr56727-2.c
+++ b/gcc/testsuite/gcc.dg/pr56727-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target fpic } } */
 /* { dg-options "-O2 -fPIC" } */
+/* { dg-require-alias "" } */
 /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* powerpc*-*-linux* } } } */
 
 __attribute__((noinline, noclone))

PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Prathamesh Kulkarni

Hi,
The attached patch adds value-range info for __builtin_tolower and
__builtin_toupper.
In the patch, I have just settled for anti-range ~['a', 'z'] for
return value of toupper.
Would that be correct albeit imprecise ?

A more precise range would be:
[0, UCHAR_MAX] intersect ~['a', 'z'] union EOF
as mentioned in PR.
I am not sure though if it's possible for a SSA_NAME to have multiple
disjoint ranges ?

Bootstrap+tested on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr7.c
new file mode 100644
index 000..2bcddf1f2c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr7.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp-slim" } */
+
+void g (int x)
+{
+  if (__builtin_toupper ((unsigned char)x) == 'a')
+__builtin_abort ();
+}
+
+void h (int x)
+{
+  if (__builtin_tolower ((unsigned char)x) == 'A')
+__builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 79a29bf0efb..7137a4c52ec 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt)
return;
  }
  break;
+   case CFN_BUILT_IN_TOUPPER:
+   case CFN_BUILT_IN_TOLOWER:
+ if (tree lhs = gimple_call_lhs (stmt))
+   {
+ tree type = TREE_TYPE (lhs);
+ unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A';
+ unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z';
+ tree range_min = build_int_cstu (type, min);
+ tree range_max = build_int_cstu (type, max);
+ set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL);
+ return;
+   }
+ break;
default:
  break;
}

Re: PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote:
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt)
>   return;
> }
> break;
> + case CFN_BUILT_IN_TOUPPER:
> + case CFN_BUILT_IN_TOLOWER:
> +   if (tree lhs = gimple_call_lhs (stmt))
> + {
> +   tree type = TREE_TYPE (lhs);
> +   unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A';
> +   unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z';
> +   tree range_min = build_int_cstu (type, min);
> +   tree range_max = build_int_cstu (type, max);
> +   set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL);
> +   return;

You are hardcoding here host characters and using it for target.
I think you need to use
lang_hooks.to_target_charset
(really no idea how it works or doesn't in LTO, but gimple-fold.c is already
using it among others).
Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps,
which isn't the case for e.g. EBCDIC.  So I think you'd need to verify
(once?) that the target charset has this property.

Jakub

[PATCH] Improve var_bound range test opt (PR tree-optimization/81655)

2017-08-03 Thread Jakub Jelinek

Hi!

For the PR81588 testcase, on targets with branch cost 1, we end up with:
  b.0_1 = b;
  _2 = (long long int) b.0_1;
  a.1_3 = a;
  _4 = _2 > a.1_3;
  _5 = (int) _4;
  if (a.1_3 < 0)
goto ; [36.00%] [count: INV]
  else
goto ; [64.00%] [count: INV]

   [64.00%] [count: INV]:
  if (_4 != 1)
goto ; [46.88%] [count: INV]
  else
goto ; [53.13%] [count: INV]

   [66.00%] [count: INV]:
  c = 0;

   [100.00%] [count: INV]:
The reason why we punt is the unexpected _4 != 1 condition, the code
is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent
to _4 != 1 for boolean type.

The following patch handles even comparison with 1 if the type is
unsigned 1-bit precision.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-08-03  Jakub Jelinek  

PR tree-optimization/81655
PR tree-optimization/81588
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): Handle also
the case when ranges[i].low and high are 1 for unsigned type with
precision 1.

--- gcc/tree-ssa-reassoc.c.jj   2017-08-01 10:28:50.0 +0200
+++ gcc/tree-ssa-reassoc.c  2017-08-02 11:28:44.789134681 +0200
@@ -2918,11 +2918,22 @@ optimize_range_tests_var_bound (enum tre
 
   for (i = 0; i < length; i++)
 {
+  bool in_p = ranges[i].in_p;
   if (ranges[i].low == NULL_TREE
- || ranges[i].high == NULL_TREE
- || !integer_zerop (ranges[i].low)
- || !integer_zerop (ranges[i].high))
+ || ranges[i].high == NULL_TREE)
continue;
+  if (!integer_zerop (ranges[i].low)
+ || !integer_zerop (ranges[i].high))
+   {
+ if (ranges[i].exp
+ && TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) == 1
+ && TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp))
+ && integer_onep (ranges[i].low)
+ && integer_onep (ranges[i].high))
+   in_p = !in_p;
+ else
+   continue;
+   }
 
   gimple *stmt;
   tree_code ccode;
@@ -2964,7 +2975,7 @@ optimize_range_tests_var_bound (enum tre
default:
  continue;
}
-  if (ranges[i].in_p)
+  if (in_p)
ccode = invert_tree_comparison (ccode, false);
   switch (ccode)
{

Jakub

Re: [PATCH 0/2] Python testcases to check DWARF output

2017-08-03 Thread Pierre-Marie de Rodat


On 08/02/2017 05:43 PM, Jeff Law wrote:

I hate to throw in a wrench at this point, but has anyone looked at
dwgrep from Petr Machata?  He's not doing much with it anymore, but it
might provide enough of a dwarf scanning framework to be useful for
testing purposes.


Sure, no problem: I first started talking publicly about this one week 
ago, so it’s definitely not too late to mention alternatives. ;-) I 
learned about dwgrep two years ago and forgot about it, so thank you for 
the idea. I started to have a look at it, and for now I don’t think it’s 
a good match in this context:


 1. it’s an ELF only tool;
 2. it must be built, requiring external dependencies: cmake and
elfutils;
 3. in order to use it, one must learn a dedicated post-fix language
(Zwerg)

For 1. I think this is a true problem, as it means for instance that we 
could not test DWARF on Windows and Darwin setups. Unless we add PE and 
Mach-O handling in dwgrep of course, but that does not sound easy and 
will bring other external dependencies.


For 3. I feel that, for someone who is comfortable with Python, it will 
be easier to deal with a Python library (the dwarfutils in my patch) 
than having to learn yet another DSL. I think that’s precisely why some 
people would like to have a Python test framework rather than a TCL one. 
Working with a “usual” imperative language looks easier than with 
postfix expressions. Smaller cognitive load.


Actually I see another problem: pattern will have to vary depending on 
the target platform (for instance 32/64bit or depending on the DWARF 
version). Of course we could duplicate whole patterns in testcases to 
take this into account, but that’s like code duplication: I think we 
should be able to include small “X if 32bit else Y” in patterns, and I 
don’t think we can do that with Zwerg (no way to pass something like 
environment variables).


Of course, I have written a “competitor” tool: I guess my judgment is 
biased. :-) So other opinions are welcome!

--
Pierre-Marie de Rodat

Re: GCC 7.2 Status Report (2017-08-02)

2017-08-03 Thread Richard Biener

On Wed, 2 Aug 2017, Ian Lance Taylor wrote:

> On Wed, Aug 2, 2017 at 4:17 AM, Richard Biener  wrote:
> >
> > Status
> > ==
> >
> > The GCC 7 branch is now frozen for the upcoming release candidate
> > and release.  All changes require release manager approval.
> 
> Whoops, I committed a patch to the GCC 7 branch this morning (my time)
> before I saw this e-mail message (SVN revision 250833).  Sorry about
> that.  Let me know if you want me to revert it, but I think it's
> pretty safe.

Yes, looks harmless to me (go isn't release critical anyway).

Thanks,
Richard.

Re: [PATCH 1/2] Introduce testsuite support to run Python tests

2017-08-03 Thread Pierre-Marie de Rodat


On 08/02/2017 08:43 PM, Jeff Law wrote:

If it was trivial, then let's support 2.6.  But if it's nontrivial, I'd
support stepping to something more modern.


It is trivial. I’ve done it locally. :-)

--
Pierre-Marie de Rodat

[committed] Diagnose invalid #pragma omp simd even for -fopenmp-simd (PR middle-end/81052)

2017-08-03 Thread Jakub Jelinek

Hi!

The (mini)pass to diagnose invalid OpenMP/OpenACC/Cilk+ constructs wasn't
mistakenly run for -fopenmp-simd, so non-conforming simd constructs weren't
diagnosed, but could easily ICE later on, because we assume the body of the
loop is a structured block (single entry, single exit, no branches into it
or out of it, no exceptions thrown out of it etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk so
far, queued for backporting.

2017-08-03  Jakub Jelinek  

PR middle-end/81052
* omp-low.c (diagnose_sb_0): Handle flag_openmp_simd like flag_openmp.
(pass_diagnose_omp_blocks::gate): Enable also for flag_openmp_simd.

* c-c++-common/pr81052.c: New test.

--- gcc/omp-low.c.jj2017-07-06 20:31:32.0 +0200
+++ gcc/omp-low.c   2017-08-02 12:52:47.608735787 +0200
@@ -9083,7 +9083,7 @@ diagnose_sb_0 (gimple_stmt_iterator *gsi
 }
   if (kind == NULL)
 {
-  gcc_checking_assert (flag_openmp);
+  gcc_checking_assert (flag_openmp || flag_openmp_simd);
   kind = "OpenMP";
 }
 
@@ -9343,7 +9343,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
-return flag_cilkplus || flag_openacc || flag_openmp;
+return flag_cilkplus || flag_openacc || flag_openmp || flag_openmp_simd;
   }
   virtual unsigned int execute (function *)
 {
--- gcc/testsuite/c-c++-common/pr81052.c.jj 2017-08-02 12:59:28.092223248 
+0200
+++ gcc/testsuite/c-c++-common/pr81052.c2017-08-02 13:00:30.603510773 
+0200
@@ -0,0 +1,28 @@
+/* PR middle-end/81052 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp-simd -O2" } */
+
+int
+foo (int x, int y)
+{
+  int i;
+#pragma omp simd
+  for (i = x; i < y; ++i)
+return 0;  /* { dg-error "invalid branch to/from OpenMP 
structured block" } */
+  return 1;
+}
+
+#ifdef __cplusplus
+template 
+T
+bar (T x, T y)
+{
+  T i;
+#pragma omp simd
+  for (i = x; i < y; ++i)
+return 0;  /* { dg-error "invalid branch to/from OpenMP 
structured block" "" { target c++ } } */
+  return 1;
+}
+
+int x = bar (1, 7);
+#endif

Jakub

[PATCH] Some -Walloc-size-larger-than= warning fixes (PR driver/81650)

2017-08-03 Thread Jakub Jelinek

Hi!

This patch attempts to fix some overflows during computation of the size
and then effectively uses MIN (SSIZE_MAX, user_specified_value) as the
bound.

Not really sure what exact behavior we want, whether that, or default
to SSIZE_MAX (note the documentation mistakenly talks about SIZE_MAX / 2
instead) and use MIN (SIZE_MAX, user_specified_value) for user specified
value, or error out if user_specified_value is larger than
SSIZE_MAX or larger than SIZE_MAX.  Also the else unit = 0; case
probably should get diagnostics.  Another issue is if we want to diagnose,
that right now it will be diagnosed only if some allocation routine is seen,
diagnosing stuff during option processing is unfortunately too early because
we don't have sizetype/ssizetype built yet.

Anyway, the following has been bootstrapped/regtested on x86_64-linux and
i686-linux.

2017-08-03  Jakub Jelinek  

PR driver/81650
* calls.c (alloc_max_size): Use HOST_WIDE_INT_UC (10??)
instead of 10??LU, perform unit multiplication in wide_int,
don't change alloc_object_size_limit if the limit is larger
than SSIZE_MAX.

* gcc.dg/pr81650.c: New test.

--- gcc/calls.c.jj  2017-06-12 12:42:01.0 +0200
+++ gcc/calls.c 2017-08-02 13:41:00.887324420 +0200
@@ -1222,32 +1222,38 @@ alloc_max_size (void)
  else if (!strcasecmp (end, "KiB") || strcmp (end, "KB"))
unit = 1024;
  else if (!strcmp (end, "MB"))
-   unit = 1000LU * 1000;
+   unit = HOST_WIDE_INT_UC (1000) * 1000;
  else if (!strcasecmp (end, "MiB"))
-   unit = 1024LU * 1024;
+   unit = HOST_WIDE_INT_UC (1024) * 1024;
  else if (!strcasecmp (end, "GB"))
-   unit = 1000LU * 1000 * 1000;
+   unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000;
  else if (!strcasecmp (end, "GiB"))
-   unit = 1024LU * 1024 * 1024;
+   unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024;
  else if (!strcasecmp (end, "TB"))
-   unit = 1000LU * 1000 * 1000 * 1000;
+   unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000;
  else if (!strcasecmp (end, "TiB"))
-   unit = 1024LU * 1024 * 1024 * 1024;
+   unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024;
  else if (!strcasecmp (end, "PB"))
-   unit = 1000LU * 1000 * 1000 * 1000 * 1000;
+   unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000;
  else if (!strcasecmp (end, "PiB"))
-   unit = 1024LU * 1024 * 1024 * 1024 * 1024;
+   unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024;
  else if (!strcasecmp (end, "EB"))
-   unit = 1000LU * 1000 * 1000 * 1000 * 1000 * 1000;
+   unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000
+  * 1000;
  else if (!strcasecmp (end, "EiB"))
-   unit = 1024LU * 1024 * 1024 * 1024 * 1024 * 1024;
+   unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024
+  * 1024;
  else
unit = 0;
}
 
  if (unit)
-   alloc_object_size_limit
- = build_int_cst (ssizetype, limit * unit);
+   {
+ wide_int w = wi::uhwi (limit, HOST_BITS_PER_WIDE_INT + 64);
+ w *= unit;
+ if (wi::ltu_p (w, alloc_object_size_limit))
+   alloc_object_size_limit = wide_int_to_tree (ssizetype, w);
+   }
}
}
 }
--- gcc/testsuite/gcc.dg/pr81650.c.jj   2017-08-02 14:52:22.864787221 +0200
+++ gcc/testsuite/gcc.dg/pr81650.c  2017-08-02 14:21:11.0 +0200
@@ -0,0 +1,9 @@
+/* PR driver/81650 */
+/* { dg-do compile } */
+/* { dg-options "-Walloc-size-larger-than=9223372036854775807" } */
+
+void *
+foo (void)
+{
+  return __builtin_malloc (5);
+}

Jakub

[PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)

2017-08-03 Thread Jakub Jelinek

Hi!

The following testcase ICEs on s390x.  The problem is that the bbpart pass
calls
df_set_flags (DF_DEFER_INSN_RESCAN);
because it wants to defer rescanning, but doesn't actually df_finish_pass
(it does in one case, but then calls df_set_flags with another changeable flag,
so it has the same issue), and if the IRA pass is invoked soon after it
without any df_finish_pass calls in between, we end up with deferred insn
rescanning during IRA which heavily relies on immediate insn rescanning.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-08-03  Jakub Jelinek  

PR target/81621
* bb-reorder.c (pass_partition_blocks::execute): Return TODO_df_finish
after setting changeable df flags.

* gcc.dg/pr81621.c: New test.

--- gcc/bb-reorder.c.jj 2017-07-21 10:28:13.0 +0200
+++ gcc/bb-reorder.c2017-08-02 19:43:58.797243254 +0200
@@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function
 
   crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
   if (!crossing_edges.exists ())
-return 0;
+return TODO_df_finish;
 
   crtl->has_bb_partition = true;
 
@@ -2970,7 +2970,7 @@ pass_partition_blocks::execute (function
   df_analyze ();
 }
 
-  return 0;
+  return TODO_df_finish;
 }
 
 } // anon namespace
--- gcc/testsuite/gcc.dg/pr81621.c.jj   2017-08-02 19:52:08.435831121 +0200
+++ gcc/testsuite/gcc.dg/pr81621.c  2017-08-02 19:52:00.026924067 +0200
@@ -0,0 +1,5 @@
+/* PR target/81621 */
+/* { dg-do compile { target freorder } } */
+/* { dg-options "-Og -fno-split-wide-types -freorder-blocks-and-partition" } */
+
+#include "graphite/scop-10.c"

Jakub

Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-03 Thread Richard Biener

On Wed, Aug 2, 2017 at 7:10 PM, Jeff Law  wrote:
> On 08/01/2017 03:25 AM, Richard Biener wrote:
>> On Tue, Aug 1, 2017 at 11:23 AM, Richard Biener
>>  wrote:
>>> On Tue, Aug 1, 2017 at 4:27 AM, Martin Sebor  wrote:
 Richard,

 in discussing this work Jeff mentioned that your comments on
 the tree-ssa-alias.c parts would be helpful.  When you have
 a chance could you please give it a once over and let me know
 if you have any suggestions or concerns?  There are no visible
 changes to existing clients of the pass, just extensions that
 are relied on only by the new diagnostics.

   https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01264.html

 I expect to revisit the sprintf %s patch mentioned below and
 see how to simplify it by taking advantage of the changes
 implemented here.
>>>
>>> Your ptr_deref_alias_decl_p returns true when must_alias is true
>>> but there is no must-alias relationship.
>>>
>>> The ao_ref_init_from_ptr_and_size doesn't belong there.
>>>
>>> I dislike all of the changes related to returning an alias "size".
>>>
>>> Please keep away from the alias machinery.
>>>
>>> Warning during folding is similarly bad design.  Please don't add
>>> more such things.
>>>
>>> Thanks for putting this on my radar though.
>>> Richard.
>>
>> Oh, for a constructive comment this all feels like sth for either
>> sanitizers or a proper static analysis tool.  As I outlined repeatedly
>> I belive GCC could host a static analysis tool but it surely should
>> not be interwinded into it's optimization passes or tools.
> I disagree strongly here.
>
> Sanitiers catch problems after the fact and only if you've compiled your
> code with sanitization and manage to find a way to trigger the problem path.
>
> Other static analysis tools are useful, but they aren't an integral part
> of the edit, compile, debug cycle and due to their cost are often run
> long after code was committed.
>
> Finding useful warnings that can be issued as part of the compile, edit,
> debug cycle is, IMHO, far more useful than sanitizers or independent
> static checkers.
>
> --
>
>
> I think finding a way to exploit information that our various analyzers
> can provide to give precise warnings is a good thing.  So it seemed
> natural that if we wanted to ask a must-alias question that we should be
> querying the alias oracle rather than implementing that kind of query
> within the sprintf warnings.  I'm not sure why you'd say "Please keep
> away from the alias machinery".
>
> I'm a little confused here...

Well, simply because the way as implemented isn't a must-alias query
but maybe one that's good enough for warnings (reduces false positives
but surely doesn't eliminate them).

There's a must alias query already, in stmt_kills_ref_p.  It's a matter
of refactoring to make a refs_must_alias_p.

Then propose that "overlap range" stuff separately.

But I'm against lumping this all in as an innocent change suggesting
the machinery can do sth (must alias) when it really can't.  I also
do not like adding a "must alias" bool to the may-alias APIs as
the implementation is fundamentally may-alias, must-alias really
is very different.

And to repeat, no, I do not want a "good-enough-for-warnings" must-alias
in an API that's supposed to be used by optimizations where "good enough"
is not good enough.

Richard.

>
>
> Jeff

Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)

2017-08-03 Thread Richard Biener

On Thu, 3 Aug 2017, Jakub Jelinek wrote:

> Hi!
> 
> For the PR81588 testcase, on targets with branch cost 1, we end up with:
>   b.0_1 = b;
>   _2 = (long long int) b.0_1;
>   a.1_3 = a;
>   _4 = _2 > a.1_3;
>   _5 = (int) _4;
>   if (a.1_3 < 0)
> goto ; [36.00%] [count: INV]
>   else
> goto ; [64.00%] [count: INV]
> 
>[64.00%] [count: INV]:
>   if (_4 != 1)
> goto ; [46.88%] [count: INV]
>   else
> goto ; [53.13%] [count: INV]
> 
>[66.00%] [count: INV]:
>   c = 0;
> 
>[100.00%] [count: INV]:
> The reason why we punt is the unexpected _4 != 1 condition, the code
> is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent
> to _4 != 1 for boolean type.

Hmm, I thought we had code to canonicalize boolean compares (but I can't 
find that right now).  Some is in 
forwprop:forward_propagate_into_gimple_cond where it canonicalizes
!= 1 to == 0.

> The following patch handles even comparison with 1 if the type is
> unsigned 1-bit precision.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2017-08-03  Jakub Jelinek  
> 
>   PR tree-optimization/81655
>   PR tree-optimization/81588
>   * tree-ssa-reassoc.c (optimize_range_tests_var_bound): Handle also
>   the case when ranges[i].low and high are 1 for unsigned type with
>   precision 1.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2017-08-01 10:28:50.0 +0200
> +++ gcc/tree-ssa-reassoc.c2017-08-02 11:28:44.789134681 +0200
> @@ -2918,11 +2918,22 @@ optimize_range_tests_var_bound (enum tre
>  
>for (i = 0; i < length; i++)
>  {
> +  bool in_p = ranges[i].in_p;
>if (ranges[i].low == NULL_TREE
> -   || ranges[i].high == NULL_TREE
> -   || !integer_zerop (ranges[i].low)
> -   || !integer_zerop (ranges[i].high))
> +   || ranges[i].high == NULL_TREE)
>   continue;
> +  if (!integer_zerop (ranges[i].low)
> +   || !integer_zerop (ranges[i].high))
> + {
> +   if (ranges[i].exp
> +   && TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) == 1
> +   && TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp))
> +   && integer_onep (ranges[i].low)
> +   && integer_onep (ranges[i].high))
> + in_p = !in_p;
> +   else
> + continue;
> + }
>  
>gimple *stmt;
>tree_code ccode;
> @@ -2964,7 +2975,7 @@ optimize_range_tests_var_bound (enum tre
>   default:
> continue;
>   }
> -  if (ranges[i].in_p)
> +  if (in_p)
>   ccode = invert_tree_comparison (ccode, false);
>switch (ccode)
>   {
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Some -Walloc-size-larger-than= warning fixes (PR driver/81650)

2017-08-03 Thread Richard Biener

On Thu, 3 Aug 2017, Jakub Jelinek wrote:

> Hi!
> 
> This patch attempts to fix some overflows during computation of the size
> and then effectively uses MIN (SSIZE_MAX, user_specified_value) as the
> bound.
> 
> Not really sure what exact behavior we want, whether that, or default
> to SSIZE_MAX (note the documentation mistakenly talks about SIZE_MAX / 2
> instead) and use MIN (SIZE_MAX, user_specified_value) for user specified
> value, or error out if user_specified_value is larger than
> SSIZE_MAX or larger than SIZE_MAX.  Also the else unit = 0; case
> probably should get diagnostics.  Another issue is if we want to diagnose,
> that right now it will be diagnosed only if some allocation routine is seen,
> diagnosing stuff during option processing is unfortunately too early because
> we don't have sizetype/ssizetype built yet.
> 
> Anyway, the following has been bootstrapped/regtested on x86_64-linux and
> i686-linux.

Ok.

Richard.

> 2017-08-03  Jakub Jelinek  
> 
>   PR driver/81650
>   * calls.c (alloc_max_size): Use HOST_WIDE_INT_UC (10??)
>   instead of 10??LU, perform unit multiplication in wide_int,
>   don't change alloc_object_size_limit if the limit is larger
>   than SSIZE_MAX.
> 
>   * gcc.dg/pr81650.c: New test.
> 
> --- gcc/calls.c.jj2017-06-12 12:42:01.0 +0200
> +++ gcc/calls.c   2017-08-02 13:41:00.887324420 +0200
> @@ -1222,32 +1222,38 @@ alloc_max_size (void)
> else if (!strcasecmp (end, "KiB") || strcmp (end, "KB"))
>   unit = 1024;
> else if (!strcmp (end, "MB"))
> - unit = 1000LU * 1000;
> + unit = HOST_WIDE_INT_UC (1000) * 1000;
> else if (!strcasecmp (end, "MiB"))
> - unit = 1024LU * 1024;
> + unit = HOST_WIDE_INT_UC (1024) * 1024;
> else if (!strcasecmp (end, "GB"))
> - unit = 1000LU * 1000 * 1000;
> + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000;
> else if (!strcasecmp (end, "GiB"))
> - unit = 1024LU * 1024 * 1024;
> + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024;
> else if (!strcasecmp (end, "TB"))
> - unit = 1000LU * 1000 * 1000 * 1000;
> + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000;
> else if (!strcasecmp (end, "TiB"))
> - unit = 1024LU * 1024 * 1024 * 1024;
> + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024;
> else if (!strcasecmp (end, "PB"))
> - unit = 1000LU * 1000 * 1000 * 1000 * 1000;
> + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000;
> else if (!strcasecmp (end, "PiB"))
> - unit = 1024LU * 1024 * 1024 * 1024 * 1024;
> + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024;
> else if (!strcasecmp (end, "EB"))
> - unit = 1000LU * 1000 * 1000 * 1000 * 1000 * 1000;
> + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000
> +* 1000;
> else if (!strcasecmp (end, "EiB"))
> - unit = 1024LU * 1024 * 1024 * 1024 * 1024 * 1024;
> + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024
> +* 1024;
> else
>   unit = 0;
>   }
>  
> if (unit)
> - alloc_object_size_limit
> -   = build_int_cst (ssizetype, limit * unit);
> + {
> +   wide_int w = wi::uhwi (limit, HOST_BITS_PER_WIDE_INT + 64);
> +   w *= unit;
> +   if (wi::ltu_p (w, alloc_object_size_limit))
> + alloc_object_size_limit = wide_int_to_tree (ssizetype, w);
> + }
>   }
>   }
>  }
> --- gcc/testsuite/gcc.dg/pr81650.c.jj 2017-08-02 14:52:22.864787221 +0200
> +++ gcc/testsuite/gcc.dg/pr81650.c2017-08-02 14:21:11.0 +0200
> @@ -0,0 +1,9 @@
> +/* PR driver/81650 */
> +/* { dg-do compile } */
> +/* { dg-options "-Walloc-size-larger-than=9223372036854775807" } */
> +
> +void *
> +foo (void)
> +{
> +  return __builtin_malloc (5);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Jan Hubicka

> On 08/02/2017 01:51 PM, Richard Biener wrote:
> > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška  wrote:
> >> Hello.
> >>
> >> After some discussions with Honza, I've decided to convert current code in 
> >> stmt.c that
> >> is responsible for switch expansion. More precisely, I would like to 
> >> convert the code
> >> to expand gswitch statements on tree level. Currently the newly created 
> >> pass is executed
> >> at the end of tree optimizations.
> >>
> >> My plan for future is to inspire in [1] and come up with some more 
> >> sophisticated switch
> >> expansions. For that I've been working on a paper where I'll summarize 
> >> statistics based
> >> on what I've collected in openSUSE distribution with specially 
> >> instrumented GCC. If I'll be
> >> happy I can also fit in to schedule of this year's Cauldron with a talk.
> >>
> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> >>
> >> Thoughts?
> > 
> > First of all thanks.
> > 
> > I think part of switch expansion moved to switch-conversion some time ago
> > (emit_case_bit_tests).  So maybe the full lowering should be in at least
> > the same source file and it should maybe applied earlier for a subset of
> > cases (very low number of cases for example).
> 
> Yep, good idea. I'll take a look.
> 
> > 
> > Did you base the code on the RTL expansion code or did you re-write it from
> > scratch?
> 
> It's based, I've just changed the function that create CFG.

I have talked Martin to do this in first step. Switch expansion is infinitely
difficult problem and I think changing representation first and keeping the
basic algorithm is easiest way to get something done.  Algorithm will be
improved next as far as I know :)

Honza
> 
> Martin

[PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return

2017-08-03 Thread Olivier Hainque

Hello,

Activating dwarf2 based eh for real on VxWorks6 (patch to come) triggers a
libgcc build failure, where most functions resorting to __builtin_eh_return
fail this check from maybe_record_trace_start in dwarf2cfi.c:

 /* We ought to have the same state incoming to a given trace no
 matter how we arrive at the trace.  Anything else means we've
 got some kind of optimization error.  */
 #if CHECKING_P
if (!cfi_row_equal_p (cur_row, ti->beg_row))
...

The inconsistency is introduced by a store inserted in the middle of the insn
stream by arm_set_return_address for __builtin_eh_return, marked FRAME_RELATED
with:

 /* The store needs to be marked as frame related in order to prevent
DSE from deleting it as dead if it is based on fp.  */
 rtx insn = emit_move_insn (gen_frame_mem (Pmode, addr), source);
 RTX_FRAME_RELATED_P (insn) = 1;
 add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (Pmode, LR_REGNUM));

The FRAME_RELATED ness was setup to fixed DWARF2 unwinding at
the time already, indeed broken by DSE removing the store - see
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01669.html.

The attached patch is a proposal to fix this by setting MEM_VOLATILE_P on
the store destination mem instead of setting RTX_FRAME_RELATED_P on the insn,
which alleviates the .cfi complications and is as effective in preventing
the store removal by DSE, from:

  /* Assuming that there are sets in these insns, we cannot delete
 them.  */
  if ((GET_CODE (PATTERN (insn)) == CLOBBER)
  || volatile_refs_p (PATTERN (insn))
  || (!cfun->can_delete_dead_exceptions && !insn_nothrow_p (insn))
  || (RTX_FRAME_RELATED_P (insn))
  || find_reg_note (insn, REG_FRAME_RELATED_EXPR, NULL_RTX))
insn_info->cannot_delete = true;

For testing, I verified that

- an arm-wrs-vxworks build with the extra patch to activate DWARF2
  eh proceeds to completion on mainline,

- the same symptoms were visible and cured on our in-house
  gcc-7 based toolchain, and that

- ACATS are clean for Ada in this configuration after the patch, 
  confirming proper behavior of the DWARF2 exception propagation
  circuitry.

OK to commit ?

Thanks much in advance for your feedback,

With Kind Regards,

Olivier



arm-set-return-address.patch
Description: Binary data

Re: [PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)

2017-08-03 Thread Richard Biener

On Thu, 3 Aug 2017, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs on s390x.  The problem is that the bbpart pass
> calls
> df_set_flags (DF_DEFER_INSN_RESCAN);
> because it wants to defer rescanning, but doesn't actually df_finish_pass
> (it does in one case, but then calls df_set_flags with another changeable 
> flag,
> so it has the same issue), and if the IRA pass is invoked soon after it
> without any df_finish_pass calls in between, we end up with deferred insn
> rescanning during IRA which heavily relies on immediate insn rescanning.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Maybe add a comment in case somebody wonders later?


> 2017-08-03  Jakub Jelinek  
> 
>   PR target/81621
>   * bb-reorder.c (pass_partition_blocks::execute): Return TODO_df_finish
>   after setting changeable df flags.
> 
>   * gcc.dg/pr81621.c: New test.
> 
> --- gcc/bb-reorder.c.jj   2017-07-21 10:28:13.0 +0200
> +++ gcc/bb-reorder.c  2017-08-02 19:43:58.797243254 +0200
> @@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function
>  
>crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
>if (!crossing_edges.exists ())
> -return 0;
> +return TODO_df_finish;

I suppose we can avoid this if we move the df_set_flags after this?
I doubt find_rarely_executed_basic_blocks_and_crossing_edges modifies
anything.

Ok with those changes (if the latter is possible).

Thanks,
Richard.

>crtl->has_bb_partition = true;
>  
> @@ -2970,7 +2970,7 @@ pass_partition_blocks::execute (function
>df_analyze ();
>  }
>
> -  return 0;
> +  return TODO_df_finish;
>  }
>  
>  } // anon namespace
> --- gcc/testsuite/gcc.dg/pr81621.c.jj 2017-08-02 19:52:08.435831121 +0200
> +++ gcc/testsuite/gcc.dg/pr81621.c2017-08-02 19:52:00.026924067 +0200
> @@ -0,0 +1,5 @@
> +/* PR target/81621 */
> +/* { dg-do compile { target freorder } } */
> +/* { dg-options "-Og -fno-split-wide-types -freorder-blocks-and-partition" } 
> */
> +
> +#include "graphite/scop-10.c"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return

2017-08-03 Thread Olivier Hainque

> On 03 Aug 2017, at 11:02, Olivier Hainque  wrote:

> The attached patch is a proposal to fix this by setting MEM_VOLATILE_P on
> the store destination mem instead of setting RTX_FRAME_RELATED_P on the insn,
> which alleviates the .cfi complications and is as effective in preventing
> the store removal by DSE

Forgot the ChangeLog, here it is:

2017-08-03  Olivier Hainque  

* config/arm/arm.c (arm_set_return_address): Use MEM_VOLATILE_P
on the target mem instead of RTX_FRAME_RELATED_P on the insn to
prevent DSE.
(thumb_set_return_address): Likewise.

> OK to commit ?
> 
> Thanks much in advance for your feedback,
> 
> With Kind Regards,
> 
> Olivier

Re: [PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 11:02:51AM +0200, Richard Biener wrote:
> On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > The following testcase ICEs on s390x.  The problem is that the bbpart pass
> > calls
> > df_set_flags (DF_DEFER_INSN_RESCAN);
> > because it wants to defer rescanning, but doesn't actually df_finish_pass
> > (it does in one case, but then calls df_set_flags with another changeable 
> > flag,
> > so it has the same issue), and if the IRA pass is invoked soon after it
> > without any df_finish_pass calls in between, we end up with deferred insn
> > rescanning during IRA which heavily relies on immediate insn rescanning.
> > 
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> 
> Maybe add a comment in case somebody wonders later?

Ok.

> > --- gcc/bb-reorder.c.jj 2017-07-21 10:28:13.0 +0200
> > +++ gcc/bb-reorder.c2017-08-02 19:43:58.797243254 +0200
> > @@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function
> >  
> >crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
> >if (!crossing_edges.exists ())
> > -return 0;
> > +return TODO_df_finish;
> 
> I suppose we can avoid this if we move the df_set_flags after this?
> I doubt find_rarely_executed_basic_blocks_and_crossing_edges modifies
> anything.
> 
> Ok with those changes (if the latter is possible).

I was looking through find_rarely_executed_basic_blocks_and_crossing_edges
before writing the patch and while I could prove for some functions that
it doesn't modify anything, but e.g. for fix_up_crossing_landing_pad I'm
pretty sure it can modify instructions in several ways.
So I think we can't do that.

Jakub

Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote:
> > The reason why we punt is the unexpected _4 != 1 condition, the code
> > is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent
> > to _4 != 1 for boolean type.
> 
> Hmm, I thought we had code to canonicalize boolean compares (but I can't 
> find that right now).  Some is in 

I was looking for that too, but didn't find anything that would be done
always.

> forwprop:forward_propagate_into_gimple_cond where it canonicalizes
> != 1 to == 0.

Yes, but from
Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 by swapping edges.
it seems that it can keep _Bool == 1 around, which is something
optimize_range_tests_var_bound didn't handle before either and now does.

Jakub

[GCC][PATCH][mid-end][Version 3] Optimize x * copysign (1.0, y) [Patch (1/2)]

2017-08-03 Thread Tamar Christina

Hi All,

this patch implements a optimization rewriting

x * copysign (1.0, y)

to:

x ^ (y & (1 << sign_bit_position))


This is only done when not honoring signaling NaNs.
This transormation is done at ssa mult widening time and
is gated on the a check for the optab "xorsign".
If the optab is not available then copysign is expanded as normal.

If the optab exists then the expression is replaced with
a call to the internal function XORSIGN.

This patch is a revival of a previous patches
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00749.html

Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
Regression done on aarch64-none-linux-gnu and no regressions.

Ok for trunk?

gcc/
2017-08-03  Tamar Christina  
Andrew Pinski 

PR middle-end/19706
* internal-fn.def (XORSIGN): New.
* optabs.def (xorsign_optab): New.
* tree-ssa-math-opts.c (is_copysign_call_with_1): New.
(convert_expand_mult_copysign): New.
(pass_optimize_widening_mul::execute): Call 
convert_expand_mult_copysign.

-- 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a9a3f7606eb2a79f64dab1b7fdeef0d308e3061d..58e5f4a322a92ccb842ab05cc4213933ffa59679 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -129,6 +129,8 @@ DEF_INTERNAL_FLT_FN (REMAINDER, ECF_CONST, remainder, binary)
 DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FN (FMAX, ECF_CONST, fmax, binary)
+DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index f21f2267ec2118d5cd0e74b18721525a564d16f2..54afe2d796ee9af3bd7b25d93eb0789a70e47c7b 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -255,6 +255,7 @@ OPTAB_D (asin_optab, "asin$a2")
 OPTAB_D (atan2_optab, "atan2$a3")
 OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
+OPTAB_D (xorsign_optab, "xorsign$F$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
 OPTAB_D (exp2_optab, "exp2$a2")
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 7ac1659fa0670b7080685f3f9513939807073a63..780a7f76ce756cfe025e80845208b00568eda56c 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -3145,6 +3145,96 @@ is_widening_mult_p (gimple *stmt,
   return true;
 }
 
+/* Check to see if the CALL statement is an invocation of copysign
+   with 1. being the first argument.  */
+static bool
+is_copysign_call_with_1 (gimple *call)
+{
+  if (!is_gimple_call (call))
+return false;
+
+  enum combined_fn code = gimple_call_combined_fn (call);
+
+  if (code == CFN_LAST)
+return false;
+
+  gcall *c = as_a (call);
+
+  if (builtin_fn_p (code))
+{
+  switch (as_builtin_fn (code))
+	{
+	CASE_FLT_FN (BUILT_IN_COPYSIGN):
+	CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN):
+	  return real_onep (gimple_call_arg (c, 0));
+	default:
+	  return false;
+	}
+}
+
+  if (internal_fn_p (code))
+{
+  switch (as_internal_fn (code))
+	{
+	case IFN_COPYSIGN:
+	  return real_onep (gimple_call_arg (c, 0));
+	default:
+	  return false;
+	}
+}
+
+   return false;
+}
+
+/* Try to expand the pattern x * copysign (1, y) into xorsign (x, y).
+   This only happens when the the xorsign optab is defined, if the
+   pattern is not a xorsign pattern or if expansion fails FALSE is
+   returned, otherwise TRUE is returned.  */
+static bool
+convert_expand_mult_copysign (gimple *stmt, gimple_stmt_iterator *gsi)
+{
+  tree treeop0, treeop1, lhs, type;
+  location_t loc = gimple_location (stmt);
+  lhs = gimple_assign_lhs (stmt);
+  treeop0 = gimple_assign_rhs1 (stmt);
+  treeop1 = gimple_assign_rhs2 (stmt);
+  type = TREE_TYPE (lhs);
+  machine_mode mode = TYPE_MODE (type);
+
+  if (HONOR_SNANS (type))
+return false;
+
+  if (TREE_CODE (treeop0) == SSA_NAME && TREE_CODE (treeop1) == SSA_NAME)
+{
+  gimple *call0 = SSA_NAME_DEF_STMT (treeop0);
+  if (!is_copysign_call_with_1 (call0))
+	{
+	  /* IPA sometimes inlines and then extracts the function again,
+	 resulting in an incorrect order, so check both ways.  */
+	  call0 = SSA_NAME_DEF_STMT (treeop1);
+	  if (!is_copysign_call_with_1 (call0))
+	return false;
+
+	  treeop1 = treeop0;
+	}
+
+	if (optab_handler (xorsign_optab, mode) == CODE_FOR_nothing)
+	  return false;
+
+	gcall *c = as_a (call0);
+	treeop0 = gimple_call_arg (c, 1);
+
+	gcall *call_stmt
+	  = gimple_build_call_internal (IFN_XORSIGN, 2, treeop1,treeop0);
+	gimple_set_lhs (call_stmt, lhs);
+	gimple_set_location (call_stmt, loc);
+	gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT);
+	return true;
+}
+
+  return false;
+}
+
 /* Process a single gimple statement STMT, which has a MULT_EXPR as
its rhs, and try to convert it into a WIDEN_MULT_EXPR.  The return
value is true iff we converted th

[PATCH, ARM] Handle DWARF2_UNWIND_INFO in arm_except_unwind_info

2017-08-03 Thread Olivier Hainque

Hello,

On top of the correction for fallouts proposed in

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00271.html

the attached patch is needed to support targets still
resorting to DWARF2 unwinding on ARM, such as VxWorks6
(VxWorks7 has moved to ARM EH).

Tested by verifying that a build with mainline sources
for arm-wrs-vxworks proceeds to completion, that a similar
gcc-7 based compiler builds fine as well and passes Ada
ACATS, which confirms proper operation of the DWARF2 based
exceptions.

OK to commit ?

Thanks in advance for your feedback,

Olivier

2017-08-03  Olivier Hainque  

* common/config/arm/arm-common.c (arm_except_unwind_info):
Handle DWARF2_UNWIND_INFO.



arm-handle-dwarf2.diff
Description: Binary data

Re: PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Prathamesh Kulkarni

On 3 August 2017 at 13:21, Jakub Jelinek  wrote:
> On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote:
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt)
>>   return;
>> }
>> break;
>> + case CFN_BUILT_IN_TOUPPER:
>> + case CFN_BUILT_IN_TOLOWER:
>> +   if (tree lhs = gimple_call_lhs (stmt))
>> + {
>> +   tree type = TREE_TYPE (lhs);
>> +   unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A';
>> +   unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z';
>> +   tree range_min = build_int_cstu (type, min);
>> +   tree range_max = build_int_cstu (type, max);
>> +   set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL);
>> +   return;
>
> You are hardcoding here host characters and using it for target.
> I think you need to use
> lang_hooks.to_target_charset
> (really no idea how it works or doesn't in LTO, but gimple-fold.c is already
> using it among others).
> Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps,
> which isn't the case for e.g. EBCDIC.  So I think you'd need to verify
> (once?) that the target charset has this property.
Hi Jakub,
Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset.
Would following be a correct check that target has ascii charset and only then
set range to ~[a, z] ?

target_a = lang_hooks.to_target_charset ('a');
target_z = lang_hooks.to_target_charset('z');
if (target_a == 'a' && target_z == 'z')
  {
 // set vr to ~['a', 'z']
  }

Thanks,
Prathamesh
>
> Jakub

Re: [PATCH][2/2] early LTO debug, main part

2017-08-03 Thread Richard Biener

On Wed, 2 Aug 2017, Jason Merrill wrote:

> On Wed, Aug 2, 2017 at 6:35 AM, Richard Biener  wrote:
> > On Wed, 2 Aug 2017, Jason Merrill wrote:
> >
> >> On 05/19/2017 06:42 AM, Richard Biener wrote:
> >> > + /* ???  In some cases the C++ FE (at least) fails to
> >> > +set DECL_CONTEXT properly.  Simply globalize stuff
> >> > +in this case.  For example
> >> > +__dso_handle created via iostream line 74 col 25.  */
> >> > + parent = comp_unit_die ();
> >>
> >> I've now fixed __dso_handle, so that can be removed from the comment, but 
> >> it
> >> still makes sense to have this fall-back for the (permitted) case of null
> >> DECL_CONTEXT.
> >
> > Adjusted the comment.
> >
> >> > +   /* ???  LANG issue - DW_TAG_module for fortran.  Either look
> >> > +at the input language (if we have enough DECL_CONTEXT to follow)
> >> > +or use a bit in tree_decl_with_vis to record the distinction.  */
> >>
> >> Sure, you should be able to look at TRANSLATION_UNIT_LANGUAGE.
> >
> > Yeah, the comment says we might be able to walk DECL_CONTEXT up to
> > a TRANSLATION_UNIT_DECL.  I've amended is_fortran similar to as I
> > amended is_cxx, providing an overload for a decl, factoring out common
> > code.  So this is now if (is_fortran (decl)) ... = new_die
> > (DW_TAG_module,...).
> >
> >> > ! /* ???  We cannot unconditionally output die_offset if
> >> > !non-zero - at least -feliminate-dwarf2-dups will
> >> > !create references to those DIEs via symbols.  And we
> >> > !do not clear its DIE offset after outputting it
> >> > !(and the label refers to the actual DIEs, not the
> >> > !DWARF CU unit header which is when using label + offset
> >> > !would be the correct thing to do).
> >>
> >> As in our previous discussion, I think -feliminate-dwarf2-dups can go away
> >> now.  But this is a more general issue: die_offset has meant the offset 
> >> from
> >> the beginning of the CU, but if with_offset is set it means an offset from
> >> die_symbol.  Since with_offset changes the meaning of die_symbol and
> >> die_offset, having different code here depending on that flag makes sense.
> >>
> >> It seems likely that when -fEDD goes away, we will never again want to do
> >> direct symbolic references to DIEs, in which case we could drop the current
> >> meaning of die_symbol, and so we wouldn't need the with_offset flag.
> >
> > Yeah, I've been playing with a patch to remove -fEDD but it has conflicts
> > with the early LTO debug work and thus I wanted to postpone it until
> > after that goes in to avoid further churn.  I hope that's fine, it's
> > sth I'll tackle soon after this patch lands on trunk.
> 
> Sure.
> 
> >> > !   /* Don't output the symbol twice.  For LTO we want the label
> >> > !  on the section beginning, not on the actual DIE.  */
> >> > !   && (!flag_generate_lto
> >> > ! || die->die_tag != DW_TAG_compile_unit))
> >>
> >> I think this check should just be !with_offset; if that flag is set the DIE
> >> doesn't actually have its own symbol.
> >
> > with_offset is set only during LTRANS phase for the DIEs refering to
> > the early DIEs via the CU label.  But the above is guarding the
> > early phase when we do not want to output that CU label twice.
> >
> > Can we revisit this check when -fEDD has gone away?
> 
> Yes.
> 
> >> > !   if (old_die
> >> > ! && (c = get_AT_ref (old_die, DW_AT_abstract_origin))
> >> > ! /* ???  In LTO all origin DIEs still refer to the early
> >> > !debug copy.  Detect that.  */
> >> > ! && get_AT (c, DW_AT_inline))
> >> ...
> >> > !   /* "Unwrap" the decls DIE which we put in the imported unit 
> >> > context.
> >> > !   ???  If we finish dwarf2out_function_decl refactoring we can
> >> > ! do this in a better way from the start and only lazily emit
> >> > ! the early DIE references.  */
> >>
> >> It seems like in gen_subprogram_die you deliberately avoid reusing the DIE
> >> from dwarf2out_register_external_die (since it doesn't have DW_AT_inline), 
> >> and
> >> then in add_abstract_origin_attribute you need to look through that 
> >> redundant
> >> die.  Why not reuse it?
> >
> > What we're doing here is dealing with the case of an inlined
> > instance which is adjusted to point back to the early debug copy
> > directly than to the wrapping DIE (supposed to eventually contain
> > the concrete instance).
> 
> But we enter this block when we're emitting the concrete out-of-line
> instance, and the DW_AT_inline check prevents us from using the
> wrapping DIE for the out-of-line instance.
> 
> The comment should really change "inlined instance" to "concrete
> instance"; inlined instances are handled in
> gen_inlined_subroutine_die.

You are right.  I suspect I got confused by the comment when looking
for a way to paper over the check_die ICE removing the check causes.
We end up add

Re: PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 04:13:38PM +0530, Prathamesh Kulkarni wrote:
> > You are hardcoding here host characters and using it for target.
> > I think you need to use
> > lang_hooks.to_target_charset
> > (really no idea how it works or doesn't in LTO, but gimple-fold.c is already
> > using it among others).
> > Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps,
> > which isn't the case for e.g. EBCDIC.  So I think you'd need to verify
> > (once?) that the target charset has this property.
> Hi Jakub,
> Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset.
> Would following be a correct check that target has ascii charset and only then
> set range to ~[a, z] ?
> 
> target_a = lang_hooks.to_target_charset ('a');
> target_z = lang_hooks.to_target_charset('z');
> if (target_a == 'a' && target_z == 'z')
>   {
>  // set vr to ~['a', 'z']
>   }

No.  E.g. if host == target charset is EBCDIC, then the condition is true,
but it isn't a consecutive range.
A rough check that would maybe work (at least would be false for EBCDIC
and true for ASCII) could be something like:
if (target_z > target_a && target_z - target_a == 25)
(which is not exact, some strange charset could have say '0'-'9' block in
the middle of 'a'-'z' block, and say 'h'-'r' outside of the range), but
perhaps good enough for real-world charsets.
In any case, you should probably investigate all the locales say in glibc or
some other big locale repository whether tolower/toupper have the expected
properties there.
Plus of course, the set vr to ~[ ] needs to use target_a/target_z.

Jakub

[nvptx, PR 81662, committed] Error out on nvptx for fpatchable-function-entry

2017-08-03 Thread Tom de Vries

[ was: Re: [testsuite, PR81662, committed] Skip 
fpatchable-function-entry tests for nvptx ]


On 08/03/2017 09:17 AM, Tom de Vries wrote:

Hi,

fpatchable-function-entry requires nop, which nvptx does not have.

I've disabled the corresponding test for nvptx.


This patch errors out on nvptx for fpatchable-function-entry.

Committed.

Thanks,
- Tom
Error out on nvptx for fpatchable-function-entry

2017-08-03  Tom de Vries  

	PR target/81662
	* config/nvptx/nvptx.c (nvptx_option_override): Emit sorry if
	function_entry_patch_area_size > 0.

	* gcc.target/nvptx/patchable_function_entry-default.c: New test.

---
 gcc/config/nvptx/nvptx.c  |  4 
 .../gcc.target/nvptx/patchable_function_entry-default.c   | 15 +++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 208b115..9211d1a 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -180,6 +180,10 @@ nvptx_option_override (void)
   if (!global_options_set.x_flag_no_common)
 flag_no_common = 1;
 
+  /* The patch area requires nops, which we don't have.  */
+  if (function_entry_patch_area_size > 0)
+sorry ("not generating patch area, nops not supported");
+
   /* Assumes that it will see only hard registers.  */
   flag_var_tracking = 0;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c b/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c
new file mode 100644
index 000..4254456
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
+
+extern int a;
+
+int f3 (void);
+
+int
+__attribute__((noinline))
+f3 (void)
+{
+  return 5*a;
+}
+
+/* { dg-excess-errors "sorry, unimplemented: not generating patch area, nops not supported" } */

Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Ximin Luo

Yury Gribov:
> On 03.08.2017 3:06, Ximin Luo wrote:
>> Jeff Law:
>>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
 (Please keep me on CC, I am not subscribed)

 Proposal

 This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. 
 When
 this is set, GCC will treat this as extra implicit 
 "-fdebug-prefix-map=$value"
 command-line arguments that precede any explicit ones. This makes the final
 binary output reproducible, and also hides the unreproducible value (the 
 source
 path prefixes) from CFLAGS et. al. which many build tools (understandably)
 embed as-is into their build output.
>>> I'd *really* avoid doing this with magic environment variables.  Make it
>>> a first class option to the compiler.  Yes, it means projects that want
>>> this behavior have to arrange to pass that flag to their compiler, but
>>> IMHO that's much preferred over environment variables.
>>>
>>> Jeff
>>>
>>
>> Hi Jeff,
>>
>> If by "first class option" you meant a command-line flag, GCC *already has* 
>> that (-fdebug-prefix-map) > and it wasn't enough to achieve reproducibility 
>> in many cases we tested.
> 
> Shouldn't -fdebug-prefix-map be updated to use the same syntax as 
> BUILD_PATH_PREFIX_MAP?
> 

-fdebug-prefix-map is a CLI option and can be given multiple times, each flag 
given is in the form of $from=$to where $from can't contain a '='.

BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the 
form $to=$from:$to=$from with some escaping for flexibility and to support 
things like windows paths. Since it's a new envvar, Ian Jackson suggested 
$to=$from to emphasise the reproducible ($to) part. I liked the idea so I 
implemented it like that. (We did a lot of bikeshedding over on the rb-general 
mailing list about the exact format and this is what we settled on, I'd like to 
avoid getting into that again but would nevertheless do it, if it's necessary 
to get this patch accepted.)

Because -fdebug-prefix-map currently only encodes one $from=$to pair, it would 
be a very disruptive and highly backward-incompatible change to make it use the 
same syntax as B_P_P_M. A slightly less disruptive but still 
backward-incompatible change would be to make it encode a single $to=$from 
pair, but I don't really see the advantage to doing so - what were your 
thoughts on this?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

[PATCH] Fix PR81148

2017-08-03 Thread Richard Biener


This tries to fix all the UBSAN issues in the reassoc case of fold_binary
by never generating any NEGATE_EXPRs from split_tree or association.
This means we have to track minus_var/minus_con and appropriately 
transform.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

This FAILs gcc.dg/tree-ssa/reassoc-23.c because we now optimize this
during FRE1 (after folding in a way that allows CSE to catch this).
Slightly uglifying the testcase exposes a missed optimization in
reassoc so I'm going to fix that separately.

Richard.

2017-08-03 Richard Biener  

PR middle-end/81148
* fold-const.c (split_tree): Add minus_var and minus_con
arguments, remove unused loc arg.  Never generate NEGATE_EXPRs
here but always use minus_*.
(associate_trees): Assert we never associate with MINUS_EXPR
and NULL first operand.  Do not recurse for PLUS_EXPR operands
when associating as MINUS_EXPR either.
(fold_binary_loc): Track minus_var and minus_con.

* c-c++-common/ubsan/pr81148.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 250818)
+++ gcc/fold-const.c(working copy)
@@ -108,8 +108,6 @@ enum comparison_code {
 
 static bool negate_expr_p (tree);
 static tree negate_expr (tree);
-static tree split_tree (location_t, tree, tree, enum tree_code,
-   tree *, tree *, tree *, int);
 static tree associate_trees (location_t, tree, tree, enum tree_code, tree);
 static enum comparison_code comparison_to_compcode (enum tree_code);
 static enum tree_code compcode_to_comparison (enum comparison_code);
@@ -775,12 +773,14 @@ negate_expr (tree t)
same type as IN, but they will have the same signedness and mode.  */
 
 static tree
-split_tree (location_t loc, tree in, tree type, enum tree_code code,
-   tree *conp, tree *litp, tree *minus_litp, int negate_p)
+split_tree (tree in, tree type, enum tree_code code,
+   tree *minus_varp, tree *conp, tree *minus_conp,
+   tree *litp, tree *minus_litp, int negate_p)
 {
   tree var = 0;
-
+  *minus_varp = 0;
   *conp = 0;
+  *minus_conp = 0;
   *litp = 0;
   *minus_litp = 0;
 
@@ -834,27 +834,19 @@ split_tree (location_t loc, tree in, tre
   if (neg_litp_p)
*minus_litp = *litp, *litp = 0;
   if (neg_conp_p && *conp)
-   {
- /* Convert to TYPE before negating.  */
- *conp = fold_convert_loc (loc, type, *conp);
- *conp = negate_expr (*conp);
-   }
+   *minus_conp = *conp, *conp = 0;
   if (neg_var_p && var)
-   {
- /* Convert to TYPE before negating.  */
- var = fold_convert_loc (loc, type, var);
- var = negate_expr (var);
-   }
+   *minus_varp = var, var = 0;
 }
   else if (TREE_CONSTANT (in))
 *conp = in;
   else if (TREE_CODE (in) == BIT_NOT_EXPR
   && code == PLUS_EXPR)
 {
-  /* -X - 1 is folded to ~X, undo that here.  Do _not_ do this
- when IN is constant.  Convert to TYPE before negating.  */
-  *minus_litp = build_one_cst (type);
-  var = negate_expr (fold_convert_loc (loc, type, TREE_OPERAND (in, 0)));
+  /* -1 - X is folded to ~X, undo that here.  Do _not_ do this
+ when IN is constant.  */
+  *litp = build_minus_one_cst (type);
+  *minus_varp = TREE_OPERAND (in, 0);
 }
   else
 var = in;
@@ -866,17 +858,13 @@ split_tree (location_t loc, tree in, tre
   else if (*minus_litp)
*litp = *minus_litp, *minus_litp = 0;
   if (*conp)
-   {
- /* Convert to TYPE before negating.  */
- *conp = fold_convert_loc (loc, type, *conp);
- *conp = negate_expr (*conp);
-   }
+   *minus_conp = *conp, *conp = 0;
+  else if (*minus_conp)
+   *conp = *minus_conp, *minus_conp = 0;
   if (var)
-   {
- /* Convert to TYPE before negating.  */
- var = fold_convert_loc (loc, type, var);
- var = negate_expr (var);
-   }
+   *minus_varp = var, var = 0;
+  else if (*minus_varp)
+   var = *minus_varp, *minus_varp = 0;
 }
 
   if (*litp
@@ -898,7 +886,10 @@ static tree
 associate_trees (location_t loc, tree t1, tree t2, enum tree_code code, tree 
type)
 {
   if (t1 == 0)
-return t2;
+{
+  gcc_assert (t2 == 0 || code != MINUS_EXPR);
+  return t2;
+}
   else if (t2 == 0)
 return t1;
 
@@ -906,6 +897,7 @@ associate_trees (location_t loc, tree t1
  try to fold this since we will have infinite recursion.  But do
  deal with any NEGATE_EXPRs.  */
   if (TREE_CODE (t1) == code || TREE_CODE (t2) == code
+  || TREE_CODE (t1) == PLUS_EXPR || TREE_CODE (t2) == PLUS_EXPR
   || TREE_CODE (t1) == MINUS_EXPR || TREE_CODE (t2) == MINUS_EXPR)
 {
   if (code == PLUS_EXPR)
@@ -9560,8 +9552,8 @@ fold_binary_loc (location_t loc,
   if ((! FLOAT_TYPE_P (type) || flag_associative_math)

[PATCH] Fix gcc.dg/tree-ssa/reassoc-23.c

2017-08-03 Thread Richard Biener


The following makes reassoc also break up subtracts for values used
on the positive side of a MINUS_EXPR.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-08-03  Richard Biener  

* tree-ssa-reassoc.c (should_break_up_subtract): Also break
up if the use is in USE - X.

* gcc.dg/tree-ssa/reassoc-23.c: Adjust to fool early folding
and CSE.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 250818)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -4718,7 +4718,9 @@ should_break_up_subtract (gimple *stmt)
   && (immusestmt = get_single_immediate_use (lhs))
   && is_gimple_assign (immusestmt)
   && (gimple_assign_rhs_code (immusestmt) == PLUS_EXPR
- ||  gimple_assign_rhs_code (immusestmt) == MULT_EXPR))
+ || (gimple_assign_rhs_code (immusestmt) == MINUS_EXPR
+ && gimple_assign_rhs1 (immusestmt) == lhs)
+ || gimple_assign_rhs_code (immusestmt) == MULT_EXPR))
 return true;
   return false;
 }
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c  (revision 250818)
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c  (working copy)
@@ -6,9 +6,10 @@ foo(unsigned int a, unsigned int b, unsi
 unsigned int e, unsigned int f, unsigned int g, unsigned int h)
 {
   /* Should be transformed into e = 20 */
-  unsigned int i = (a + 9) + (c + 8);
-  unsigned int j = (-c + 1) + (-a + 2);
-
+  unsigned int i = (a + 9);
+  unsigned int j = (-c + 1);
+  i += (c + 8);
+  j += (-a + 2);
   e = i + j;
   return e;
 }

[PATCH] Avoid messing up DECL_ABSTRACT_ORIGIN in lto_symtab_prevail_decl

2017-08-03 Thread Richard Biener


$subject because if we put anything decl-ish there 
(set_decl_origin_self...) then we run into lto_fixup_prevailing_decls
ICEing as the state is now no longer the same as during the
time we called mentions_vars_p.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-08-03  Richard Biener  

lto/
* lto-symtab.h (lto_symtab_prevail_decl): Do not use
DECL_ABSTRACT_ORIGIN as flag we can end up using that.  Instead
use DECL_LANG_FLAG_0.
(lto_symtab_prevail_decl): Likewise.

Index: gcc/lto/lto-symtab.h
===
--- gcc/lto/lto-symtab.h(revision 250817)
+++ gcc/lto/lto-symtab.h(working copy)
@@ -23,7 +23,7 @@ extern tree lto_symtab_prevailing_decl (
 extern tree lto_symtab_prevailing_virtual_decl (tree decl);
 
 /* Mark DECL to be previailed by PREVAILING.
-   Use DECL_ABSTRACT_ORIGIN and DECL_CHAIN as special markers; those do not
+   Use DECL_LANG_FLAG_0 and DECL_CHAIN as special markers; those do not
disturb debug_tree and diagnostics.
We are safe to modify them as we wish, because the declarations disappear
from the IL after the merging.  */
@@ -31,10 +31,10 @@ extern tree lto_symtab_prevailing_virtua
 inline void
 lto_symtab_prevail_decl (tree prevailing, tree decl)
 {
-  gcc_checking_assert (DECL_ABSTRACT_ORIGIN (decl) != error_mark_node);
+  gcc_checking_assert (! DECL_LANG_FLAG_0 (decl));
   gcc_assert (TREE_PUBLIC (decl) || DECL_EXTERNAL (decl));
   DECL_CHAIN (decl) = prevailing;
-  DECL_ABSTRACT_ORIGIN (decl) = error_mark_node;
+  DECL_LANG_FLAG_0 (decl) = 1;
 }
 
 /* Given the decl DECL, return the prevailing decl with the same name. */
@@ -42,7 +42,7 @@ lto_symtab_prevail_decl (tree prevailing
 inline tree
 lto_symtab_prevailing_decl (tree decl)
 {
-  if (DECL_ABSTRACT_ORIGIN (decl) == error_mark_node)
+  if (DECL_LANG_FLAG_0 (decl))
 return DECL_CHAIN (decl);
   else
 {

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Steven Bosscher

On Wed, Aug 2, 2017 at 1:51 PM, Richard Biener wrote:
> On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote:
>> Hello.
>>
>> After some discussions with Honza, I've decided to convert current code in 
>> stmt.c that
>> is responsible for switch expansion. More precisely, I would like to convert 
>> the code
>> to expand gswitch statements on tree level. Currently the newly created pass 
>> is executed
>> at the end of tree optimizations.

Hah, something I promissed myself (and others) to do years ago! Thanks
thanks thanks! :-)


>> My plan for future is to inspire in [1] and come up with some more 
>> sophisticated switch
>> expansions. For that I've been working on a paper where I'll summarize 
>> statistics based
>> on what I've collected in openSUSE distribution with specially instrumented 
>> GCC. If I'll be
>> happy I can also fit in to schedule of this year's Cauldron with a talk.

Sayle's paper is a good starting point. Also interesting:



>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Thoughts?
>
> First of all thanks.
>
> I think part of switch expansion moved to switch-conversion some time ago
> (emit_case_bit_tests).  So maybe the full lowering should be in at least
> the same source file and it should maybe applied earlier for a subset of
> cases (very low number of cases for example).
>
> Did you base the code on the RTL expansion code or did you re-write it from
> scratch?
>
> No further comments sofar.
>
> Richard.
>
>> Martin
>>
>> [1] https://www.nextmovesoftware.com/technology/SwitchOptimization.pdf
>>
>> gcc/ChangeLog:
>>
>> 2017-07-31  Martin Liska  
>>
>> * Makefile.in: Add gimple-switch-low.o.
>> * passes.def: Include pass_lower_switch.
>> * stmt.c (dump_case_nodes): Remove and move to
>> gimple-switch-low.c.
>> (case_values_threshold): Likewise.
>> (expand_switch_as_decision_tree_p): Likewise.
>> (emit_case_decision_tree): Likewise.
>> (expand_case): Likewise.
>> (balance_case_nodes): Likewise.
>> (node_has_low_bound): Likewise.
>> (node_has_high_bound): Likewise.
>> (node_is_bounded): Likewise.
>> (emit_case_nodes): Likewise.
>> * timevar.def: Add TV_TREE_SWITCH_LOWERING.
>> * tree-pass.h: Add make_pass_lower_switch.
>> * gimple-switch-low.c: New file.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-07-31  Martin Liska  
>>
>> * gcc.dg/tree-prof/update-loopch.c: Scan patterns in
>> switchlower.
>> * gcc.dg/tree-ssa/vrp104.c: Likewise.
>> ---
>>  gcc/Makefile.in|1 +
>>  gcc/gimple-switch-low.c| 1226 
>> 
>>  gcc/passes.def |1 +
>>  gcc/stmt.c |  861 -
>>  gcc/testsuite/gcc.dg/tree-prof/update-loopch.c |   10 +-
>>  gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |2 +-
>>  gcc/timevar.def|1 +
>>  gcc/tree-pass.h|1 +
>>  8 files changed, 1236 insertions(+), 867 deletions(-)
>>  create mode 100644 gcc/gimple-switch-low.c
>>
>>

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread H.J. Lu

On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  wrote:
> On 02.08.2017 23:04, H.J. Lu wrote:
>>
>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>> wrote:
>>>
>>> On 02.08.2017 21:48, H.J. Lu wrote:


 On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
 
 wrote:
>
>
> On 02.08.2017 20:02, Jeff Law wrote:
>>
>>
>>
>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>
>>>
>>>
>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:



 On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>
>
>
> I've rebased the previous patch to trunk per Andrew's suggestion.
> Original patch description/motivation/questions are in
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html



 Is his stuff used for exception handling?  If so, doesn't that make
 the
 performance a significant concern (ie that msync call?)
>>>
>>>
>>>
>>>
>>> For the performance issue, I wonder if it wouldn't be better to just
>>> compile the unwinder twice, basically include everything needed for
>>> the
>>> verification backtrace in a single *.c file with special defines, and
>>> not export anything from that *.o file except the single entrypoint.
>>> That would also handle the ABI problems.
>>
>>
>>
>> Yea.
>>
>> Given that the vast majority of the time the addresses are valid, I
>> wonder if we could take advantage of that property to keep the
>> overhead
>> lower in the most common cases.
>>
>>> For the concurrent calls of other threads unmapping stacks of running
>>> threads (that is where most of the verification goes against), I'm
>>> afraid
>>> that is simply non-solveable, even if you don't cache anything, it
>>> will
>>> still be racy.
>>
>>
>>
>> Absolutely -- I think solving this stuff 100% reliably without races
>> isn't possible.  I think the question is can we make this
>> significantly
>> better.
>
>
>
>
> All,
>
> First of all, thanks for the feedback.  This is indeed not a 100%
> robust
> solution, just something to allow tools like Asan to produce more sane
> backtraces on crash (currently when stack is corrupted they would crash
> in
> the unwinder without printing at least partial backtrace).
>

 FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
 corrupted:

 https://sourceware.org/bugzilla/show_bug.cgi?id=21752
 https://sourceware.org/bugzilla/show_bug.cgi?id=12189

 There is very little point to unwind stack when stack is corrupted.
>>>
>>>
>>>
>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>> used
>>> in production code where you'd want to halt as soon as corruption is
>>> detected to prevent further damage so even printing message is an
>>> additional
>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>> to
>>> print as many survived frames as possible.
>>>
>>
>> But stack unwinder in libgcc is primarily for production use, not for
>> debugging.
>
>
> I can see it used in different projects to print backtraces on error (bind9,
> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
> seems to be meant primarily for debugging (at least many projects use it for
> this purpose:
> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for
> libbacktrace which, as its README mentions, is meant to "print a detailed
> backtrace when an error occurs or to gather detailed profiling information".

But it shouldn't be performed on a corrupted stack.  When a stack is
corrupted, there is just no way to reliably unwind it.   It isn't a problem
for a debugger which is designed to analyze corrupted program.

> Validation is also performed by libunwind (at least in some cases) which
> libgcc unwinder mimics.
>
>
>> Use it to unwind corrupted stack isn't appropriate.   Santizer can try
>> similar
>> similar to valgrind to invoke a debugger to unwind corrupted stack.
>
>
> -Y



-- 
H.J.

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Yury Gribov

On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
> wrote:
>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>
>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>>> wrote:

 On 02.08.2017 21:48, H.J. Lu wrote:
>
>
> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
> 
> wrote:
>>
>>
>> On 02.08.2017 20:02, Jeff Law wrote:
>>>
>>>
>>>
>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:



 On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>
>
>
> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>
>>
>>
>> I've rebased the previous patch to trunk per Andrew's suggestion.
>> Original patch description/motivation/questions are in
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>
>
>
> Is his stuff used for exception handling?  If so, doesn't that make
> the
> performance a significant concern (ie that msync call?)




 For the performance issue, I wonder if it wouldn't be better to just
 compile the unwinder twice, basically include everything needed for
 the
 verification backtrace in a single *.c file with special defines, and
 not export anything from that *.o file except the single entrypoint.
 That would also handle the ABI problems.
>>>
>>>
>>>
>>> Yea.
>>>
>>> Given that the vast majority of the time the addresses are valid, I
>>> wonder if we could take advantage of that property to keep the
>>> overhead
>>> lower in the most common cases.
>>>
 For the concurrent calls of other threads unmapping stacks of running
 threads (that is where most of the verification goes against), I'm
 afraid
 that is simply non-solveable, even if you don't cache anything, it
 will
 still be racy.
>>>
>>>
>>>
>>> Absolutely -- I think solving this stuff 100% reliably without races
>>> isn't possible.  I think the question is can we make this
>>> significantly
>>> better.
>>
>>
>>
>>
>> All,
>>
>> First of all, thanks for the feedback.  This is indeed not a 100%
>> robust
>> solution, just something to allow tools like Asan to produce more sane
>> backtraces on crash (currently when stack is corrupted they would crash
>> in
>> the unwinder without printing at least partial backtrace).
>>
>
> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
> corrupted:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>
> There is very little point to unwind stack when stack is corrupted.



 I'd argue that situation with __stack_chk_fail is very special.  It's
 used
 in production code where you'd want to halt as soon as corruption is
 detected to prevent further damage so even printing message is an
 additional
 risk.  Debugging tools (e.g. sanitizers) are different and would prefer
 to
 print as many survived frames as possible.

>>>
>>> But stack unwinder in libgcc is primarily for production use, not for
>>> debugging.
>>
>>
>> I can see it used in different projects to print backtraces on error (bind9,
>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
>> seems to be meant primarily for debugging (at least many projects use it for
>> this purpose:
>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for
>> libbacktrace which, as its README mentions, is meant to "print a detailed
>> backtrace when an error occurs or to gather detailed profiling information".
>
> But it shouldn't be performed on a corrupted stack.  When a stack is
> corrupted, there is just no way to reliably unwind it.

Why not? Attached patch shows that it's possible (up to a point of
corruption of course).

> It isn't a problem
> for a debugger which is designed to analyze corrupted program.

Yes but forcing all applications that would like to print helpful
message on stack corruption to employ gdb sounds less efficient that
providing fail-safe APIs in standard library...

>> Validation is also performed by libunwind (at least in some cases) which
>> libgcc unwinder mimics.
>>
>>
>>> Use it to unwind corrupted stack isn't appropriate.   Santizer can try
>>> similar
>>> similar to valgrind to invoke a debugger to unwind corrupted stack.
>>
>>
>> -Y
>
>
>
> --
> H.J.

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Steven Bosscher

On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote:
> Hello.
>
> After some discussions with Honza, I've decided to convert current code in 
> stmt.c that
> is responsible for switch expansion. More precisely, I would like to convert 
> the code
> to expand gswitch statements on tree level. Currently the newly created pass 
> is executed
> at the end of tree optimizations.

Hah, something I promissed myself (and others) to do years ago! Thanks
thanks thanks! :-)

The end of the gimple optimizations is seems late for the lowering.
Before there were gimple optimizations, switch lowering was part of
the first compiler pass (to generate RTL from the front end) *before*
all code transformation passes ("optimizations" ;-). Because the
lowering of switch statements was rewritten as a gimple lowering pass,
it now runs *after* optimizations. But to be honest, I can't think of
any optimization opportunities exposed by lowering earlier than at the
end of gimple optimizations. Years ago there was some RTL jump
threading still done after lowering of small switch statements, but I
can't find the related PRs anymore.

> My plan for future is to inspire in [1] and come up with some more 
> sophisticated switch
> expansions. For that I've been working on a paper where I'll summarize 
> statistics based
> on what I've collected in openSUSE distribution with specially instrumented 
> GCC. If I'll be
> happy I can also fit in to schedule of this year's Cauldron with a talk.

Sayle's paper is a good starting point. Also interesting:

http://llvm.org/devmtg/2017-02-04/Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf
About adjusting the size of jump tables to the capabilities of the CPU
branch predictor. Mixed results but something to consider.

Kannan & Proebsting, "CORRECTION TO ‘PRODUCING GOOD CODE FOR THE CASE
STATEMENT’"
About finding "clusers" of given density within the target values of
the switch statement. The clustering algorithm as presented is N^2 in
the number of case labels, but the idea is interesting and a
simplified, cheaper approach may be possible. This is already used in
LLVM, it seems.

The real challenge will be to figure out how to pick from all these
different approaches the right ones to lower a single switch
statement.

Ciao!
Steven

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Richard Biener

On Thu, Aug 3, 2017 at 2:52 PM, Steven Bosscher  wrote:
> On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote:
>> Hello.
>>
>> After some discussions with Honza, I've decided to convert current code in 
>> stmt.c that
>> is responsible for switch expansion. More precisely, I would like to convert 
>> the code
>> to expand gswitch statements on tree level. Currently the newly created pass 
>> is executed
>> at the end of tree optimizations.
>
> Hah, something I promissed myself (and others) to do years ago! Thanks
> thanks thanks! :-)
>
> The end of the gimple optimizations is seems late for the lowering.
> Before there were gimple optimizations, switch lowering was part of
> the first compiler pass (to generate RTL from the front end) *before*
> all code transformation passes ("optimizations" ;-). Because the
> lowering of switch statements was rewritten as a gimple lowering pass,
> it now runs *after* optimizations. But to be honest, I can't think of
> any optimization opportunities exposed by lowering earlier than at the
> end of gimple optimizations. Years ago there was some RTL jump
> threading still done after lowering of small switch statements, but I
> can't find the related PRs anymore.

I think the main reason for not doing it early is the benefit is small
(unless it is GIMPLE optimizations triggering) and we can't get rid of
switches completely because we eventually have to support casei RTL expansion.
(and no, computed goto with an array of label addresses at GIMPLE is really
too ugly ;))

Richard.

>
>> My plan for future is to inspire in [1] and come up with some more 
>> sophisticated switch
>> expansions. For that I've been working on a paper where I'll summarize 
>> statistics based
>> on what I've collected in openSUSE distribution with specially instrumented 
>> GCC. If I'll be
>> happy I can also fit in to schedule of this year's Cauldron with a talk.
>
> Sayle's paper is a good starting point. Also interesting:
>
> http://llvm.org/devmtg/2017-02-04/Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf
> About adjusting the size of jump tables to the capabilities of the CPU
> branch predictor. Mixed results but something to consider.
>
> Kannan & Proebsting, "CORRECTION TO ‘PRODUCING GOOD CODE FOR THE CASE
> STATEMENT’"
> About finding "clusers" of given density within the target values of
> the switch statement. The clustering algorithm as presented is N^2 in
> the number of case labels, but the idea is interesting and a
> simplified, cheaper approach may be possible. This is already used in
> LLVM, it seems.
>
> The real challenge will be to figure out how to pick from all these
> different approaches the right ones to lower a single switch
> statement.
>
> Ciao!
> Steven

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread H.J. Lu

On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  wrote:
> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
>> wrote:
>>> On 02.08.2017 23:04, H.J. Lu wrote:

 On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
 wrote:
>
> On 02.08.2017 21:48, H.J. Lu wrote:
>>
>>
>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>> 
>> wrote:
>>>
>>>
>>> On 02.08.2017 20:02, Jeff Law wrote:



 On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>
>
>
> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>
>>
>>
>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>
>>>
>>>
>>> I've rebased the previous patch to trunk per Andrew's suggestion.
>>> Original patch description/motivation/questions are in
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>
>>
>>
>> Is his stuff used for exception handling?  If so, doesn't that make
>> the
>> performance a significant concern (ie that msync call?)
>
>
>
>
> For the performance issue, I wonder if it wouldn't be better to just
> compile the unwinder twice, basically include everything needed for
> the
> verification backtrace in a single *.c file with special defines, and
> not export anything from that *.o file except the single entrypoint.
> That would also handle the ABI problems.



 Yea.

 Given that the vast majority of the time the addresses are valid, I
 wonder if we could take advantage of that property to keep the
 overhead
 lower in the most common cases.

> For the concurrent calls of other threads unmapping stacks of running
> threads (that is where most of the verification goes against), I'm
> afraid
> that is simply non-solveable, even if you don't cache anything, it
> will
> still be racy.



 Absolutely -- I think solving this stuff 100% reliably without races
 isn't possible.  I think the question is can we make this
 significantly
 better.
>>>
>>>
>>>
>>>
>>> All,
>>>
>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>> robust
>>> solution, just something to allow tools like Asan to produce more sane
>>> backtraces on crash (currently when stack is corrupted they would crash
>>> in
>>> the unwinder without printing at least partial backtrace).
>>>
>>
>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>> corrupted:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>
>> There is very little point to unwind stack when stack is corrupted.
>
>
>
> I'd argue that situation with __stack_chk_fail is very special.  It's
> used
> in production code where you'd want to halt as soon as corruption is
> detected to prevent further damage so even printing message is an
> additional
> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
> to
> print as many survived frames as possible.
>

 But stack unwinder in libgcc is primarily for production use, not for
 debugging.
>>>
>>>
>>> I can see it used in different projects to print backtraces on error (bind9,
>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
>>> seems to be meant primarily for debugging (at least many projects use it for
>>> this purpose:
>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for
>>> libbacktrace which, as its README mentions, is meant to "print a detailed
>>> backtrace when an error occurs or to gather detailed profiling information".
>>
>> But it shouldn't be performed on a corrupted stack.  When a stack is
>> corrupted, there is just no way to reliably unwind it.
>
> Why not? Attached patch shows that it's possible (up to a point of
> corruption of course).
>
>> It isn't a problem
>> for a debugger which is designed to analyze corrupted program.
>
> Yes but forcing all applications that would like to print helpful
> message on stack corruption to employ gdb sounds less efficient that
> providing fail-safe APIs in standard library...
^^^  There is no such a thing of fail-safe on corrupted
stack.   A bad, but valid address, can be put on corrupted stack.  When
you unwind it, the unwinder may not crash, but give you bogus stack
backtrace.


-- 
H.J.

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Jan Hubicka

> On Thu, Aug 3, 2017 at 2:52 PM, Steven Bosscher  wrote:
> > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote:
> >> Hello.
> >>
> >> After some discussions with Honza, I've decided to convert current code in 
> >> stmt.c that
> >> is responsible for switch expansion. More precisely, I would like to 
> >> convert the code
> >> to expand gswitch statements on tree level. Currently the newly created 
> >> pass is executed
> >> at the end of tree optimizations.
> >
> > Hah, something I promissed myself (and others) to do years ago! Thanks
> > thanks thanks! :-)
> >
> > The end of the gimple optimizations is seems late for the lowering.
> > Before there were gimple optimizations, switch lowering was part of
> > the first compiler pass (to generate RTL from the front end) *before*
> > all code transformation passes ("optimizations" ;-). Because the
> > lowering of switch statements was rewritten as a gimple lowering pass,
> > it now runs *after* optimizations. But to be honest, I can't think of
> > any optimization opportunities exposed by lowering earlier than at the
> > end of gimple optimizations. Years ago there was some RTL jump
> > threading still done after lowering of small switch statements, but I
> > can't find the related PRs anymore.
> 
> I think the main reason for not doing it early is the benefit is small
> (unless it is GIMPLE optimizations triggering) and we can't get rid of
> switches completely because we eventually have to support casei RTL expansion.
> (and no, computed goto with an array of label addresses at GIMPLE is really
> too ugly ;))

Also later in optimization you have better idea of value ranges and you can
shape the decision tree better for given context. In theory at least ;)
One thing that would benefit from earlier expansion is code size/runtime 
estimation
for inlining. But perhaps we could just run the algorithm to see how hard the
switch is when we decide on its size (what is there right now is bit lame)

Honza
> 
> Richard.

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Steven Bosscher

On Thu, Aug 3, 2017 at 2:56 PM, Richard Biener wrote:
> I think the main reason for not doing it early is the benefit is small
> (unless it is GIMPLE optimizations triggering)

Agree.

> and we can't get rid of
> switches completely because we eventually have to support casei RTL expansion.
> (and no, computed goto with an array of label addresses at GIMPLE is really
> too ugly ;))

What I would have done, is lower all gswitch statements that are to be
lowered to something other than a tablejump.
So by the time you get to RTL expansion, all remaining gswitch
statements would be tablejump or casesi.

Ciao!
Steven

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao


> On Aug 2, 2017, at 6:17 PM, David Miller  wrote:
> 
> From: Qing Zhao 
> Date: Wed, 2 Aug 2017 14:41:51 -0500
> 
>> so, could you please specify what kind of side effects will have
>> when set STRICT_ALIGNMENT to true on TARGET_MISALIGN?
> 
> Why don't you read the code rather than just relying upon what
> high level description is given by the documentation instead?

I read the codes before making the change, that’s the reason I ask you to 
specify clearly the bad side effect that I didn’t considered yet.

thanks.

Qing
> 
> Thanks.

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Yury Gribov

On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
> wrote:
>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
>>> wrote:
 On 02.08.2017 23:04, H.J. Lu wrote:
>
> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
> wrote:
>>
>> On 02.08.2017 21:48, H.J. Lu wrote:
>>>
>>>
>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>>> 
>>> wrote:


 On 02.08.2017 20:02, Jeff Law wrote:
>
>
>
> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>
>>
>>
>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>>
>>>
>>>
>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:



 I've rebased the previous patch to trunk per Andrew's suggestion.
 Original patch description/motivation/questions are in
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>>
>>>
>>>
>>> Is his stuff used for exception handling?  If so, doesn't that make
>>> the
>>> performance a significant concern (ie that msync call?)
>>
>>
>>
>>
>> For the performance issue, I wonder if it wouldn't be better to just
>> compile the unwinder twice, basically include everything needed for
>> the
>> verification backtrace in a single *.c file with special defines, and
>> not export anything from that *.o file except the single entrypoint.
>> That would also handle the ABI problems.
>
>
>
> Yea.
>
> Given that the vast majority of the time the addresses are valid, I
> wonder if we could take advantage of that property to keep the
> overhead
> lower in the most common cases.
>
>> For the concurrent calls of other threads unmapping stacks of running
>> threads (that is where most of the verification goes against), I'm
>> afraid
>> that is simply non-solveable, even if you don't cache anything, it
>> will
>> still be racy.
>
>
>
> Absolutely -- I think solving this stuff 100% reliably without races
> isn't possible.  I think the question is can we make this
> significantly
> better.




 All,

 First of all, thanks for the feedback.  This is indeed not a 100%
 robust
 solution, just something to allow tools like Asan to produce more sane
 backtraces on crash (currently when stack is corrupted they would crash
 in
 the unwinder without printing at least partial backtrace).

>>>
>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>>> corrupted:
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>>
>>> There is very little point to unwind stack when stack is corrupted.
>>
>>
>>
>> I'd argue that situation with __stack_chk_fail is very special.  It's
>> used
>> in production code where you'd want to halt as soon as corruption is
>> detected to prevent further damage so even printing message is an
>> additional
>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>> to
>> print as many survived frames as possible.
>>
>
> But stack unwinder in libgcc is primarily for production use, not for
> debugging.


 I can see it used in different projects to print backtraces on error 
 (bind9,
 SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
 seems to be meant primarily for debugging (at least many projects use it 
 for
 this purpose:
 https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
 for
 libbacktrace which, as its README mentions, is meant to "print a detailed
 backtrace when an error occurs or to gather detailed profiling 
 information".
>>>
>>> But it shouldn't be performed on a corrupted stack.  When a stack is
>>> corrupted, there is just no way to reliably unwind it.
>>
>> Why not? Attached patch shows that it's possible (up to a point of
>> corruption of course).
>>
>>> It isn't a problem
>>> for a debugger which is designed to analyze corrupted program.
>>
>> Yes but forcing all applications that would like to print helpful
>> message on stack corruption to employ gdb sounds less efficient that
>> providing fail-safe APIs in standard library...
> ^^^  There is no such a thing of fail-safe on 
> corrupted
> stack.   A bad, but valid address, can be put on corrupted stack.  When
> you unwind it, the unwinder may not cr

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov

On Wed, 2 Aug 2017, Jeff Law wrote:
> >> Well, there's not *that* many qsort calls.  My quick grep shows 94 and
> >> its a very mechanical change.  Then a poison in system.h to ensure raw
> >> calls to qsort don't return.

Note that poisoning qsort outlaws vec::qsort too; it would need to be mass-
renamed as well (to vec::sort, presumably).  It seems there are 83 or more
calls to vec::qsort.

> > Any suggestion for the non-poisoned replacement?  xqsort?  gcc_qsort?
> qsort_chk/qsort_nochk for checked and non-checked?

I believe qsort_chk isn't appropriate, checking is not explicitly a part
of interface, and we never use _chk in potentially-checking tree and RTL
accessors either.

Thanks.
Alexander

Re: PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Joseph Myers

On Thu, 3 Aug 2017, Jakub Jelinek wrote:

> In any case, you should probably investigate all the locales say in glibc or
> some other big locale repository whether tolower/toupper have the expected
> properties there.

They don't.  In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the 
correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a 
multibyte character and toupper can only return single-byte characters.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders

2017-08-03 Thread Olivier Hainque

Hello,

The attached patch fixes a typo in a Makefile rule, which
we noticed while working on fixincludes for a specific target.

Bootstrapped and regression tested fine on x86_64-linux.

OK to commit ?

Thanks much in advance for your feedback,

With Kind Regards,

Olivier

2017-08-04  Douglas Rupp  

* Makefile.in (install-mkheaders): Fix typo, where the multi_dir
variable was referenced as multidir in command.



fixinc-multidir.diff
Description: Binary data

Re: PR78888 - add value range info for tolower/toupper

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 02:38:35PM +, Joseph Myers wrote:
> On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> 
> > In any case, you should probably investigate all the locales say in glibc or
> > some other big locale repository whether tolower/toupper have the expected
> > properties there.
> 
> They don't.  In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the 
> correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a 
> multibyte character and toupper can only return single-byte characters.

Indeed,
#include 
#include 

int
main ()
{
  setlocale (LC_ALL, "");
  int i;
  for (i = -1000; i < 1000; i++)
if (tolower (i) >= 'A' && tolower (i) <= 'Z')
  __builtin_abort ();
else if (toupper (i) >= 'a' && toupper (i) <= 'z')
  __builtin_abort ();
  return 0;
}
fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'.
Not to mention that the result is unspecified if the functions are called
with a value outside of the range of unsigned char or EOF.
Therefore, this optimization is invalid.

Jakub

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread H.J. Lu

On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov  wrote:
> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
>> wrote:
>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
 On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
 wrote:
> On 02.08.2017 23:04, H.J. Lu wrote:
>>
>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>> wrote:
>>>
>>> On 02.08.2017 21:48, H.J. Lu wrote:


 On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
 
 wrote:
>
>
> On 02.08.2017 20:02, Jeff Law wrote:
>>
>>
>>
>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>
>>>
>>>
>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:



 On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>
>
>
> I've rebased the previous patch to trunk per Andrew's suggestion.
> Original patch description/motivation/questions are in
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html



 Is his stuff used for exception handling?  If so, doesn't that make
 the
 performance a significant concern (ie that msync call?)
>>>
>>>
>>>
>>>
>>> For the performance issue, I wonder if it wouldn't be better to just
>>> compile the unwinder twice, basically include everything needed for
>>> the
>>> verification backtrace in a single *.c file with special defines, 
>>> and
>>> not export anything from that *.o file except the single entrypoint.
>>> That would also handle the ABI problems.
>>
>>
>>
>> Yea.
>>
>> Given that the vast majority of the time the addresses are valid, I
>> wonder if we could take advantage of that property to keep the
>> overhead
>> lower in the most common cases.
>>
>>> For the concurrent calls of other threads unmapping stacks of 
>>> running
>>> threads (that is where most of the verification goes against), I'm
>>> afraid
>>> that is simply non-solveable, even if you don't cache anything, it
>>> will
>>> still be racy.
>>
>>
>>
>> Absolutely -- I think solving this stuff 100% reliably without races
>> isn't possible.  I think the question is can we make this
>> significantly
>> better.
>
>
>
>
> All,
>
> First of all, thanks for the feedback.  This is indeed not a 100%
> robust
> solution, just something to allow tools like Asan to produce more sane
> backtraces on crash (currently when stack is corrupted they would 
> crash
> in
> the unwinder without printing at least partial backtrace).
>

 FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
 corrupted:

 https://sourceware.org/bugzilla/show_bug.cgi?id=21752
 https://sourceware.org/bugzilla/show_bug.cgi?id=12189

 There is very little point to unwind stack when stack is corrupted.
>>>
>>>
>>>
>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>> used
>>> in production code where you'd want to halt as soon as corruption is
>>> detected to prevent further damage so even printing message is an
>>> additional
>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>> to
>>> print as many survived frames as possible.
>>>
>>
>> But stack unwinder in libgcc is primarily for production use, not for
>> debugging.
>
>
> I can see it used in different projects to print backtraces on error 
> (bind9,
> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
> and
> seems to be meant primarily for debugging (at least many projects use it 
> for
> this purpose:
> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
> for
> libbacktrace which, as its README mentions, is meant to "print a detailed
> backtrace when an error occurs or to gather detailed profiling 
> information".

 But it shouldn't be performed on a corrupted stack.  When a stack is
 corrupted, there is just no way to reliably unwind it.
>>>
>>> Why not? Attached patch shows that it's possible (up to a point of
>>> corruption of course).
>>>
 It isn't a problem
 for a debugger which is designed to analyze corrupted program.
>>>
>>> Yes but forcing all applications that would like to print helpful
>>> message on stack corruption to employ gdb sounds less efficient that
>>> providing f

Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-03 Thread Segher Boessenkool

Hi Mike,

On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > I think calling this with the rtx elementN args makes this only more
> > complicated (the function comment doesn't say what they are or what
> > NULL means, btw).

You didn't handle the first part of this as far as I see?  It's the
big complicating issue here.

> +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG1.
> +
> +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG2.
> +
> +   The element number is based on the user element ordering, set by the
> +   endianess and by the -maltivec={le,be} options.  */

("endianness", two n's).

I don't like using NULL as a magic value at all; it does not simplify
this interface, it complicates it instead.

Can you move the "which half is high" decision to the callers?


Segher

[PATCH,AIX] 2nd part of support for DWARF debug sections in XCOFF.

2017-08-03 Thread REIX, Tony

Description:
 * This patch provides the 2nd part of the support for DWARF debug sections in 
XCOFF.
 ImportedSymbols() and ImportedLibraries() functions are not implemented.

Tests:
 * AIX: Build: SUCCESS
   - build made by means of gmake in trunk.
   - patch generated by: 
cd gcc-svn-trunk/
svn diff libgo

ChangeLog:
  * libgo/Makefile.am
  * libgo/Makefile.in
  * libgo/go/cmd/cgo/gcc.go
  * libgo/go/cmd/cgo/out.go
  * libgo/go/debug/dwarf/open.go
  * libgo/go/debug/xcoff/file.go
  * libgo/go/debug/xcoff/xcoff.go
  * libgo/go/go/build/deps_test.go


Cordialement,

Tony Reix
Bull - ATOS
IBM Coop Architect & Technical Leader

Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.netIndex: libgo/Makefile.am
===
--- ./libgo/Makefile.am	(revision 250630)
+++ ./libgo/Makefile.am	(working copy)
@@ -215,7 +215,8 @@ toolexeclibgodebug_DATA = \
 	debug/gosym.gox \
 	debug/macho.gox \
 	debug/pe.gox \
-	debug/plan9obj.gox
+	debug/plan9obj.gox \
+	debug/xcoff.gox
 
 toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding
 
@@ -698,6 +699,7 @@ PACKAGES = \
 	debug/macho \
 	debug/pe \
 	debug/plan9obj \
+	debug/xcoff \
 	encoding \
 	encoding/ascii85 \
 	encoding/asn1 \
@@ -1209,6 +1211,7 @@ TEST_PACKAGES = \
 	debug/macho/check \
 	debug/pe/check \
 	debug/plan9obj/check \
+	debug/xcoff/check \
 	encoding/ascii85/check \
 	encoding/asn1/check \
 	encoding/base32/check \
Index: libgo/Makefile.in
===
--- ./libgo/Makefile.in	(revision 250630)
+++ ./libgo/Makefile.in	(working copy)
@@ -608,7 +608,8 @@ toolexeclibgodebug_DATA = \
 	debug/gosym.gox \
 	debug/macho.gox \
 	debug/pe.gox \
-	debug/plan9obj.gox
+	debug/plan9obj.gox \
+	debug/xcoff.gox
 
 toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding
 toolexeclibgoencoding_DATA = \
@@ -861,6 +862,7 @@ PACKAGES = \
 	debug/macho \
 	debug/pe \
 	debug/plan9obj \
+	debug/xcoff \
 	encoding \
 	encoding/ascii85 \
 	encoding/asn1 \
@@ -1239,6 +1241,7 @@ TEST_PACKAGES = \
 	debug/macho/check \
 	debug/pe/check \
 	debug/plan9obj/check \
+	debug/xcoff/check \
 	encoding/ascii85/check \
 	encoding/asn1/check \
 	encoding/base32/check \
Index: libgo/go/cmd/cgo/gcc.go
===
--- ./libgo/go/cmd/cgo/gcc.go	(revision 250630)
+++ ./libgo/go/cmd/cgo/gcc.go	(working copy)
@@ -13,6 +13,7 @@ import (
 	"debug/elf"
 	"debug/macho"
 	"debug/pe"
+	"debug/xcoff"
 	"encoding/binary"
 	"errors"
 	"flag"
@@ -1230,6 +1231,10 @@ func (p *Package) gccMachine() []string {
 		return []string{"-mabi=64"}
 	case "mips", "mipsle":
 		return []string{"-mabi=32"}
+	case "ppc64":
+		if goos == "aix" {
+			return []string{"-maix64"}
+		}
 	}
 	return nil
 }
@@ -1360,7 +1365,29 @@ func (p *Package) gccDebug(stdin []byte) (*dwarf.D
 		return d, binary.LittleEndian, data
 	}
 
-	fatalf("cannot parse gcc output %s as ELF, Mach-O, PE object", gccTmp())
+	if f, err := xcoff.Open(gccTmp()); err == nil {
+		defer f.Close()
+		d, err := f.DWARF()
+		if err != nil {
+			fatalf("cannot load DWARF output from %s: %v", gccTmp(), err)
+		}
+		var data []byte
+		for _, s := range f.Symbols {
+			if isDebugData(s.Name) {
+if i := int(s.SectionNumber) - 1; 0 <= i && i < len(f.Sections) {
+	sect := f.Sections[i]
+	if s.Value < sect.Size {
+		if sdat, err := sect.Data(); err == nil {
+			data = sdat[s.Value:]
+		}
+	}
+}
+			}
+		}
+		return d, binary.BigEndian, data
+	}
+
+	fatalf("cannot parse gcc output %s as ELF, Mach-O, PE, XCOFF object", gccTmp())
 	panic("not reached")
 }
 
Index: libgo/go/cmd/cgo/out.go
===
--- ./libgo/go/cmd/cgo/out.go	(revision 250630)
+++ ./libgo/go/cmd/cgo/out.go	(working copy)
@@ -9,6 +9,7 @@ import (
 	"debug/elf"
 	"debug/macho"
 	"debug/pe"
+	"debug/xcoff"
 	"fmt"
 	"go/ast"
 	"go/printer"
@@ -324,7 +325,28 @@ func dynimport(obj string) {
 		return
 	}
 
-	fatalf("cannot parse %s as ELF, Mach-O or PE", obj)
+	if f, err := xcoff.Open(obj); err == nil {
+		sym, err := f.ImportedSymbols()
+		if err != nil {
+			fatalf("cannot load imported symbols from XCOFF file %s: %v", obj, err)
+		}
+		for _, s := range sym {
+			if len(s) > 0 && s[0] == '.' {
+s = s[1:]
+			}
+			fmt.Fprintf(stdout, "//go:cgo_import_dynamic %s %s %q\n", s, s, "")
+		}
+		lib, err := f.ImportedLibraries()
+		if err != nil {
+			fatalf("cannot load imported libraries from XCOFF file %s: %v", obj, err)
+		}
+		for _, l := range lib {
+			fmt.Fprintf(stdout, "//go:cgo_import_dynamic _ _ %q\n", l)
+		}
+		return
+	}
+
+	fatalf("cannot parse %s as ELF, Mach-O, PE or XCOFF", obj)
 }
 
 // Construct a gcc struct matching the gc argument frame.
Index: libgo/go/debug/dwarf/open.go
===
--- ./libgo/go/debug/dwarf/open.go	(revision 2506

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Yury Gribov

On Thu, Aug 3, 2017 at 3:55 PM, H.J. Lu  wrote:
> On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov  
> wrote:
>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
>>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
>>> wrote:
 On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
> wrote:
>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>
>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>>> 
>>> wrote:

 On 02.08.2017 21:48, H.J. Lu wrote:
>
>
> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
> 
> wrote:
>>
>>
>> On 02.08.2017 20:02, Jeff Law wrote:
>>>
>>>
>>>
>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:



 On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>
>
>
> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>
>>
>>
>> I've rebased the previous patch to trunk per Andrew's suggestion.
>> Original patch description/motivation/questions are in
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>
>
>
> Is his stuff used for exception handling?  If so, doesn't that 
> make
> the
> performance a significant concern (ie that msync call?)




 For the performance issue, I wonder if it wouldn't be better to 
 just
 compile the unwinder twice, basically include everything needed for
 the
 verification backtrace in a single *.c file with special defines, 
 and
 not export anything from that *.o file except the single 
 entrypoint.
 That would also handle the ABI problems.
>>>
>>>
>>>
>>> Yea.
>>>
>>> Given that the vast majority of the time the addresses are valid, I
>>> wonder if we could take advantage of that property to keep the
>>> overhead
>>> lower in the most common cases.
>>>
 For the concurrent calls of other threads unmapping stacks of 
 running
 threads (that is where most of the verification goes against), I'm
 afraid
 that is simply non-solveable, even if you don't cache anything, it
 will
 still be racy.
>>>
>>>
>>>
>>> Absolutely -- I think solving this stuff 100% reliably without races
>>> isn't possible.  I think the question is can we make this
>>> significantly
>>> better.
>>
>>
>>
>>
>> All,
>>
>> First of all, thanks for the feedback.  This is indeed not a 100%
>> robust
>> solution, just something to allow tools like Asan to produce more 
>> sane
>> backtraces on crash (currently when stack is corrupted they would 
>> crash
>> in
>> the unwinder without printing at least partial backtrace).
>>
>
> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
> corrupted:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>
> There is very little point to unwind stack when stack is corrupted.



 I'd argue that situation with __stack_chk_fail is very special.  It's
 used
 in production code where you'd want to halt as soon as corruption is
 detected to prevent further damage so even printing message is an
 additional
 risk.  Debugging tools (e.g. sanitizers) are different and would prefer
 to
 print as many survived frames as possible.

>>>
>>> But stack unwinder in libgcc is primarily for production use, not for
>>> debugging.
>>
>>
>> I can see it used in different projects to print backtraces on error 
>> (bind9,
>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>> and
>> seems to be meant primarily for debugging (at least many projects use it 
>> for
>> this purpose:
>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
>> for
>> libbacktrace which, as its README mentions, is meant to "print a detailed
>> backtrace when an error occurs or to gather detailed profiling 
>> information".
>
> But it shouldn't be performed on a corrupted stack.  When a stack is
> corrupted, there is just no way to reliably unwind it.

 Why not? Attached patch shows that it's possible (up to a point of
 corruption of course).

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread H.J. Lu

On Thu, Aug 3, 2017 at 8:07 AM, Yury Gribov  wrote:
> On Thu, Aug 3, 2017 at 3:55 PM, H.J. Lu  wrote:
>> On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov  
>> wrote:
>>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
 On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
 wrote:
> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov 
>>  wrote:
>>> On 02.08.2017 23:04, H.J. Lu wrote:

 On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
 
 wrote:
>
> On 02.08.2017 21:48, H.J. Lu wrote:
>>
>>
>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>> 
>> wrote:
>>>
>>>
>>> On 02.08.2017 20:02, Jeff Law wrote:



 On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>
>
>
> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>
>>
>>
>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>
>>>
>>>
>>> I've rebased the previous patch to trunk per Andrew's 
>>> suggestion.
>>> Original patch description/motivation/questions are in
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>
>>
>>
>> Is his stuff used for exception handling?  If so, doesn't that 
>> make
>> the
>> performance a significant concern (ie that msync call?)
>
>
>
>
> For the performance issue, I wonder if it wouldn't be better to 
> just
> compile the unwinder twice, basically include everything needed 
> for
> the
> verification backtrace in a single *.c file with special defines, 
> and
> not export anything from that *.o file except the single 
> entrypoint.
> That would also handle the ABI problems.



 Yea.

 Given that the vast majority of the time the addresses are valid, I
 wonder if we could take advantage of that property to keep the
 overhead
 lower in the most common cases.

> For the concurrent calls of other threads unmapping stacks of 
> running
> threads (that is where most of the verification goes against), I'm
> afraid
> that is simply non-solveable, even if you don't cache anything, it
> will
> still be racy.



 Absolutely -- I think solving this stuff 100% reliably without 
 races
 isn't possible.  I think the question is can we make this
 significantly
 better.
>>>
>>>
>>>
>>>
>>> All,
>>>
>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>> robust
>>> solution, just something to allow tools like Asan to produce more 
>>> sane
>>> backtraces on crash (currently when stack is corrupted they would 
>>> crash
>>> in
>>> the unwinder without printing at least partial backtrace).
>>>
>>
>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack 
>> is
>> corrupted:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>
>> There is very little point to unwind stack when stack is corrupted.
>
>
>
> I'd argue that situation with __stack_chk_fail is very special.  It's
> used
> in production code where you'd want to halt as soon as corruption is
> detected to prevent further damage so even printing message is an
> additional
> risk.  Debugging tools (e.g. sanitizers) are different and would 
> prefer
> to
> print as many survived frames as possible.
>

 But stack unwinder in libgcc is primarily for production use, not for
 debugging.
>>>
>>>
>>> I can see it used in different projects to print backtraces on error 
>>> (bind9,
>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>>> and
>>> seems to be meant primarily for debugging (at least many projects use 
>>> it for
>>> this purpose:
>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). 
>>> Same for
>>> libbacktrace which, as its README mentions, is meant to "print a 
>>> detailed
>>> backtrace when an error occurs or to gather detailed profiling 
>

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Jeff Law

On 08/03/2017 08:17 AM, Yury Gribov wrote:
> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
>> wrote:
>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
 On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
 wrote:
> On 02.08.2017 23:04, H.J. Lu wrote:
>>
>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>> wrote:
>>>
>>> On 02.08.2017 21:48, H.J. Lu wrote:


 On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
 
 wrote:
>
>
> On 02.08.2017 20:02, Jeff Law wrote:
>>
>>
>>
>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>
>>>
>>>
>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:



 On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>
>
>
> I've rebased the previous patch to trunk per Andrew's suggestion.
> Original patch description/motivation/questions are in
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html



 Is his stuff used for exception handling?  If so, doesn't that make
 the
 performance a significant concern (ie that msync call?)
>>>
>>>
>>>
>>>
>>> For the performance issue, I wonder if it wouldn't be better to just
>>> compile the unwinder twice, basically include everything needed for
>>> the
>>> verification backtrace in a single *.c file with special defines, 
>>> and
>>> not export anything from that *.o file except the single entrypoint.
>>> That would also handle the ABI problems.
>>
>>
>>
>> Yea.
>>
>> Given that the vast majority of the time the addresses are valid, I
>> wonder if we could take advantage of that property to keep the
>> overhead
>> lower in the most common cases.
>>
>>> For the concurrent calls of other threads unmapping stacks of 
>>> running
>>> threads (that is where most of the verification goes against), I'm
>>> afraid
>>> that is simply non-solveable, even if you don't cache anything, it
>>> will
>>> still be racy.
>>
>>
>>
>> Absolutely -- I think solving this stuff 100% reliably without races
>> isn't possible.  I think the question is can we make this
>> significantly
>> better.
>
>
>
>
> All,
>
> First of all, thanks for the feedback.  This is indeed not a 100%
> robust
> solution, just something to allow tools like Asan to produce more sane
> backtraces on crash (currently when stack is corrupted they would 
> crash
> in
> the unwinder without printing at least partial backtrace).
>

 FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
 corrupted:

 https://sourceware.org/bugzilla/show_bug.cgi?id=21752
 https://sourceware.org/bugzilla/show_bug.cgi?id=12189

 There is very little point to unwind stack when stack is corrupted.
>>>
>>>
>>>
>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>> used
>>> in production code where you'd want to halt as soon as corruption is
>>> detected to prevent further damage so even printing message is an
>>> additional
>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>> to
>>> print as many survived frames as possible.
>>>
>>
>> But stack unwinder in libgcc is primarily for production use, not for
>> debugging.
>
>
> I can see it used in different projects to print backtraces on error 
> (bind9,
> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
> and
> seems to be meant primarily for debugging (at least many projects use it 
> for
> this purpose:
> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
> for
> libbacktrace which, as its README mentions, is meant to "print a detailed
> backtrace when an error occurs or to gather detailed profiling 
> information".

 But it shouldn't be performed on a corrupted stack.  When a stack is
 corrupted, there is just no way to reliably unwind it.
>>>
>>> Why not? Attached patch shows that it's possible (up to a point of
>>> corruption of course).
>>>
 It isn't a problem
 for a debugger which is designed to analyze corrupted program.
>>>
>>> Yes but forcing all applications that would like to print helpful
>>> message on stack corruption to employ gdb sounds less efficient that
>>> providing fail-safe

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread H.J. Lu

On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law  wrote:
> On 08/03/2017 08:17 AM, Yury Gribov wrote:
>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
>>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
>>> wrote:
 On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
> wrote:
>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>
>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>>> 
>>> wrote:

 On 02.08.2017 21:48, H.J. Lu wrote:
>
>
> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
> 
> wrote:
>>
>>
>> On 02.08.2017 20:02, Jeff Law wrote:
>>>
>>>
>>>
>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:



 On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>
>
>
> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>
>>
>>
>> I've rebased the previous patch to trunk per Andrew's suggestion.
>> Original patch description/motivation/questions are in
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>
>
>
> Is his stuff used for exception handling?  If so, doesn't that 
> make
> the
> performance a significant concern (ie that msync call?)




 For the performance issue, I wonder if it wouldn't be better to 
 just
 compile the unwinder twice, basically include everything needed for
 the
 verification backtrace in a single *.c file with special defines, 
 and
 not export anything from that *.o file except the single 
 entrypoint.
 That would also handle the ABI problems.
>>>
>>>
>>>
>>> Yea.
>>>
>>> Given that the vast majority of the time the addresses are valid, I
>>> wonder if we could take advantage of that property to keep the
>>> overhead
>>> lower in the most common cases.
>>>
 For the concurrent calls of other threads unmapping stacks of 
 running
 threads (that is where most of the verification goes against), I'm
 afraid
 that is simply non-solveable, even if you don't cache anything, it
 will
 still be racy.
>>>
>>>
>>>
>>> Absolutely -- I think solving this stuff 100% reliably without races
>>> isn't possible.  I think the question is can we make this
>>> significantly
>>> better.
>>
>>
>>
>>
>> All,
>>
>> First of all, thanks for the feedback.  This is indeed not a 100%
>> robust
>> solution, just something to allow tools like Asan to produce more 
>> sane
>> backtraces on crash (currently when stack is corrupted they would 
>> crash
>> in
>> the unwinder without printing at least partial backtrace).
>>
>
> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
> corrupted:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>
> There is very little point to unwind stack when stack is corrupted.



 I'd argue that situation with __stack_chk_fail is very special.  It's
 used
 in production code where you'd want to halt as soon as corruption is
 detected to prevent further damage so even printing message is an
 additional
 risk.  Debugging tools (e.g. sanitizers) are different and would prefer
 to
 print as many survived frames as possible.

>>>
>>> But stack unwinder in libgcc is primarily for production use, not for
>>> debugging.
>>
>>
>> I can see it used in different projects to print backtraces on error 
>> (bind9,
>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>> and
>> seems to be meant primarily for debugging (at least many projects use it 
>> for
>> this purpose:
>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
>> for
>> libbacktrace which, as its README mentions, is meant to "print a detailed
>> backtrace when an error occurs or to gather detailed profiling 
>> information".
>
> But it shouldn't be performed on a corrupted stack.  When a stack is
> corrupted, there is just no way to reliably unwind it.

 Why not? Attached patch shows that it's possible (up to a point of
 corruption of course).

> It is

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Jeff Law

On 08/03/2017 06:52 AM, Yury Gribov wrote:
> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov  
>> wrote:
>>> On 02.08.2017 23:04, H.J. Lu wrote:

 On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
 wrote:
>
> On 02.08.2017 21:48, H.J. Lu wrote:
>>
>>
>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>> 
>> wrote:
>>>
>>>
>>> On 02.08.2017 20:02, Jeff Law wrote:



 On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>
>
>
> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>
>>
>>
>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>
>>>
>>>
>>> I've rebased the previous patch to trunk per Andrew's suggestion.
>>> Original patch description/motivation/questions are in
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>
>>
>>
>> Is his stuff used for exception handling?  If so, doesn't that make
>> the
>> performance a significant concern (ie that msync call?)
>
>
>
>
> For the performance issue, I wonder if it wouldn't be better to just
> compile the unwinder twice, basically include everything needed for
> the
> verification backtrace in a single *.c file with special defines, and
> not export anything from that *.o file except the single entrypoint.
> That would also handle the ABI problems.



 Yea.

 Given that the vast majority of the time the addresses are valid, I
 wonder if we could take advantage of that property to keep the
 overhead
 lower in the most common cases.

> For the concurrent calls of other threads unmapping stacks of running
> threads (that is where most of the verification goes against), I'm
> afraid
> that is simply non-solveable, even if you don't cache anything, it
> will
> still be racy.



 Absolutely -- I think solving this stuff 100% reliably without races
 isn't possible.  I think the question is can we make this
 significantly
 better.
>>>
>>>
>>>
>>>
>>> All,
>>>
>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>> robust
>>> solution, just something to allow tools like Asan to produce more sane
>>> backtraces on crash (currently when stack is corrupted they would crash
>>> in
>>> the unwinder without printing at least partial backtrace).
>>>
>>
>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>> corrupted:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>
>> There is very little point to unwind stack when stack is corrupted.
>
>
>
> I'd argue that situation with __stack_chk_fail is very special.  It's
> used
> in production code where you'd want to halt as soon as corruption is
> detected to prevent further damage so even printing message is an
> additional
> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
> to
> print as many survived frames as possible.
>

 But stack unwinder in libgcc is primarily for production use, not for
 debugging.
>>>
>>>
>>> I can see it used in different projects to print backtraces on error (bind9,
>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
>>> seems to be meant primarily for debugging (at least many projects use it for
>>> this purpose:
>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for
>>> libbacktrace which, as its README mentions, is meant to "print a detailed
>>> backtrace when an error occurs or to gather detailed profiling information".
>>
>> But it shouldn't be performed on a corrupted stack.  When a stack is
>> corrupted, there is just no way to reliably unwind it.
> 
> Why not? Attached patch shows that it's possible (up to a point of
> corruption of course).
Right.  We should unwind up to the point where we detect something
amiss, then stop.

And that's what I'm most interested in here -- detecting of the invalid
frame data.  THe problem I see is doing it efficiently enough.

jeff

Re: [PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders

2017-08-03 Thread Jeff Law

On 08/03/2017 08:41 AM, Olivier Hainque wrote:
> Hello,
> 
> The attached patch fixes a typo in a Makefile rule, which
> we noticed while working on fixincludes for a specific target.
> 
> Bootstrapped and regression tested fine on x86_64-linux.
> 
> OK to commit ?
> 
> Thanks much in advance for your feedback,
> 
> With Kind Regards,
> 
> Olivier
> 
> 2017-08-04  Douglas Rupp  
> 
> * Makefile.in (install-mkheaders): Fix typo, where the multi_dir
> variable was referenced as multidir in command.
> 
OK.
jeff

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Jeff Law

On 08/03/2017 08:24 AM, Alexander Monakov wrote:
> On Wed, 2 Aug 2017, Jeff Law wrote:
 Well, there's not *that* many qsort calls.  My quick grep shows 94 and
 its a very mechanical change.  Then a poison in system.h to ensure raw
 calls to qsort don't return.
> 
> Note that poisoning qsort outlaws vec::qsort too; it would need to be mass-
> renamed as well (to vec::sort, presumably).  It seems there are 83 or more
> calls to vec::qsort.
Ugh :(  That's an unfortunate implementation of poisoning :(  Consider a
patch to rename those too pre-approved.


> 
>>> Any suggestion for the non-poisoned replacement?  xqsort?  gcc_qsort?
>> qsort_chk/qsort_nochk for checked and non-checked?
> 
> I believe qsort_chk isn't appropriate, checking is not explicitly a part
> of interface, and we never use _chk in potentially-checking tree and RTL
> accessors either.
How about just "sort"?
Jeff

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao

To be more specified,  when reading all the codes corresponding to 
“STRICT_ALIGNMENT” and “SLOW_UNALIGNMENT_ACCESS” in gcc
(NOTE, SLOW_UNALIGNMENT_ACCESS is the same as STRICT_ALIGNMENT when it is NOT 
defined explicitly, this is the case for SPARC)

We can get the following summary: 

all the special handling on STRICT_ALIGNMENT or SLOW_UNALIGNMENT_ACCESS in 
these codes have the following common logic:

if the memory access is known to be not-aligned well during compilation time, 
if the targeted platform does NOT support faster unaligned memory
access,  the compiler will try to make the memory access aligned well. 
Otherwise, if the targeted platform supports faster unaligned memory access,
 it will leave the compiler-time known not-aligned memory access as it, later 
the hardware support will kicked in for these unaligned memory access. 

this behavior is consistent with the high level definition of STRICT_ALIGNMENT. 

And also consistent with the M8 misaligned support:

if the target is NOT TARGET_MISALIGN,  STRICT_ALIGNMENT is 1,  all the 
compiler-time known misaligned memory accesses are adjusted to
aligned memory access before RTL generation;
on the other hand, if the target is TARGET_MISALIGN, STRICT_ALIGNMENT is 0,  
the compiler-time known misaligned memory accesses are NOT
adjusted,  after RTL generation, we will have compiler-time known misaligned 
memory access, we can use the new misaligned ld/st hardware insns to 
support these compiler-time known misaligned memory access. 

hope this is clear.

thanks.

Qing


>> 
>> Why don't you read the code rather than just relying upon what
>> high level description is given by the documentation instead?
> 
> I read the codes before making the change, that’s the reason I ask you to 
> specify clearly the bad side effect that I didn’t considered yet.
> 
> thanks.
> 
> Qing

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Jeff Law

On 08/02/2017 03:36 PM, Florian Weimer wrote:
> * Jeff Law:
> 
>> Something like setup a signal handler when we first start unwinding that
>> flags the error and tear it down when we're done unwinding?Obviously
>> we can't do setup/tear down each time for each address.  Anyway, just
>> thinking outloud here...
> 
> Linux doesn't have per-thread signal handlers, so this doesn't work
> reliably.
Sigh.  There's probably a multitude of reasons why this wouldn't work :-)

> 
> If speed is not a concern, but reliability is, call fork (the system
> call, not glibc's wrapper which calls fork handlers) and do the work
> in a single-threaded copy of the process.  There, you can set up
> signal handlers as you see fit, and the VM layout won't change
> unexpectedly.
Speed is one of the concerns.  Essentially we're pondering an efficient
mechanism to know if a given code address is valid so that we know if we
should even try to unwind through a particular stack frame.

One could probably claim that in the EH path a call to msync shouldn't
be a concern.  I'm skeptical these days, but could be convinced that
it's tolerable.

> 
> To harden unwinding against corrupted tables or table locations, we'd
> have to change ld.so to make all critical data read-only after loading
> and remove the unwinder caches (with more help from ld.so instead).
> It would make sense to move the unwinder implementation into ld.so.
> With proper hardening, corrupted stacks would not be able to cause
> crashes anymore, either.
Yea.  I'm not sure if we're ready to suggest that kind of change yet,
but that may be the long term direction.

jeff

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Yury Gribov

On Thu, Aug 3, 2017 at 4:28 PM, H.J. Lu  wrote:
> On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law  wrote:
>> On 08/03/2017 08:17 AM, Yury Gribov wrote:
>>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu  wrote:
 On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov  
 wrote:
> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu  wrote:
>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov 
>>  wrote:
>>> On 02.08.2017 23:04, H.J. Lu wrote:

 On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
 
 wrote:
>
> On 02.08.2017 21:48, H.J. Lu wrote:
>>
>>
>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>> 
>> wrote:
>>>
>>>
>>> On 02.08.2017 20:02, Jeff Law wrote:



 On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>
>
>
> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>
>>
>>
>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>
>>>
>>>
>>> I've rebased the previous patch to trunk per Andrew's 
>>> suggestion.
>>> Original patch description/motivation/questions are in
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>
>>
>>
>> Is his stuff used for exception handling?  If so, doesn't that 
>> make
>> the
>> performance a significant concern (ie that msync call?)
>
>
>
>
> For the performance issue, I wonder if it wouldn't be better to 
> just
> compile the unwinder twice, basically include everything needed 
> for
> the
> verification backtrace in a single *.c file with special defines, 
> and
> not export anything from that *.o file except the single 
> entrypoint.
> That would also handle the ABI problems.



 Yea.

 Given that the vast majority of the time the addresses are valid, I
 wonder if we could take advantage of that property to keep the
 overhead
 lower in the most common cases.

> For the concurrent calls of other threads unmapping stacks of 
> running
> threads (that is where most of the verification goes against), I'm
> afraid
> that is simply non-solveable, even if you don't cache anything, it
> will
> still be racy.



 Absolutely -- I think solving this stuff 100% reliably without 
 races
 isn't possible.  I think the question is can we make this
 significantly
 better.
>>>
>>>
>>>
>>>
>>> All,
>>>
>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>> robust
>>> solution, just something to allow tools like Asan to produce more 
>>> sane
>>> backtraces on crash (currently when stack is corrupted they would 
>>> crash
>>> in
>>> the unwinder without printing at least partial backtrace).
>>>
>>
>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack 
>> is
>> corrupted:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>
>> There is very little point to unwind stack when stack is corrupted.
>
>
>
> I'd argue that situation with __stack_chk_fail is very special.  It's
> used
> in production code where you'd want to halt as soon as corruption is
> detected to prevent further damage so even printing message is an
> additional
> risk.  Debugging tools (e.g. sanitizers) are different and would 
> prefer
> to
> print as many survived frames as possible.
>

 But stack unwinder in libgcc is primarily for production use, not for
 debugging.
>>>
>>>
>>> I can see it used in different projects to print backtraces on error 
>>> (bind9,
>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>>> and
>>> seems to be meant primarily for debugging (at least many projects use 
>>> it for
>>> this purpose:
>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). 
>>> Same for
>>> libbacktrace which, as its README mentions, is meant to "print a 
>>> detailed
>>> backtrace when an error occurs or to gather detailed profiling 
>>> information".

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 09:33:13AM -0600, Jeff Law wrote:
> On 08/03/2017 08:24 AM, Alexander Monakov wrote:
> > On Wed, 2 Aug 2017, Jeff Law wrote:
>  Well, there's not *that* many qsort calls.  My quick grep shows 94 and
>  its a very mechanical change.  Then a poison in system.h to ensure raw
>  calls to qsort don't return.
> > 
> > Note that poisoning qsort outlaws vec::qsort too; it would need to be mass-
> > renamed as well (to vec::sort, presumably).  It seems there are 83 or more
> > calls to vec::qsort.
> Ugh :(  That's an unfortunate implementation of poisoning :(  Consider a
> patch to rename those too pre-approved.

Do we really need to rename and poison anything?  qsort () in the source is
something that is most obvious to developers, so trying to force them to use
something different will just mean extra thing to learn.

I mean, isn't it better to just add a static inline qsort that in the
checking case calls qsort_chk, or do the redirection using GNU asm
redirection:
typeof (qsort) __asm ("qsort_chk");
and define extern "C" qsort_chk somewhere?
configure could test whether it works on the target (it wouldn't perhaps
on targets which already use some inline for qsort in their headers or use
qsort as a macro (but the latter wouldn't work anyway with GCC already)).
The _5th macro isn't that bad either, appart from using reserved namespace
identifiers (it really should be something like qsort_5th and the arguments
shouldn't start with underscores).

Jakub

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 09:48:47AM -0600, Jeff Law wrote:
> > If speed is not a concern, but reliability is, call fork (the system
> > call, not glibc's wrapper which calls fork handlers) and do the work
> > in a single-threaded copy of the process.  There, you can set up
> > signal handlers as you see fit, and the VM layout won't change
> > unexpectedly.
> Speed is one of the concerns.  Essentially we're pondering an efficient
> mechanism to know if a given code address is valid so that we know if we
> should even try to unwind through a particular stack frame.
> 
> One could probably claim that in the EH path a call to msync shouldn't
> be a concern.  I'm skeptical these days, but could be convinced that
> it's tolerable.

There are many projects that abuse exception handling for normal control
flow and where the EH speed is very important.  While the msync calls were
not performed during normal EH or normal backtrace in the posted patch,
just in sections guarded with __builtin_expect on some variable, it still
makes worse RA because the compiler has to store and load all the state it
could have in call clobbered registers across that spot.

Jakub

Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Jeff Law

On 08/02/2017 10:47 PM, Yury Gribov wrote:
> On 03.08.2017 3:06, Ximin Luo wrote:
>> Jeff Law:
>>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
 (Please keep me on CC, I am not subscribed)


 Proposal
 

 This patch series adds a new environment variable
 BUILD_PATH_PREFIX_MAP. When
 this is set, GCC will treat this as extra implicit
 "-fdebug-prefix-map=$value"
 command-line arguments that precede any explicit ones. This makes
 the final
 binary output reproducible, and also hides the unreproducible value
 (the source
 path prefixes) from CFLAGS et. al. which many build tools
 (understandably)
 embed as-is into their build output.
>>> I'd *really* avoid doing this with magic environment variables.  Make it
>>> a first class option to the compiler.  Yes, it means projects that want
>>> this behavior have to arrange to pass that flag to their compiler, but
>>> IMHO that's much preferred over environment variables.
>>>
>>> Jeff
>>>
>>
>> Hi Jeff,
>>
>> If by "first class option" you meant a command-line flag, GCC *already
>> has* that (-fdebug-prefix-map) > and it wasn't enough to achieve
>> reproducibility in many cases we tested.
> 
> Shouldn't -fdebug-prefix-map be updated to use the same syntax as
> BUILD_PATH_PREFIX_MAP?
Or a different option to cover what Ximin is trying to do with
BUILD_PATH_PREFIX_MAP.  BUILD_PATH_PREFIX_MAP has to go, I'm not going
to approve it in that form.  But I would seriously consider the same
functionality as a command line switch.

jeff

Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Jeff Law

On 08/02/2017 08:06 PM, Ximin Luo wrote:
> Jeff Law:
>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
>>> (Please keep me on CC, I am not subscribed)
>>>
>>>
>>> Proposal
>>> 
>>>
>>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. 
>>> When
>>> this is set, GCC will treat this as extra implicit 
>>> "-fdebug-prefix-map=$value"
>>> command-line arguments that precede any explicit ones. This makes the final
>>> binary output reproducible, and also hides the unreproducible value (the 
>>> source
>>> path prefixes) from CFLAGS et. al. which many build tools (understandably)
>>> embed as-is into their build output.
>> I'd *really* avoid doing this with magic environment variables.  Make it
>> a first class option to the compiler.  Yes, it means projects that want
>> this behavior have to arrange to pass that flag to their compiler, but
>> IMHO that's much preferred over environment variables.
>>
>> Jeff
>>
> 
> Hi Jeff,
> 
> If by "first class option" you meant a command-line flag, GCC *already has* 
> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
> many cases we tested. dpkg-buildflags actually already adds these flags to 
> CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment 
> variable, we are able to reproduce 1800 more packages out of 26000.Then take 
> what you've done with the environment variable and instead
implement it on top of a switch.  An environment variable is absolutely
the wrong thing to do here.

> 
> GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which 
> was accepted about 2 years ago in a patch written by one of our GSoC 
> students. We are not planning any more environment variables like this, and 
> are committed to fixing other sources of non-determinism by patching the 
> relevant build scripts.
I would have rejected that as well :-)  One of the few times I would
have disagreed with Bernd.


Jeff

[PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Bill Schmidt

Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81503 identifies a problem in
SLSR where an invalid replacement is made because the desired value doesn't
fit in the target type.  This patch addresses that issue.  The signed case
is simple, as we require the value not to change when restricted to the
precision of the target type.  For the unsigned case, a negative value is
expected to be converted to the congruent positive value, so we need to
check for that.  We have several test cases in the test suite that exercise
the negative bump for an unsigned/pointer type.

Per Jakub's suggestion I attempted to do all of this within the wide_int
and widest_int framework.  However, I was unable to find a way to generate
maxval as a widest_int within that framework.  I can use 

   wide_int maxval = wi::max_value (prec, sign);

to get a wide_int, but I can't find a way to convert this to a widest_int
so that I can add it to directly to bump.  I am probably missing something
obvious here, so I welcome any suggestions.  At any rate, using
TYPE_MAX_VALUE works, it's just not as clean as I would like.

BTW, I plan a follow-up patch to rewrite the complex condition for 
executing the main block of code, which is starting to get out of control.  
It will be simpler to read if the conditions are negated and used to do an
immediate return.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this ok for trunk?

(Just noticed a tab/space problem on the comment lines, which I will fix.)

Thanks,
Bill


[gcc]

2017-08-03  Bill Schmidt  

PR tree-optimization/81503
* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
folded constant fits in the target type.

[gcc/testsuite]

2017-08-03  Bill Schmidt  

PR tree-optimization/81503
* gcc.c-torture/execute/pr81503.c: New file.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 250791)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
 {
   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
+  unsigned int prec = TYPE_PRECISION (target_type);
+  tree maxval = (POINTER_TYPE_P (target_type)
+? TYPE_MAX_VALUE (sizetype)
+: TYPE_MAX_VALUE (target_type));
 
   /* It is highly unlikely, but possible, that the resulting
  bump doesn't fit in a HWI.  Abandon the replacement
@@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
  types but allows for safe negation without twisted logic.  */
   if (wi::fits_shwi_p (bump)
   && bump.to_shwi () != HOST_WIDE_INT_MIN
+  /* It is more likely that the bump doesn't fit in the target
+type, so check whether constraining it to that type changes
+the value.  For a signed type, the value mustn't change.
+ For an unsigned type, the value may only change to a 
+ congruent value (for negative bumps).  */
+  && (TYPE_UNSIGNED (target_type)
+ || wi::eq_p (bump, wi::ext (bump, prec, SIGNED)))
+  && (!TYPE_UNSIGNED (target_type)
+ || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED))
+ || wi::eq_p (bump + wi::to_widest (maxval) + 1,
+  wi::ext (bump, prec, UNSIGNED)))
   /* It is not useful to replace casts, copies, negates, or adds of
 an SSA name and a constant.  */
   && cand_code != SSA_NAME
Index: gcc/testsuite/gcc.c-torture/execute/pr81503.c
===
--- gcc/testsuite/gcc.c-torture/execute/pr81503.c   (nonexistent)
+++ gcc/testsuite/gcc.c-torture/execute/pr81503.c   (working copy)
@@ -0,0 +1,15 @@
+unsigned short a = 41461;
+unsigned short b = 3419;
+int c = 0;
+
+void foo() {
+  if (a + b * ~(0 != 5))
+c = -~(b * ~(0 != 5)) + 2147483647;
+}
+
+int main() {
+  foo();
+  if (c != 2147476810)
+return -1;
+  return 0;
+}

Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)

2017-08-03 Thread Jeff Law

On 08/03/2017 03:32 AM, Jakub Jelinek wrote:
> On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote:
>>> The reason why we punt is the unexpected _4 != 1 condition, the code
>>> is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent
>>> to _4 != 1 for boolean type.
>>
>> Hmm, I thought we had code to canonicalize boolean compares (but I can't 
>> find that right now).  Some is in 
> 
> I was looking for that too, but didn't find anything that would be done
> always.
I can recall looking for that kind of canonicalization as well, but not
finding it.  Furthermore instrumentation showed the different forms
could show up in the IL, but forms such as != 1 were relatively rare.

Jeff

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-03 Thread Jeff Law

On 08/03/2017 03:02 AM, Jan Hubicka wrote:
>> On 08/02/2017 01:51 PM, Richard Biener wrote:
>>> On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška  wrote:
 Hello.

 After some discussions with Honza, I've decided to convert current code in 
 stmt.c that
 is responsible for switch expansion. More precisely, I would like to 
 convert the code
 to expand gswitch statements on tree level. Currently the newly created 
 pass is executed
 at the end of tree optimizations.

 My plan for future is to inspire in [1] and come up with some more 
 sophisticated switch
 expansions. For that I've been working on a paper where I'll summarize 
 statistics based
 on what I've collected in openSUSE distribution with specially 
 instrumented GCC. If I'll be
 happy I can also fit in to schedule of this year's Cauldron with a talk.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Thoughts?
>>>
>>> First of all thanks.
>>>
>>> I think part of switch expansion moved to switch-conversion some time ago
>>> (emit_case_bit_tests).  So maybe the full lowering should be in at least
>>> the same source file and it should maybe applied earlier for a subset of
>>> cases (very low number of cases for example).
>>
>> Yep, good idea. I'll take a look.
>>
>>>
>>> Did you base the code on the RTL expansion code or did you re-write it from
>>> scratch?
>>
>> It's based, I've just changed the function that create CFG.
> 
> I have talked Martin to do this in first step. Switch expansion is infinitely
> difficult problem and I think changing representation first and keeping the
> basic algorithm is easiest way to get something done.  Algorithm will be
> improved next as far as I know :)
I can support that approach :-)

jeff

Re: [PATCH 0/2] Python testcases to check DWARF output

2017-08-03 Thread Jeff Law

On 08/03/2017 02:27 AM, Pierre-Marie de Rodat wrote:
> On 08/02/2017 05:43 PM, Jeff Law wrote:
>> I hate to throw in a wrench at this point, but has anyone looked at
>> dwgrep from Petr Machata?  He's not doing much with it anymore, but it
>> might provide enough of a dwarf scanning framework to be useful for
>> testing purposes.
> 
> Sure, no problem: I first started talking publicly about this one week
> ago, so it’s definitely not too late to mention alternatives. ;-) I
> learned about dwgrep two years ago and forgot about it, so thank you for
> the idea. I started to have a look at it, and for now I don’t think it’s
> a good match in this context:
> 
>  1. it’s an ELF only tool;
>  2. it must be built, requiring external dependencies: cmake and
> elfutils;
>  3. in order to use it, one must learn a dedicated post-fix language
> (Zwerg)
> 
> For 1. I think this is a true problem, as it means for instance that we
> could not test DWARF on Windows and Darwin setups. Unless we add PE and
> Mach-O handling in dwgrep of course, but that does not sound easy and
> will bring other external dependencies.
> 
> For 3. I feel that, for someone who is comfortable with Python, it will
> be easier to deal with a Python library (the dwarfutils in my patch)
> than having to learn yet another DSL. I think that’s precisely why some
> people would like to have a Python test framework rather than a TCL one.
> Working with a “usual” imperative language looks easier than with
> postfix expressions. Smaller cognitive load.
> 
> Actually I see another problem: pattern will have to vary depending on
> the target platform (for instance 32/64bit or depending on the DWARF
> version). Of course we could duplicate whole patterns in testcases to
> take this into account, but that’s like code duplication: I think we
> should be able to include small “X if 32bit else Y” in patterns, and I
> don’t think we can do that with Zwerg (no way to pass something like
> environment variables).
> 
> Of course, I have written a “competitor” tool: I guess my judgment is
> biased. :-) So other opinions are welcome!
Thanks.  I just wanted to raise it as a possibility.  It looks like it's
not a good fit here, so let's not let it be a distraction.

jeff

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 11:05:54AM -0500, Bill Schmidt wrote:
> --- gcc/gimple-ssa-strength-reduction.c   (revision 250791)
> +++ gcc/gimple-ssa-strength-reduction.c   (working copy)
> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>  {
>tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
> +  unsigned int prec = TYPE_PRECISION (target_type);
> +  tree maxval = (POINTER_TYPE_P (target_type)
> +  ? TYPE_MAX_VALUE (sizetype)
> +  : TYPE_MAX_VALUE (target_type));
>  
>/* It is highly unlikely, but possible, that the resulting
>   bump doesn't fit in a HWI.  Abandon the replacement
> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>   types but allows for safe negation without twisted logic.  */
>if (wi::fits_shwi_p (bump)
>&& bump.to_shwi () != HOST_WIDE_INT_MIN
> +  /* It is more likely that the bump doesn't fit in the target
> +  type, so check whether constraining it to that type changes
> +  the value.  For a signed type, the value mustn't change.
> + For an unsigned type, the value may only change to a 
> + congruent value (for negative bumps).  */
> +  && (TYPE_UNSIGNED (target_type)
> +   || wi::eq_p (bump, wi::ext (bump, prec, SIGNED)))
> +  && (!TYPE_UNSIGNED (target_type)
> +   || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED))
> +   || wi::eq_p (bump + wi::to_widest (maxval) + 1,
> +wi::ext (bump, prec, UNSIGNED)))

wi::ext makes only sense if used with variable TYPE_SIGNED, when
used with constant, it is better to use wi::sext or wi::zext.
And, wouldn't it be more readable to use:
  && (TYPE_UNSIGNED (target_type)
  ? (wi::eq_p (bump, wi::zext (bump, prec))
 || wi::eq_p (bump + wi::to_widest (maxval) + 1,
  wi::zext (bump, prec)))
  : wi::eq_p (bump, wi::sext (bump, prec)))
?
For TYPE_UNSIGNED, do you actually need any restriction?
What kind of bump values are wrong for unsigned types and why?

Jakub

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov

On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> Do we really need to rename and poison anything?  qsort () in the source is
> something that is most obvious to developers, so trying to force them to use
> something different will just mean extra thing to learn.

Yep, I'd prefer to have a solution that keeps C-style qsort invocations as-is.

Note that with vec::qsort -> vec::sort renaming (which should be less
controversial, STL also has std::vector::sort), the argument counting
trick won't be needed, the redirection will simply be:

  #undef qsort
  #define qsort(base, n, sz, cmp) qsort_chk (base, n, sz, cmp)

> I mean, isn't it better to just add a static inline qsort that in the
> checking case calls qsort_chk,

(redefining qsort like that is formally UB, I'd like to avoid it)

> or do the redirection using GNU asm redirection:
> typeof (qsort) __asm ("qsort_chk");
> and define extern "C" qsort_chk somewhere?
> configure could test whether it works on the target (it wouldn't perhaps
> on targets which already use some inline for qsort in their headers or use
> qsort as a macro (but the latter wouldn't work anyway with GCC already)).

I'd rather not go this way.

> The _5th macro isn't that bad either, appart from using reserved namespace
> identifiers (it really should be something like qsort_5th and the arguments
> shouldn't start with underscores).

I didn't understand what Jeff found "ugly" about it; I wonder what epithets
would be applied then to more, ahem, unusual parts of GCC.

Thanks.
Alexander

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Oleg Endo

On Thu, 2017-08-03 at 19:23 +0300, Alexander Monakov wrote:
> 
> Note that with vec::qsort -> vec::sort renaming (which should be less
> controversial, STL also has std::vector::sort)

No it doesn't?  One uses std::sort from  on a pair of random
access iterators to sort a std::vector.

Cheers,
Oleg

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 07:23:31PM +0300, Alexander Monakov wrote:
> On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> > Do we really need to rename and poison anything?  qsort () in the source is
> > something that is most obvious to developers, so trying to force them to use
> > something different will just mean extra thing to learn.
> 
> Yep, I'd prefer to have a solution that keeps C-style qsort invocations as-is.
> 
> Note that with vec::qsort -> vec::sort renaming (which should be less
> controversial, STL also has std::vector::sort), the argument counting

vec::qsort -> vec::sort renaming is fine with me.

Jakub

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Bill Schmidt

On Aug 3, 2017, at 11:20 AM, Jakub Jelinek  wrote:
> 
> On Thu, Aug 03, 2017 at 11:05:54AM -0500, Bill Schmidt wrote:
>> --- gcc/gimple-ssa-strength-reduction.c  (revision 250791)
>> +++ gcc/gimple-ssa-strength-reduction.c  (working copy)
>> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>> {
>>   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>> +  unsigned int prec = TYPE_PRECISION (target_type);
>> +  tree maxval = (POINTER_TYPE_P (target_type)
>> + ? TYPE_MAX_VALUE (sizetype)
>> + : TYPE_MAX_VALUE (target_type));
>> 
>>   /* It is highly unlikely, but possible, that the resulting
>>  bump doesn't fit in a HWI.  Abandon the replacement
>> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>  types but allows for safe negation without twisted logic.  */
>>   if (wi::fits_shwi_p (bump)
>>   && bump.to_shwi () != HOST_WIDE_INT_MIN
>> +  /* It is more likely that the bump doesn't fit in the target
>> + type, so check whether constraining it to that type changes
>> + the value.  For a signed type, the value mustn't change.
>> + For an unsigned type, the value may only change to a 
>> + congruent value (for negative bumps).  */
>> +  && (TYPE_UNSIGNED (target_type)
>> +  || wi::eq_p (bump, wi::ext (bump, prec, SIGNED)))
>> +  && (!TYPE_UNSIGNED (target_type)
>> +  || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED))
>> +  || wi::eq_p (bump + wi::to_widest (maxval) + 1,
>> +   wi::ext (bump, prec, UNSIGNED)))
> 
> wi::ext makes only sense if used with variable TYPE_SIGNED, when
> used with constant, it is better to use wi::sext or wi::zext.

Okay.

> And, wouldn't it be more readable to use:
>  && (TYPE_UNSIGNED (target_type)
> ? (wi::eq_p (bump, wi::zext (bump, prec))
>|| wi::eq_p (bump + wi::to_widest (maxval) + 1,
> wi::zext (bump, prec)))
> : wi::eq_p (bump, wi::sext (bump, prec)))
> ?

Probably.  As noted, it's all becoming a bit unreadable with too
much negative logic in a long conditional, so I want to clean that
up in a follow-up.

> For TYPE_UNSIGNED, do you actually need any restriction?
> What kind of bump values are wrong for unsigned types and why?

If the value of the bump is actually larger than the precision of the
type (not likely except for quite small types), say 2 * (maxval + 1)
which is congruent to 0, the replacement is wrong.

Bill

> 
>   Jakub
>

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov

On Fri, 4 Aug 2017, Oleg Endo wrote:
> > Note that with vec::qsort -> vec::sort renaming (which should be less
> > controversial, STL also has std::vector::sort)
> 
> No it doesn't?  One uses std::sort from  on a pair of random
> access iterators to sort a std::vector.

My mistake, but the main point remains: STL uses 'sort' without the 'q'.

Thanks.
Alexander

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Oleg Endo

On Thu, 2017-08-03 at 19:31 +0300, Alexander Monakov wrote:
> 
> My mistake, but the main point remains: STL uses 'sort' without the
> 'q'.

I think it'd be better if GCC's custom containers somewhat tried to
follow C++ standard container idioms.  Chopping off the 'q' is one step
towards it.

Cheers,
Oleg

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Jakub Jelinek

On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote:
> > And, wouldn't it be more readable to use:
> >  && (TYPE_UNSIGNED (target_type)
> >   ? (wi::eq_p (bump, wi::zext (bump, prec))
> >  || wi::eq_p (bump + wi::to_widest (maxval) + 1,
> >   wi::zext (bump, prec)))
> >   : wi::eq_p (bump, wi::sext (bump, prec)))
> > ?
> 
> Probably.  As noted, it's all becoming a bit unreadable with too
> much negative logic in a long conditional, so I want to clean that
> up in a follow-up.
> 
> > For TYPE_UNSIGNED, do you actually need any restriction?
> > What kind of bump values are wrong for unsigned types and why?
> 
> If the value of the bump is actually larger than the precision of the
> type (not likely except for quite small types), say 2 * (maxval + 1)
> which is congruent to 0, the replacement is wrong.

Ah, ok.  Anyway, for unsigned type, perhaps it could be written as:
  && (TYPE_UNSIGNED (target_type)
  ? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1
   : bump, wi::zext (bump, prec))
  : wi::eq_p (bump, wi::sext (bump, prec)))
I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1
value has no chance to be equal to zero extended bump, and
for bump < 0 only that one has a chance.

Jakub

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread David Miller

From: Qing Zhao 
Date: Thu, 3 Aug 2017 10:37:15 -0500

> all the special handling on STRICT_ALIGNMENT or
> SLOW_UNALIGNMENT_ACCESS in these codes have the following common
> logic:
> 
> if the memory access is known to be not-aligned well during
> compilation time, if the targeted platform does NOT support faster
> unaligned memory access, the compiler will try to make the memory
> access aligned well. Otherwise, if the targeted platform supports
> faster unaligned memory access, it will leave the compiler-time
> known not-aligned memory access as it, later the hardware support
> will kicked in for these unaligned memory access.
> 
> this behavior is consistent with the high level definition of 
> STRICT_ALIGNMENT. 

That's exactly the problem.

What you want with this M8 feature is simply to let the compiler know
that if it is completely impossible to make some memory object
aligned, then the cpu can handle this with special instructions.

You still want the compiler to make the effort to align data when it
can because the accesses will be faster than if it used the unaligned
loads and stores.

This is incredibly important for on-stack objects.

Re: [PATCH] Verify edge probability consistency in verify_flow_info

2017-08-03 Thread Jeff Law

On 08/02/2017 10:07 AM, Tom de Vries wrote:
> Hi,
> 
> I.
> 
> for target nvptx we recently ran into PR81442, an ICE in verify_flow_info:
> ...
> error: verify_flow_info: REG_BR_PROB is set but cfg probability is not
> ...
> 
> We start out with a jump instruction:
> ...
> (jump_insn 18 17 31 2 (set (pc)
> (if_then_else (ne (reg:BI 83)
> (const_int 0 [0]))
> (label_ref 31)
> (pc))) "reduction-5.c":52 104 {br_true}
>  (expr_list:REG_DEAD (reg:BI 83)
> (int_list:REG_BR_PROB 1 (nil)))
>  -> 31)
> ...
> 
> The probability is set on the branch edge, but not on the fallthru edge.
> 
> After fixup_reorder_chain, the condition is reversed, and the
> probability reg-note update accordingly:
> ...
> (jump_insn 18 17 32 2 (set (pc)
> (if_then_else (eq (reg:BI 83)
> (const_int 0 [0]))
> (label_ref:DI 33)
> (pc))) "reduction-5.c":52 105 {br_false}
>  (expr_list:REG_DEAD (reg:BI 83)
> (int_list:REG_BR_PROB 0 (nil)))
>  -> 33)
> ...
> 
> The branch and fallthru edge are also reversed, which means now the
> probability is set on the fallthru edge, and not on the branch edge.
> 
> This is the immediate cause for the verify_flow_info error.
> 
> 
> II.
> 
> We've fixed the ICE in the target by setting the probability on the
> fallthru edge when we introduce it (r250422).
> 
> We've noted other places where we were forgetting to set the probability
> (fixed in r250823), but that did not trigger the ICE.
> 
> 
> III.
> 
> I've written this patch to check for the missing probability more
> consistently. I'm not certain if we can require that the probability
> should always be set, so I'm just requiring that if it is set on one
> outgoing edge, it needs to be set on all outgoing edges.
> 
> Sofar I've build a c-only x86_64 non-bootstrap compiler and ran dg.exp.
> The only problem I ran into was in attr-simd{,-2,-4}.c. I've written a
> tentative patch for that, and will submit it as follow-up.
> 
> Is this check a good idea?
I think the additional checking is a good idea.  Ideally we'd verify
that all edges have a probability.  Until then I think you need some
kind of rationale in a comment for why the checking is limited.

> 
> OK for trunk if bootstrap and reg-test on x86_64 succeeds?
Yea, but I'd like to see ongoing work towards full checking.

Jeff

Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Ximin Luo

Jeff Law:
> On 08/02/2017 08:06 PM, Ximin Luo wrote:
>> Jeff Law:
>>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
 (Please keep me on CC, I am not subscribed)


 Proposal
 

 This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. 
 When
 this is set, GCC will treat this as extra implicit 
 "-fdebug-prefix-map=$value"
 command-line arguments that precede any explicit ones. This makes the final
 binary output reproducible, and also hides the unreproducible value (the 
 source
 path prefixes) from CFLAGS et. al. which many build tools (understandably)
 embed as-is into their build output.
>>> I'd *really* avoid doing this with magic environment variables.  Make it
>>> a first class option to the compiler.  Yes, it means projects that want
>>> this behavior have to arrange to pass that flag to their compiler, but
>>> IMHO that's much preferred over environment variables.
>>>
>>> Jeff
>>>
>>
>> Hi Jeff,
>>
>> If by "first class option" you meant a command-line flag, GCC *already has* 
>> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
>> many cases we tested. dpkg-buildflags actually already adds these flags to 
>> CFLAGS CXXFLAGS etc on Debian. However, with this patch using the 
>> environment variable, we are able to reproduce 1800 more packages out of 
>> 26000.
> Then take what you've done with the environment variable and instead 
> implement it on top of a switch.  An environment variable is absolutely the 
> wrong thing to do here.
> 
>>
>> GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which 
>> was accepted about 2 years ago in a patch written by one of our GSoC 
>> students. We are not planning any more environment variables like this, and 
>> are committed to fixing other sources of non-determinism by patching the 
>> relevant build scripts.
> I would have rejected that as well :-)  One of the few times I would
> have disagreed with Bernd.
> 
> 

Could you go into some more detail on why you think an envvar is absolutely the 
wrong thing to do here?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Florian Weimer

* Jeff Law:

>> If speed is not a concern, but reliability is, call fork (the system
>> call, not glibc's wrapper which calls fork handlers) and do the work
>> in a single-threaded copy of the process.  There, you can set up
>> signal handlers as you see fit, and the VM layout won't change
>> unexpectedly.

> Speed is one of the concerns.

I thought this was debugging-only, optional unwinder behavior.

> Essentially we're pondering an efficient
> mechanism to know if a given code address is valid so that we know if we
> should even try to unwind through a particular stack frame.

Efficient code address validation is possible using data structures
maintained by ld.so.  If those data structures are mostly read-only,
they cannot be corrupted, so pointers and offsets in them don't have
to be validated before derefencing them.

But the main problem is the frame pointer chain, and the current lack
of registration for stacks.  The program can just malloc some area,
switch the SP, and start running from there.  Some libraries do
exactly that.  And without knowledge of the bottom address of the
stack, it is impossible to verify that the frame pointer chain goes
somewhere off the stack.

> One could probably claim that in the EH path a call to msync shouldn't
> be a concern.  I'm skeptical these days, but could be convinced that
> it's tolerable.

Quite a few applications are sensitive to EH performance.  The current
freshness check for the cache is very inefficient, which is a
significant problem for some users.  If we can push out the flag check
the top level (essentially compiling the unwinder twice, with and
without checking), I don't think this would cause significant
additional performance issues.

Re: [PATCH][2/2] early LTO debug, main part

2017-08-03 Thread Jason Merrill

On Thu, Aug 3, 2017 at 6:51 AM, Richard Biener  wrote:
> On Wed, 2 Aug 2017, Jason Merrill wrote:
>
>> On Wed, Aug 2, 2017 at 6:35 AM, Richard Biener  wrote:
>> > On Wed, 2 Aug 2017, Jason Merrill wrote:
>> >
>> >> On 05/19/2017 06:42 AM, Richard Biener wrote:
>> >> > + /* ???  In some cases the C++ FE (at least) fails to
>> >> > +set DECL_CONTEXT properly.  Simply globalize stuff
>> >> > +in this case.  For example
>> >> > +__dso_handle created via iostream line 74 col 25.  */
>> >> > + parent = comp_unit_die ();
>> >>
>> >> I've now fixed __dso_handle, so that can be removed from the comment, but 
>> >> it
>> >> still makes sense to have this fall-back for the (permitted) case of null
>> >> DECL_CONTEXT.
>> >
>> > Adjusted the comment.
>> >
>> >> > +   /* ???  LANG issue - DW_TAG_module for fortran.  Either look
>> >> > +at the input language (if we have enough DECL_CONTEXT to follow)
>> >> > +or use a bit in tree_decl_with_vis to record the distinction.  */
>> >>
>> >> Sure, you should be able to look at TRANSLATION_UNIT_LANGUAGE.
>> >
>> > Yeah, the comment says we might be able to walk DECL_CONTEXT up to
>> > a TRANSLATION_UNIT_DECL.  I've amended is_fortran similar to as I
>> > amended is_cxx, providing an overload for a decl, factoring out common
>> > code.  So this is now if (is_fortran (decl)) ... = new_die
>> > (DW_TAG_module,...).
>> >
>> >> > ! /* ???  We cannot unconditionally output die_offset if
>> >> > !non-zero - at least -feliminate-dwarf2-dups will
>> >> > !create references to those DIEs via symbols.  And we
>> >> > !do not clear its DIE offset after outputting it
>> >> > !(and the label refers to the actual DIEs, not the
>> >> > !DWARF CU unit header which is when using label + offset
>> >> > !would be the correct thing to do).
>> >>
>> >> As in our previous discussion, I think -feliminate-dwarf2-dups can go away
>> >> now.  But this is a more general issue: die_offset has meant the offset 
>> >> from
>> >> the beginning of the CU, but if with_offset is set it means an offset from
>> >> die_symbol.  Since with_offset changes the meaning of die_symbol and
>> >> die_offset, having different code here depending on that flag makes sense.
>> >>
>> >> It seems likely that when -fEDD goes away, we will never again want to do
>> >> direct symbolic references to DIEs, in which case we could drop the 
>> >> current
>> >> meaning of die_symbol, and so we wouldn't need the with_offset flag.
>> >
>> > Yeah, I've been playing with a patch to remove -fEDD but it has conflicts
>> > with the early LTO debug work and thus I wanted to postpone it until
>> > after that goes in to avoid further churn.  I hope that's fine, it's
>> > sth I'll tackle soon after this patch lands on trunk.
>>
>> Sure.
>>
>> >> > !   /* Don't output the symbol twice.  For LTO we want the label
>> >> > !  on the section beginning, not on the actual DIE.  */
>> >> > !   && (!flag_generate_lto
>> >> > ! || die->die_tag != DW_TAG_compile_unit))
>> >>
>> >> I think this check should just be !with_offset; if that flag is set the 
>> >> DIE
>> >> doesn't actually have its own symbol.
>> >
>> > with_offset is set only during LTRANS phase for the DIEs refering to
>> > the early DIEs via the CU label.  But the above is guarding the
>> > early phase when we do not want to output that CU label twice.
>> >
>> > Can we revisit this check when -fEDD has gone away?
>>
>> Yes.
>>
>> >> > !   if (old_die
>> >> > ! && (c = get_AT_ref (old_die, DW_AT_abstract_origin))
>> >> > ! /* ???  In LTO all origin DIEs still refer to the early
>> >> > !debug copy.  Detect that.  */
>> >> > ! && get_AT (c, DW_AT_inline))
>> >> ...
>> >> > !   /* "Unwrap" the decls DIE which we put in the imported unit 
>> >> > context.
>> >> > !   ???  If we finish dwarf2out_function_decl refactoring we can
>> >> > ! do this in a better way from the start and only lazily emit
>> >> > ! the early DIE references.  */
>> >>
>> >> It seems like in gen_subprogram_die you deliberately avoid reusing the DIE
>> >> from dwarf2out_register_external_die (since it doesn't have 
>> >> DW_AT_inline), and
>> >> then in add_abstract_origin_attribute you need to look through that 
>> >> redundant
>> >> die.  Why not reuse it?
>> >
>> > What we're doing here is dealing with the case of an inlined
>> > instance which is adjusted to point back to the early debug copy
>> > directly than to the wrapping DIE (supposed to eventually contain
>> > the concrete instance).
>>
>> But we enter this block when we're emitting the concrete out-of-line
>> instance, and the DW_AT_inline check prevents us from using the
>> wrapping DIE for the out-of-line instance.
>>
>> The comment should really change "inlined instance" to "concrete
>> instance"; inlined instances are handled in
>> gen_inlined_

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Florian Weimer

* Yuri Gribov:

> I've rebased the previous patch to trunk per Andrew's suggestion.
> Original patch description/motivation/questions are in
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html

What about the remaining TODOs?

> +  if (__builtin_expect (msync ((void *)page, len, MS_ASYNC), 0))
> +return 0;

Is there any reason not to probe using mincore?  It won't trigger
writeback.

Re: [PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders

2017-08-03 Thread Olivier Hainque


> On 03 Aug 2017, at 17:30, Jeff Law  wrote:
> 
>>* Makefile.in (install-mkheaders): Fix typo, where the multi_dir
>>variable was referenced as multidir in command.
>> 
> OK.

Great, Thanks Jeff!

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Jeff Law

On 08/03/2017 11:03 AM, Florian Weimer wrote:
> * Jeff Law:
> 
>>> If speed is not a concern, but reliability is, call fork (the system
>>> call, not glibc's wrapper which calls fork handlers) and do the work
>>> in a single-threaded copy of the process.  There, you can set up
>>> signal handlers as you see fit, and the VM layout won't change
>>> unexpectedly.
> 
>> Speed is one of the concerns.
> 
> I thought this was debugging-only, optional unwinder behavior.
My understanding was this was in the EH path as well.  If it's outside
the EH path, then I don't care about the cost of msync and the like :-)

> 
>> Essentially we're pondering an efficient
>> mechanism to know if a given code address is valid so that we know if we
>> should even try to unwind through a particular stack frame.
> 
> Efficient code address validation is possible using data structures
> maintained by ld.so.  If those data structures are mostly read-only,
> they cannot be corrupted, so pointers and offsets in them don't have
> to be validated before derefencing them.
Understood.

> 
> But the main problem is the frame pointer chain, and the current lack
> of registration for stacks.  The program can just malloc some area,
> switch the SP, and start running from there.  Some libraries do
> exactly that.  And without knowledge of the bottom address of the
> stack, it is impossible to verify that the frame pointer chain goes
> somewhere off the stack.
Yea.

> 
>> One could probably claim that in the EH path a call to msync shouldn't
>> be a concern.  I'm skeptical these days, but could be convinced that
>> it's tolerable.
> 
> Quite a few applications are sensitive to EH performance.  The current
> freshness check for the cache is very inefficient, which is a
> significant problem for some users.  If we can push out the flag check
> the top level (essentially compiling the unwinder twice, with and
> without checking), I don't think this would cause significant
> additional performance issues.
Yea.  I went back and reviewed the internal BZ.  It's actually just the
print-a-backtrace thing rather than EH unwinding.  SO yea, if the
print-a-backtrace path and EH path both use bits, we could just compile
them twice, once with checking once without respectively.


Jeff

Re: [PATCH 1/3] matching tokens: c-family parts

2017-08-03 Thread Jeff Law

On 08/01/2017 02:21 PM, David Malcolm wrote:
> (Unchanged since v1; already approved by Marek, assuming rest is approved)
> 
> gcc/c-family/ChangeLog:
>   * c-common.c (c_parse_error): Add rich_location * param, using it
>   rather implicitly using input_location.
>   * c-common.h (c_parse_error): Add rich_location * param.
> 
> gcc/testsuite/ChangeLog:
>   * c-c++-common/missing-close-symbol.c: New test case.
>   * c-c++-common/missing-symbol.c: New test case.
LGTM.
jeff

Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-03 Thread Michael Meissner

On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.

I am not sure exactly what you are asking for.  This is like the other output
functions that take the rtx insns.

> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
>
> Can you move the "which half is high" decision to the callers?

And then essentially there is no need for the function, since each of the 4
concat variants have to have the logic to support big endian, -maltivec=be, and
-maltivec=le.

Let me see what I can do about it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH 2/3] Matching tokens: C parts (v2)

2017-08-03 Thread Jeff Law

On 08/01/2017 02:21 PM, David Malcolm wrote:
> Changed in v2:
> 
> * Renamed template argument to traits_t; eliminated subclasses, just
>   using traits struct.
> * Moved enum constants into struct bodies (string constants can't be
>   without constexpr, which isn't available in C++98).
> * Fixed typo.
> 
> OK for trunk?
> 
> gcc/c/ChangeLog:
>   * c-parser.c (c_parser_error): Rename to...
>   (c_parser_error_richloc): ...this, making static, and adding
>   "richloc" parameter, passing it to the c_parse_error call,
>   rather than calling c_parser_set_source_position_from_token.
>   (c_parser_error): Reintroduce, reimplementing in terms of the
>   above, converting return type from void to bool.
>   (class token_pair): New class.
>   (struct matching_paren_traits): New struct.
>   (matching_parens): New typedef.
>   (struct matching_brace_traits): New struct.
>   (matching_braces): New typedef.
>   (get_matching_symbol): New function.
>   (c_parser_require): Add param MATCHING_LOCATION, using it to
>   highlight matching "opening" tokens for missing "closing" tokens.
>   (c_parser_skip_until_found): Likewise.
>   (c_parser_static_assert_declaration_no_semi): Convert explicit
>   parsing of CPP_OPEN_PAREN and CPP_CLOSE_PAREN to use of
>   class matching_parens, so that the pertinent open parenthesis is
>   highlighted when there are problems locating the close
>   parenthesis.
>   (c_parser_struct_or_union_specifier): Likewise.
>   (c_parser_typeof_specifier): Likewise.
>   (c_parser_alignas_specifier): Likewise.
>   (c_parser_simple_asm_expr): Likewise.
>   (c_parser_braced_init): Likewise, for matching_braces.
>   (c_parser_paren_condition): Likewise, for matching_parens.
>   (c_parser_switch_statement): Likewise.
>   (c_parser_for_statement): Likewise.
>   (c_parser_asm_statement): Likewise.
>   (c_parser_asm_operands): Likewise.
>   (c_parser_cast_expression): Likewise.
>   (c_parser_sizeof_expression): Likewise.
>   (c_parser_alignof_expression): Likewise.
>   (c_parser_generic_selection): Likewise.
>   (c_parser_postfix_expression): Likewise for cases RID_VA_ARG,
>   RID_OFFSETOF, RID_TYPES_COMPATIBLE_P, RID_AT_SELECTOR,
>   RID_AT_PROTOCOL, RID_AT_ENCODE, reindenting as necessary.
>   In case CPP_OPEN_PAREN, pass loc_open_paren to the
>   c_parser_skip_until_found call.
>   (c_parser_objc_class_definition): Use class matching_parens as
>   above.
>   (c_parser_objc_method_decl): Likewise.
>   (c_parser_objc_try_catch_finally_statement): Likewise.
>   (c_parser_objc_synchronized_statement): Likewise.
>   (c_parser_objc_at_property_declaration): Likewise.
>   (c_parser_oacc_wait_list): Likewise.
>   (c_parser_omp_var_list_parens): Likewise.
>   (c_parser_omp_clause_collapse): Likewise.
>   (c_parser_omp_clause_default): Likewise.
>   (c_parser_omp_clause_if): Likewise.
>   (c_parser_omp_clause_num_threads): Likewise.
>   (c_parser_omp_clause_num_tasks): Likewise.
>   (c_parser_omp_clause_grainsize): Likewise.
>   (c_parser_omp_clause_priority): Likewise.
>   (c_parser_omp_clause_hint): Likewise.
>   (c_parser_omp_clause_defaultmap): Likewise.
>   (c_parser_oacc_single_int_clause): Likewise.
>   (c_parser_omp_clause_ordered): Likewise.
>   (c_parser_omp_clause_reduction): Likewise.
>   (c_parser_omp_clause_schedule): Likewise.
>   (c_parser_omp_clause_num_teams): Likewise.
>   (c_parser_omp_clause_thread_limit): Likewise.
>   (c_parser_omp_clause_aligned): Likewise.
>   (c_parser_omp_clause_linear): Likewise.
>   (c_parser_omp_clause_safelen): Likewise.
>   (c_parser_omp_clause_simdlen): Likewise.
>   (c_parser_omp_clause_depend): Likewise.
>   (c_parser_omp_clause_map): Likewise.
>   (c_parser_omp_clause_device): Likewise.
>   (c_parser_omp_clause_dist_schedule): Likewise.
>   (c_parser_omp_clause_proc_bind): Likewise.
>   (c_parser_omp_clause_uniform): Likewise.
>   (c_parser_omp_for_loop): Likewise.
>   (c_parser_cilk_clause_vectorlength): Likewise.
>   (c_parser_cilk_clause_linear): Likewise.
>   (c_parser_transaction_expression): Likewise.
>   * c-parser.h (c_parser_require): Add param matching_location with
>   default UNKNOWN_LOCATION.
>   (c_parser_error): Convert return type from void to bool.
>   (c_parser_skip_until_found): Add param matching_location with
>   default UNKNOWN_LOCATION.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/unclosed-init.c: New test case.
Phew.  I only spot-checked most of the changes around the new API for
requiring the open/close paren/brace/bracket or consuming
parens/braces/brackets.  They were very mechanical :-)

Jeff

Re: [PATCH 1/3] c-family: add name_hint/deferred_diagnostic

2017-08-03 Thread Jeff Law

On 07/03/2017 11:32 AM, David Malcolm wrote:
> On Mon, 2017-07-03 at 10:25 -0600, Jeff Law wrote:
>> On 05/05/2017 11:51 AM, David Malcolm wrote:
>>> In various places we use lookup_name_fuzzy to provide a hint,
>>> and can report messages of the form:
>>>   error: unknown foo named 'bar'
>>> or:
>>>   error: unknown foo named 'bar'; did you mean 'SUGGESTION?
>>>
>>> This patch provides a way for lookup_name_fuzzy to provide
>>> both the suggestion above, and (optionally) additional hints
>>> that can be printed e.g.
>>>
>>>   note: did you forget to include ?
>>>
>>> This patch provides the mechanism and ports existing users
>>> of lookup_name_fuzzy to the new return type.
>>> There are no uses of such hints in this patch, but followup
>>> patches provide various front-end specific uses of this.
>>>
>>> gcc/c-family/ChangeLog:
>>> * c-common.h (class deferred_diagnostic): New class.
>>> (class name_hint): New class.
>>> (lookup_name_fuzzy): Convert return type from const char *
>>> to name_hint.  Add location_t param.
>>>
>>> gcc/c/ChangeLog:
>>> * c-decl.c (implicit_decl_warning): Convert "hint" from
>>> const char * to name_hint.  Pass location to
>>> lookup_name_fuzzy.  Suppress any deferred diagnostic if the
>>> warning was not printed.
>>> (undeclared_variable): Likewise for "guessed_id".
>>> (lookup_name_fuzzy): Convert return type from const char *
>>> to name_hint.  Add location_t param.
>>> * c-parser.c (c_parser_declaration_or_fndef): Convert "hint"
>>> from
>>> const char * to name_hint.  Pass location to lookup_name_fuzzy.
>>> (c_parser_parameter_declaration): Pass location to
>>> lookup_name_fuzzy.
>>>
>>> gcc/cp/ChangeLog:
>>> * name-lookup.c (suggest_alternatives_for): Convert
>>> "fuzzy_name" from
>>> const char * to name_hint, and rename to "hint".  Pass location
>>> to
>>> lookup_name_fuzzy.
>>> (lookup_name_fuzzy): Convert return type from const char *
>>> to name_hint.  Add location_t param.
>>> * parser.c (cp_parser_diagnose_invalid_type_name): Convert
>>> "suggestion" from const char * to name_hint, and rename to
>>> "hint".
>>> Pass location to lookup_name_fuzzy.
>>
>>> ---
>>>  gcc/c-family/c-common.h | 121
>>> +++-
>>>  gcc/c/c-decl.c  |  35 +++---
>>>  gcc/c/c-parser.c|  16 ---
>>>  gcc/cp/name-lookup.c|  17 +++
>>>  gcc/cp/parser.c |  12 ++---
>>>  5 files changed, 163 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
>>> index 138a0a6..83c1a68 100644
>>> --- a/gcc/c-family/c-common.h
>>> +++ b/gcc/c-family/c-common.h
>>> @@ -1009,7 +1009,126 @@ enum lookup_name_fuzzy_kind {
>>>/* Any name.  */
>>>FUZZY_LOOKUP_NAME
>>>  };
>>> -extern const char *lookup_name_fuzzy (tree, enum
>>> lookup_name_fuzzy_kind);
>>> +
>>> +/* A deferred_diagnostic is a wrapper around optional extra
>>> diagnostics
>>> +   that we may want to bundle into a name_hint.
>>> +
>>> +   The emit method is called when no name_hint instances reference
>>> +   the deferred_diagnostic.  In the simple case this is when the
>>> name_hint
>>> +   goes out of scope, but a reference-counting scheme is used to
>>> allow
>>> +   name_hint instances to be copied.  */
>>> +
>>> +class deferred_diagnostic
>>> +{
>>> + public:
>>> +  virtual ~deferred_diagnostic () {}
>>> +  virtual void emit () = 0;
>>> +
>>> +  void incref () { m_refcnt++; }
>>> +  void decref ()
>>> +  {
>>> +if (--m_refcnt == 0)
>>> +  {
>>> +   if (!m_suppress)
>>> + emit ();
>>> +   delete this;
>>> +  }
>>> +  }
>>> +
>>> +  location_t get_location () const { return m_loc; }
>>> +
>>> +  /* Call this if the corresponding warning was not emitted,
>>> + in which case we should also not emit the
>>> deferred_diagnostic.  */
>>> +  void suppress ()
>>> +  {
>>> +m_suppress = true;
>>> +  }
>>> +
>>> + protected:
>>> +  deferred_diagnostic (location_t loc)
>>> +  : m_refcnt (0), m_loc (loc), m_suppress (false) {}
>>> +
>>> + private:
>>> +  int m_refcnt;
>>> +  location_t m_loc;
>>> +  bool m_suppress;
>>> +};
>> So what stands out here is "delete this" and the need for explicit
>> reference counting.  Also doesn't that imply that deferred_diagnostic
>> objects must be allocated on the heap?  Is there another way to get
>> the
>> behavior you want without resorting to something like this?
>>
> 
> Thanks for looking at this.
> 
> Yes: deferred_diagnostic instances are heap-allocated.  This is because
> it's an abstract base class; each concrete subclass is an
> implementation detail within the frontends, for isolating the special
> -case logic for each different kind of hint, and thus these concrete
> subclasses are hidden within the FE code.
> 
> My initial implementation of the above had the name_hint class directly
> "own" the deferred_diagnostic ptr, with a:
>   delete m_deferred;
> within name

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Bill Schmidt



-- Bill

Bill Schmidt, Ph.D.
GCC for Linux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschm...@linux.vnet.ibm.com

> On Aug 3, 2017, at 11:39 AM, Jakub Jelinek  wrote:
> 
> On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote:
>>> And, wouldn't it be more readable to use:
>>> && (TYPE_UNSIGNED (target_type)
>>>   ? (wi::eq_p (bump, wi::zext (bump, prec))
>>>  || wi::eq_p (bump + wi::to_widest (maxval) + 1,
>>>   wi::zext (bump, prec)))
>>>   : wi::eq_p (bump, wi::sext (bump, prec)))
>>> ?
>> 
>> Probably.  As noted, it's all becoming a bit unreadable with too
>> much negative logic in a long conditional, so I want to clean that
>> up in a follow-up.
>> 
>>> For TYPE_UNSIGNED, do you actually need any restriction?
>>> What kind of bump values are wrong for unsigned types and why?
>> 
>> If the value of the bump is actually larger than the precision of the
>> type (not likely except for quite small types), say 2 * (maxval + 1)
>> which is congruent to 0, the replacement is wrong.
> 
> Ah, ok.  Anyway, for unsigned type, perhaps it could be written as:
>  && (TYPE_UNSIGNED (target_type)
> ? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1
>  : bump, wi::zext (bump, prec))
> : wi::eq_p (bump, wi::sext (bump, prec)))
> I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1
> value has no chance to be equal to zero extended bump, and
> for bump < 0 only that one has a chance.

Yeah, that's true.  And arguably my case for the really large bump
causing problems is kind of thin, because the program is probably
already broken in that case anyway.  But I think I will sleep better
having the check in there, as somebody other than SLSR will catch
the bug then. ;-)

Thanks for all the help with this one.  These corner cases are
always tricky, and I appreciate the extra eyeballs.

Bill

> 
>   Jakub
>

libgo patch committed: Fix signal counting for glibc 2.26

2017-08-03 Thread Ian Lance Taylor

This patch to libgo changes the mksigtab script to recognize the glibc
2.26 NSIG expression.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.  Will commit to GCC 7
branch when it reopens.  This fixes GCC PR 81617 and
https://golang.org/issue/21147.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 250832)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c1ac6bc99f988633c6bc68a5ca9ffad3487750ef
+adac632f95d1cd3421c9c1df5204db10b6a92c44
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksigtab.sh
===
--- libgo/mksigtab.sh   (revision 250406)
+++ libgo/mksigtab.sh   (working copy)
@@ -107,6 +107,19 @@ if test "${GOOS}" = "aix"; then
 nsig=`expr $nsig + 1`
 else
 nsig=`grep 'const _*NSIG = [0-9]*$' gen-sysinfo.go | sed -e 's/.* = 
\([0-9]*\)/\1/'`
+if test -z "$nsig"; then
+   if grep 'const _*NSIG = [ (]*_*SIGRTMAX + 1[ )]*' gen-sysinfo.go 
>/dev/null 2>&1; then
+   rtmax=`grep 'const _*SIGRTMAX = [0-9]*$' gen-sysinfo.go | sed -e 
's/.* = \([0-9]*\)/\1/'`
+   if test -n "$rtmax"; then
+   nsig=`expr $rtmax + 1`
+   fi
+   fi
+fi
+fi
+
+if test -z "$nsig"; then
+echo 1>&2 "could not determine number of signals"
+exit 1
 fi
 
 i=1

Re: [PATCH, rs6000] Clean up capitalized diagnostic messages

2017-08-03 Thread Bill Schmidt


> On Aug 2, 2017, at 6:43 PM, Segher Boessenkool  
> wrote:
> 
> Hi Bill,
> 
> On Wed, Aug 02, 2017 at 10:29:20AM -0500, Bill Schmidt wrote:
>> I don't anticipate backporting any of this.
> 
> Good :-)
> 
>> @@ -6802,7 +6802,7 @@ altivec_resolve_overloaded_builtin (location_t loc
>> if (unsupported_builtin)
>>   {
>>  const char *name = rs6000_overloaded_builtin_name (fcode);
>> -error ("Builtin function %s not supported in this compiler 
>> configuration",
>> +error ("builtin function %s not supported in this compiler 
>> configuration",
>> name);
> 
> As Martin says, %qs for this and similar (see the documentation before
> pp_format in pretty-print.c).  Can be a separate patch of course, this
> one is big enough already.
> 
>> Index: gcc/config/rs6000/rs6000.c
>> ===
>> --- gcc/config/rs6000/rs6000.c   (revision 250791)
>> +++ gcc/config/rs6000/rs6000.c   (working copy)
>> @@ -4132,7 +4132,7 @@ rs6000_option_override_internal (bool global_init_
>>   || rs6000_cpu == PROCESSOR_PPCE5500)
>> {
>>   if (TARGET_ALTIVEC)
>> -error ("AltiVec not supported in this target");
>> +error ("altivec not supported in this target");
>> }
>> 
>>   /* If we are optimizing big endian systems for space, use the load/store
> 
> Let's either keep AltiVec or say -maltivec.  We only have this warning
> because we allow -maltivec with CPUs that do not support it; and this
> warning is only for some of the FSL CPUs.  It isn't very consistent.

Back to AltiVec it goes!  Thanks.

> 
>> @@ -4250,7 +4250,7 @@ rs6000_option_override_internal (bool global_init_
>>rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
>>  }
>>else
>> -error ("Power9 target option is incompatible with -mcpu= for "
>> +error ("power9 target option is incompatible with -mcpu= for "
>> " less than power9");
>>  }
>>   else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
> 
> See also PR79477.  Since many of these options are going away it is
> probably not worth spending too much time on this, not until stage 3
> or so anyway.

Yeah, let's address that later in the year after Mike finishes his cleanups.

> 
>> @@ -11226,7 +11226,7 @@ rs6000_return_in_memory (const_tree type, const_tr
>>   static bool warned_for_return_big_vectors = false;
>>   if (!warned_for_return_big_vectors)
>>  {
>> -  warning (OPT_Wpsabi, "GCC vector returned by reference: "
>> +  warning (OPT_Wpsabi, "gcc vector returned by reference: "
>> "non-standard ABI extension with no compatibility 
>> guarantee");
>>warned_for_return_big_vectors = true;
>>  }
> 
> Maybe the warning should just say "big vector"?  Or "generic vector"?
> 
> (Vectors that fit in one VR, or in GPRs in 8 bytes or less, do not have
> the problem this warns for.  Kind of hard to express tersely and
> precisely though).

I looked in the GCC manual and couldn't find a better way of expressing
this than just "GCC vector," so I will return it to the way it was.  "GCC 
vector extension vector" is accurate but hardly trips lightly off the
tongue...

> 
> Approved for trunk with whichever of the suggested changes you think
> are good.  Thanks,

Thanks much!

Bill

> 
> 
> Segher
>

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao

> On Aug 3, 2017, at 11:40 AM, David Miller  wrote:
> 
> From: Qing Zhao 
> Date: Thu, 3 Aug 2017 10:37:15 -0500
> 
>> all the special handling on STRICT_ALIGNMENT or
>> SLOW_UNALIGNMENT_ACCESS in these codes have the following common
>> logic:
>> 
>> if the memory access is known to be not-aligned well during
>> compilation time, if the targeted platform does NOT support faster
>> unaligned memory access, the compiler will try to make the memory
>> access aligned well. Otherwise, if the targeted platform supports
>> faster unaligned memory access, it will leave the compiler-time
>> known not-aligned memory access as it, later the hardware support
>> will kicked in for these unaligned memory access.
>> 
>> this behavior is consistent with the high level definition of 
>> STRICT_ALIGNMENT. 
> 
> That's exactly the problem.
> 
> What you want with this M8 feature is simply to let the compiler know
> that if it is completely impossible to make some memory object
> aligned, then the cpu can handle this with special instructions.

> 
> You still want the compiler to make the effort to align data when it
> can because the accesses will be faster than if it used the unaligned
> loads and stores.

I don’t think the above is true.

first, the compiler-time known misaligned memory access can always be emulated 
by aligned memory access ( by byte-size load/stores).  then there will be no 
compiler-time known 
misaligned memory access left for the special misaligned ld/st insns. 

second, there are always overhead cost for the compiler-time effort to make the 
compiler-time known unaligned memory access as aligned memory access. (adding 
additional
padding, or split the unaligned multi-bytes to single-byte load/store), all 
such overhead might be even bigger than the overhead of the special misaligned 
load/store itself.

to decide which is better (to use software emulation or use hardware misaligned 
load/store insns), experiments might be needed to justify the performance 
impact.

This set of change is to provide a way to use misaligned load/store insns to 
implement the compiler-time known unaligned memory access,  -mno-misalign can 
be used
to disable such behavior very easily if our performance data shows that 
misaligned load/store insns are slower than the current software emulation. 

Qing

> 
> This is incredibly important for on-stack objects.

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-08-03 Thread Yury Gribov


On 03.08.2017 18:04, Florian Weimer wrote:

* Yuri Gribov:


I've rebased the previous patch to trunk per Andrew's suggestion.
Original patch description/motivation/questions are in
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html


What about the remaining TODOs?


Yes, need to be fixed.  Note that patch is an RFC so I didn't want to 
polish it until I get enough feedback from others.



+  if (__builtin_expect (msync ((void *)page, len, MS_ASYNC), 0))
+return 0;


Is there any reason not to probe using mincore?  It won't trigger
writeback.


Nice, for some reason I didn't realize it provides necessary info.

-Y

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-03 Thread Bill Schmidt

Hi,

Here's v2 of the patch with Jakub's suggestions incorporated.  Bootstrapped
and tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
trunk?

Eventually this should be backported to all active releases as well.
Ok for that after a week or so of burn-in? (And after 7.2, I imagine.)

Thanks,
Bill


[gcc]

2017-08-03  Bill Schmidt  
Jakub Jelinek  

PR tree-optimization/81503
* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
folded constant fits in the target type.

[gcc/testsuite]

2017-08-03  Bill Schmidt  
Jakub Jelinek  

PR tree-optimization/81503
* gcc.c-torture/execute/pr81503.c: New file.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 250791)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
 {
   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
+  unsigned int prec = TYPE_PRECISION (target_type);
+  tree maxval = (POINTER_TYPE_P (target_type)
+? TYPE_MAX_VALUE (sizetype)
+: TYPE_MAX_VALUE (target_type));
 
   /* It is highly unlikely, but possible, that the resulting
  bump doesn't fit in a HWI.  Abandon the replacement
@@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
  types but allows for safe negation without twisted logic.  */
   if (wi::fits_shwi_p (bump)
   && bump.to_shwi () != HOST_WIDE_INT_MIN
+  /* It is more likely that the bump doesn't fit in the target
+type, so check whether constraining it to that type changes
+the value.  For a signed type, the value mustn't change.
+For an unsigned type, the value may only change to a 
+congruent value (for negative bumps).  */
+  && (TYPE_UNSIGNED (target_type)
+ ? wi::eq_p (wi::neg_p (bump)
+ ? bump + wi::to_widest (maxval) + 1
+ : bump,
+ wi::zext (bump, prec))
+ : wi::eq_p (bump, wi::sext (bump, prec)))
   /* It is not useful to replace casts, copies, negates, or adds of
 an SSA name and a constant.  */
   && cand_code != SSA_NAME
Index: gcc/testsuite/gcc.c-torture/execute/pr81503.c
===
--- gcc/testsuite/gcc.c-torture/execute/pr81503.c   (nonexistent)
+++ gcc/testsuite/gcc.c-torture/execute/pr81503.c   (working copy)
@@ -0,0 +1,15 @@
+unsigned short a = 41461;
+unsigned short b = 3419;
+int c = 0;
+
+void foo() {
+  if (a + b * ~(0 != 5))
+c = -~(b * ~(0 != 5)) + 2147483647;
+}
+
+int main() {
+  foo();
+  if (c != 2147476810)
+return -1;
+  return 0;
+}


On 8/3/17 1:02 PM, Bill Schmidt wrote:
>> On Aug 3, 2017, at 11:39 AM, Jakub Jelinek  wrote:
>>
>> On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote:
 And, wouldn't it be more readable to use:
 && (TYPE_UNSIGNED (target_type)
  ? (wi::eq_p (bump, wi::zext (bump, prec))
 || wi::eq_p (bump + wi::to_widest (maxval) + 1,
  wi::zext (bump, prec)))
  : wi::eq_p (bump, wi::sext (bump, prec)))
 ?
>>> Probably.  As noted, it's all becoming a bit unreadable with too
>>> much negative logic in a long conditional, so I want to clean that
>>> up in a follow-up.
>>>
 For TYPE_UNSIGNED, do you actually need any restriction?
 What kind of bump values are wrong for unsigned types and why?
>>> If the value of the bump is actually larger than the precision of the
>>> type (not likely except for quite small types), say 2 * (maxval + 1)
>>> which is congruent to 0, the replacement is wrong.
>> Ah, ok.  Anyway, for unsigned type, perhaps it could be written as:
>>  && (TYPE_UNSIGNED (target_type)
>>? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1
>> : bump, wi::zext (bump, prec))
>>: wi::eq_p (bump, wi::sext (bump, prec)))
>> I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1
>> value has no chance to be equal to zero extended bump, and
>> for bump < 0 only that one has a chance.
> Yeah, that's true.  And arguably my case for the really large bump
> causing problems is kind of thin, because the program is probably
> already broken in that case anyway.  But I think I will sleep better
> having the check in there, as somebody other than SLSR will catch
> the bug then. ;-)
>
> Thanks for all the help with this one.  These corner cases are
> always tricky, and I appreciate the extra eyeballs.
>
> Bill
>
>>  Jakub
>>

Re: Handle data dependence relations with different bases

2017-08-03 Thread Richard Sandiford

Ping

Richard Sandiford  writes:
> Richard Sandiford  writes:
>> Eric Botcazou  writes:
>>> [Sorry for missing the previous messages]
>>>
 Thanks.  Just been retesting, and I think I must have forgotten
 to include Ada last time.  It turns out that the patch causes a dg-scan
 regression in gnat.dg/vect17.adb, because we now think that if the
 array RECORD_TYPEs *do* alias in:
 
procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is
begin
   for I in Sarray'Range loop
  R(I) := X(I) + Y(I);
   end loop;
end;
 
 then the dependence distance must be zero.  Eric, does that hold true
 for Ada?  I.e. if X and R (or Y and R) alias, must it be the case that
 X(I) can only alias R(I) and not for example R(I-1) or R(I+1)?
>>>
>>> Yes, I'd think so (even without the artificial RECORD_TYPE around the 
>>> arrays).
>>
>> Good!
>>
 2017-06-07  Richard Sandiford  
 
 gcc/testsuite/
* gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5.
* gnat.dg/vect17.adb (Add): Create a dependence distance of 1
when X = R or Y = R.
>>>
>>> I think that you need to modify vect15 and vect16 the same way.
>>
>> Ah, yeah.  And doing that shows that I'd not handled safelen for
>> DDR_COULD_BE_INDEPENDENT_P.  I've fixed that locally.
>>
>> How does this look?  Tested on x86_64-linux-gnu both without the
>> vectoriser changes and with the fixed vectoriser patch.
>
> Here's a version of the patch that handles safelen.  I split the
> handling out into a new function (vect_analyze_possibly_independent_ddr)
> since it was getting too big to do inline.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> 2017-07-27  Richard Sandiford  
>
> gcc/
>   * tree-data-ref.h (subscript): Add access_fn field.
>   (data_dependence_relation): Add could_be_independent_p.
>   (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
>   (same_access_functions): Move to tree-data-ref.c.
>   * tree-data-ref.c (ref_contains_union_access_p): New function.
>   (access_fn_component_p): Likewise.
>   (access_fn_components_comparable_p): Likewise.
>   (dr_analyze_indices): Add a reference to access_fn_component_p.
>   (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
>   DR_ACCESS_FN.
>   (constant_access_functions): Likewise.
>   (add_other_self_distances): Likewise.
>   (same_access_functions): Likewise.  (Moved from tree-data-ref.h.)
>   (initialize_data_dependence_relation): Use XCNEW and remove
>   explicit zeroing of DDR_REVERSED_P.  Look for a subsequence
>   of access functions that have the same type.  Allow the
>   subsequence to end with different bases in some circumstances.
>   Record the chosen access functions in SUB_ACCESS_FN.
>   (build_classic_dist_vector_1): Replace ddr_a and ddr_b with
>   a_index and b_index.  Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
>   (subscript_dependence_tester_1): Likewise dra and drb.
>   (build_classic_dist_vector): Update calls accordingly.
>   (subscript_dependence_tester): Likewise.
>   * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
>   DDR_COULD_BE_INDEPENDENT_P.
>   * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
>   comp_alias_ddrs instead of may_alias_ddrs.
>   * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr):
>   New function.
>   (vect_analyze_data_ref_dependence): Use it if
>   DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded
>   distance vectors if that fails.
>   (dependence_distance_ge_vf): New function.
>   (vect_prune_runtime_alias_test_list): Use it.  Don't clear
>   LOOP_VINFO_MAY_ALIAS_DDRS.
>
> gcc/testsuite/
>   * gcc.dg/vect/vect-alias-check-3.c: New test.
>   * gcc.dg/vect/vect-alias-check-4.c: Likewise.
>   * gcc.dg/vect/vect-alias-check-5.c: Likewise.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h   2017-07-27 13:10:29.620045506 +0100
> +++ gcc/tree-data-ref.h   2017-07-27 13:10:33.023912613 +0100
> @@ -260,6 +260,9 @@ struct conflict_function
>  
>  struct subscript
>  {
> +  /* The access functions of the two references.  */
> +  tree access_fn[2];
> +
>/* A description of the iterations for which the elements are
>   accessed twice.  */
>conflict_function *conflicting_iterations_in_a;
> @@ -278,6 +281,7 @@ struct subscript
>  
>  typedef struct subscript *subscript_p;
>  
> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
>  #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
>  #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
>  #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
> @@ -333,6 +337,33 @@ struct data_dependence_relation
>/* Set to true when the dep

Re: [PATCH 0/2] Python testcases to check DWARF output

2017-08-03 Thread Mike Stump

On Jul 27, 2017, at 2:07 AM, Pierre-Marie de Rodat  wrote:
> On 07/27/2017 09:52 AM, Richard Biener wrote:
>>> I'm fine with the direction if a reviewer wants to go in that
>>> direction.  I wish python didn't have a built-in speed penalty,
>>> that's the only downside I don't like about it.  Aside from that,
>>> even switching all of the testsuite to be python based isn't a
>>> terrible idea.
>> But is it worse than TCL?

python is likely 2x faster than tcl, if you have one instance per testsuite 
run.  The problem is, for the work required, it's cheaper to do the work once 
to cut over to a new language.  I'd rather switch to some other language that 
can come closer to the speed of compiled C code, yet in the scripting family.  
I don't have a pointer to a better solution than python at this time.  I'd not 
be opposed to switching to python, it should be faster, just as safe, a bit 
easier for new people to code in, and more people who know it and use it.  I 
think python is funner to code in than tcl.  Cutting the entire testsuite over 
at once, might well be more than any one person can contribute, but, we could 
cut over subtrees, as people donate time; if people want to go in that 
direction.  This can't be a 1 person decision, but rather a consensus building 
type decision.  What do others think?

> Good point. Actually for Python there are ways to make it faster. If we can 
> somehow manage to have a limited set of Python interpreter instances (instead 
> of one per test), we could use pypy, which is very good I heard to make long 
> running instances fast.

Yes.  One instance would help ensure the performance is good.  I don't have a 
firm grasp of startup time to know just how critical it is.  Also, I don't have 
a good grasp on memory pressures that python would create, say, compared to how 
we use tcl.

Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)

2017-08-03 Thread Richard Biener

On Thu, 3 Aug 2017, Jeff Law wrote:

> On 08/03/2017 03:32 AM, Jakub Jelinek wrote:
> > On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote:
> >>> The reason why we punt is the unexpected _4 != 1 condition, the code
> >>> is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is 
> >>> equivalent
> >>> to _4 != 1 for boolean type.
> >>
> >> Hmm, I thought we had code to canonicalize boolean compares (but I can't 
> >> find that right now).  Some is in 
> > 
> > I was looking for that too, but didn't find anything that would be done
> > always.
> I can recall looking for that kind of canonicalization as well, but not
> finding it.  Furthermore instrumentation showed the different forms
> could show up in the IL, but forms such as != 1 were relatively rare.

We have foldings that canonicalize to some extent but further 
canonicalization requires switching edges which means even if we'd
do that in CFG cleanup it would be delayed until the next such run.

Richard.

96 matches

Mail list logo