[x86][patch] Fix clwb for skylake

2017-12-11 Thread Koval, Julia
Hi Uros, Kirill,
According to isa-extensions doc CLWB appeared first in Skylake-avx512, but it 
isn't in the PTA. This patch fixes it. Ok for trunk?

Thanks,
Julia


0001-i386.patch
Description: 0001-i386.patch


Re: [x86][patch] Fix clwb for skylake

2017-12-11 Thread Uros Bizjak
On Mon, Dec 11, 2017 at 9:34 AM, Koval, Julia  wrote:
> Hi Uros, Kirill,
> According to isa-extensions doc CLWB appeared first in Skylake-avx512, but it 
> isn't in the PTA. This patch fixes it. Ok for trunk?

Please also include ChangeLog entry in your patch submission.

Uros.


Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Christophe Lyon
On 8 December 2017 at 15:53, Tamar Christina  wrote:
> Hi All,
>
> My previous patch had two issues with the new test cases.
> It seems that depending on which DejaGnu version you have
> dg-additional-options will add the options before or after the
> ones added by the test suite. Which means I can't use it to override
> the default options.
>
> For this I use a pragma now and place the pragma before GCC needs to emit
> any code. Which in turn means it doesn't emit the .fpu directive for the first
> switching of fpus.
>
> Secondly, because of the usage of neon I also need to guard against 
> arm_neon_ok.
>
> Regtested on arm-none-eabi and no regressions.
>
> Ok for trunk?
>
>
> gcc/testsuite/
> 2017-12-08  Tamar Christina  
>
> PR target/82641
> * gcc.target/arm/pragma_fpu_attribute.c: New.
> * gcc.target/arm/pragma_fpu_attribute_2.c: New.
>
> --

Hi Tamar,

We must be testing/building differently, since your patch doesn't work for me.

The compiler complains when including arm_neon.h because:
"NEON intrinsics not available with the soft-float ABI."

I'm using a recent dejagnu (1.6+). and for instance on arm-none-eabi,
the testcase
is compiled with -std=gnu99, but no other ABI-related option. Why does
it work for you?

Christophe


[PATCH] [SPARC] Make sure that jump is to a label in errata workaround

2017-12-11 Thread Daniel Cederman
In some cases the jump could be to a return instruction and in those
cases the next_active_insn() function tries to follow an invalid pointer
which leads to a crash. This error did not manifest when using a 32-bit
version of GCC which is why I did not detect it before. Thanks to Sebastian
for reporting this to me.

gcc/ChangeLog:

2017-12-11  Daniel Cederman  

* config/sparc/sparc.c (sparc_do_work_around_errata): Make sure
the jump is to a label.
---
 gcc/config/sparc/sparc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 00b90e5..b9c8dcc 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1032,6 +1032,7 @@ sparc_do_work_around_errata (void)
 floating-point operation or a floating-point branch.  */
   if (sparc_fix_gr712rc
  && jump
+ && jump_to_label_p (jump)
  && get_attr_branch_type (jump) == BRANCH_TYPE_ICC)
{
  rtx_insn *target = next_active_insn (JUMP_LABEL_AS_INSN (jump));
@@ -1051,7 +1052,7 @@ sparc_do_work_around_errata (void)
  && mem_ref (SET_SRC (set))
  && REG_P (SET_DEST (set)))
{
- if (jump)
+ if (jump && jump_to_label_p (jump))
{
  rtx_insn *target = next_active_insn (JUMP_LABEL_AS_INSN (jump));
  if (target && atomic_insn_for_leon3_p (target))
-- 
2.9.3



Re: [PATCH] Fix cgraph_edge::redirect_call_stmt_to_callee noreturn call handling (PR c++/78692)

2017-12-11 Thread Thomas Schwinge
Hi!

On Wed, 7 Dec 2016 18:28:39 +0100, Jakub Jelinek  wrote:
> The code in this function assumes that lhs is the lhs of new_stmt (it tests
> that new_stmt is a noreturn call etc.), but that is only the case if
> new_stmt == e->call_stmt.  But in the function it can be set to various
> other stmts.  Nothing tests the lhs before this noreturn handling, so this
> patch fixes it by moving the initialization of lhs right before the use.

One year later, as discussed in , backported
to gcc-6-branch in r255538:

commit 57354e3c971f9a17f11a3fd28342eaea50ea0fd3
Author: tschwinge 
Date:   Mon Dec 11 09:49:25 2017 +

[PR c++/83301] cgraph.c segfault

Backport trunk r243377:

gcc/
2016-12-07  Jakub Jelinek  

PR c++/78692
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Set lhs
var to lhs of new_stmt right before noreturn handling rather than to
lhs of e->call_stmt early.

gcc/testsuite/
2016-12-07  Jakub Jelinek  

PR c++/78692
* g++.dg/torture/pr78692.C: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@255538 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |   12 
 gcc/cgraph.c   |2 +-
 gcc/testsuite/ChangeLog|   10 ++
 gcc/testsuite/g++.dg/torture/pr78692.C |   26 ++
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index e586870..35a70d4 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,15 @@
+2017-12-11  Thomas Schwinge  
+
+   PR c++/83301
+
+   Backport trunk r243377:
+   2016-12-07  Jakub Jelinek  
+
+   PR c++/78692
+   * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Set lhs
+   var to lhs of new_stmt right before noreturn handling rather than to
+   lhs of e->call_stmt early.
+
 2017-12-04  Sebastian Peryt  
H.J. Lu  
 
diff --git gcc/cgraph.c gcc/cgraph.c
index 6ff8f26..0c9d969 100644
--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -1259,7 +1259,6 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
   cgraph_edge *e = this;
 
   tree decl = gimple_call_fndecl (e->call_stmt);
-  tree lhs = gimple_call_lhs (e->call_stmt);
   gcall *new_stmt;
   gimple_stmt_iterator gsi;
   bool skip_bounds = false;
@@ -1529,6 +1528,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
 gimple_call_set_fntype (new_stmt, TREE_TYPE (e->callee->decl));
 
   /* If the call becomes noreturn, remove the LHS if possible.  */
+  tree lhs = gimple_call_lhs (new_stmt);
   if (lhs
   && (gimple_call_flags (new_stmt) & ECF_NORETURN)
   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (new_stmt)))
diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 6a1b459..0fae4dc 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,13 @@
+2017-12-11  Thomas Schwinge  
+
+   PR c++/83301
+
+   Backport trunk r243377:
+   2016-12-07  Jakub Jelinek  
+
+   PR c++/78692
+   * g++.dg/torture/pr78692.C: New test.
+
 2017-12-04  Sebastian Peryt  
H.J. Lu  
 
diff --git gcc/testsuite/g++.dg/torture/pr78692.C 
gcc/testsuite/g++.dg/torture/pr78692.C
new file mode 100644
index 000..57a0d2f
--- /dev/null
+++ gcc/testsuite/g++.dg/torture/pr78692.C
@@ -0,0 +1,26 @@
+// PR c++/78692
+
+int a;
+void *b;
+extern "C" {
+struct C {
+  virtual int d ();
+};
+struct E {
+  virtual int operator () (int, const void *, int) = 0;
+};
+class F {
+  int g ();
+  int h;
+  E &i;
+};
+struct : C, E {
+  int operator () (int, const void *, int) { throw int(); }
+} j;
+
+int
+F::g ()
+{
+  a = i (h, b, 0);
+}
+}


Grüße
 Thomas


Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-11 Thread Paolo Carlini

Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:

Hi.

This patch intends to fix Bug 83237 - Values returned by 
std::poisson_distribution are not distributed correctly.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue 
description and tests.
In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be pretty 
easy to add one, because all the infrastructure is already in place, 
essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.


Paolo.


Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-11 Thread Paolo Carlini

Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:

Hi.

This patch intends to fix Bug 83237 - Values returned by 
std::poisson_distribution are not distributed correctly.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue 
description and tests.
In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be pretty 
easy to add one, because all the infrastructure is already in place, 
essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.


Paolo.


RE: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Tamar Christina
Hi Christoph,

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Monday, December 11, 2017 09:02
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c
> 
> On 8 December 2017 at 15:53, Tamar Christina 
> wrote:
> > Hi All,
> >
> > My previous patch had two issues with the new test cases.
> > It seems that depending on which DejaGnu version you have
> > dg-additional-options will add the options before or after the ones
> > added by the test suite. Which means I can't use it to override the
> > default options.
> >
> > For this I use a pragma now and place the pragma before GCC needs to
> > emit any code. Which in turn means it doesn't emit the .fpu directive
> > for the first switching of fpus.
> >
> > Secondly, because of the usage of neon I also need to guard against
> arm_neon_ok.
> >
> > Regtested on arm-none-eabi and no regressions.
> >
> > Ok for trunk?
> >
> >
> > gcc/testsuite/
> > 2017-12-08  Tamar Christina  
> >
> > PR target/82641
> > * gcc.target/arm/pragma_fpu_attribute.c: New.
> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
> >
> > --
> 
> Hi Tamar,
> 
> We must be testing/building differently, since your patch doesn't work for
> me.
> 
> The compiler complains when including arm_neon.h because:
> "NEON intrinsics not available with the soft-float ABI."
> 
> I'm using a recent dejagnu (1.6+). and for instance on arm-none-eabi, the
> testcase is compiled with -std=gnu99, but no other ABI-related option. Why
> does it work for you?

This is a good question, it also works on our internal overnight testing 
infrastructure.
At least the neon bit, it was the reason I noticed the discrepancy with the 
Dejagnu versions.

It also works when I build natively using just configure && make. Could be 
something in the configure flags.
Looking back at it, if the vanilla compiler doesn't support neon I can see the 
test failing. But fixing it means
Turning on neon and then turning it off after the include. Which makes the test 
do too many things.

I will try to think of  a testcase that doesn't require neon, if I can't I'll 
just remove the tests.
They weren't being tested before and if there's no way to reliably test 
changing fpu options on ARM
Then there's no point having them.

Thanks,
Tamar

> 
> Christophe


Re: [PATCH] [SPARC] Make sure that jump is to a label in errata workaround

2017-12-11 Thread Eric Botcazou
> 2017-12-11  Daniel Cederman  
> 
>   * config/sparc/sparc.c (sparc_do_work_around_errata): Make sure
>   the jump is to a label.

OK for mainline and 7 branch, thanks.

-- 
Eric Botcazou


Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p

2017-12-11 Thread Eric Botcazou
> When looking at the rs6000_store_data_bypass_p stuff, I've noticed that
> it accepts PARALLELs containing not just SETs and CLOBBERs like
> store_data_bypass_p, but also USEs.  Given that it is something that
> single_set also ignores, I think fixing store_data_bypass_p is the
> right fix here.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Note, the patch is larger due to the formatting fixes, the actual changes
> are just 3x
> - if (GET_CODE (xxx) == CLOBBER)
> + if (GET_CODE (xxx) == CLOBBER || GET_CODE (xxx) == USE)

Couldn't the code be also re-factored?  Because 3x the same change is a lot...

> 2017-12-06  Jakub Jelinek  
> 
>   * recog.c (store_data_bypass_p): Handle USE in a PARALLEL
>   like CLOBBER.  Formatting fixes.

OK for mainline modulo the above remark.

-- 
Eric Botcazou


Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 11:45:25AM +0100, Eric Botcazou wrote:
> > When looking at the rs6000_store_data_bypass_p stuff, I've noticed that
> > it accepts PARALLELs containing not just SETs and CLOBBERs like
> > store_data_bypass_p, but also USEs.  Given that it is something that
> > single_set also ignores, I think fixing store_data_bypass_p is the
> > right fix here.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Note, the patch is larger due to the formatting fixes, the actual changes
> > are just 3x
> > - if (GET_CODE (xxx) == CLOBBER)
> > + if (GET_CODE (xxx) == CLOBBER || GET_CODE (xxx) == USE)
> 
> Couldn't the code be also re-factored?  Because 3x the same change is a lot...

Is that long enough to be worth it?  I mean, in all other places (rtlanal.c,
recog.c, ...) we use similar code in all spots where it is needed, adding
an inline would just mean yet another thing to remember.  Or do you mean
CLOBBER_OR_USE_P macro?  If so, we'd need to adjust all spots to use it, not
just these 3.

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Kilian Verhetsel

Hello,

Jakub Jelinek  writes:
> As it doesn't apply, I can't easily check what the patch generates
> on the PR80631 testcases vs. my thoughts on that; though, if it emits
> something more complicated for the simple cases, perhaps we could improve
> those (not handle it like COND_REDUCTION, but instead as former
> INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> if 0 isn't usable for the condition never matched value.

While you could use values different from 0, I'm not sure this can be
done as efficiently.  0 is convenient because a single bitwise-and
between the index vector and the condition will set lanes that do not
contain a match to 0.

Jakub Jelinek  writes:
> First of all, I fail to see why we don't handle negative step,
> that can be done with REDUC_MIN instead of REDUC_MAX.

That would not work without first using values different from 0 to
indicate the absence of matches (except in cases where all indices are
negative), which I assume is why the test was there in the first place.

Below is the patch with fixed formatting and changes to make it apply
cleanly to r255537 (as far as I can tell this was simply caused by some
variables and constants having been renamed).

2017-11-21  Kilian Verhetsel  

gcc/ChangeLog:
PR testsuite/81179
* tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the
returned value for INTEGER_INDUC_COND_REDUCTION whose last match
occurred at index 0.
(vectorizable_reduction): For
INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets
the induction variable to the code generating the epilogue and
check that no overflow will occur.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
overflows when compiling conditional reductions.
* gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c	(revision 255537)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4325,7 +4325,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
-  gimple *reduc_def_stmt,
+  gimple *reduc_def_stmt, gimple *induct_stmt,
   int ncopies, internal_fn reduc_fn,
   vec reduction_phis,
   bool double_reduc, 
@@ -4486,7 +4486,9 @@ vect_create_epilog_for_reduction (vec vect_d
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
  zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	  == INTEGER_INDUC_COND_REDUCTIO

Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p

2017-12-11 Thread Eric Botcazou
> Is that long enough to be worth it?  I mean, in all other places (rtlanal.c,
> recog.c, ...) we use similar code in all spots where it is needed, adding
> an inline would just mean yet another thing to remember.  Or do you mean
> CLOBBER_OR_USE_P macro?

No, the whole function, it seems to duplicate everything in the 2 main arms.

-- 
Eric Botcazou


Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Christophe Lyon
On 11 December 2017 at 11:35, Tamar Christina  wrote:
> Hi Christoph,
>
>> -Original Message-
>> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
>> Sent: Monday, December 11, 2017 09:02
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
>> ; Richard Earnshaw
>> ; ni...@redhat.com; Kyrylo Tkachov
>> 
>> Subject: Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c
>>
>> On 8 December 2017 at 15:53, Tamar Christina 
>> wrote:
>> > Hi All,
>> >
>> > My previous patch had two issues with the new test cases.
>> > It seems that depending on which DejaGnu version you have
>> > dg-additional-options will add the options before or after the ones
>> > added by the test suite. Which means I can't use it to override the
>> > default options.
>> >
>> > For this I use a pragma now and place the pragma before GCC needs to
>> > emit any code. Which in turn means it doesn't emit the .fpu directive
>> > for the first switching of fpus.
>> >
>> > Secondly, because of the usage of neon I also need to guard against
>> arm_neon_ok.
>> >
>> > Regtested on arm-none-eabi and no regressions.
>> >
>> > Ok for trunk?
>> >
>> >
>> > gcc/testsuite/
>> > 2017-12-08  Tamar Christina  
>> >
>> > PR target/82641
>> > * gcc.target/arm/pragma_fpu_attribute.c: New.
>> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
>> >
>> > --
>>
>> Hi Tamar,
>>
>> We must be testing/building differently, since your patch doesn't work for
>> me.
>>
>> The compiler complains when including arm_neon.h because:
>> "NEON intrinsics not available with the soft-float ABI."
>>
>> I'm using a recent dejagnu (1.6+). and for instance on arm-none-eabi, the
>> testcase is compiled with -std=gnu99, but no other ABI-related option. Why
>> does it work for you?
>
> This is a good question, it also works on our internal overnight testing 
> infrastructure.
> At least the neon bit, it was the reason I noticed the discrepancy with the 
> Dejagnu versions.
>
> It also works when I build natively using just configure && make. Could be 
> something in the configure flags.
> Looking back at it, if the vanilla compiler doesn't support neon I can see 
> the test failing. But fixing it means
> Turning on neon and then turning it off after the include. Which makes the 
> test do too many things.

What are your configure flags?
Can you can&paste the command line used to compile the testcase (from gcc.log) ?

Thanks

>
> I will try to think of  a testcase that doesn't require neon, if I can't I'll 
> just remove the tests.
> They weren't being tested before and if there's no way to reliably test 
> changing fpu options on ARM
> Then there's no point having them.
>

Yes, that's becoming way too complex for the purpose :(

> Thanks,
> Tamar
>
>>
>> Christophe


RE: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Tamar Christina


> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Monday, December 11, 2017 11:24
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c
> 
> On 11 December 2017 at 11:35, Tamar Christina 
> wrote:
> > Hi Christoph,
> >
> >> -Original Message-
> >> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> >> Sent: Monday, December 11, 2017 09:02
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> >> ; Richard Earnshaw
> >> ; ni...@redhat.com; Kyrylo Tkachov
> >> 
> >> Subject: Re: [PATCH][GCC][ARM] Fix failing testcase
> >> pragma_fpu_attribute.c
> >>
> >> On 8 December 2017 at 15:53, Tamar Christina
> >> 
> >> wrote:
> >> > Hi All,
> >> >
> >> > My previous patch had two issues with the new test cases.
> >> > It seems that depending on which DejaGnu version you have
> >> > dg-additional-options will add the options before or after the ones
> >> > added by the test suite. Which means I can't use it to override the
> >> > default options.
> >> >
> >> > For this I use a pragma now and place the pragma before GCC needs
> >> > to emit any code. Which in turn means it doesn't emit the .fpu
> >> > directive for the first switching of fpus.
> >> >
> >> > Secondly, because of the usage of neon I also need to guard against
> >> arm_neon_ok.
> >> >
> >> > Regtested on arm-none-eabi and no regressions.
> >> >
> >> > Ok for trunk?
> >> >
> >> >
> >> > gcc/testsuite/
> >> > 2017-12-08  Tamar Christina  
> >> >
> >> > PR target/82641
> >> > * gcc.target/arm/pragma_fpu_attribute.c: New.
> >> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
> >> >
> >> > --
> >>
> >> Hi Tamar,
> >>
> >> We must be testing/building differently, since your patch doesn't
> >> work for me.
> >>
> >> The compiler complains when including arm_neon.h because:
> >> "NEON intrinsics not available with the soft-float ABI."
> >>
> >> I'm using a recent dejagnu (1.6+). and for instance on arm-none-eabi,
> >> the testcase is compiled with -std=gnu99, but no other ABI-related
> >> option. Why does it work for you?
> >
> > This is a good question, it also works on our internal overnight testing
> infrastructure.
> > At least the neon bit, it was the reason I noticed the discrepancy with the
> Dejagnu versions.
> >
> > It also works when I build natively using just configure && make. Could be
> something in the configure flags.
> > Looking back at it, if the vanilla compiler doesn't support neon I can
> > see the test failing. But fixing it means Turning on neon and then turning 
> > it
> off after the include. Which makes the test do too many things.
> 
> What are your configure flags?
> Can you can&paste the command line used to compile the testcase (from
> gcc.log) ?

They are:

Schedule of variations:
arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp

arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard


/build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-eabi/obj/gcc2/gcc/ 
/src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c -marm 
-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-caret 
-fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-objects 
-S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o 
pragma_fpu_attribute_2.s

/build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-eabi/obj/gcc2/gcc/ 
/src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -marm 
-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-caret 
-fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-objects 
-S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o 
pragma_fpu_attribute.s

/build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-eabi/obj/gcc2/gcc/ 
/src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c  -mthumb 
-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard   
-fno-diagnostics-show-caret -fdiagnostics-color=never  -ansi -pedantic-errors 
-std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs 
-Wa,-mno-warn-deprecated   -o pragma_fpu_attribute_2.s

/build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-eabi/obj/gcc2/gcc/ 
/src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -mthumb 
-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard 
-fno-diagnostics-show-caret -fdiagnostics-color=never -ansi -pedantic-errors 
-std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs 
-Wa,-mno-warn-deprecated -o pragma_fpu_attribute.s

It's also weird that you only see one of the testcases failing.
The pragma_fpu_attribute.c and pragma_fpu_attribute_2.c should have the exact 
same issues.

> 
> Thanks
> 
> >
> > I will try to think of  a testcase that doesn't require neon, if I

[PATCH] Allow USE in PARALLELs in store_data_bypass_p (take 2)

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 12:09:14PM +0100, Eric Botcazou wrote:
> > Is that long enough to be worth it?  I mean, in all other places (rtlanal.c,
> > recog.c, ...) we use similar code in all spots where it is needed, adding
> > an inline would just mean yet another thing to remember.  Or do you mean
> > CLOBBER_OR_USE_P macro?
> 
> No, the whole function, it seems to duplicate everything in the 2 main arms.

Ah, that makes a lot of sense.  So like this?

2017-12-11  Jakub Jelinek  

* recog.c (store_data_bypass_p_1): New function.
(store_data_bypass_p): Handle USE in a PARALLEL like CLOBBER.  Use
store_data_bypass_p_1 to avoid code duplication.  Formatting fixes.

--- gcc/recog.c.jj  2017-12-06 16:32:53.605660887 +0100
+++ gcc/recog.c 2017-12-11 12:49:05.350575530 +0100
@@ -3657,93 +3657,70 @@ peephole2_optimize (void)
 
 /* Common predicates for use with define_bypass.  */
 
-/* True if the dependency between OUT_INSN and IN_INSN is on the store
-   data not the address operand(s) of the store.  IN_INSN and OUT_INSN
-   must be either a single_set or a PARALLEL with SETs inside.  */
+/* Helper function for store_data_bypass_p, handle just a single SET
+   IN_SET.  */
 
-int
-store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn)
+static bool
+store_data_bypass_p_1 (rtx_insn *out_insn, rtx in_set)
 {
-  rtx out_set, in_set;
-  rtx out_pat, in_pat;
-  rtx out_exp, in_exp;
-  int i, j;
+  if (!MEM_P (SET_DEST (in_set)))
+return false;
 
-  in_set = single_set (in_insn);
-  if (in_set)
+  rtx out_set = single_set (out_insn);
+  if (out_set)
 {
-  if (!MEM_P (SET_DEST (in_set)))
+  if (reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set)))
return false;
-
-  out_set = single_set (out_insn);
-  if (out_set)
-{
-  if (reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set)))
-return false;
-}
-  else
-{
-  out_pat = PATTERN (out_insn);
-
- if (GET_CODE (out_pat) != PARALLEL)
-   return false;
-
-  for (i = 0; i < XVECLEN (out_pat, 0); i++)
-  {
-out_exp = XVECEXP (out_pat, 0, i);
-
-if (GET_CODE (out_exp) == CLOBBER)
-  continue;
-
-gcc_assert (GET_CODE (out_exp) == SET);
-
-if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set)))
-  return false;
-  }
-  }
 }
   else
 {
-  in_pat = PATTERN (in_insn);
-  gcc_assert (GET_CODE (in_pat) == PARALLEL);
+  rtx out_pat = PATTERN (out_insn);
+
+  if (GET_CODE (out_pat) != PARALLEL)
+   return false;
 
-  for (i = 0; i < XVECLEN (in_pat, 0); i++)
+  for (int i = 0; i < XVECLEN (out_pat, 0); i++)
{
- in_exp = XVECEXP (in_pat, 0, i);
+ rtx out_exp = XVECEXP (out_pat, 0, i);
 
- if (GET_CODE (in_exp) == CLOBBER)
+ if (GET_CODE (out_exp) == CLOBBER || GET_CODE (out_exp) == USE)
continue;
 
- gcc_assert (GET_CODE (in_exp) == SET);
+ gcc_assert (GET_CODE (out_exp) == SET);
 
- if (!MEM_P (SET_DEST (in_exp)))
+ if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set)))
return false;
+   }
+}
+
+  return true;
+}
 
-  out_set = single_set (out_insn);
-  if (out_set)
-{
-  if (reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_exp)))
-return false;
-}
-  else
-{
-  out_pat = PATTERN (out_insn);
-  gcc_assert (GET_CODE (out_pat) == PARALLEL);
-
-  for (j = 0; j < XVECLEN (out_pat, 0); j++)
-{
-  out_exp = XVECEXP (out_pat, 0, j);
-
-  if (GET_CODE (out_exp) == CLOBBER)
-continue;
-
-  gcc_assert (GET_CODE (out_exp) == SET);
-
-  if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_exp)))
-return false;
-}
-}
-}
+/* True if the dependency between OUT_INSN and IN_INSN is on the store
+   data not the address operand(s) of the store.  IN_INSN and OUT_INSN
+   must be either a single_set or a PARALLEL with SETs inside.  */
+
+int
+store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn)
+{
+  rtx in_set = single_set (in_insn);
+  if (in_set)
+return store_data_bypass_p_1 (out_insn, in_set);
+
+  rtx in_pat = PATTERN (in_insn);
+  gcc_assert (GET_CODE (in_pat) == PARALLEL);
+
+  for (int i = 0; i < XVECLEN (in_pat, 0); i++)
+{
+  rtx in_exp = XVECEXP (in_pat, 0, i);
+
+  if (GET_CODE (in_exp) == CLOBBER || GET_CODE (in_exp) == USE)
+   continue;
+
+  gcc_assert (GET_CODE (in_exp) == SET);
+
+  if (!store_data_bypass_p_1 (out_insn, in_exp))
+   return false;
 }
 
   return true;


Jakub


RE: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Tamar Christina
> > >
> > > It also works when I build natively using just configure && make.
> > > Could be
> > something in the configure flags.
> > > Looking back at it, if the vanilla compiler doesn't support neon I
> > > can see the test failing. But fixing it means Turning on neon and
> > > then turning it
> > off after the include. Which makes the test do too many things.
> >
> > What are your configure flags?
> > Can you can&paste the command line used to compile the testcase (from
> > gcc.log) ?
> 

Ah, Richard pointed out to me that the difference is in "soft" abi, I was only 
testing 
Softfp and hard. I'll write a new testcase that should work for all.

Thanks

> They are:
> 
> Schedule of variations:
> arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-
> abi=softfp
> arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-
> mfloat-abi=hard
> 
> 
> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> eabi/obj/gcc2/gcc/
> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c -marm -
> march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-
> caret -fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-
> objects -S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
> pragma_fpu_attribute_2.s
> 
> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> eabi/obj/gcc2/gcc/
> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -marm -
> march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-
> caret -fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-
> objects -S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
> pragma_fpu_attribute.s
> 
> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> eabi/obj/gcc2/gcc/
> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c  -mthumb -
> march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard   -fno-
> diagnostics-show-caret -fdiagnostics-color=never  -ansi -pedantic-errors -
> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs -Wa,-mno-
> warn-deprecated   -o pragma_fpu_attribute_2.s
> 
> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> eabi/obj/gcc2/gcc/
> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -mthumb -
> march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -fno-
> diagnostics-show-caret -fdiagnostics-color=never -ansi -pedantic-errors -
> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs -Wa,-mno-
> warn-deprecated -o pragma_fpu_attribute.s
> 
> It's also weird that you only see one of the testcases failing.
> The pragma_fpu_attribute.c and pragma_fpu_attribute_2.c should have the
> exact same issues.
> 
> >
> > Thanks
> >
> > >
> > > I will try to think of  a testcase that doesn't require neon, if I
> > > can't I'll just
> > remove the tests.
> > > They weren't being tested before and if there's no way to reliably
> > > test changing fpu options on ARM Then there's no point having them.
> > >
> >
> > Yes, that's becoming way too complex for the purpose :(
> 
> I think I can do one using the fmla instructions. So will try that next.
> 
> >
> > > Thanks,
> > > Tamar
> > >
> > >>
> > >> Christophe


Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-11 Thread Christophe Lyon
On 11 December 2017 at 12:56, Tamar Christina  wrote:
>> > >
>> > > It also works when I build natively using just configure && make.
>> > > Could be
>> > something in the configure flags.
>> > > Looking back at it, if the vanilla compiler doesn't support neon I
>> > > can see the test failing. But fixing it means Turning on neon and
>> > > then turning it
>> > off after the include. Which makes the test do too many things.
>> >
>> > What are your configure flags?
>> > Can you can&paste the command line used to compile the testcase (from
>> > gcc.log) ?
>>
>
> Ah, Richard pointed out to me that the difference is in "soft" abi, I was 
> only testing
> Softfp and hard. I'll write a new testcase that should work for all.
>

Indeed, you override the float-abi flags in your RUNTESTFLAGS, which
I'm not doing.

I think your arm-none-eabi builds have soft, softfp and hard multilibs?

With arm-none-linux-gnueabi[hf], you cannot override float-abi as easily,
see for instance:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02323.html

Thanks,

Christophe

> Thanks
>
>> They are:
>>
>> Schedule of variations:
>> arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-
>> abi=softfp
>> arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-
>> mfloat-abi=hard
>>
>>
>> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
>> eabi/obj/gcc2/gcc/
>> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c -marm -
>> march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-
>> caret -fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-
>> objects -S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
>> pragma_fpu_attribute_2.s
>>
>> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
>> eabi/obj/gcc2/gcc/
>> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -marm -
>> march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -fno-diagnostics-show-
>> caret -fdiagnostics-color=never -ansi -pedantic-errors -std=gnu99 -ffat-lto-
>> objects -S -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
>> pragma_fpu_attribute.s
>>
>> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
>> eabi/obj/gcc2/gcc/
>> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c  -mthumb -
>> march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard   -fno-
>> diagnostics-show-caret -fdiagnostics-color=never  -ansi -pedantic-errors -
>> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs -Wa,-mno-
>> warn-deprecated   -o pragma_fpu_attribute_2.s
>>
>> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
>> eabi/obj/gcc2/gcc/
>> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -mthumb -
>> march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -fno-
>> diagnostics-show-caret -fdiagnostics-color=never -ansi -pedantic-errors -
>> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs -Wa,-mno-
>> warn-deprecated -o pragma_fpu_attribute.s
>>
>> It's also weird that you only see one of the testcases failing.
>> The pragma_fpu_attribute.c and pragma_fpu_attribute_2.c should have the
>> exact same issues.
>>
>> >
>> > Thanks
>> >
>> > >
>> > > I will try to think of  a testcase that doesn't require neon, if I
>> > > can't I'll just
>> > remove the tests.
>> > > They weren't being tested before and if there's no way to reliably
>> > > test changing fpu options on ARM Then there's no point having them.
>> > >
>> >
>> > Yes, that's becoming way too complex for the purpose :(
>>
>> I think I can do one using the fmla instructions. So will try that next.
>>
>> >
>> > > Thanks,
>> > > Tamar
>> > >
>> > >>
>> > >> Christophe


Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p (take 2)

2017-12-11 Thread Eric Botcazou
> Ah, that makes a lot of sense.  So like this?
> 
> 2017-12-11  Jakub Jelinek  
> 
>   * recog.c (store_data_bypass_p_1): New function.
>   (store_data_bypass_p): Handle USE in a PARALLEL like CLOBBER.  Use
>   store_data_bypass_p_1 to avoid code duplication.  Formatting fixes.

Yes, but I think that you can further simplify the first function:

  rtx out_set = single_set (out_insn);
  if (out_set)
return !reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set)));

I also wonder why we have a test on PARALLEL in the first one and an assertion 
on the same PARALLEL in the second one.

No big deal in either case so your call for the definitive version.

-- 
Eric Botcazou


Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p (take 2)

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 01:26:42PM +0100, Eric Botcazou wrote:
> > Ah, that makes a lot of sense.  So like this?
> > 
> > 2017-12-11  Jakub Jelinek  
> > 
> > * recog.c (store_data_bypass_p_1): New function.
> > (store_data_bypass_p): Handle USE in a PARALLEL like CLOBBER.  Use
> > store_data_bypass_p_1 to avoid code duplication.  Formatting fixes.
> 
> Yes, but I think that you can further simplify the first function:
> 
>   rtx out_set = single_set (out_insn);
>   if (out_set)
> return !reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set)));

Ok.

> I also wonder why we have a test on PARALLEL in the first one and an 
> assertion 
> on the same PARALLEL in the second one.

The old code was inconsistent, had return false; in one case and assert in
the remaining two spots.  If you are not against it, I'd use return false; in 
both
cases if we want consistency.

> No big deal in either case so your call for the definitive version.

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek  writes:
> > As it doesn't apply, I can't easily check what the patch generates
> > on the PR80631 testcases vs. my thoughts on that; though, if it emits
> > something more complicated for the simple cases, perhaps we could improve
> > those (not handle it like COND_REDUCTION, but instead as former
> > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> > if 0 isn't usable for the condition never matched value.
> 
> While you could use values different from 0, I'm not sure this can be
> done as efficiently.  0 is convenient because a single bitwise-and
> between the index vector and the condition will set lanes that do not
> contain a match to 0.

Of course it can be done efficiently, what we care most is that the body of
the vectorized loop is efficient.  Whether we choose -1, 0 or 124 as the
COND_EXPR not ever meant value matters only before that loop (when we need
to load that into a register holding vector of all those constants) and
then a scalar comparison on the REDUC_* result.  Load of -1 vector on some 
targets
is as expensive as load of 0, for arbitrary value worst case it is one
memory load compared to a specialized zero register (or set all bits)
instruction.  On the other side, by not using any offsetted iteration var,
one can reuse the vector register that holds the IV, which can be used in
some loops too and thus decrease register pressure.
And while comparison against 0 is sometimes one scalar insn
cheaper than comparison against other value, if the insn producing it
already sets the flags, I doubt it is the case here, so it is exactly the
same cost.  Not to mention that in your patch you are instead subtracting
one in the scalar code.

> Jakub Jelinek  writes:
> > First of all, I fail to see why we don't handle negative step,
> > that can be done with REDUC_MIN instead of REDUC_MAX.
> 
> That would not work without first using values different from 0 to
> indicate the absence of matches (except in cases where all indices are
> negative), which I assume is why the test was there in the first place.
> 
> Below is the patch with fixed formatting and changes to make it apply
> cleanly to r255537 (as far as I can tell this was simply caused by some
> variables and constants having been renamed).

Thanks, it applies cleanly now
> +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION))
> +&& reduc_fn == IFN_LAST)»

contains a character at the end of line that makes it not to compile.

Trying to understand your patch, here is the difference with your patch
between additional:
--- tree-vect-loop.c2017-12-11 13:39:35.619122907 +0100
+++ tree-vect-loop.c2017-12-11 13:35:27.0 +0100
@@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi
dump_printf_loc (MSG_NOTE, vect_location,
 "condition expression based on "
 "integer induction.\n");
- STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-   = INTEGER_INDUC_COND_REDUCTION;
+/*   STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   = INTEGER_INDUC_COND_REDUCTION; */
}
so that COND_REDUCTION is used, and the case with
INTEGER_INDUC_COND_REDUCTION with your patch on:
int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 256; k++)
if (v[k] == 77)
  r = k;
  if (r != 0)
__builtin_abort ();
}

   vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
   vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
+  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
 
[local count: 139586436]:
   # k_12 = PHI 
   # r_13 = PHI 
   # ivtmp_11 = PHI 
   # vect_vec_iv_.0_22 = PHI 
-  # vect_r_3.1_24 = PHI 
+  # vect_r_3.1_24 = PHI 
   # vectp_v.2_25 = PHI 
-  # ivtmp_30 = PHI 
-  # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
-  # ivtmp_43 = PHI 
+  # ivtmp_31 = PHI 
+  # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
+  # ivtmp_41 = PHI 
   vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
   vect__1.4_27 = MEM[(int *)vectp_v.2_25];
   _1 = v[k_12];
   vect_r_3.6_29 = VEC_COND_EXPR ;
   r_3 = _1 == 77 ? k_12 : r_13;
   k_8 = k_12 + 1;
   ivtmp_2 = ivtmp_11 - 1;
   vectp_v.2_26 = vectp_v.2_25 + 32;
-  _33 = VEC_COND_EXPR ;
-  ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 };
-  ivtmp_44 = ivtmp_43 + 1;
-  if (ivtmp_44 < 32)
+  _34 = VEC_COND_EXPR ;
+  ivtmp_32 = ivtmp_31 + { 8, 8, 8, 8, 8, 8, 8, 8 };
+  ivtmp_42 = ivtmp_41 + 1;
+  if (ivtmp_42 < 32)
 goto ; [92.31%]
   else
 goto ; [7.69%]

...
[local count: 10737418]:
   # r_19 = PHI 
-  # vect_r_3.6_34 = PHI 
-  # _45 = PHI <_33(3)>
-  _35 = REDUC_MAX (_45);
-  _36 = {_35, _35, _35, _35, _35, _35, _35, _35};
-  _37 = { 0, 0, 0, 0, 0, 0, 0, 0 };
-  _38 = _45 == _36;
-  _39 = VEC_COND_E

[PATCH][AArch64] Specify fp16 support for Cortex-A55 and Cortex-A75

2017-12-11 Thread Kyrill Tkachov

Hi all,

The Cortex-A55 and Cortex-A75 processors support the fp16 extension.
We already specify them as such in the arm port.
This patch makes aarch64 consistent on this front.

Bootstrapped and tested on aarch64-none-linux-gnu.
Manually checked that compiling with aarch64-none-linux-gnu-gcc 
-mcpu=cortex-a55 -dM -E - < /dev/null
shows __ARM_FEATURE_FP16_VECTOR_ARITHMETIC and 
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC being specified

as expected whereas they were not before this patch.

Ok for trunk?

Thanks,
Kyrill

2017-12-11  Kyrylo Tkachov  

* config/aarch64/aarch64-cores.def (cortex-a55, cortex-a75,
cortex-a75.cortex-a55): Specify AARCH64_FL_F16 in the arch features.
commit e9148d4af145bcd094dddf1b23fdaa3b4c1a95b5
Author: Kyrylo Tkachov 
Date:   Wed Dec 6 16:07:05 2017 +

[AArch64] Specify fp16 support for Cortex-A55 and Cortex-A75

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index cdf047c..fa08cdf 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -83,8 +83,8 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR
 /* ARMv8.2-A Architecture Processors.  */
 
 /* ARM ('A') cores. */
-AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
-AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
+AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
+AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 
 /* ARMv8.3-A Architecture Processors.  */
 
@@ -100,6 +100,6 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  AARCH
 
 /* ARM DynamIQ big.LITTLE configurations.  */
 
-AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
+AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
 
 #undef AARCH64_CORE


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 02:11:34PM +0100, Jakub Jelinek wrote:
> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +   || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +   == INTEGER_INDUC_COND_REDUCTION))
> > +  && reduc_fn == IFN_LAST)»
> 
> contains a character at the end of line that makes it not to compile.

Another thing is, as your patch is quite large, we need a copyright
assignment for the changes before we can accept it, see
https://gcc.gnu.org/contribute.html for details.

If you are already covered by an assignment of some company, please tell
us which one it is, otherwise contact us and we'll get you the needed
forms.

Jakub


Re: [patch, fortran] Implement maxval for characters

2017-12-11 Thread James Greenhalgh
On Wed, Dec 06, 2017 at 11:38:21AM +, Christophe Lyon wrote:
> Hi,
> 
> 
> On 28 November 2017 at 19:40, Thomas Koenig  wrote:
> > Hello world,
> >
> > the attached patch implements maxval for characters, an F2003 feature
> > that we were missing up to now.
> >
> > Regression-tested on x86_64-pc-linux-gnu.
> >
> > OK for trunk?
> >
> > Regards
> >
> > Thomas
> >
> > 2017-11-28  Thomas Koenig  
> >
> > PR fortran/36313
> > * check.c (gfc_check_minval_maxval): Use
> > int_orLreal_or_char_check_f2003 for array argument.
> > * iresolve.c (gfc_resolve_maxval): Insert number in
> > function name for character arguments.
> > (gfc_resolve_minval): Likewise.
> > * trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc):
> > Fix comment.
> > (gfc_conv_intrinsic_minmaxval): Resort arguments and call library
> > function if dealing with a character function.
> >
> > 2017-11-28  Thomas Koenig  
> >
> > PR fortran/36313
> > * Makefile.am: Add new files for character-valued
> > maxval and minval.
> > * Makefile.in: Regenerated.
> > * gfortran.map: Add new functions.
> > * m4/iforeach-s2.m4: New file.
> > * m4/ifunction-s2.m4: New file.
> > * m4/iparm.m4: Add intitval for minval and maxval.
> > * m4/maxval0s.m4: New file.
> > * m4/maxval1s.m4: New file.
> > * m4/minval0s.m4: New file.
> > * m4/minval1s.m4: New file.
> > * generated/maxval0_s1.c: New file.
> > * generated/maxval0_s4.c: New file.
> > * generated/maxval1_s1.c: New file.
> > * generated/maxval1_s4.c: New file.
> > * generated/minval0_s1.c: New file.
> > * generated/minval0_s4.c: New file.
> > * generated/minval1_s1.c: New file.
> > * generated/minval1_s4.c: New file.
> >
> > 2017-11-28  Thomas Koenig  
> >
> > PR fortran/36313
> > * gfortran.dg/maxval_char_1.f90: New test.
> > * gfortran.dg/maxval_char_2.f90: New test.
> > * gfortran.dg/maxval_char_3.f90: New test.
> > * gfortran.dg/maxval_char_4.f90: New test.
> > * gfortran.dg/minval_char_1.f90: New test.
> > * gfortran.dg/minval_char_2.f90: New test.
> > * gfortran.dg/minval_char_3.f90: New test.
> > * gfortran.dg/minval_char_4.f90: New test.
> 
> Hi,
> In my testing I'm seeing random results with at least some of these new tests
> (maxval_char_1, maxval_char_2, minval_char_2 at least).
> I'm cross-testing on arm targets using qemu.
> 
> Sorry, I don't really read fortran, so a first obvious question: is
> there anything
> undefined/random/race condition in these tests?
> My logs only show that the program aborted, so it doesn't seem the process
> was killed by a timeout or similar.

I'm also seeing these tests fail sporadically on x86_64-none-linux-gnu and
aarch64-none-linux-gnu.

I also struggle with reading fortran, one abort which triggers for me in
maxval_char_1.f90 is

  if (res /= maxval(b, mask)) call abort

I think we're getting in to trouble when the mask comes out as all False,
that seems to give a different behaviour between this call:

  write (unit=res,fmt='(I5.5)') maxval(v,mask)

After which res = "*"

And this call:

  maxval(b, mask)

Which returns the empty string. Thus, res != maxval(b, mask) and we fail the
test.

I presume something similar is happening in the other tests from this patch.

Thanks,
James



[Patch combine] Don't create vector mode ZERO_EXTEND from subregs

2017-12-11 Thread James Greenhalgh

Hi,

In simplify_set we try transforming the paradoxical subreg expression:

  (set FOO (subreg:M (mem:N BAR) 0))

in to:

  (set FOO (zero_extend:M (mem:N BAR)))

However, this code does not consider the case where M is a vector
mode, allowing it to construct (for example):

  (zero_extend:V4SI (mem:SI))

This would clearly have the wrong semantics, but fortunately we fail long
before then in expand_compound_operation. As we really don't want a vector
zero_extend of a scalar value.

We need to explicitly reject vector modes from this transformation.

This fixes a failure I'm seeing on a branch in which I'm trying to
tackle some performance regressions, so I have no live testcase for
this, but it is wrong by observation.

Tested on aarch64-none-elf and bootstrapped on aarch64-none-linux-gnu with
no issues.

OK?

Thanks,
James

---
2017-12-11  James Greenhalgh  

* combine.c (simplify_set): Do not transform subregs to zero_extends
if the destination mode is a vector mode.

diff --git a/gcc/combine.c b/gcc/combine.c
index 786a840..562eae6 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6962,11 +6962,13 @@ simplify_set (rtx x)
 
   /* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this
  would require a paradoxical subreg.  Replace the subreg with a
- zero_extend to avoid the reload that would otherwise be required.  */
+ zero_extend to avoid the reload that would otherwise be required.
+ Don't do this for vector modes, as the transformation is incorrect.  */
 
   enum rtx_code extend_op;
   if (paradoxical_subreg_p (src)
   && MEM_P (SUBREG_REG (src))
+  && !VECTOR_MODE_P (GET_MODE (src))
   && (extend_op = load_extend_op (GET_MODE (SUBREG_REG (src != UNKNOWN)
 {
   SUBST (SET_SRC (x),


Re: [PATCH] annotate vector::_M_default_append fo better codegen (PR 83229)

2017-12-11 Thread Jonathan Wakely

On 05/12/17 20:44 -0700, Martin Sebor wrote:

Bug 83239 - False positive from -Wstringop-overflow on simple
std::vector code, besides pointing out the warning, suggests
a missed optimization opportunity.  This is the second report
involving for vector (the last one was pr79095) with the same
symptoms and a similar root cause.  Since the information GCC
keeps about pointers is limited, it's difficult to determine
that their relationship is p <= q <= r.  This is exacerbated
by the fact that GCC doesn't know that the difference between
any two pointers into the same object cannot be greater than
PTRDIFF_MAX (bug 79119).

To help GCC generate better code and avoid the false positive,
the attached patch adds simple instrumentation to
vector::_M_default_append() asserting the important pointer
relationships.


The vector annotation is OK for trunk, thanks.


[PATCH] Fix PR81889

2017-12-11 Thread Richard Biener

Unrolling often has only rudimentary info for the upper bound of a loop
even though VRP would compute reasonable bounds for the variables
participating in the loop exit test.  This causes excessive peeling
and thus warnings from array bound and uninit warning code.

The following mitigates missing range-info somewhat because range-info
from early often persists on IV computation statements.  We already
use "ranges" on them but only their natural range.  The following
makes us use range info properly.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-12-11  Richard Biener  

PR tree-optimization/81889
* tree-ssa-loop-niter.c (infer_loop_bounds_from_signedness): Use
range info from the non-wrapping IV instead of just the range
of the type.

* gfortran.dg/pr81889.f90: New testcase.

Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 255539)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -3510,6 +3510,12 @@ infer_loop_bounds_from_signedness (struc
 
   low = lower_bound_in_type (type, type);
   high = upper_bound_in_type (type, type);
+  wide_int minv, maxv;
+  if (get_range_info (def, &minv, &maxv) == VR_RANGE)
+{
+  low = wide_int_to_tree (type, minv);
+  high = wide_int_to_tree (type, maxv);
+}
 
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }
Index: gcc/testsuite/gfortran.dg/pr81889.f90
===
--- gcc/testsuite/gfortran.dg/pr81889.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr81889.f90   (working copy)
@@ -0,0 +1,29 @@
+! { dg-do compile }
+! { dg-options "-O3 -Wall" }
+
+module m
+
+   type t
+  integer, dimension(:), pointer :: list
+   end type
+
+contains
+
+   subroutine s(n, p, Y)
+  integer, intent(in) :: n
+  type(t) :: p
+  real, dimension(:) :: Y
+
+  real, dimension(1:16) :: xx
+
+  if (n > 3) then
+ xx(1:n) = 0.
+ print *, xx(1:n)
+  else
+ xx(1:n) = Y(p%list(1:n)) ! { dg-bogus "uninitialized" }
+ print *, sum(xx(1:n))
+  end if
+
+   end subroutine
+
+end module


[PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2017-12-11 Thread Renlin Li

Hi all,

In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as
temporary register.

When the compiler is performing sibcall optimization. It has the chance to use
ip0/ip1 register for indirect function call to hold the address. However, those 
two register might
be clobbered by the epilogue code which makes the last sibcall instruction
invalid.

The following is an extreme example:
When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 
-ffixed-x5 -ffixed-x6 -ffixed-x7
-ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 
-ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18

void (*f)();
int xx;

void tailcall (int i)

{
   int arr[5000];
   xx = arr[i];
   f();
}


tailcall:
mov x16, 20016
sub sp, sp, x16
adrpx16, .LANCHOR0
stp x19, x30, [sp]
add x19, sp, 16
ldr s0, [x19, w0, sxtw 2]
ldp x19, x30, [sp]
str s0, [x16, #:lo12:.LANCHOR0]
mov x16, 20016
add sp, sp, x16
br  x16   // oops


As we can see, x16 is used in the indirect sibcall instruction. It is used as
a temporary in the epilogue code as well. The register allocation is invalid.

With the change, the register allocator is only allowed to use r0-r15, r18 for
indirect sibcall instruction.

For this particular case above, the compiler will ICE as there is not register
could be used for this sibcall instruction.
And I think it is better to fail instead of wrong code-generation.

test.c:10:1: error: unable to generate reloads for:
 }
 ^
(call_insn/j 16 12 17 2 (parallel [
(call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8])
(const_int 0 [0]))
(return)
]) "test.c":9 42 {*sibcall_insn}
 (expr_list:REG_DEAD (reg/f:DI 84 [ f ])
(expr_list:REG_CALL_DECL (nil)
(nil)))
(expr_list (clobber (reg:DI 17 x17))
(expr_list (clobber (reg:DI 16 x16))
(nil

aarch64-none-elf test without regressions. Okay to commit?
The same issue affects gcc-6, gcc-7 as well. Backport are needed for those 
branches.

Regards,
Renlin

gcc/ChangeLog:

2017-12-11  Renlin Li  

PR target/83370
* config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
TAILCALL_ADDR_REGS.
(aarch64_register_move_cost): Likewise.
* config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to 
TAILCALL_ADDR_REGS.
* config/aarch64/constraints.md (Ucs): Update register constraint.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 93d29b84d47b7017661a2129d61e7d740bbf7c93..322b7f4628aa69cf331c12ff2c8df351890da9ef 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -446,7 +446,7 @@ extern unsigned aarch64_architecture_version;
 enum reg_class
 {
   NO_REGS,
-  CALLER_SAVE_REGS,
+  TAILCALL_ADDR_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -462,7 +462,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {		\
   "NO_REGS",	\
-  "CALLER_SAVE_REGS",\
+  "TAILCALL_ADDR_REGS",\
   "GENERAL_REGS",\
   "STACK_REG",	\
   "POINTER_REGS",\
@@ -475,7 +475,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0007, 0x, 0x },	/* CALLER_SAVE_REGS */	\
+  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 75a6c0d0421354d7c0759292947eb5d407f5b703..66d503ac6edf59a1ea2fa3675fbbe03d70769833 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6060,7 +6060,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 {
   switch (regclass)
 {
-case CALLER_SAVE_REGS:
+case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
 case GENERAL_REGS:
 case ALL_REGS:
@@ -8226,10 +8226,10 @@ aarch64_register_move_cost (machine_mode mode,
 = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
 to = GENERAL_REGS;
 
-  if (from == CALLER_SAVE_REGS || from == POINTER_REGS)
+  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
 from = GENERAL_REGS;
 
   /* Moving between GPR and stack cost is the same as GP2GP.  */
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -21,8 +21,8 @@
 (define_register_constraint "k"

Re: [PATCH] Fix stack overflow with autofdo (PR83355)

2017-12-11 Thread Richard Biener
On Mon, Dec 11, 2017 at 8:01 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> g++.dg/bprob* is failing currently with autofdo.
>
> Running in gdb shows that there is a very deep recursion in get_index_by_decl 
> until it
> overflows the stack.
>
> This patch seems to fix it (but not sure why the abstract origin would point 
> to
> itself)
>
> Passes bootstrap and testing on x86_64-linux
Ok.

Richard.

> gcc/:
> 2017-12-10  Andi Kleen  
>
> PR gcov-profile/83355
> * auto-profile.c (string_table::get_index_by_decl): Don't
> recurse when abstract origin points to itself.
> ---
>  gcc/auto-profile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
> index 5134a795331..403709bad6b 100644
> --- a/gcc/auto-profile.c
> +++ b/gcc/auto-profile.c
> @@ -477,7 +477,7 @@ string_table::get_index_by_decl (tree decl) const
>ret = get_index (lang_hooks.dwarf_name (decl, 0));
>if (ret != -1)
>  return ret;
> -  if (DECL_ABSTRACT_ORIGIN (decl))
> +  if (DECL_ABSTRACT_ORIGIN (decl) && DECL_ABSTRACT_ORIGIN (decl) != decl)
>  return get_index_by_decl (DECL_ABSTRACT_ORIGIN (decl));
>
>return -1;
> --
> 2.15.1
>


Re: [PATCH] Fix segfault in inliner with attribute flatten

2017-12-11 Thread Richard Biener
On Mon, Dec 11, 2017 at 8:01 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> This fixes a segfault in gcc 7/8 when building turicreate.
>
> For some reason the node has no decl here, and there is a
> crash when checking for attribute flatten.

As said in the PR it looks like the order array is corrupted
(a freed entry is re-used with an inline clone).

Honza?

Richard.

> gcc/:
>
> 2017-12-10  Andi Kleen  
>
> PR ipa/83346
> * ipa-inline.c (ipa_inline): Check for NULL pointer.
>
> gcc/testsuite:
>
> 2017-12-10  Andi Kleen  
>
> * g++.dg/pr83346.C: Add.
> ---
>  gcc/ipa-inline.c   |  3 ++-
>  gcc/testsuite/g++.dg/pr83346.C | 32 
>  2 files changed, 34 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr83346.C
>
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 7846e93d119..dcd8a3de1ac 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -2391,7 +2391,8 @@ ipa_inline (void)
>  entry of cycles, possibly cloning that entry point and
>  try to flatten itself turning it into a self-recursive
>  function.  */
> -  if (lookup_attribute ("flatten",
> +  if (node->decl
> +&& lookup_attribute ("flatten",
> DECL_ATTRIBUTES (node->decl)) != NULL)
> {
>   if (dump_file)
> diff --git a/gcc/testsuite/g++.dg/pr83346.C b/gcc/testsuite/g++.dg/pr83346.C
> new file mode 100644
> index 000..2a916223dc9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr83346.C
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" }  */
> +namespace {
> +template  struct b { a c; };
> +}
> +typedef int d;
> +namespace {
> +namespace {
> +template  class ac;
> +typedef ac ad;
> +template  class ac {
> +public:
> +  ~ac();
> +};
> +}
> +typedef ad f;
> +struct g {};
> +enum ag {};
> +class ae {
> +public:
> +  ~ae();
> +  template  ae(h);
> +  union aj {
> +b *ak;
> +struct {
> +  ag al;
> +};
> +  } am;
> +  __attribute__((always_inline)) void an(aj i, ag) { delete i.ak; }
> +} ao = g();
> +__attribute__((always_inline, flatten)) ae::~ae() { an(am, am.al); }
> +}
> --
> 2.15.1
>


RE: [PATCH] rl78 anddi3 improvement

2017-12-11 Thread Sebastian Perta
Hello Jeff,

Thank you for your comments.

>>So I think you're ultimately far better off determining why GCC does not
>>generate efficient code for 64bit logicals on the rl78 target.
I totally agree with you, this is why:
1. I have another patch: I define_expand movdi in which I instruct GCC to use 
16 bit movw instead of movw, with this patch applied on the latest revision I 
reduce the code size of this function (my_anddi3) from 323 bytes to 245 bytes. 
I'm just waiting on the regression to finish I will post this patch.
2. I am working very hard for quite some time to improve/replace the 
"devirtualization" pass (see pass_rl78_devirt in rl78.c). I am working on a 
solution which will generate very similar code what I wrote in ___adddi3 and 
will also allow me to also change the calling convention to be efficient 
(similar to what the RL78 commercial compilers use) but unfortunately I still a 
long way from being finished (as it is quite difficult). I think DJ can explain 
much better why he needed to do things this way (add this pass and the *_virt 
and *_real variants for the instructions) in the first place.

However if you look closely at the patch you will see the that I put the 
following condition for the availability of the expand:
+  "optimize_size"
The idea behind this is the following:
Compared to the commercial RL78 compilers GCC is quite far behind (even 2x-3x 
bigger). When comparing the output code I observed the commercial compilers I 
saw they use quite extensively code merging techniques.
For GCC I found some work on this on a branch which didn't make to master 
(https://www.gnu.org/software/gcc/projects/cfo.html). I have ported this a 
while back to 4.9.2 but I didn't get significant code size improvement (I think 
I will give this another try after I finish point 2 above)
So I decided then to continue doing things this way which finally gave me some 
really good results (improved the code size by 30% on average, even more than 
50% in some cases).
So even if/when I finish with point 2 above I think I will still like to have 
things done this way as they improve code size significantly.

I hope this explanation is satisfactory to you  as I have other patches (not 
only for 64 bit operations) which make use of the same idea.

Best Regards,
Sebastian


[https://www2.renesas.eu/media/email/unicef_2017.jpg]

This Christmas, instead of sending out cards, Renesas Electronics Europe have 
decided to support Unicef with a donation. For further details click 
here to find out about the valuable work they do, 
helping children all over the world.
We would like to take this opportunity to wish you a Merry Christmas and a 
prosperous New Year.



Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, 
Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered 
No. 04586709.


[patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2017-12-11 Thread James Greenhalgh

Hi,

In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.

This duplication can be clunky and wasteful.

Better would be to simply use the fact that we will always be overwriting
the remaining bits, and simply move the first element to the corrcet place
(implicitly zeroing all other bits).

This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.

Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.

Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.

We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
catch:

  (vec_merge:OUTER
 (vec_duplicate:OUTER x:INNER)
 (subreg:OUTER y:INNER 0)
 (const_int N))

And simplify it to:

  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)

This is similar to the existing patterns which are tested in this function,
without requiring the second operand to also be a vec_duplicate.

Bootstrapped and tested on aarch64-none-linux-gnu and tested on
aarch64-none-elf.

Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
if we don't want to ICE creating broken vector zero extends.

Are the non-AArch64 parts OK?

Thanks,
James

---
2017-12-11  James Greenhalgh  

* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.

2017-12-11  James Greenhalgh  

* gcc.target/aarch64/vect-slp-dup.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 83d8607..8abb8e4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12105,9 +12105,51 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	maxv = matches[i][1];
 	  }
 
-  /* Create a duplicate of the most common element.  */
-  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+  /* Create a duplicate of the most common element, unless all elements
+	 are equally useless to us, in which case just immediately set the
+	 vector register using the first element.  */
+
+  if (maxv == 1)
+	{
+	  /* For vectors of two 64-bit elements, we can do even better.  */
+	  if (n_elts == 2
+	  && (inner_mode == E_DImode
+		  || inner_mode == E_DFmode))
+
+	{
+	  rtx x0 = XVECEXP (vals, 0, 0);
+	  rtx x1 = XVECEXP (vals, 0, 1);
+	  /* Combine can pick up this case, but handling it directly
+		 here leaves clearer RTL.
+
+		 This is load_pair_lanes, and also gives us a clean-up
+		 for store_pair_lanes.  */
+	  if (memory_operand (x0, inner_mode)
+		  && memory_operand (x1, inner_mode)
+		  && !STRICT_ALIGNMENT
+		  && rtx_equal_p (XEXP (x1, 0),
+  plus_constant (Pmode,
+		 XEXP (x0, 0),
+		 GET_MODE_SIZE (inner_mode
+		{
+		  rtx t;
+		  if (inner_mode == DFmode)
+		t = gen_load_pair_lanesdf (target, x0, x1);
+		  else
+		t = gen_load_pair_lanesdi (target, x0, x1);
+		  emit_insn (t);
+		  return;
+		}
+	}
+	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
+	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
+	  maxelement = 0;
+	}
+  else
+	{
+	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+	}
 
   /* Insert the rest.  */
   for (int i = 0; i < n_elts; i++)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 806c309..ed16f70 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
 	}
 
+	  /* Replace:
+
+	  (vec_merge:outer (vec_duplicate:outer x:inner)
+			   (subreg:outer y:inner 0)
+			   (const_int N))
+
+	 with (vec_concat:outer x:inner y:inner) if N == 1,
+	 or (vec_concat:outer y:inner x:inner) if N == 2.
+
+	 Implicitly, this means we have a paradoxical subreg, but such
+	 a check is cheap, so make it anyway.
+
+	 Only applies for vectors of two elements.  */
+	  if (GET_CODE (op0) =

[PATCH] ifcvt: Call fixup_partitions (PR83361)

2017-12-11 Thread Segher Boessenkool
After converting a conditional branch to an unconditional trap to a
conditional trap, if the original trap is still reachable from another
path, it may be that it is in a hot basic block  and only reachable from
cold blocks.  Fix that up.

This fixes PR83361.  Bootstrapping on powerpc64-linux {-m32,-m64}; okay
for trunk if it succeeds?


Segher


2017-12-11  Segher Boessenkool  

PR rtl-optimization/83361
* ifcvt.c (if_convert): Call fixup_partitions.

gcc/testsuite/
PR rtl-optimization/83361
* gcc.dg/pr83361.c: New testcase.

---
 gcc/ifcvt.c|  4 
 gcc/testsuite/gcc.dg/pr83361.c | 40 
 2 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr83361.c

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 91360d8..eb3da68 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -5446,6 +5446,10 @@ if_convert (bool after_combine)
   if (optimize == 1)
 df_remove_problem (df_live);
 
+  /* Some non-cold blocks may now be only reachable from cold blocks.
+ Fix that up.  */
+  fixup_partitions ();
+
   checking_verify_flow_info ();
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr83361.c b/gcc/testsuite/gcc.dg/pr83361.c
new file mode 100644
index 000..2a6f807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83361.c
@@ -0,0 +1,40 @@
+/* PR rtl-optimization/83361 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -freorder-blocks-and-partition" } */
+
+#include 
+
+int yz;
+
+void
+tq (int z3)
+{
+  unsigned long long int n8 = (unsigned long long int)INT_MAX + 1;
+  int *ey = &yz;
+
+  if (yz == 0)
+{
+  int bc;
+
+  yz = 1;
+  while (yz != 0)
+{
+  *ey *= bc;
+  n8 = !!(1 / ((unsigned long long int)yz == n8));
+  ey = &z3;
+}
+
+  while (z3 != 0)
+{
+}
+}
+
+  z3 = (n8 != 0) && (*ey != 0);
+  z3 = yz / z3;
+  if (z3 < 0)
+{
+  if (yz != 0)
+yz = 0;
+  yz /= 0;
+}
+}
-- 
1.8.3.1



[PATCH, rs6000] generate loop code for memcmp inline expansion

2017-12-11 Thread Aaron Sawdey
This patch builds on two previously posted patches:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01216.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02599.html

The previous patches allow the use of bdnzt.

This patch allows the cmpstrsi pattern expansion to handle longer blocks without
increasing code size by using a loop:

.L13:
ldbrx 7,3,6
ldbrx 9,10,6
ldbrx 0,3,5
ldbrx 4,10,5
addi 6,6,16
addi 5,5,16
subfc. 9,9,7
bne 0,.L10
subfc. 9,4,0
bdnzt 2,.L13

Performance testing on P8 showed that unroll x2 with 2 IVs was the way to go,
and it is the same performance as the previous non-loop inlne code for a
32 byte compare. Also on P8 the performance crossover with calling glibc memcmp
happens around 192 bytes length. The loop expansion code can also be generated
when the length is not known at compile time.

The gcc.dg/memcmp-1.c and gcc.dg/strncmp-2.c test cases are updated by the
patch to check the additional memcmp expansion for known and unknown lengths.

The undocumented option -mblock-compare-inline-limit was changed slightly to
mean how many bytes will be compared by the non-loop inline code, the default
is 31 bytes. As previously, -mblock-compare-inlne-limit=0 will disable all
inline expansion of memcmp(). The new -mblock-compare-inline-loop-limit option
sets the upper limit, for lengths above that no expansion is done if the length
is known at compile time. If the length is unknown, the generated code calls
glibc memcmp() after comparing that many bytes.

Bootstrap and regtest pass on P8 LE and P9 LE, testing in progress for
BE (default, P7, and P8). If tests pass, OK for trunk when the first of the
previously posted patches is approved?

Thanks,
Aaron


2017-12-11  Aaron Sawdey  

* config/rs6000/rs6000-string.c (do_load_for_compare_from_addr): New
function.
(do_ifelse): New function.
(do_isel): New function.
(do_sub3): New function.
(do_add3): New function.
(do_load_mask_compare): New function.
(do_overlap_load_compare): New function.
(expand_compare_loop): New function.
(expand_block_compare): Call expand_compare_loop() when appropriate.
* config/rs6000/rs6000.opt (-mblock-compare-inline-limit): Change
option description.
(-mblock-compare-inline-loop-limit): New option.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c	(revision 255516)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -301,6 +301,923 @@
   return MIN (base_align, offset & -offset);
 }
 
+/* Prepare address and then do a load.
+
+   MODE is the mode to use for the load.
+   DEST is the destination register for the data.
+   ADDR is the address to be loaded.
+   ORIG_ADDR is the original address expression.  */
+static void
+do_load_for_compare_from_addr (machine_mode mode, rtx dest, rtx addr,
+			   rtx orig_addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr);
+  MEM_COPY_ATTRIBUTES (mem, orig_addr);
+  set_mem_size (mem, GET_MODE_SIZE (mode));
+  do_load_for_compare (dest, mem, mode);
+  return;
+}
+
+/* Do a branch for an if/else decision.
+
+   CMPMODE is the mode to use for the comparison.
+   COMPARISON is the rtx code for the compare needed.
+   A is the first thing to be compared.
+   B is the second thing to be compared.
+   CR is the condition code reg input, or NULL_RTX.
+   TRUE_LABEL is the label to branch to if the condition is true.
+
+   The return value is the CR used for the comparison.
+   If CR is null_rtx, then a new register of CMPMODE is generated.
+   If A and B are both null_rtx, then CR must not be null, and the
+   compare is not generated so you can use this with a dot form insn.  */
+
+static rtx
+do_ifelse (machine_mode cmpmode, rtx_code comparison,
+	   rtx a, rtx b, rtx cr, rtx true_label)
+{
+  gcc_assert ((a == NULL_RTX && b == NULL_RTX && cr != NULL_RTX)
+	  || (a != NULL_RTX && b != NULL_RTX));
+
+  if (cr != NULL_RTX)
+gcc_assert (GET_MODE (cr) == cmpmode);
+  else
+cr = gen_reg_rtx (cmpmode);
+
+  rtx label_ref = gen_rtx_LABEL_REF (VOIDmode, true_label);
+
+  if (a != NULL_RTX)
+emit_move_insn (cr, gen_rtx_COMPARE (cmpmode, a, b));
+
+  rtx cmp_rtx = gen_rtx_fmt_ee (comparison, VOIDmode, cr, const0_rtx);
+
+  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, label_ref, pc_rtx);
+  rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+  JUMP_LABEL (j) = true_label;
+  LABEL_NUSES (true_label) += 1;
+
+  return cr;
+}
+
+/* Emit an isel of the proper mode for DEST.
+
+   DEST is the isel destination register.
+   SRC1 is the isel source if CR is true.
+   SRC2 is the isel source if CR is false.
+   CR is the condition for the isel.  */
+static v

[PING ^ 2] Re: [PATCH v2: 00/14] Preserving locations for variable-uses and constants (PR 43486)

2017-12-11 Thread David Malcolm
Ping for this patch kit:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00880.html

(and thanks again for looking at patch 2 earlier)


On Thu, 2017-11-30 at 14:17 -0500, David Malcolm wrote:
> Ping for the rest of this kit:
>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00880.html
> 
> (thanks for the review of patch 2 of the kit)
> 
> On Fri, 2017-11-10 at 16:45 -0500, David Malcolm wrote:
> > On Thu, 2017-11-02 at 10:46 -0400, Jason Merrill wrote:
> > > On Tue, Oct 31, 2017 at 5:09 PM, David Malcolm  > > om
> > > > 
> > > 
> > > wrote:
> > > > On Tue, 2017-10-24 at 09:53 -0400, Jason Merrill wrote:
> > > > > On Fri, Oct 20, 2017 at 5:53 PM, David Malcolm  > > > > at
> > > > > .c
> > > > > om>
> > > > > wrote:
> > > > > > Design questions:
> > > > > > 
> > > > > > * The patch introduces a new kind of tree node, currently
> > > > > > called
> > > > > >   DECL_WRAPPER_EXPR (although it's used for wrapping
> > > > > > constants
> > > > > > as
> > > > > > well
> > > > > >   as decls).  Should wrappers be a new kind of tree node,
> > > > > > or
> > > > > > should
> > > > > > they
> > > > > >   reuse an existing TREE_CODE? (e.g. NOP_EXPR,
> > > > > > CONVERT_EXPR,
> > > > > > etc).
> > > > > > * NOP_EXPR: seems to be for use as an rvalue
> > > > > > * CONVERT_EXPR: for type conversions
> > > > > > * NON_LVALUE_EXPR: "Value is same as argument, but
> > > > > > guaranteed
> > > > > > not an
> > > > > >   lvalue"
> > > > > >   * but we *do* want to support lvalues here
> > > > > 
> > > > > I think using NON_LVALUE_EXPR for constants would be
> > > > > appropriate.
> > > > > 
> > > > > > * VIEW_CONVERT_EXPR: viewing one thing as of a
> > > > > > different
> > > > > > type
> > > > > >   * can it support lvalues?
> > > > > 
> > > > > Yes, the purpose of VIEW_CONVERT_EXPR is to support lvalues,
> > > > > it
> > > > > seems
> > > > > like the right choice.
> > > > > 
> > > > > Jason
> > > > 
> > > > Thanks.  I've been working on a new version of the patch using
> > > > those
> > > > tree codes, but have run into an issue.
> > > > 
> > > > In g++.dg/conversion/reinterpret1.C:
> > > > 
> > > >   // PR c++/15076
> > > > 
> > > >   struct Y { Y(int &); };
> > > > 
> > > >   int v;
> > > >   Y y1(reinterpret_cast(v));  // { dg-error "" }
> > > > 
> > > > With trunk, this successfully generates an error:
> > > > 
> > > >   reinterpret1.C:6:6: error: cannot bind non-const lvalue
> > > > reference
> > > > of type ‘int&’ to an rvalue of type ‘int’
> > > >Y y1(reinterpret_cast(v));  // { dg-error "" }
> > > > ^~~~
> > > >   reinterpret1.C:3:12: note:   initializing argument 1 of
> > > > ‘Y::Y(int&)’
> > > >struct Y { Y(int &); };
> > > >   ^
> > > > 
> > > > where internally there's a NON_LVALUE_EXPR around a VAR_DECL,
> > > > where
> > > > both have the same type:
> > > > 
> > > > (gdb) call debug_tree (expr)
> > > >   > > > type  > > > size 
> > > > unit-size 
> > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1
> > > > canonical-type 0x7132e5e8 precision:32 min  > > > 0x713310d8 -2147483648> max  > > > 2147483647>
> > > > pointer_to_this 
> > > > reference_to_this >
> > > > 
> > > > arg:0  > > > 0x7132e5e8 int>
> > > > used public static tree_1 read SI /home/david/coding-
> > > > 3/gcc-
> > > > git-expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:5:5
> > > > size
> > > >  unit-size  > > > 0x71331138 4>
> > > > align:32 warn_if_not_align:0 context
> > > >  > > > 0x7131e168 /home/david/coding-3/gcc-git-expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > > > chain  > > > 0x7144c150 Y>
> > > > public decl_2 VOID /home/david/coding-3/gcc-git-
> > > > expr-
> > > > vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:3:8
> > > > align:8 warn_if_not_align:0 context
> > > >  > > > git-
> > > > expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > > > chain >>
> > > > /home/david/coding-3/gcc-git-expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > > start:
> > > > /home/david/coding-3/gcc-git-expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > > finish:
> > > > /home/david/coding-3/gcc-git-expr-vs-
> > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:29>
> > > > 
> > > > The problem is that this reinterpret cast "looks" just like one
> > > > of
> > > > my
> > > > location wrappers.
> > > 
> > > Your code shouldn't strip a NON_LVALUE_EXPR around a VAR_DECL.
> > > > I see a similar issue with constants, where with:
> > > > 
> > > >   struct Y { Y(int &); };
> > > >   Y y1(reinterpret_cast(42));
> > > > 
> > > > trunk generates an error like the above, but my code handles
> > > > the
> > > >   NON_LVALUE_EXPR(INTEGER_CST(42))
> > > > as if it were a location wrapper around the INTEGER_C

PING: [PATCH] C++: avoid most reserved words as misspelling suggestions (PR c++/81610 and PR c++/80567)

2017-12-11 Thread David Malcolm
Ping: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02048.html

On Wed, 2017-11-22 at 10:36 -0500, David Malcolm wrote:
> lookup_name_fuzzy can offer some reserved words as suggestions for
> misspelled words, helping with "singed"/"signed" typos.
> 
> PR c++/81610 and PR c++/80567 report problems where the C++ frontend
> suggested "if", "for" and "else" as corrections for misspelled
> variable
> names.
> 
> The root cause is that in r247233
>   ("Fix spelling suggestions for reserved words (PR c++/80177)")
> I loosened the conditions on these reserved words, adding this
> condition:
>if (kind == FUZZY_LOOKUP_TYPENAME)
> to the logic for rejecting words that don't start decl-specifiers, to
> allow for "static_assert" to be offered.
> 
> This is too loose a condition: we don't want to suggest *any*
> reserved word
> when we're in a context where we don't know we expect a typename.
> 
> For the kinds of error-recover situations where we're suggesting
> spelling corrections we don't have much contextual information, so it
> seems prudent to be stricter about which reserved words we offer
> as spelling suggestions; I don't think it makes sense for us to
> suggest e.g. "for".
> 
> This patch implements that by effectively reinstating the old logic,
> but special-casing RID_STATIC_ASSERT, moving the logic to a new
> subroutine (in case we want to allow for other special-cases).
> 
> I attempted to add suggestions for the various RID_*CAST, to cope
> with e.g. "reinterptet_cast" (I can never type that correctly on the
> first try), but the following '<' token confuses the error-recovery
> enough that the suggestion code isn't triggered.
> 
> Hence this more minimal fix.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/cp/ChangeLog:
>   PR c++/81610
>   PR c++/80567
>   * name-lookup.c (suggest_rid_p): New function.
>   (lookup_name_fuzzy): Replace enum-rid-filtering logic with call
> to
>   suggest_rid_p.
> 
> gcc/testsuite/ChangeLog:
>   PR c++/81610
>   PR c++/80567
>   * g++.dg/spellcheck-reswords.C: New test case.
>   * g++.dg/spellcheck-stdlib.C: Remove xfail from dg-bogus
>   suggestion of "if".
> ---
>  gcc/cp/name-lookup.c   | 31
> +++---
>  gcc/testsuite/g++.dg/spellcheck-reswords.C | 11 +++
>  gcc/testsuite/g++.dg/spellcheck-stdlib.C   |  2 +-
>  3 files changed, 40 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/spellcheck-reswords.C
> 
> diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
> index 7c363b0..a96be46 100644
> --- a/gcc/cp/name-lookup.c
> +++ b/gcc/cp/name-lookup.c
> @@ -5671,6 +5671,32 @@ class macro_use_before_def : public
> deferred_diagnostic
>cpp_hashnode *m_macro;
>  };
>  
> +/* Determine if it can ever make sense to offer RID as a suggestion
> for
> +   a misspelling.
> +
> +   Subroutine of lookup_name_fuzzy.  */
> +
> +static bool
> +suggest_rid_p  (enum rid rid)
> +{
> +  switch (rid)
> +{
> +/* Support suggesting function-like keywords.  */
> +case RID_STATIC_ASSERT:
> +  return true;
> +
> +default:
> +  /* Support suggesting the various decl-specifier words, to
> handle
> +  e.g. "singed" vs "signed" typos.  */
> +  if (cp_keyword_starts_decl_specifier_p (rid))
> + return true;
> +
> +  /* Otherwise, don't offer it.  This avoids suggesting e.g.
> "if"
> +  and "do" for short misspellings, which are likely to lead
> to
> +  nonsensical results.  */
> +  return false;
> +}
> +}
>  
>  /* Search for near-matches for NAME within the current bindings, and
> within
> macro names, returning the best match as a const char *, or NULL
> if
> @@ -5735,9 +5761,8 @@ lookup_name_fuzzy (tree name, enum
> lookup_name_fuzzy_kind kind, location_t loc)
>  {
>const c_common_resword *resword = &c_common_reswords[i];
>  
> -  if (kind == FUZZY_LOOKUP_TYPENAME)
> - if (!cp_keyword_starts_decl_specifier_p (resword->rid))
> -   continue;
> +  if (!suggest_rid_p (resword->rid))
> + continue;
>  
>tree resword_identifier = ridpointers [resword->rid];
>if (!resword_identifier)
> diff --git a/gcc/testsuite/g++.dg/spellcheck-reswords.C
> b/gcc/testsuite/g++.dg/spellcheck-reswords.C
> new file mode 100644
> index 000..db6104b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/spellcheck-reswords.C
> @@ -0,0 +1,11 @@
> +void pr81610 (void *p)
> +{  
> +  forget (p); // { dg-error "not declared" }
> +  // { dg-bogus "'for'" "" { target *-*-*} .-1 }
> +}
> +
> +void pr80567 (void *p)
> +{
> +  memset (p, 0, 4); // { dg-error "not declared" }
> +  // { dg-bogus "'else'" "" { target *-*-*} .-1 }
> +}
> diff --git a/gcc/testsuite/g++.dg/spellcheck-stdlib.C
> b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
> index 6e6ab1d..c7a6626 100644
> --- a/gcc/testsuite/g++.dg/spellcheck-stdlib.C
> +++ b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
> @@ -1

Re: [PATCH #2], PR target/81959, Fix ++int to _Float128 conversion on power9

2017-12-11 Thread Segher Boessenkool
On Mon, Dec 04, 2017 at 04:31:55PM -0500, Michael Meissner wrote:
> On Fri, Dec 01, 2017 at 05:33:39PM -0600, Segher Boessenkool wrote:
> > Okay for trunk.  Further improvements welcome ;-)  Thanks!
> 
> Here is the patch for GCC 7 (the bug shows up in GCC 7).  It is slightly
> different due to the surrounding lines in rs6000.c being different.  Is it ok
> to apply?  There were no regressions in the build.

It looks fine...  Okay for 7.  Thanks!


Segher

> [gcc]
> 2017-12-04  Michael Meissner  
> 
>   Back port from trunk
>   2017-12-01  Michael Meissner  
> 
>   PR target/81959
>   * config/rs6000/rs6000.c (rs6000_address_for_fpconvert): Check for
>   whether we can allocate pseudos before trying to fix an address.
>   * config/rs6000/rs6000.md (float_si2_hw): Make sure the
>   memory address is indexed or indirect.
>   (floatuns_si2_hw2): Likewise.
> 
> [gcct/testsuite]
> 2017-12-04  Michael Meissner  
> 
>   Back port from trunk
>   2017-12-01  Michael Meissner  
> 
>   PR target/81959
>   * gcc.target/powerpc/pr81959.c: New test.


PING: Re: [PATCH] Expensive selftests: torture testing for fix-it boundary conditions (PR c/82050)

2017-12-11 Thread David Malcolm
Ping: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02459.html

On Tue, 2017-11-28 at 14:31 -0500, David Malcolm wrote:
> This patch adds selftest coverage for the fix for PR c/82050
> (r255214).
> 
> The selftest iterates over various "interesting" column and line-
> width
> values to try to shake out bugs in the fix-it printing routines, a
> kind
> of "torture" selftest.
> 
> Unfortunately this selftest is noticably slower than the other
> selftests;
> adding it to diagnostic-show-locus.c led to:
>   -fself-test: 40218 pass(es) in 0.172000 seconds
> slowing down to:
>   -fself-test: 97315 pass(es) in 6.109000 seconds
> for an unoptimized build (e.g. when hacking with --disable-
> bootstrap).
> 
> Given that this affects the compile-edit-test cycle of the "gcc"
> subdirectory, this felt like an unacceptable amount of overhead to
> add.
> 
> I attempted to optimize the test by reducing the amount of coverage,
> but
> the test seems useful, and there seems to be a valid role for
> "torture"
> selftests.
> 
> Hence this patch adds a:
>   gcc.dg/plugin/expensive_selftests_plugin.c
> with the responsibility for running "expensive" selftests, and adds
> the
> expensive test there.  The patch moves a small amount of code from
> selftest::run_tests into a helper class so that the plugin can print
> a useful summary line (to reassure us that the tests are actually
> being
> run).
> 
> With that, the compile-edit-test cycle of the "gcc" subdir is
> unaffected;
> the plugin takes:
>   expensive_selftests_plugin: 26641 pass(es) in 3.127000 seconds
> which seems reasonable within the much longer time taken by "make
> check"
> (I optimized some of the overhead away, hence the reduction from 6
> seconds
> above down to 3 seconds).
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   PR c/82050
>   * selftest-run-tests.c (selftest::run_tests): Move start/finish
> code
>   to...
>   * selftest.c (selftest::test_runner::test_runner): New ctor.
>   (selftest::test_runner::~test_runner): New dtor.
>   * selftest.h (class selftest::test_runner): New class.
> 
> gcc/testsuite/ChangeLog:
>   PR c/82050
>   * gcc.dg/plugin/expensive-selftests-1.c: New file.
>   * gcc.dg/plugin/expensive_selftests_plugin.c: New file.
>   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
> ---
>  gcc/selftest-run-tests.c   |  11 +-
>  gcc/selftest.c |  22 +++
>  gcc/selftest.h |  14 ++
>  .../gcc.dg/plugin/expensive-selftests-1.c  |   3 +
>  .../gcc.dg/plugin/expensive_selftests_plugin.c | 175
> +
>  gcc/testsuite/gcc.dg/plugin/plugin.exp |   2 +
>  6 files changed, 218 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/expensive-selftests-
> 1.c
>  create mode 100644
> gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c
> 
> diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
> index 6030d3b..f539c66 100644
> --- a/gcc/selftest-run-tests.c
> +++ b/gcc/selftest-run-tests.c
> @@ -46,7 +46,7 @@ selftest::run_tests ()
>   option-handling.  */
>path_to_selftest_files = flag_self_test;
>  
> -  long start_time = get_run_time ();
> +  test_runner r ("-fself-test");
>  
>/* Run all the tests, in hand-coded order of (approximate)
> dependencies:
>   run the tests for lowest-level code first.  */
> @@ -106,14 +106,7 @@ selftest::run_tests ()
>   failed to be finalized can be detected by valgrind.  */
>forcibly_ggc_collect ();
>  
> -  /* Finished running tests.  */
> -  long finish_time = get_run_time ();
> -  long elapsed_time = finish_time - start_time;
> -
> -  fprintf (stderr,
> -"-fself-test: %i pass(es) in %ld.%06ld seconds\n",
> -num_passes,
> -elapsed_time / 100, elapsed_time % 100);
> +  /* Finished running tests; the test_runner dtor will print a
> summary.  */
>  }
>  
>  #endif /* #if CHECKING_P */
> diff --git a/gcc/selftest.c b/gcc/selftest.c
> index b41b9f5..ca84bfa 100644
> --- a/gcc/selftest.c
> +++ b/gcc/selftest.c
> @@ -213,6 +213,28 @@ locate_file (const char *name)
>return concat (path_to_selftest_files, "/", name, NULL);
>  }
>  
> +/* selftest::test_runner's ctor.  */
> +
> +test_runner::test_runner (const char *name)
> +: m_name (name),
> +  m_start_time (get_run_time ())
> +{
> +}
> +
> +/* selftest::test_runner's dtor.  Print a summary line to
> stderr.  */
> +
> +test_runner::~test_runner ()
> +{
> +  /* Finished running tests.  */
> +  long finish_time = get_run_time ();
> +  long elapsed_time = finish_time - m_start_time;
> +
> +  fprintf (stderr,
> +"%s: %i pass(es) in %ld.%06ld seconds\n",
> +m_name, num_passes,
> +elapsed_time / 100, elapsed_time % 100);
> +}
> +
>  /* Selftests for libiberty.  */
>  
>  /* Verify that xstrndup generates EXPECTED w

Re: [PATCH, rs6000] (v2) Gimple folding of splat_uX

2017-12-11 Thread Segher Boessenkool
Hi!

On Fri, Dec 08, 2017 at 11:08:26AM -0600, Will Schmidt wrote:
> Add support for gimple folding of splat_u{8,16,32}.
> Testcase coverage is primarily handled by existing tests
> testsuite/gcc.target/powerpc/fold-vec-splat_*.c
> 
> One new test added to verify we continue to receive
> an 'invalid argument, must be a 5-bit immediate' error
> when we try to splat a non-constant value.
> 
> V2 updates include..
>   Use the gimple_convert() helper.
>   Use the build_vector_from_val() helper.
>   whitespace fix-ups.
> Those changes actually simplify the code here significantly, which is good. 
> :-)

:-)

> 2017-12-08  Will Schmidt  
> 
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>   early folding of splat_u{8,16,32}.
> 
> [testsuite]
> 
> 2017-12-08  Will Schmidt  
> 
>   * gcc.target/powerpc/fold-vec-splat-misc-invalid.c: New.


> +/* flavors of vec_splat_[us]{8,16,32}.  */
> +case ALTIVEC_BUILTIN_VSPLTISB:
> +case ALTIVEC_BUILTIN_VSPLTISH:
> +case ALTIVEC_BUILTIN_VSPLTISW:
> +  {
> +  arg0 = gimple_call_arg (stmt, 0);

The indent here is wrong (should be two spaces, is three).

Looks fine otherwise.  Okay for trunk with that fixed.  Thanks!


Segher


[PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-11 Thread Martin Sebor

Bug 83369 - Missing diagnostics during inlining, notes that when
-Wnonnull is issued for an inlined call to a built-in function,
GCC doesn't print the inlining stack, making it hard to debug
where the problem comes from.

When the -Wnonnull warning was introduced into the middle-end
the diagnostic machinery provided no way to print the inlining
stack (analogous to %K for trees).  Since then GCC has gained
support for the %G directive which does just that.  The attached
patch makes use of the directive to print the inlining context
for -Wnonnull.

The patch doesn't include a test because the DejaGnu framework
provides no mechanism to validate this part of GCC output (see
also bug 83336).

Tested on x86_64-linux with no regressions.

Martin
PR c/83369 - Missing diagnostics during inlining

gcc/ChangeLog:

	PR c/83369
	* tree-ssa-ccp.c (pass_post_ipa_warn::execute): Use %G to print
	inlining context.

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 3acddf9..8a405fc 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -3457,8 +3457,8 @@ pass_post_ipa_warn::execute (function *fun)
 
 		  location_t loc = gimple_location (stmt);
 		  if (warning_at (loc, OPT_Wnonnull,
-  "argument %u null where non-null "
-  "expected", i + 1))
+  "%Gargument %u null where non-null "
+  "expected", as_a (stmt), i + 1))
 			{
 			  tree fndecl = gimple_call_fndecl (stmt);
 			  if (fndecl && DECL_IS_BUILTIN (fndecl))


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Kilian Verhetsel

Jakub Jelinek  writes:
> Of course it can be done efficiently, what we care most is that the body of
> the vectorized loop is efficient.

That's fair, I was looking at the x86 assembly being generated when a single
vectorized iteration was enough (because that is the context in which I
first encountered this bug):

int f(unsigned int *x, unsigned int k) {
  unsigned int result = 8;
  for (unsigned int i = 0; i < 8; i++) {
if (x[i] == k) result = i;
  }
  return result;
}

where the vpand instruction this generates would have to be replaced
with a variable blend if the default value weren't 0 — although I had
not realized even SSE4.1 on x86 includes such an instruction, making
this point less relevant.

> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +   || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +   == INTEGER_INDUC_COND_REDUCTION))
> > +  && reduc_fn == IFN_LAST)»
>
> contains a character at the end of line that makes it not to compile.

My bad, I must have added this when I opened the patch file itself to
inspect it...

> Another thing is, as your patch is quite large, we need a copyright
> assignment for the changes before we can accept it, see
> https://gcc.gnu.org/contribute.html for details.
>
> If you are already covered by an assignment of some company, please tell
> us which one it is, otherwise contact us and we'll get you the needed
> forms.

I am not covered by any copyright assignment yet. Do I need to send you
any additional information?


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek  writes:
> > Of course it can be done efficiently, what we care most is that the body of
> > the vectorized loop is efficient.
> 
> That's fair, I was looking at the x86 assembly being generated when a single
> vectorized iteration was enough (because that is the context in which I
> first encountered this bug):
> 
> int f(unsigned int *x, unsigned int k) {
>   unsigned int result = 8;
>   for (unsigned int i = 0; i < 8; i++) {
> if (x[i] == k) result = i;
>   }
>   return result;
> }
> 
> where the vpand instruction this generates would have to be replaced
> with a variable blend if the default value weren't 0 — although I had
> not realized even SSE4.1 on x86 includes such an instruction, making
> this point less relevant.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631#c6 where I've
attached so far untested prototype.  If it is added before your patch
makes it in, your patch would start by introducing another kind
(say SIMPLE_INTEGER_INDUC_COND_REDUCTION) and would use that for the
spots that are handled by the PR80631 patch as INTEGER_INDUC_COND_REDUCTION
right now and your code for the rest.  E.g. the above testcase with my
patch, because i is unsigned and base is the minimum of the type is emitted as
COND_REDUCTION, which is what your patch would improve.

> > Another thing is, as your patch is quite large, we need a copyright
> > assignment for the changes before we can accept it, see
> > https://gcc.gnu.org/contribute.html for details.
> >
> > If you are already covered by an assignment of some company, please tell
> > us which one it is, otherwise contact us and we'll get you the needed
> > forms.
> 
> I am not covered by any copyright assignment yet. Do I need to send you
> any additional information?

I'll send it offlist.

Jakub


Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-11 Thread Sudakshina Das

On 30/11/17 16:01, Sudakshina Das wrote:

Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
   return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard generates wrong code:


test_select:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 vcvtb.f32.f16    s0, s0
 vcvtb.f32.f16    s15, s1
 vmov.f16    r3, s2    @ __fp16
 vcmpe.f32    s0, s15
 vmrs    APSR_nzcv, FPSCR
     // <-- No conditional branch
 vmov.f16    r3, s1    @ __fp16
.L1:
 vmov.f16    s0, r3    @ __fp16
 bx    lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on 
bootstrapped arm-none-linux-gnueabihf.


Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

 * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

 * gcc.target/arm/armv8_2-fp16-move-2.c: New test.


As per the trunk thread for this 
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as 
r255536 on gcc-7-branch for the backport.


Thanks
Sudi


[PATCH] RL78 movdi improvement

2017-12-11 Thread Sebastian Perta
Hello,

The following patch improves 64 bit operations by instructing GCC to use 16
bit movw instead of 8 bit mov.
On the following test case the patch reduces the code size from 323 bytes to
245 bytes.
unsigned long long my_anddi3(unsigned long long x, unsigned long long y){ 
return x & y;
}
I did not add this to the regression as it very simple and there many test
cases in the regression especially c-torture which use this patch.
Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

Please let me know if this is OK, Thank you!
Sebastian

Index: ChangeLog
===
--- ChangeLog   (revision 255538)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2017-12-12  Sebastian Perta  
+
+   * config/rl78/rl78-protos.h: New function declaration
rl78_split_movdi
+   * config/rl78/rl78.md: New define_expand "movdi"
+   * config/rl78/rl78.c: New function definition rl78_split_movdi
+   
 2017-12-10  Gerald Pfeifer  
 
* doc/install.texi (Specific): Tweak link to mkssoftware.com.
Index: config/rl78/rl78-protos.h
===
--- config/rl78/rl78-protos.h   (revision 255538)
+++ config/rl78/rl78-protos.h   (working copy)
@@ -23,6 +23,7 @@
 void   rl78_expand_compare (rtx *);
 void   rl78_expand_movsi (rtx *);
 void   rl78_split_movsi (rtx *, machine_mode);
+void   rl78_split_movdi (rtx *, enum machine_mode);
 intrl78_force_nonfar_2 (rtx *, rtx (*gen)(rtx,rtx));
 intrl78_force_nonfar_3 (rtx *, rtx (*gen)(rtx,rtx,rtx));
 void   rl78_expand_eh_epilogue (rtx);
Index: config/rl78/rl78.c
===
--- config/rl78/rl78.c  (revision 255538)
+++ config/rl78/rl78.c  (working copy)
@@ -596,6 +596,18 @@
 }
 }
 
+void
+rl78_split_movdi (rtx *operands, enum machine_mode omode)
+{
+rtx op00, op04, op10, op14;
+op00 = rl78_subreg (SImode, operands[0], omode, 0);
+op04 = rl78_subreg (SImode, operands[0], omode, 4);
+op10 = rl78_subreg (SImode, operands[1], omode, 0);
+op14 = rl78_subreg (SImode, operands[1], omode, 4);
+emit_insn (gen_movsi (op00, op10));
+emit_insn (gen_movsi (op04, op14));
+}
+
 /* Used by various two-operand expanders which cannot accept all
operands in the "far" namespace.  Force some such operands into
registers so that each pattern has at most one far operand.  */
Index: config/rl78/rl78.md
===
--- config/rl78/rl78.md (revision 255538)
+++ config/rl78/rl78.md (working copy)
@@ -718,3 +718,11 @@
   [(set_attr "valloc" "macax")
(set_attr "is_g13_muldiv_insn" "yes")]
 )
+
+(define_expand "movdi"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "")
+(match_operand:DI 1 "general_operand"  ""))]
+  ""
+  "rl78_split_movdi(operands, DImode);
+  DONE;"
+)



[PATCH] Fix broken capitalization in aarch64 diagnostics

2017-12-11 Thread Jakub Jelinek
Hi!

Diagnostics should not start with capital letters unless the first word is
capitalized that way even in the middle of a sentence.

Fixed thusly, ok for trunk?

2017-12-11  Jakub Jelinek  

* config/aarch64/aarch64.c (aarch64_print_operand): Don't start
output_operand_lossage first argument with capital letter.
(aarch64_override_options): Don't start error and sorry first argument
with capital letter.

--- gcc/config/aarch64/aarch64.c.jj 2017-12-08 00:50:28.0 +0100
+++ gcc/config/aarch64/aarch64.c2017-12-11 17:54:13.281833956 +0100
@@ -5258,7 +5258,7 @@ aarch64_print_operand (FILE *f, rtx x, i
  /* Fall through.  */
 
default:
- output_operand_lossage ("Unsupported operand for code '%c'", code);
+ output_operand_lossage ("unsupported operand for code '%c'", code);
}
   break;
 
@@ -9404,11 +9404,11 @@ aarch64_override_options (void)
   /* The compiler may have been configured with 2.23.* binutils, which does
  not have support for ILP32.  */
   if (TARGET_ILP32)
-error ("Assembler does not support -mabi=ilp32");
+error ("assembler does not support -mabi=ilp32");
 #endif
 
   if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32)
-sorry ("Return address signing is only supported for -mabi=lp64");
+sorry ("return address signing is only supported for -mabi=lp64");
 
   /* Make sure we properly set up the explicit options.  */
   if ((aarch64_cpu_string && valid_cpu)

Jakub


[AArch64] Fix ICEs in aarch64_print_operand_internal (PR target/83335)

2017-12-11 Thread Jakub Jelinek
Hi!

On Fri, Dec 08, 2017 at 08:10:08PM +0100, Christophe Lyon wrote:
> >> Can you check?
> >
> > I think that's a separate preexisting problem.  Could you file a PR?
> >
> 
> Sure, I filed:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83335
> 
> > Personally I'd just remove the assert, but I'm guessing that wouldn't
> > be acceptable...

So, I think either we can return false instead of dying on the assertion,
but then it will emit output_addr_const and often just silently emit it
without diagnosing it (first patch), or just call output_operand_lossage
there, which will ICE except for inline asm, where it will error.
It is true there is no code provided, but output_addr_const wouldn't
provide that either:
default:
  if (targetm.asm_out.output_addr_const_extra (file, x))
break;

  output_operand_lossage ("invalid expression as operand");
in final.c.

Jakub
2017-12-11  Jakub Jelinek  

PR target/83335
* config/aarch64/aarch64.c (aarch64_print_address_internal):
If x doesn't have Pmode, return false to let.

--- gcc/config/aarch64/aarch64.c.jj 2017-12-11 17:54:13.0 +0100
+++ gcc/config/aarch64/aarch64.c2017-12-11 17:57:41.245261299 +0100
@@ -5633,7 +5633,8 @@ aarch64_print_address_internal (FILE *f,
   struct aarch64_address_info addr;
 
   /* Check all addresses are Pmode - including ILP32.  */
-  gcc_assert (GET_MODE (x) == Pmode);
+  if (GET_MODE (x) != Pmode)
+return false;
 
   if (aarch64_classify_address (&addr, x, mode, op, true))
 switch (addr.type)
2017-12-11  Jakub Jelinek  

PR target/83335
* config/aarch64/aarch64.c (aarch64_print_address_internal):
If x doesn't have Pmode, call output_operand_lossage and return true.

* gcc.target/aarch64/asm-2.c: Expect error if ilp32.

--- gcc/config/aarch64/aarch64.c.jj 2017-12-11 17:54:13.0 +0100
+++ gcc/config/aarch64/aarch64.c2017-12-11 18:23:25.847181675 +0100
@@ -5633,7 +5633,11 @@ aarch64_print_address_internal (FILE *f,
   struct aarch64_address_info addr;
 
   /* Check all addresses are Pmode - including ILP32.  */
-  gcc_assert (GET_MODE (x) == Pmode);
+  if (GET_MODE (x) != Pmode)
+{
+  output_operand_lossage ("invalid expression as operand");
+  return true;
+}
 
   if (aarch64_classify_address (&addr, x, mode, op, true))
 switch (addr.type)
--- gcc/testsuite/gcc.target/aarch64/asm-2.c.jj 2017-12-08 00:50:23.0 
+0100
+++ gcc/testsuite/gcc.target/aarch64/asm-2.c2017-12-11 18:24:44.192215734 
+0100
@@ -6,5 +6,5 @@ int x;
 void
 f (void)
 {
-  asm volatile ("%a0" :: "X" (&x));
+  asm volatile ("%a0" :: "X" (&x));/* { dg-error "invalid" "" { target 
ilp32 } } */
 }


Re: [patch, fortran] Implement maxval for characters

2017-12-11 Thread Thomas Koenig

Christophe and James (and everybody else),

I have created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83379
and assigned it to myself. This should be easy to fix.

Regards

Thomas


Re: [PATCH] Fix broken capitalization in aarch64 diagnostics

2017-12-11 Thread Richard Biener
On December 11, 2017 6:29:07 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>Diagnostics should not start with capital letters unless the first word
>is
>capitalized that way even in the middle of a sentence.
>
>Fixed thusly, ok for trunk?

OK. 

Richard. 

>2017-12-11  Jakub Jelinek  
>
>   * config/aarch64/aarch64.c (aarch64_print_operand): Don't start
>   output_operand_lossage first argument with capital letter.
>   (aarch64_override_options): Don't start error and sorry first argument
>   with capital letter.
>
>--- gcc/config/aarch64/aarch64.c.jj2017-12-08 00:50:28.0 +0100
>+++ gcc/config/aarch64/aarch64.c   2017-12-11 17:54:13.281833956 +0100
>@@ -5258,7 +5258,7 @@ aarch64_print_operand (FILE *f, rtx x, i
> /* Fall through.  */
> 
>   default:
>-output_operand_lossage ("Unsupported operand for code '%c'", code);
>+output_operand_lossage ("unsupported operand for code '%c'", code);
>   }
>   break;
> 
>@@ -9404,11 +9404,11 @@ aarch64_override_options (void)
>/* The compiler may have been configured with 2.23.* binutils, which
>does
>  not have support for ILP32.  */
>   if (TARGET_ILP32)
>-error ("Assembler does not support -mabi=ilp32");
>+error ("assembler does not support -mabi=ilp32");
> #endif
> 
>   if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32)
>-sorry ("Return address signing is only supported for -mabi=lp64");
>+sorry ("return address signing is only supported for -mabi=lp64");
> 
>   /* Make sure we properly set up the explicit options.  */
>   if ((aarch64_cpu_string && valid_cpu)
>
>   Jakub



Re: [patch, fortran] Implement maxval for characters

2017-12-11 Thread Thomas Koenig



I have created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83379
and assigned it to myself. This should be easy to fix.


OK, I have updated the test cases in question.  They pass for
me at least.

I'll keep the PR open for a couple of days to make sure this
is really fixed.

Regards

Thomas


Re: [PATCH 3/3] diagnose attribute aligned conflicts (PR 81566)

2017-12-11 Thread Martin Sebor

On 12/09/2017 04:40 AM, Andreas Schwab wrote:

That requires updates to gcc.dg/pr53037-4.c and g++.dg/pr53037-4.C.


I don't see these failures in my own test result or in those
reported for common targets.

Would you mind providing some details about where this output came
from?

Martin


FAIL: gcc.dg/pr53037-4.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20171208/gcc/testsuite/gcc.dg/pr53037-4.c:9:1: error: 
alignment for 'foo2' must be at least 16

FAIL: g++.dg/pr53037-4.C  -std=gnu++98 (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20171208/gcc/testsuite/g++.dg/pr53037-4.C:9:1: error: 
alignment for 'void foo2()' must be at least 16

Andreas.





Re: [PATCH 3/3] diagnose attribute aligned conflicts (PR 81566)

2017-12-11 Thread Andreas Schwab
http://gcc.gnu.org/ml/gcc-testresults/2017-12/msg00672.html

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH][RFA][P1 PR tree-optimization/83298] Avoid over-optimistic result range for COND_EXPR

2017-12-11 Thread Jeff Law
On 12/08/2017 04:17 AM, Richard Biener wrote:
> On Fri, Dec 8, 2017 at 1:18 AM, Jeff Law  wrote:
>>
>> So the underlying issue here is quite simple.  Given something like
>>
>> x = (cond) ? res1 : res2;
>>
>> EVRP analysis will compute the resultant range using vrp_meet of the
>> ranges for res1 and res2.  Seems pretty natural.
>>
>> vrp_meet makes optimistic assumptions if either range is VR_UNDEFINED
>> and will set the resultant range to the range of the other operand.
>>
>> Some callers explicitly mention this is the desired behavior (PHI
>> processing).  Other callers avoid calling vrp_meet when one of the
>> ranges is VR_UNDEFINED and do something sensible
>> (extract_range_from_unary_expr, extract_range_from_binary_expr_1).
>>
>> extract_range_from_cond_expr neither mentions that it wants the
>> optimistic behavior nor does it avoid calling vrp_meet with a
>> VR_UNDEFINED range.  It naturally seems to fit in better with the other
>> extract_range_from_* routines.
>>
>> I'm not at all familiar with the ipa-cp bits, but from a quick look they
>> also seems to fall into the extract_* camp.
>>
>>
>> Anyway, normally in a domwalk the only place where we're going to see
>> VR_UNDEFINED would be in the PHI nodes.  It's one of the nice properties
>> of a domwalk :-)
>>
>> However, for jump threading we look past the dominance frontier;
>> furthermore, we do not currently record ranges for statements we process
>> as part of the jump threading.  But we do try to extract the range each
>> statement generates -- we're primarily looking for cases where the
>> statement generates a singleton range.
>>
>> While the plan does include recording ranges as we look past the
>> dominance frontier, I strongly believe some serious code cleanup in DOM
>> and jump threading needs to happen first.  So I don't want to go down
>> that path for gcc-8.
>>
>> So we're kind-of stuck with the fact that we might query for a resultant
>> range when one or more input operands may not have recorded range
>> information.  Thankfully that's easily resolved by making
>> extract_range_from_cond_expr work like the other range extraction
>> routines and avoid calling vrp_meet when one or more operands is
>> VR_UNDEFINED.
>>
>> Bootstrapped and regression tested on x86_64.  OK for the trunk?
> 
> But that does regress the case of either arm being an uninitialized
> variable.
> 
> I'm not convinced that when you look forward past the dominance frontier
> and do VRP analysis on stmts without analyzing all intermediate
> stmts on the path (or at least push all defs on that path temporarily
> to VR_VARYING) is fixed by this patch.  It merely looks like a wrong
> workaround for a fundamental issue in how DOM now uses the
> interface?I'm going back and pulling together the bits to fix this in a more
consistent way.  Specifically, recording ranges as we process statements
outside the dominance frontier so we don't ever see a VR_UNDEFINED range.

It's not bad as long as I resist the urge to pull in cleanups along the
way :-)

The only thing really painful is that normally when we generate a range
for a statement's output, we can just update the VR and reflect the
updated range into the global table -- that range is globally
applicable.There's no need to push stuff onto the unwind stack or
the like.  The update_value_range API is sufficient here as NEW_VR is
just a temporary -- we copy the relevant bits from NEW_VR into the
actual tables.


When threading we must add entries to the unwinding stack.  Worse yet,
we can't use the existing VR because it's a stack local and obviously
does not persist.  We have to allocate a new object and copy from the
stack temporary into the new object.  Ugh.

We don't see any of this complexity with the other tables (like
const_and_copies) because we always update the global state and always
unwind it.  The VRP bits are a bit more efficient because they don't
bother with unwinding entries in cases where the result is globally
applicable.

Jeff
jeff


Re: [RFA][PATCH] 8/n Pull evrp range analyzer into its own file

2017-12-11 Thread Jeff Law
On 12/09/2017 02:19 PM, Gerald Pfeifer wrote:
> On Wed, 22 Nov 2017, Jeff Law wrote:
 * gimple-ssa-evrp-analyze.c: New file pulled from gimple-ssa-evrp.c.
>>>
>>> With the move to C++, wasn't there a policy to name new files *.cc
>>> instead of *.c?
>> I'm happy to use .cc if that's where we want to go.  It's trivial 
>> and we're not losing any significant history.
> 
> Unlike CVS (still remember the days? ;-), SVN features a mv command,
> so renames generally should not lose any history?
Hell, I remember RCS and SCCS :-)

I actually use GIT almost exclusively these days...

jeff


Re: [PATCH][RFA][P1 PR tree-optimization/83298] Avoid over-optimistic result range for COND_EXPR

2017-12-11 Thread Jeff Law
On 12/08/2017 04:17 AM, Richard Biener wrote:
> 
> I'm not convinced that when you look forward past the dominance frontier
> and do VRP analysis on stmts without analyzing all intermediate
> stmts on the path (or at least push all defs on that path temporarily
> to VR_VARYING) is fixed by this patch.  It merely looks like a wrong
> workaround for a fundamental issue in how DOM now uses the
> interface?
So here's the bits to record ranges (with unwind entries so we can roll
them back) as we process beyond the dominance frontier.  This was always
in the plan, but I wanted to do some cleanups prior to adding this
capability.

I've stayed away from doing any of the cleanups at this time.

At a high level we break out routines to push a marker and pop to a
marker on the unwinding stack.  We then define the enter/leave in terms
of those new routines.  This allows us to push/pop scopes as we process
thread paths which don't want the same processing we see in the enter
method.

We add a boolean to record_ranges_from_stmt to indicate if any generated
range is of a temporary nature   The pre-existing calls all pass false
here to indicate the range is global.  WHen true (from threading) we
generate the necessary unwind entries and avoid changing any global state.

push_value_range becomes a public member function so it can be used when
threading to record a temporary range created by a PHI.  It's not
usually necessary, but could be for cases where we're unable to
propagate the PHI equivalences.

--


We pass in the evrp_range_analyzer instance from DOM into the threader.
>From VRP we just pass in a NULL pointer as VRP doesn't use the EVRP
analyzer and we have to check it in various places.  This is one of the
many cleanups that will occur as we drop threading from tree-vrp.c.

We pass that down to a couple functions.  Nothing significant.

Then it's just a matter of recording something for PHIs and statements
we encounter -- ensuring that in both cases we create unwind entries.
That obviously fixes 83298 and it's duplicate (testcases for both are
included).  It probably enables more jump threading as well, but I
didn't specifically look for cases where that happened.

Bootstrapped and regression tested on x86.

OK for the trunk?


Jeff


* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Make
push_value_range a public interface.  Add new argument to
record_ranges_from_stmt.
* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::record_ranges_from_stmt): Add new argument.
Update comments.  Handle recording temporary equivalences.
* tree-ssa-dom.c (dom_opt_opt_walker::before_dom_children): Add
new argument to call to evrp_range_analyzer::record_ranges_from_stmt.
* gimple-ssa-evrp.c (evrp_dom_walker::before_dom_children): Likewise.
* tree-ssa-threadedge.c: Include alloc-pool.h, vr-values.h and
gimple-ssa-evrp-analyze.h.
(record_temporary_equivalences_from_phis): Add new argument.  When
the PHI arg is an SSA_NAME, set the result's range to the range
of the PHI arg.
(record_temporary_equivalences_from_stmts_at_dest): Record ranges
from statements too.
(thread_through_normal_block): Accept new argument, evrp_range_analyzer.
Pass it down to children as needed.
(thread_outgoing_edges): Likewise.
(thread_across_edge): Likewise.   Push/pop range state as needed.
* tree-ssa-threadedge.h (thread_outgoing_edges): Update prototype.

* gcc.c-torture/execute/pr83298.c: New test.
* gcc.c-torture/execute/pr83362.c New test.

diff --git a/gcc/gimple-ssa-evrp-analyze.h b/gcc/gimple-ssa-evrp-analyze.h
index 4783e6f772e..3968cfd805a 100644
--- a/gcc/gimple-ssa-evrp-analyze.h
+++ b/gcc/gimple-ssa-evrp-analyze.h
@@ -31,13 +31,18 @@ class evrp_range_analyzer
   }
 
   void enter (basic_block);
+  void push_marker (void);
+  void pop_to_marker (void);
   void leave (basic_block);
-  void record_ranges_from_stmt (gimple *);
+  void record_ranges_from_stmt (gimple *, bool);
 
   /* Main interface to retrieve range information.  */
   value_range *get_value_range (const_tree op)
 { return vr_values->get_value_range (op); }
 
+  /* Record a new unwindable range.  */
+  void push_value_range (tree var, value_range *vr);
+
   /* Dump all the current value ranges.  This is primarily
  a debugging interface.  */
   void dump_all_value_ranges (FILE *fp)
@@ -57,7 +62,6 @@ class evrp_range_analyzer
   DISABLE_COPY_AND_ASSIGN (evrp_range_analyzer);
   class vr_values *vr_values;
 
-  void push_value_range (tree var, value_range *vr);
   value_range *pop_value_range (tree var);
   value_range *try_find_new_range (tree, tree op, tree_code code, tree limit);
   void record_ranges_from_incoming_edge (basic_block);
diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
index fb3d3297a78..8e9881b6964 100644
--- a/gcc/gimple-ssa-evrp-analyze.c
+++ b/g

Re: [PR81165] discount killed stmts when sizing blocks for threading

2017-12-11 Thread Jeff Law
On 12/07/2017 05:04 AM, Alexandre Oliva wrote:
> We limit the amount of copying for jump threading based on counting
> stmts.  This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed.  Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
> 
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
> 
> Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>   * tree-ssa-threadedge.c (uses_in_bb): New.
>   (estimate_threading_killed_stmts): New.
>   (estimate_threading_killed_stmts): New overload.
>   (record_temporary_equivalences_from_stmts_at_dest): Add path
>   parameter; adjust caller.  Expand limit when it's hit.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/pr81165.c: New.
> ---
>  gcc/testsuite/gcc.dg/pr81165.c |   59 
>  gcc/tree-ssa-threadedge.c  |  189 
> +++-
>  2 files changed, 245 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr81165.c
> 
> diff --git a/gcc/testsuite/gcc.dg/pr81165.c b/gcc/testsuite/gcc.dg/pr81165.c
> new file mode 100644
> index ..8508d893bed6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr81165.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not " \[/%\] " "optimized" } } */
> +
> +/* Testcase submitted for PR81165, with its main function removed as
> +   it's turned into a compile test.  We want to make sure that all of
> +   the divide/remainder computations are removed by tree optimizers.
> +
> +   We can figure out that we don't need to compute at runtime even the
> +   condition to enter the loop: the initial i==0 would have to be
> +   greater than the sum of two small unsigned values: 1U>>t1 is in the
> +   range 0..1, whereas the char value is bounded by the range 0..127,
> +   being 128 % a positive number (zero would invoke undefined
> +   behavior, so we can assume it doesn't happen).  (We know it's
> +   nonnegative because it's 10 times a number that has no more than
> +   the bits for 16, 8 and 1 set.)
> +
> +   We don't realize that the loop is useless right away: jump
> +   threading helps remove some of the complexity, particularly of the
> +   computation within the loop: t1 is compared with 1, but it can
> +   never be 1.  (We could assume as much, since its being 1 would
> +   divide by zero, but we don't.)
> +
> +   If we don't enter the conditional block, t1 remains at 2; if we do,
> +   it's set to either -1.  If we jump thread at the end of the
> +   conditional block, we can figure out the ranges exclude 1 and the
> +   jump body is completely optimized out.  However, we used to fail to
> +   consider the block for jump threading due to the amount of
> +   computation in it, without realizing most of it would die in
> +   consequence of the threading.
> +
> +   We now take the dying code into account when deciding whether or
> +   not to try jump threading.  That might enable us to optimize the
> +   function into { if (x2 != 0 || (x1 & 1) == 0) abort (); }.  At the
> +   time of this writing, with the patch, we get close, but the test on
> +   x2 only gets as far as ((1 >> x2) == 0).  Without the patch, some
> +   of the loop remains.  */
> +
> +short x0 = 15;
> +
> +void func (){
> +  volatile int x1 = 1U;
> +  volatile char x2 = 0;
> +  char t0 = 0;
> +  unsigned long t1 = 2LU;
> +  int i = 0;
> +  
> +  if(1>>x2) {
> +t0 = -1;
> +t1 = (1&(short)(x1^8U))-1;
> +  }
> +
> +  while(i > (int)((1U>>t1)+(char)(128%(10*(25LU&(29%x0)) {
> +i += (int)(12L/(1!=(int)t1));
> +  }
> +
> +  if (t0 != -1) __builtin_abort();
> +  if (t1 != 0L) __builtin_abort();
> +}
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 536c4717b725..25ccac2a3ecc 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c

> +
> +/* Starting from the final control flow stmt in BB, assuming it will
> +   be removed, follow uses in to-be-removed stmts back to their defs
> +   and count how many defs are to become dead and be removed as
> +   well.  */
> +
> +static int
> +estimate_threading_killed_stmts (basic_block bb)
> +{
> +  int killed_stmts = 0;
> +  hash_map ssa_remaining_uses;
> +  auto_vec dead_worklist;
> +
> +  /* If the block has only two predecessors, threading will turn phi
> + dsts into either src, so count them as dead stmts.  */
> +  bool drop_all_phis = EDGE_COUNT (bb->preds) == 2;
> +
> +  if (drop_all_phis)
> +for (gphi_iterator gsi = gsi_start_phis (bb);
> +  !gsi_

[patch, fortran, doc, committed] Update description of MINLOC and MAXLOC

2017-12-11 Thread Thomas Koenig

Hi,

I have just committed the attached doc patch as obvious after "make dvi"
and "make pdf".

2017-12-11  Thomas Koenig  

* intrinsic.texi (MAXLOC): Update documentation for
character arrays and KIND argument.
(MINLOC): Likewise.

Regards

Thomas
Index: intrinsic.texi
===
--- intrinsic.texi	(Revision 255545)
+++ intrinsic.texi	(Arbeitskopie)
@@ -9994,10 +9994,13 @@ that of the first such element in array element or
 zero size, or all of the elements of @var{MASK} are @code{.FALSE.}, then
 the result is an array of zeroes.  Similarly, if @var{DIM} is supplied
 and all of the elements of @var{MASK} along a given row are zero, the
-result value for that row is zero.
+result value for that row is zero. If the optional argument @var{KIND}
+is present, the result is an integer of kind @var{KIND}, otherwise it is of
+default kind.
 
 @item @emph{Standard}:
-Fortran 95 and later
+Fortran 95 and later; @var{ARRAY} of @code{CHARACTER} and the
+@var{KIND} argument are available in Fortran 2003 and later.
 
 @item @emph{Class}:
 Transformational function
@@ -10004,8 +10007,8 @@ Transformational function
 
 @item @emph{Syntax}:
 @multitable @columnfractions .80
-@item @code{RESULT = MAXLOC(ARRAY, DIM [, MASK])}
-@item @code{RESULT = MAXLOC(ARRAY [, MASK])}
+@item @code{RESULT = MAXLOC(ARRAY, DIM [, MASK] [,KIND])}
+@item @code{RESULT = MAXLOC(ARRAY [, MASK] [,KIND])}
 @end multitable
 
 @item @emph{Arguments}:
@@ -10017,6 +10020,8 @@ Transformational function
 inclusive.  It may not be an optional dummy argument.
 @item @var{MASK}  @tab Shall be an array of type @code{LOGICAL},
 and conformable with @var{ARRAY}.
+@item @var{KIND} @tab (Optional) An @code{INTEGER} initialization
+expression indicating the kind parameter of the result.
 @end multitable
 
 @item @emph{Return value}:
@@ -10342,10 +10347,13 @@ that of the first such element in array element or
 zero size, or all of the elements of @var{MASK} are @code{.FALSE.}, then
 the result is an array of zeroes.  Similarly, if @var{DIM} is supplied
 and all of the elements of @var{MASK} along a given row are zero, the
-result value for that row is zero.
+result value for that row is zero. If the optional argument @var{KIND}
+is present, the result is an integer of kind @var{KIND}, otherwise it is of
+default kind.
 
 @item @emph{Standard}:
-Fortran 95 and later
+Fortran 95 and later; @var{ARRAY} of @code{CHARACTER} and the
+@var{KIND} argument are available in Fortran 2003 and later.
 
 @item @emph{Class}:
 Transformational function
@@ -10352,19 +10360,21 @@ Transformational function
 
 @item @emph{Syntax}:
 @multitable @columnfractions .80
-@item @code{RESULT = MINLOC(ARRAY, DIM [, MASK])}
-@item @code{RESULT = MINLOC(ARRAY [, MASK])}
+@item @code{RESULT = MINLOC(ARRAY, DIM [, MASK] [,KIND])}
+@item @code{RESULT = MINLOC(ARRAY [, MASK], [,KIND])}
 @end multitable
 
 @item @emph{Arguments}:
 @multitable @columnfractions .15 .70
-@item @var{ARRAY} @tab Shall be an array of type @code{INTEGER} or
-@code{REAL}.
+@item @var{ARRAY} @tab Shall be an array of type @code{INTEGER},
+@code{REAL} or @code{CHARACTER}.
 @item @var{DIM}   @tab (Optional) Shall be a scalar of type
 @code{INTEGER}, with a value between one and the rank of @var{ARRAY},
 inclusive.  It may not be an optional dummy argument.
 @item @var{MASK}  @tab Shall be an array of type @code{LOGICAL},
 and conformable with @var{ARRAY}.
+@item @var{KIND} @tab (Optional) An @code{INTEGER} initialization
+expression indicating the kind parameter of the result.
 @end multitable
 
 @item @emph{Return value}:


[PATCH] testsuite: add coverage for diagnostics relating to inlining (PR tree-optimization/83336)

2017-12-11 Thread David Malcolm
In theory, the diagnostics subsystem can print context information on
code inlining when diagnostics are emitted by the middle-end, describing
the chain of inlined callsites that led to a particular warning,
but PR tree-optimization/83336 describes various issues with this.

An underlying issue is that we have very little automated testing for
this code: gcc.dg/tm/pr52141.c has a test, but in general, prune.exp
filters out the various "inlined from" lines.

The following patch adds test coverage for it for C and C++ via a new
testsuite plugin, which emits a warning from the middle-end; the test
cases use dg-regexp to verify that the "inlined from" lines are
emitted correctly, with the correct function names and source locations.

Doing so requires a change to prune.exp: the dg-regexp lines have to
be handled *before* the "inlined from" lines are stripped.

(I have various followups, but establishing automated test coverage
seems like an important first step)

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; adds
7 PASS results to g++.sum and 31 PASS results to gcc.sum.

OK for trunk?

gcc/testsuite/ChangeLog:
PR tree-optimization/83336
* g++.dg/cpp0x/missing-initializer_list-include.C: Update for
changes to prune.exp's handling of dg-regexp.
* g++.dg/plugin/diagnostic-test-inlining-1.C: New test case.
* g++.dg/plugin/plugin.exp (plugin_test_list): Add it, via
gcc.dg's plugin/diagnostic_plugin_test_inlining.c.
* gcc.dg/plugin/diagnostic-test-inlining-1.c: New test case.
* gcc.dg/plugin/diagnostic-test-inlining-2.c: Likewise.
* gcc.dg/plugin/diagnostic-test-inlining-3.c: Likewise.
* gcc.dg/plugin/diagnostic-test-inlining-4.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_inlining.c: New test
plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add them.
* lib/prune.exp (prune_gcc_output): Move call to handle-dg-regexps
to before the various text stripping regsup invocations,
in particular, to before the stripping of "inlined from".
---
 .../cpp0x/missing-initializer_list-include.C   |   1 +
 .../g++.dg/plugin/diagnostic-test-inlining-1.C |  34 
 gcc/testsuite/g++.dg/plugin/plugin.exp |   2 +
 .../gcc.dg/plugin/diagnostic-test-inlining-1.c |  34 
 .../gcc.dg/plugin/diagnostic-test-inlining-2.c |  48 ++
 .../gcc.dg/plugin/diagnostic-test-inlining-3.c |  43 +
 .../gcc.dg/plugin/diagnostic-test-inlining-4.c |  56 +++
 .../plugin/diagnostic_plugin_test_inlining.c   | 180 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp |   5 +
 gcc/testsuite/lib/prune.exp|   6 +-
 10 files changed, 406 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-2.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-3.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-4.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_inlining.c

diff --git a/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C 
b/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C
index 7d72ec4..1010b0a 100644
--- a/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C
+++ b/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C
@@ -24,5 +24,6 @@ void test (int i)
 +#include 
  /* This is padding (to avoid the generated patch containing DejaGnu
 directives).  */
+ 
 { dg-end-multiline-output "" }
 #endif
diff --git a/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C 
b/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C
new file mode 100644
index 000..df7bb1f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-Wno-attributes -fdiagnostics-show-caret" } */
+
+extern void __emit_warning (const char *message);
+
+/* Verify that the diagnostic subsytem describes the chain of inlining
+   when reporting the warning.  */
+
+__attribute__((always_inline))
+static void foo (void)
+{
+  __emit_warning ("message");
+}
+
+__attribute__((always_inline))
+static void bar (void)
+{
+  foo ();
+}
+
+int main()
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-regexp "In function 'void foo\\(\\)'," "" } */
+/* { dg-regexp "inlined from 'void bar\\(\\)' at 
.+/diagnostic-test-inlining-1.C:18:7," "" } */
+/* { dg-regexp "inlined from 'int main\\(\\)' at 
.+/diagnostic-test-inlining-1.C:23:7:" "" } */
+/* { dg-warning "18: message" "" { target *-*-* } 12 } */
+/* { dg-begin-multiline-output "" }
+   __emit_warning ("message");
+   ~~~^~~
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/g++.dg/plugin/p

Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-12-11 Thread Michael Meissner
On Fri, Dec 01, 2017 at 05:53:55PM -0600, Segher Boessenkool wrote:
> On Fri, Dec 01, 2017 at 12:40:22AM -0500, Michael Meissner wrote:
> > After committing the previous patch, I noticed that it was now generating
> > warnings for __{mul,div}kc3_{sw,hw} not having a prototype that I hadn't
> > noticed during development of the patch.  This is due to the fact that 
> > before I
> > added the ifunc support, it was only compiling __{mul,div}kc3, and those 
> > have
> > built-in declarations.  I installed this patch as being obvious:
> > 
> > 2017-11-30  Michael Meissner  
> > 
> > * config/rs6000/_mulkc3.c (__mulkc3): Add forward declaration.
> > * config/rs6000/_divkc3.c (__divkc3): Likewise.
> > 
> > Index: libgcc/config/rs6000/_divkc3.c
> > ===
> > --- libgcc/config/rs6000/_divkc3.c  (revision 255288)
> > +++ libgcc/config/rs6000/_divkc3.c  (working copy)
> > @@ -37,6 +37,8 @@ typedef __complex float KCtype __attribu
> >  #define __divkc3 __divkc3_sw
> >  #endif
> >  
> > +extern KCtype __divkc3 (KFtype, KFtype, KFtype, KFtype);
> > +
> >  KCtype
> >  __divkc3 (KFtype a, KFtype b, KFtype c, KFtype d)
> >  {
> 
> How does this warn?  -Wmissing-declarations?  Should this declaration be
> in a header then?

The compiler creates the call to __mulkc3 and __divkc3, and internally it has
the appropriate prototype like it does for all built-in functions (in this
case, returning an _Float128 _Complex type, and taking 4 _Float128 arguments).

So before adding ifunc support, we never noticed it didn't have a prototype,
because the compiler already has a prototype.

With ifunc support, we now need to create two separate functions, __mulkc3_sw
and __mulkc3_hw, and make __multkc3 the ifunc resolver.

So there really isn't an include file that is appropriate to put the
definitions in.  I could change it to use the soft-fp includes (including
quadmath-float128.h) if desired.

Did you want me to do that?

> A code comment explaining why you do a declaration for exactly the same
> thing as there is two lines later would help; otherwise people will try
> to delete it again :-)

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-11 Thread Michele Pezzutti

I apologize as I am unable to run the test suitein its full form.

Nevertheless, I locally tested the following patch for the test suite:

*26_numerics/random/poisson_distribution/operators/values.cc
    Add additional test to cover bin 'floor(mu) + 1'

by compiling locally values.cc and running it as main().


I had to increase N from default value N = 10 to N = 400 to 
trigger the fault.
I think that for smaller N, the deviation due to the bug is well within 
the confidence

interval set in testDiscreteDist().


The test case below is successful with the proposed patch.


diff --git a/values.cc b/values.cc
index 0039b7d..e12e54f 100644
--- a/values.cc
+++ b/values.cc
@@ -42,6 +42,10 @@ void test01()
   std::poisson_distribution<> pd3(30.0);
   auto bpd3 = std::bind(pd3, eng);
   testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } );
+
+  std::poisson_distribution<> pd4(37.17);
+  auto bpd4 = std::bind(pd4, eng);
+  testDiscreteDist<100, 400>(bpd4, [](int n) { return 
poisson_pdf(n, 37.17); } );

 }



On 12/11/2017 10:52 AM, Paolo Carlini wrote:

Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:

Hi.

This patch intends to fix Bug 83237 - Values returned by 
std::poisson_distribution are not distributed correctly.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue 
description and tests.
In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be pretty 
easy to add one, because all the infrastructure is already in place, 
essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.


Paolo.




Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-11 Thread David Malcolm
On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:
> Bug 83369 - Missing diagnostics during inlining, notes that when
> -Wnonnull is issued for an inlined call to a built-in function,
> GCC doesn't print the inlining stack, making it hard to debug
> where the problem comes from.
> 
> When the -Wnonnull warning was introduced into the middle-end
> the diagnostic machinery provided no way to print the inlining
> stack (analogous to %K for trees).  Since then GCC has gained
> support for the %G directive which does just that.  The attached
> patch makes use of the directive to print the inlining context
> for -Wnonnull.
> 
> The patch doesn't include a test because the DejaGnu framework
> provides no mechanism to validate this part of GCC output (see
> also bug 83336).
> 
> Tested on x86_64-linux with no regressions.
> 
> Martin

I'm wondering if we should eliminate %K and %G altogether, and make
tree-diagnostic.c and friends automatically print the inlining stack
-they just need a location_t (the issue is with system headers, I
suppose, but maybe we can just make that smarter: perhaps only suppress
if every location in the chain is in a system header?).  I wonder if
that would be GCC 9 material at this point though?

Coming back to this patch: regarding tests, would you be able to use
the techniques of:
  https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00646.html
to build a test case?

Dave


[PATCH] Handle LOOP_DIST_ALIAS ifns in move_sese_region_to_fn (PR tree-optimization/83359)

2017-12-11 Thread Jakub Jelinek
Hi!

Unlike LOOP_VECTORIZED ifns, LOOP_DIST_ALIAS is added by the ldist pass
and needs to be maintained until the vectorizer, and parloops in between
that.  Earlier I've added code to update or drop orig_loop_num during
move_sese_region_to_fn, but that is not sufficient.  If we move
the whole pair of loops with the associated LOOP_DIST_ALIAS call into
the outlined loopfn, we need to update the first argument, as orig_loop_num
is likely changing.  If the whole triplet (two loops with orig_loop_num
and LOOP_DIST_ALIAS with the same first argument) stays in parent function,
we don't need to adjust it.  In all other cases, this patch folds the
LOOP_DIST_ALIAS ifn to the second argument, like the vectorizer does if
it fails to vectorize it.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
bootstrapped on powerpc64-linux, regtest there pending.  Ok for trunk?

2017-12-11  Jakub Jelinek  

PR tree-optimization/83359
* tree-cfg.h (fold_loop_internal_call): Declare.
* tree-vectorizer.c (fold_loop_internal_call): Moved to ...
* tree-cfg.c (fold_loop_internal_call): ... here.  No longer static.
(find_loop_dist_alias): New function.
(move_sese_region_to_fn): If any dloop->orig_loop_num value is
updated, also adjust any corresponding LOOP_DIST_ALIAS internal
calls.

* gcc.dg/graphite/pr83359.c: New test.

--- gcc/tree-cfg.h.jj   2017-09-05 23:28:14.0 +0200
+++ gcc/tree-cfg.h  2017-12-11 12:35:24.284777550 +0100
@@ -77,6 +77,7 @@ extern void gather_blocks_in_sese_region
  vec *bbs_p);
 extern void verify_sese (basic_block, basic_block, vec *);
 extern bool gather_ssa_name_hash_map_from (tree const &, tree const &, void *);
+extern void fold_loop_internal_call (gimple *, tree);
 extern basic_block move_sese_region_to_fn (struct function *, basic_block,
   basic_block, tree);
 extern void dump_function_to_file (tree, FILE *, dump_flags_t);
--- gcc/tree-vectorizer.c.jj2017-09-01 09:26:37.0 +0200
+++ gcc/tree-vectorizer.c   2017-12-11 12:33:41.436055580 +0100
@@ -464,27 +464,6 @@ vect_loop_vectorized_call (struct loop *
   return NULL;
 }
 
-/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS
-   to VALUE and update any immediate uses of it's LHS.  */
-
-static void
-fold_loop_internal_call (gimple *g, tree value)
-{
-  tree lhs = gimple_call_lhs (g);
-  use_operand_p use_p;
-  imm_use_iterator iter;
-  gimple *use_stmt;
-  gimple_stmt_iterator gsi = gsi_for_stmt (g);
-
-  update_call_from_tree (&gsi, value);
-  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
-{
-  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
-   SET_USE (use_p, value);
-  update_stmt (use_stmt);
-}
-}
-
 /* If LOOP has been versioned during loop distribution, return the gurading
internal call.  */
 
--- gcc/tree-cfg.c.jj   2017-12-07 18:05:30.0 +0100
+++ gcc/tree-cfg.c  2017-12-11 12:34:55.054140750 +0100
@@ -7337,6 +7337,47 @@ gather_ssa_name_hash_map_from (tree cons
   return true;
 }
 
+/* Return LOOP_DIST_ALIAS call if present in BB.  */
+
+static gimple *
+find_loop_dist_alias (basic_block bb)
+{
+  gimple *g = last_stmt (bb);
+  if (g == NULL || gimple_code (g) != GIMPLE_COND)
+return NULL;
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (g);
+  gsi_prev (&gsi);
+  if (gsi_end_p (gsi))
+return NULL;
+
+  g = gsi_stmt (gsi);
+  if (gimple_call_internal_p (g, IFN_LOOP_DIST_ALIAS))
+return g;
+  return NULL;
+}
+
+/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS
+   to VALUE and update any immediate uses of it's LHS.  */
+
+void
+fold_loop_internal_call (gimple *g, tree value)
+{
+  tree lhs = gimple_call_lhs (g);
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple *use_stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (g);
+
+  update_call_from_tree (&gsi, value);
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+{
+  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+   SET_USE (use_p, value);
+  update_stmt (use_stmt);
+}
+}
+
 /* Move a single-entry, single-exit region delimited by ENTRY_BB and
EXIT_BB to function DEST_CFUN.  The whole region is replaced by a
single basic block in the original CFG and the new basic block is
@@ -7510,7 +7551,6 @@ move_sese_region_to_fn (struct function
  }
 }
 
-
   /* Adjust the number of blocks in the tree root of the outlined part.  */
   get_loop (dest_cfun, 0)->num_nodes = bbs.length () + 2;
 
@@ -7521,19 +7561,77 @@ move_sese_region_to_fn (struct function
   /* Fix up orig_loop_num.  If the block referenced in it has been moved
  to dest_cfun, update orig_loop_num field, otherwise clear it.  */
   struct loop *dloop;
+  signed char *moved_orig_loop_num = NULL;
   FOR_EACH_LOOP_FN (dest_cfun, dloop, 0)
 if (dloop->orig_loop_num)
   {
+   if (moved_orig_loop_num == NULL)
+ mov

Re: [PATCH] ifcvt: Call fixup_partitions (PR83361)

2017-12-11 Thread Jeff Law
On 12/11/2017 08:49 AM, Segher Boessenkool wrote:
> After converting a conditional branch to an unconditional trap to a
> conditional trap, if the original trap is still reachable from another
> path, it may be that it is in a hot basic block  and only reachable from
> cold blocks.  Fix that up.
> 
> This fixes PR83361.  Bootstrapping on powerpc64-linux {-m32,-m64}; okay
> for trunk if it succeeds?
> 
> 
> Segher
> 
> 
> 2017-12-11  Segher Boessenkool  
> 
>   PR rtl-optimization/83361
>   * ifcvt.c (if_convert): Call fixup_partitions.
> 
> gcc/testsuite/
>   PR rtl-optimization/83361
>   * gcc.dg/pr83361.c: New testcase.
OK.
jeff




Re: Ping ^2 [PATCH], Add rounding built-ins to the _Float and _FloatX built-in functions

2017-12-11 Thread Michael Meissner
On Fri, Oct 27, 2017 at 06:39:21PM -0400, Michael Meissner wrote:
> The power9 (running PowerPC ISA 3.0) has a round to integer instruction
> (XSRQPI) that does various flavors of round an IEEE 128-bit floating point to
> integeral values.  This patch adds the support to the machine independent
> portion of the compiler, and adds the necessary support for ceilf128,
> roundf128, truncf128, and roundf128 to the PowerPC backend when you use
> -mcpu=power9.
> 
> I have done bootstrap builds on both x86-64 and a little endian power8 system.
> Can I install these patches to the trunk?
> 
> [gcc]
> 2017-10-27  Michael Meissner  
> 
>   * builtins.def: (_Float and _FloatX BUILT_IN_CEIL): Add
>   _Float and _FloatX variants for rounding built-in
>   functions.
>   (_Float and _FloatX BUILT_IN_FLOOR): Likewise.
>   (_Float and _FloatX BUILT_IN_NEARBYINT): Likewise.
>   (_Float and _FloatX BUILT_IN_RINT): Likewise.
>   (_Float and _FloatX BUILT_IN_ROUND): Likewise.
>   (_Float and _FloatX BUILT_IN_TRUNC): Likewise.
>   * builtins.c (mathfn_built_in_2): Likewise.
>   * internal-fn.def (CEIL): Likewise.
>   (FLOOR): Likewise.
>   (NEARBYINT): Likewise.
>   (RINT): Likewise.
>   (ROUND): Likewise.
>   (TRUNC): Likewise.
>   * fold-const.c (tree_call_nonnegative_warnv_p): Likewise.
>   (integer_valued_real_call_p): Likewise.
>   * fold-const-call.c (fold_const_call_ss): Likewise.
>   * config/rs6000/rs6000.md (floor2): Add support for IEEE
>   128-bit round to integer instructions.
>   (ceil2): Likewise.
>   (btrunc2): Likewise.
>   (round2): Likewise.
> 
> [gcc/c]
> 2017-10-27  Michael Meissner  
> 
>   * c-decl.c (header_for_builtin_fn): Add integer rounding _Float
>   and _FloatX built-in functions.
> 
> [gcc/testsuite]
> 2017-10-27  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-hw2.c: Add tests for ceilf128,
>   floorf128, truncf128, and roundf128.


Originally posted as:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01421.html

I posted this in October and ping'ed it the first time in November.  Could a
global or gimple maintainer look at the patch and either approve it or tell me
what I need to do to improve it?  In theory it should be similar to my
previoius patch to add square root, fma, and absolute value _Float and
_FloatX support to the infrastructure.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Allow USE in PARALLELs in store_data_bypass_p (take 2)

2017-12-11 Thread Eric Botcazou
> The old code was inconsistent, had return false; in one case and assert in
> the remaining two spots.  If you are not against it, I'd use return false;
> in both cases if we want consistency.

Sure, thanks.

-- 
Eric Botcazou


[PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631)

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek  writes:
> > Of course it can be done efficiently, what we care most is that the body of
> > the vectorized loop is efficient.
> 
> That's fair, I was looking at the x86 assembly being generated when a single
> vectorized iteration was enough (because that is the context in which I
> first encountered this bug):
> 
> int f(unsigned int *x, unsigned int k) {
>   unsigned int result = 8;
>   for (unsigned int i = 0; i < 8; i++) {
> if (x[i] == k) result = i;
>   }
>   return result;
> }
> 
> where the vpand instruction this generates would have to be replaced
> with a variable blend if the default value weren't 0 — although I had
> not realized even SSE4.1 on x86 includes such an instruction, making
> this point less relevant.

So, here is my version of the patch, independent from your change.
As I said, your patch is still highly valueable if it will be another
STMT_VINFO_VEC_REDUCTION_TYPE kind to be used for the cases like the
above testcase, where base is equal to TYPE_MIN_VALUE, or future improvement
of base being variable, but TYPE_OVERFLOW_UNDEFINED iterator, where all we
need is that the maximum number of iterations is smaller than the maximum
of the type we use for the reduction phi.

The patch handles also negative steps, though for now only on signed type
(for unsigned it can't be really negative, but perhaps we could treat
unsigned values with the msb set as if they were negative and consider
overflows in that direction).

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
bootstrapped on powerpc64-linux, regtest there ongoing.  Ok for trunk?

The patch prefers to emit what we were emitting if possible (i.e. zero
value for the COND_EXPR never hit) - building a zero vector is usually
cheaper than any other; if that is not possible, checks if initial_def
can be used for that value - then we can avoid the
res == induc_val ? initial_def : res;
conditional move; if even that is not possible, attempts to use any other
value.  If no value can be found, it for now uses COND_REDUCTION, which is
more expensive, but correct.

2017-12-11  Jakub Jelinek  

PR tree-optimization/80631
* tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
(vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
hardcoding zero as the value if COND_EXPR is never true.  For
INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
hardcoding MAX_EXPR as the reduction operation.
(is_nonwrapping_integer_induction): Allow negative step.
(vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
vect_create_epilog_for_reduction, if no value is suitable, don't
use INTEGER_INDUC_COND_REDUCTION for now.  Formatting fixes.

* gcc.dg/vect/pr80631-1.c: New test.
* gcc.dg/vect/pr80631-2.c: New test.
* gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
vectorization.

--- gcc/tree-vect-loop.c.jj 2017-12-11 14:57:38.0 +0100
+++ gcc/tree-vect-loop.c2017-12-11 16:59:06.930720928 +0100
@@ -4034,7 +4034,7 @@ get_initial_def_for_reduction (gimple *s
 case MULT_EXPR:
 case BIT_AND_EXPR:
   {
-/* ADJUSMENT_DEF is NULL when called from
+/* ADJUSTMENT_DEF is NULL when called from
vect_create_epilog_for_reduction to vectorize double reduction.  */
 if (adjustment_def)
  *adjustment_def = init_val;
@@ -4283,6 +4283,11 @@ get_initial_defs_for_reduction (slp_tree
DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
SLP_NODE is an SLP node containing a group of reduction statements. The 
  first one in this group is STMT.
+   INDUC_VAL is for INTEGER_INDUC_COND_REDUCTION the value to use for the case
+ when the COND_EXPR is never true in the loop.  For MAX_EXPR, it needs to
+ be smaller than any value of the IV in the loop, for MIN_EXPR larger than
+ any value of the IV in the loop.
+   INDUC_CODE is the code for epilog reduction if INTEGER_INDUC_COND_REDUCTION.
 
This function:
1. Creates the reduction def-use cycles: sets the arguments for 
@@ -4330,7 +4335,8 @@ vect_create_epilog_for_reduction (vec reduction_phis,
   bool double_reduc, 
  slp_tree slp_node,
- slp_instance slp_node_instance)
+ slp_instance slp_node_instance,
+ tree induc_val, enum tree_code induc_code)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   stmt_vec_info prev_phi_info;
@@ -4419,6 +4425,18 @@ vect_create_epilog_for_reduction (vec (phi), zero_vec,
+

Re: [Patch combine] Don't create vector mode ZERO_EXTEND from subregs

2017-12-11 Thread Jeff Law
On 12/11/2017 07:18 AM, James Greenhalgh wrote:
> 
> Hi,
> 
> In simplify_set we try transforming the paradoxical subreg expression:
> 
>   (set FOO (subreg:M (mem:N BAR) 0))
> 
> in to:
> 
>   (set FOO (zero_extend:M (mem:N BAR)))
> 
> However, this code does not consider the case where M is a vector
> mode, allowing it to construct (for example):
> 
>   (zero_extend:V4SI (mem:SI))
> 
> This would clearly have the wrong semantics, but fortunately we fail long
> before then in expand_compound_operation. As we really don't want a vector
> zero_extend of a scalar value.
> 
> We need to explicitly reject vector modes from this transformation.
> 
> This fixes a failure I'm seeing on a branch in which I'm trying to
> tackle some performance regressions, so I have no live testcase for
> this, but it is wrong by observation.
> 
> Tested on aarch64-none-elf and bootstrapped on aarch64-none-linux-gnu with
> no issues.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2017-12-11  James Greenhalgh  
> 
>   * combine.c (simplify_set): Do not transform subregs to zero_extends
>   if the destination mode is a vector mode.
> 
OK.  Ideally you'd have a test for the testsuite as well, but I won't
stress without it :-)


jeff


[PATCH, rs6000] Fix PR83332 (missing vcond patterns)

2017-12-11 Thread Bill Schmidt
Hi,

A new test case introduced for PR81303 failed on powerpc64 (BE, LE).  This
turns out to be due to a missing standard pattern (vcondv2div2df).  This
and a couple of other patterns are easy to support with existing logic
by just adding new patterns with appropriate modes.  That's all this patch
does.  That's sufficient to cause the failing test to pass.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
this okay for trunk?

Thanks,
Bill


2017-12-11  Bill Schmidt  

PR target/83332
* config/rs6000/vector.md (vcondv2dfv2di): New define_expand.
(vcondv2div2df): Likewise.
(vconduv2dfv2di): Likewise.

Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 255539)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -455,6 +455,44 @@
 FAIL;
 }")
 
+(define_expand "vcondv2dfv2di"
+  [(set (match_operand:V2DF 0 "vfloat_operand" "")
+   (if_then_else:V2DF
+(match_operator 3 "comparison_operator"
+[(match_operand:V2DI 4 "vint_operand" "")
+ (match_operand:V2DI 5 "vint_operand" "")])
+(match_operand:V2DF 1 "vfloat_operand" "")
+(match_operand:V2DF 2 "vfloat_operand" "")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
+   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
+  "
+{
+  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
+   operands[3], operands[4], operands[5]))
+DONE;
+  else
+FAIL;
+}")
+
+(define_expand "vcondv2div2df"
+  [(set (match_operand:V2DI 0 "vint_operand" "")
+   (if_then_else:V2DI
+(match_operator 3 "comparison_operator"
+[(match_operand:V2DF 4 "vfloat_operand" "")
+ (match_operand:V2DF 5 "vfloat_operand" "")])
+(match_operand:V2DI 1 "vint_operand" "")
+(match_operand:V2DI 2 "vint_operand" "")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
+   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
+  "
+{
+  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
+   operands[3], operands[4], operands[5]))
+DONE;
+  else
+FAIL;
+}")
+
 (define_expand "vcondu"
   [(set (match_operand:VEC_I 0 "vint_operand")
(if_then_else:VEC_I
@@ -492,6 +530,25 @@
 FAIL;
 }")
 
+(define_expand "vconduv2dfv2di"
+  [(set (match_operand:V2DF 0 "vfloat_operand" "")
+   (if_then_else:V2DF
+(match_operator 3 "comparison_operator"
+[(match_operand:V2DI 4 "vint_operand" "")
+ (match_operand:V2DI 5 "vint_operand" "")])
+(match_operand:V2DF 1 "vfloat_operand" "")
+(match_operand:V2DF 2 "vfloat_operand" "")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
+   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
+  "
+{
+  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
+   operands[3], operands[4], operands[5]))
+DONE;
+  else
+FAIL;
+}")
+
 (define_expand "vector_eq"
   [(set (match_operand:VEC_C 0 "vlogical_operand" "")
(eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand" "")



Re: [PATCH] [GOLD] Add plugin API for processing plugin-added input files

2017-12-11 Thread Sriraman Tallam via gcc-patches
On Thu, Nov 9, 2017 at 9:04 PM, Cary Coutant  wrote:
>> include/ChangeLog:
>> 2017-11-09  Stephen Crane  
>>
>> * plugin-api.h: Add new plugin hook to allow processing of input
>> files added by a plugin.
>> (ld_plugin_new_input_handler): New funcion hook type.
>> (ld_plugin_register_new_input): New interface.
>> (LDPT_REGISTER_NEW_INPUT_HOOK): New enum val.
>> (tv_register_new_input): New member.
>>
>>
>> gold/ChangeLog:
>> 2017-11-09  Stephen Crane  
>>
>> * plugin.cc (Plugin::load): Include hooks for register_new_input
>> in transfer vector.
>> (Plugin::new_input): New function.
>> (register_new_input): New function.
>> (Plugin_manager::claim_file): Call Plugin::new_input if in
>> replacement phase.
>> * plugin.h (Plugin::set_new_input_handler): New function.
>> * testsuite/plugin_new_section_layout.c: New plugin to test
>> new_input plugin API.
>> * testsuite/plugin_final_layout.sh: Add new input test.
>> * testsuite/Makefile.am (plugin_layout_new_file): New test case.
>> * testsuite/Makefile.in: Regenerate.
>
> These are OK. Thanks!
>
> Sri, I'm out of town through 11/18, and won't be able to commit the
> include/ patch to GCC before Stage 1 ends. Can you take care of it?
> (If not, I'll take care of it when I get back -- it was approved
> during Stage 1, so I think it's OK to commit early in Stage 3,
> especially since it's nothing but new declarations.)

Stephen, I was looking at binutils and realized this patch has not
been committed yet.  I only committed the GCC portion, plugin-api.h.

Thanks
Sri

>
> -cary


Re: [PATCH] [GOLD] Add plugin API for processing plugin-added input files

2017-12-11 Thread Stephen Crane
Thanks for committing the GCC portion and following up on this. I had
been meaning to write and ask. I don't have commit privs for binutils,
so either you or Cary will have to commit the binutils patch as well,
if it's not too much trouble. I think much has changed to need a
rebase?

Thanks,
Stephen

On Mon, Dec 11, 2017 at 2:10 PM, Sriraman Tallam  wrote:
> On Thu, Nov 9, 2017 at 9:04 PM, Cary Coutant  wrote:
>>> include/ChangeLog:
>>> 2017-11-09  Stephen Crane  
>>>
>>> * plugin-api.h: Add new plugin hook to allow processing of input
>>> files added by a plugin.
>>> (ld_plugin_new_input_handler): New funcion hook type.
>>> (ld_plugin_register_new_input): New interface.
>>> (LDPT_REGISTER_NEW_INPUT_HOOK): New enum val.
>>> (tv_register_new_input): New member.
>>>
>>>
>>> gold/ChangeLog:
>>> 2017-11-09  Stephen Crane  
>>>
>>> * plugin.cc (Plugin::load): Include hooks for register_new_input
>>> in transfer vector.
>>> (Plugin::new_input): New function.
>>> (register_new_input): New function.
>>> (Plugin_manager::claim_file): Call Plugin::new_input if in
>>> replacement phase.
>>> * plugin.h (Plugin::set_new_input_handler): New function.
>>> * testsuite/plugin_new_section_layout.c: New plugin to test
>>> new_input plugin API.
>>> * testsuite/plugin_final_layout.sh: Add new input test.
>>> * testsuite/Makefile.am (plugin_layout_new_file): New test case.
>>> * testsuite/Makefile.in: Regenerate.
>>
>> These are OK. Thanks!
>>
>> Sri, I'm out of town through 11/18, and won't be able to commit the
>> include/ patch to GCC before Stage 1 ends. Can you take care of it?
>> (If not, I'll take care of it when I get back -- it was approved
>> during Stage 1, so I think it's OK to commit early in Stage 3,
>> especially since it's nothing but new declarations.)
>
> Stephen, I was looking at binutils and realized this patch has not
> been committed yet.  I only committed the GCC portion, plugin-api.h.
>
> Thanks
> Sri
>
>>
>> -cary


Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-11 Thread Michele Pezzutti

I lowered to N = 250 and still fails with a good margin.


On 12/11/2017 09:58 PM, Michele Pezzutti wrote:

I apologize as I am unable to run the test suite in its full form.

Nevertheless, I locally tested the following patch for the test suite:

*26_numerics/random/poisson_distribution/operators/values.cc
    Add additional test to cover bin 'floor(mu) + 1'

by compiling locally values.cc and running it as main().


I had to increase N from default value N = 10 to N = 400 to 
trigger the fault.
I think that for smaller N, the deviation due to the bug is well 
within the confidence

interval set in testDiscreteDist().


The test case below is successful with the proposed patch.


diff --git a/values.cc b/values.cc
index 0039b7d..e12e54f 100644
--- a/values.cc
+++ b/values.cc
@@ -42,6 +42,10 @@ void test01()
   std::poisson_distribution<> pd3(30.0);
   auto bpd3 = std::bind(pd3, eng);
   testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } );
+
+  std::poisson_distribution<> pd4(37.17);
+  auto bpd4 = std::bind(pd4, eng);
+  testDiscreteDist<100, 400>(bpd4, [](int n) { return 
poisson_pdf(n, 37.17); } );

 }



On 12/11/2017 10:52 AM, Paolo Carlini wrote:

Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:

Hi.

This patch intends to fix Bug 83237 - Values returned by 
std::poisson_distribution are not distributed correctly.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue 
description and tests.
In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be 
pretty easy to add one, because all the infrastructure is already in 
place, essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.


Paolo.






Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-11 Thread Martin Sebor

On 12/11/2017 02:08 PM, David Malcolm wrote:

On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:

Bug 83369 - Missing diagnostics during inlining, notes that when
-Wnonnull is issued for an inlined call to a built-in function,
GCC doesn't print the inlining stack, making it hard to debug
where the problem comes from.

When the -Wnonnull warning was introduced into the middle-end
the diagnostic machinery provided no way to print the inlining
stack (analogous to %K for trees).  Since then GCC has gained
support for the %G directive which does just that.  The attached
patch makes use of the directive to print the inlining context
for -Wnonnull.

The patch doesn't include a test because the DejaGnu framework
provides no mechanism to validate this part of GCC output (see
also bug 83336).

Tested on x86_64-linux with no regressions.

Martin


I'm wondering if we should eliminate %K and %G altogether, and make
tree-diagnostic.c and friends automatically print the inlining stack
-they just need a location_t (the issue is with system headers, I
suppose, but maybe we can just make that smarter: perhaps only suppress
if every location in the chain is in a system header?).  I wonder if
that would be GCC 9 material at this point though?


Getting rid of %G and %K sounds fine to me.  I can't think of
a use case for suppressing middle end diagnostics in system
headers so unless someone else can it might be a non-issue.
Since the change would fix a known bug it seems to me that it
should be acceptable even at this stage.


Coming back to this patch: regarding tests, would you be able to use
the techniques of:
  https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00646.html
to build a test case?


I think so.  If I'm reading it right, it depends on the prune.exp
changes and could be as simple as
testsuite/gcc.dg/plugin/diagnostic-test-inlining-4.c, right?

Does it need to be  in the plugin directory or can any test
use this approach?

Martin


Re: [PATCH] C++: avoid most reserved words as misspelling suggestions (PR c++/81610 and PR c++/80567)

2017-12-11 Thread Jason Merrill
On Wed, Nov 22, 2017 at 10:36 AM, David Malcolm  wrote:
> PR c++/81610 and PR c++/80567 report problems where the C++ frontend
> suggested "if", "for" and "else" as corrections for misspelled variable
> names.

Hmm, what about cases where people are actually misspelling keywords?
Don't we want to handle that?

fi (true) { }
retrun 42;

In the PRs you mention, the actual identifiers are 1) missing
includes, which we should check first, and 2) pretty far from the
suggested keywords.

Jason


Re: [PING 2][PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-12-11 Thread Jeff Law
On 12/08/2017 12:19 PM, Martin Sebor wrote:
> Attached is revision 8 of the patch with the changes suggested
> and/or requested below.

[ Big snip. ]

> 
> 
> gcc-78918.diff
> 
> 
> PR tree-optimization/78918 - missing -Wrestrict on memcpy copying over self
> 
> gcc/c-family/ChangeLog:
> 
>   PR tree-optimization/78918
>   * c-common.c (check_function_restrict): Avoid checking built-ins.
>   * c.opt (-Wrestrict): Include in -Wall.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/78918
>   * Makefile.in (OBJS): Add gimple-ssa-warn-restrict.o.
>   * builtins.c (check_sizes): Rename...
>   (check_access): ...to this.  Rename function arguments for clarity.
>   (check_memop_sizes): Adjust names.
>   (expand_builtin_memchr, expand_builtin_memcpy): Same.
>   (expand_builtin_memmove, expand_builtin_mempcpy): Same.
>   (expand_builtin_strcat, expand_builtin_stpncpy): Same.
>   (check_strncat_sizes, expand_builtin_strncat): Same.
>   (expand_builtin_strncpy, expand_builtin_memset): Same.
>   (expand_builtin_bzero, expand_builtin_memcmp): Same.
>   (expand_builtin_memory_chk, maybe_emit_chk_warning): Same.
>   (maybe_emit_sprintf_chk_warning): Same.
>   (expand_builtin_strcpy): Adjust.
>   (expand_builtin_stpcpy): Same.
>   (expand_builtin_with_bounds): Detect out-of-bounds accesses
>   in pointer-checking forms of memcpy, memmove, and mempcpy.
>   (gcall_to_tree_minimal, max_object_size): Define new functions.
>   * builtins.h (max_object_size): Declare.
>   * calls.c (alloc_max_size): Call max_object_size instead of
>   hardcoding ssizetype limit.
>   (get_size_range): Handle new argument.
>   * calls.h (get_size_range): Add a new argument.
>   * cfgexpand.c (expand_call_stmt): Propagate no-warning bit.
>   * doc/invoke.texi (-Wrestrict): Adjust, add example.
>   * gimple-fold.c (gimple_fold_builtin_memory_op): Detect overlapping
>   operations.
>   (gimple_fold_builtin_memory_chk): Same.
>   (gimple_fold_builtin_stxcpy_chk): New function.
>   * gimple-ssa-warn-restrict.c: New source.
>   * gimple-ssa-warn-restrict.h: New header.
>   * gimple.c (gimple_build_call_from_tree): Propagate location.
>   * passes.def (pass_warn_restrict): Add new pass.
>   * tree-pass.h (make_pass_warn_restrict): Declare.
>   * tree-ssa-strlen.c (handle_builtin_strcpy): Detect overlapping
>   operations.
>   (handle_builtin_strcat): Same.
>   (strlen_optimize_stmt): Rename...
>   (strlen_check_and_optimize_stmt): ...to this.  Handle strncat,
>   stpncpy, strncpy, and their checking forms.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/78918
>   * c-c++-common/Warray-bounds.c: New test.
>   * c-c++-common/Warray-bounds-2.c: New test.
>   * c-c++-common/Warray-bounds-3.c: New test.
>   * c-c++-common/Wrestrict-2.c: New test.
>   * c-c++-common/Wrestrict.c: New test.
>   * c-c++-common/Wrestrict.s: New test.
>   * c-c++-common/Wsizeof-pointer-memaccess1.c: Adjust
>   * c-c++-common/Wsizeof-pointer-memaccess2.c: Same.
>   * g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
>   * g++.dg/torture/Wsizeof-pointer-memaccess2.C: Same.
>   * gcc.dg/memcpy-6.c: New test.
>   * gcc.dg/pr69172.c: Adjust.
>   * gcc.dg/pr79223.c: Same.
>   * gcc.dg/Wrestrict-2.c: New test.
>   * gcc.dg/Wrestrict.c: New test.
>   * gcc.dg/Wsizeof-pointer-memaccess1.c
>   * gcc.target/i386/chkp-stropt-17.c: New test.
>   * gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Adjust.
OK.  Thanks for your patience.  I know this was a ton of work and even
more waiting.

jeff


Re: [PATCH] [GOLD] Add plugin API for processing plugin-added input files

2017-12-11 Thread Sriraman Tallam via gcc-patches
On Mon, Dec 11, 2017 at 2:16 PM, Stephen Crane  wrote:
> Thanks for committing the GCC portion and following up on this. I had
> been meaning to write and ask. I don't have commit privs for binutils,
> so either you or Cary will have to commit the binutils patch as well,
> if it's not too much trouble. I think much has changed to need a
> rebase?

Ok, let me apply your patch.  I will get back if there are inconsistencies.

Thanks
Sri

>
> Thanks,
> Stephen
>
> On Mon, Dec 11, 2017 at 2:10 PM, Sriraman Tallam  wrote:
>> On Thu, Nov 9, 2017 at 9:04 PM, Cary Coutant  wrote:
 include/ChangeLog:
 2017-11-09  Stephen Crane  

 * plugin-api.h: Add new plugin hook to allow processing of input
 files added by a plugin.
 (ld_plugin_new_input_handler): New funcion hook type.
 (ld_plugin_register_new_input): New interface.
 (LDPT_REGISTER_NEW_INPUT_HOOK): New enum val.
 (tv_register_new_input): New member.


 gold/ChangeLog:
 2017-11-09  Stephen Crane  

 * plugin.cc (Plugin::load): Include hooks for register_new_input
 in transfer vector.
 (Plugin::new_input): New function.
 (register_new_input): New function.
 (Plugin_manager::claim_file): Call Plugin::new_input if in
 replacement phase.
 * plugin.h (Plugin::set_new_input_handler): New function.
 * testsuite/plugin_new_section_layout.c: New plugin to test
 new_input plugin API.
 * testsuite/plugin_final_layout.sh: Add new input test.
 * testsuite/Makefile.am (plugin_layout_new_file): New test case.
 * testsuite/Makefile.in: Regenerate.
>>>
>>> These are OK. Thanks!
>>>
>>> Sri, I'm out of town through 11/18, and won't be able to commit the
>>> include/ patch to GCC before Stage 1 ends. Can you take care of it?
>>> (If not, I'll take care of it when I get back -- it was approved
>>> during Stage 1, so I think it's OK to commit early in Stage 3,
>>> especially since it's nothing but new declarations.)
>>
>> Stephen, I was looking at binutils and realized this patch has not
>> been committed yet.  I only committed the GCC portion, plugin-api.h.
>>
>> Thanks
>> Sri
>>
>>>
>>> -cary


Re: C PATCH for c/82679 (rejects-valid with _Atomic and arrays)

2017-12-11 Thread Jeff Law
On 12/10/2017 10:48 AM, Marek Polacek wrote:
> We were wrongly rejecting code in the attached test because the check
> in grokdeclarator is wrong: we only want to check whether the user is
> trying to apply _Atomic to an array type, i.e. this:
> 
> typedef int T[10];
> _Atomic T a;
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-12-10  Marek Polacek  
> 
>   PR c/82679
>   * c-decl.c (grokdeclarator): Check declspecs insted of atomicp.
> 
>   * gcc.dg/c11-atomic-5.c: New test.
OK.
jeff


[PATCH] avoid false negatives in attr-nonstring-3.c (PR 83131)

2017-12-11 Thread Martin Sebor

The attr-nonstring-3.c test fails on targets that expand
the calls to some of the tested string functions in builtins.c,
before they reach the checker in calls.c.  The failures were
reported on powrrpc64le but tests can be constructed that fail
even on other targets (including x86_64).

To fix these failures the checker needs to be invoked both
in builtins.c when the expansion takes place and in calls.c
otherwise.

The attached patch does that.  Since it also adjusts
the indentation in the changed functions, I used diff -w
to leave the whitespace changes out of it.

Bootstrapped and tested on x86_64-linux.  I verified the tests
pass using a powerpc64le-linux cross-compiler.

Martin
PR testsuite/83131 - c-c++/common/attr-nonstring-3 failure for strcmp tests on 
PowerPC

gcc/ChangeLog:

PR testsuite/83131
* builtins.c (expand_builtin_strlen): Use get_callee_fndecl.
(expand_builtin_strcmp): Call maybe_warn_nonstring_arg. 
(expand_builtin_strncmp): Same.

gcc/testsuite/ChangeLog:

PR testsuite/83131
* c-c++-common/attr-nonstring-4.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 6b25253..79616df 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2819,8 +2819,7 @@ expand_builtin_strlen (tree exp, rtx target,
 {
   if (!validate_arglist (exp, POINTER_TYPE, VOID_TYPE))
 return NULL_RTX;
-  else
-{
+
   struct expand_operand ops[4];
   rtx pat;
   tree len;
@@ -2883,7 +2882,7 @@ expand_builtin_strlen (tree exp, rtx target,
   /* Check to see if the argument was declared attribute nonstring
  and if so, issue a warning since at this point it's not known
  to be nul-terminated.  */
-  maybe_warn_nonstring_arg (TREE_OPERAND (CALL_EXPR_FN (exp), 0), exp);
+  maybe_warn_nonstring_arg (get_callee_fndecl (exp), exp);
 
   /* Now that we are assured of success, expand the source.  */
   start_sequence ();
@@ -2915,7 +2914,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   return target;
 }
-}
 
 /* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
bytes from constant string DATA + OFFSET and return it as target
@@ -4426,13 +4424,11 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx 
target)
 
   insn_code cmpstr_icode = direct_optab_handler (cmpstr_optab, SImode);
   insn_code cmpstrn_icode = direct_optab_handler (cmpstrn_optab, SImode);
-  if (cmpstr_icode != CODE_FOR_nothing || cmpstrn_icode != CODE_FOR_nothing)
-{
-  rtx arg1_rtx, arg2_rtx;
-  tree fndecl, fn;
+  if (cmpstr_icode == CODE_FOR_nothing && cmpstrn_icode == CODE_FOR_nothing)
+return NULL_RTX;
+
   tree arg1 = CALL_EXPR_ARG (exp, 0);
   tree arg2 = CALL_EXPR_ARG (exp, 1);
-  rtx result = NULL_RTX;
 
   unsigned int arg1_align = get_pointer_alignment (arg1) / BITS_PER_UNIT;
   unsigned int arg2_align = get_pointer_alignment (arg2) / BITS_PER_UNIT;
@@ -4445,9 +4441,10 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx 
target)
   arg1 = builtin_save_expr (arg1);
   arg2 = builtin_save_expr (arg2);
 
-  arg1_rtx = get_memory_rtx (arg1, NULL);
-  arg2_rtx = get_memory_rtx (arg2, NULL);
+  rtx arg1_rtx = get_memory_rtx (arg1, NULL);
+  rtx arg2_rtx = get_memory_rtx (arg2, NULL);
 
+  rtx result = NULL_RTX;
   /* Try to call cmpstrsi.  */
   if (cmpstr_icode != CODE_FOR_nothing)
 result = expand_cmpstr (cmpstr_icode, target, arg1_rtx, arg2_rtx,
@@ -4501,6 +4498,12 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx 
target)
}
 }
 
+  /* Check to see if the argument was declared attribute nonstring
+ and if so, issue a warning since at this point it's not known
+ to be nul-terminated.  */
+  tree fndecl = get_callee_fndecl (exp);
+  maybe_warn_nonstring_arg (fndecl, exp);
+
   if (result)
 {
   /* Return the value in the proper mode for this function.  */
@@ -4515,14 +4518,11 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx 
target)
 
   /* Expand the library call ourselves using a stabilized argument
  list to avoid re-evaluating the function's arguments twice.  */
-  fndecl = get_callee_fndecl (exp);
-  fn = build_call_nofold_loc (EXPR_LOCATION (exp), fndecl, 2, arg1, arg2);
+  tree fn = build_call_nofold_loc (EXPR_LOCATION (exp), fndecl, 2, arg1, arg2);
   gcc_assert (TREE_CODE (fn) == CALL_EXPR);
   CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
   return expand_call (fn, target, target == const0_rtx);
 }
-  return NULL_RTX;
-}
 
 /* Expand expression EXP, which is a call to the strncmp builtin. Return
NULL_RTX if we failed the caller should emit a normal call, otherwise try 
to get
@@ -4532,8 +4532,6 @@ static rtx
 expand_builtin_strncmp (tree exp, ATTRIBUTE_UNUSED rtx target,
ATTRIBUTE_UNUSED machine_mode mode)
 {
-  location_t loc ATTRIBUTE_UNUSED = EXPR_LOCATION (exp);
-
   if (!validate_arglist (exp,
 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
@@ -4542,12 +4540,

Re: [PATCH] [GOLD] Add plugin API for processing plugin-added input files

2017-12-11 Thread Sriraman Tallam via gcc-patches
On Mon, Dec 11, 2017 at 2:16 PM, Stephen Crane  wrote:
> Thanks for committing the GCC portion and following up on this. I had
> been meaning to write and ask. I don't have commit privs for binutils,
> so either you or Cary will have to commit the binutils patch as well,
> if it's not too much trouble. I think much has changed to need a
> rebase?

I just committed your patch.  I had to make one very minor change to
plugin_new_section_layout.c to compile, move the loop initialization
declaration outside as that is not allowed on C.  I tested the patch.

Thanks
Sri

>
> Thanks,
> Stephen
>
> On Mon, Dec 11, 2017 at 2:10 PM, Sriraman Tallam  wrote:
>> On Thu, Nov 9, 2017 at 9:04 PM, Cary Coutant  wrote:
 include/ChangeLog:
 2017-11-09  Stephen Crane  

 * plugin-api.h: Add new plugin hook to allow processing of input
 files added by a plugin.
 (ld_plugin_new_input_handler): New funcion hook type.
 (ld_plugin_register_new_input): New interface.
 (LDPT_REGISTER_NEW_INPUT_HOOK): New enum val.
 (tv_register_new_input): New member.


 gold/ChangeLog:
 2017-11-09  Stephen Crane  

 * plugin.cc (Plugin::load): Include hooks for register_new_input
 in transfer vector.
 (Plugin::new_input): New function.
 (register_new_input): New function.
 (Plugin_manager::claim_file): Call Plugin::new_input if in
 replacement phase.
 * plugin.h (Plugin::set_new_input_handler): New function.
 * testsuite/plugin_new_section_layout.c: New plugin to test
 new_input plugin API.
 * testsuite/plugin_final_layout.sh: Add new input test.
 * testsuite/Makefile.am (plugin_layout_new_file): New test case.
 * testsuite/Makefile.in: Regenerate.
>>>
>>> These are OK. Thanks!
>>>
>>> Sri, I'm out of town through 11/18, and won't be able to commit the
>>> include/ patch to GCC before Stage 1 ends. Can you take care of it?
>>> (If not, I'll take care of it when I get back -- it was approved
>>> during Stage 1, so I think it's OK to commit early in Stage 3,
>>> especially since it's nothing but new declarations.)
>>
>> Stephen, I was looking at binutils and realized this patch has not
>> been committed yet.  I only committed the GCC portion, plugin-api.h.
>>
>> Thanks
>> Sri
>>
>>>
>>> -cary


Re: Handle more SLP constant and extern definitions for variable VF

2017-12-11 Thread Jeff Law
On 11/09/2017 07:20 AM, Richard Sandiford wrote:
> This patch adds support for vectorising SLP definitions that are
> constant or external (i.e. from outside the loop) when the vectorisation
> factor isn't known at compile time.  It can only handle cases where the
> number of SLP statements is a power of 2.
> 
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2017-11-09  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-slp.c: Include gimple-fold.h and internal-fn.h
>   (can_duplicate_and_interleave_p): New function.
>   (vect_get_and_check_slp_defs): Take the vector of statements
>   rather than just the current one.  Remove excess parentheses.
>   Restriction rejectinon of vect_constant_def and vect_external_def
>   for variable-length vectors to boolean types, or types for which
>   can_duplicate_and_interleave_p is false.
>   (vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs.
>   (duplicate_and_interleave): New function.
>   (vect_get_constant_vectors): Use gimple_build_vector for
>   constant-length vectors and duplicate_and_interleave for
>   variable-length vectors.  Don't defer the update when
>   inserting new statements.
> 
> gcc/testsuite/
>   * gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length
>   && vect_load_lanes
>   * gcc.dg/vect/slp-1.c: Likewise.
>   * gcc.dg/vect/slp-10.c: Likewise.
>   * gcc.dg/vect/slp-12b.c: Likewise.
>   * gcc.dg/vect/slp-12c.c: Likewise.
>   * gcc.dg/vect/slp-17.c: Likewise.
>   * gcc.dg/vect/slp-19b.c: Likewise.
>   * gcc.dg/vect/slp-20.c: Likewise.
>   * gcc.dg/vect/slp-21.c: Likewise.
>   * gcc.dg/vect/slp-22.c: Likewise.
>   * gcc.dg/vect/slp-24-big-array.c: Likewise.
>   * gcc.dg/vect/slp-24.c: Likewise.
>   * gcc.dg/vect/slp-28.c: Likewise.
>   * gcc.dg/vect/slp-39.c: Likewise.
>   * gcc.dg/vect/slp-6.c: Likewise.
>   * gcc.dg/vect/slp-7.c: Likewise.
>   * gcc.dg/vect/slp-cond-1.c: Likewise.
>   * gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
>   * gcc.dg/vect/slp-cond-2.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-1.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-8.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-9.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-10.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-12.c: Likewise.
>   * gcc.dg/vect/slp-perm-6.c: Likewise.
>   * gcc.dg/vect/slp-widen-mult-half.c: Likewise.
>   * gcc.dg/vect/vect-live-slp-1.c: Likewise.
>   * gcc.dg/vect/vect-live-slp-2.c: Likewise.
>   * gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length.
>   * gcc.dg/vect/slp-12a.c: Likewise.
>   * gcc.dg/vect/slp-14.c: Likewise.
>   * gcc.dg/vect/slp-15.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>   * gcc.target/aarch64/sve_slp_1.c: New test.
>   * gcc.target/aarch64/sve_slp_1_run.c: Likewise.
>   * gcc.target/aarch64/sve_slp_2.c: Likewise.
>   * gcc.target/aarch64/sve_slp_2_run.c: Likewise.
>   * gcc.target/aarch64/sve_slp_3.c: Likewise.
>   * gcc.target/aarch64/sve_slp_3_run.c: Likewise.
>   * gcc.target/aarch64/sve_slp_4.c: Likewise.
>   * gcc.target/aarch64/sve_slp_4_run.c: Likewise.
OK.
jeff


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2017-12-11 Thread Jeff Law
On 11/09/2017 07:34 PM, Alexandre Oliva wrote:
> This patch introduces an option to enable the generation of location
> views along with location lists.  The exact format depends on the
> DWARF version: it can be a separate attribute (DW_AT_GNU_locviews) or
> (DW_LLE_view_pair) entries in DWARF5+ loclists.
> 
> Line number tables are also affected.  If the assembler is found, at
> compiler build time, to support .loc views, we use them and
> assembler-computed view labels, otherwise we output compiler-generated
> line number programs with conservatively-computed view labels.  In
> either case, we output view information next to line number changes
> when verbose assembly output is requested.
> 
> This patch requires an LVU patch that modifies the exported API of
> final_scan_insn.  It also expects the entire SFN patchset to be
> installed first, although SFN is not a requirement for LVU.
> 
> for  include/ChangeLog
> 
>   * dwarf2.def (DW_AT_GNU_locviews): New.
>   * dwarf2.h (enum dwarf_location_list_entry_type): Add
>   DW_LLE_GNU_view_pair.
>   (DW_LLE_view_pair): Define.
> 
> for  gcc/ChangeLog
> 
>   * common.opt (gvariable-location-views): New.
>   * config.in: Rebuilt.
>   * configure: Rebuilt.
>   * configure.ac: Test assembler for view support.
>   * dwarf2asm.c (dw2_asm_output_symname_uleb128): New.
>   * dwarf2asm.h (dw2_asm_output_symname_uleb128): Declare.
>   * dwarf2out.c (var_loc_view): New typedef.
>   (struct dw_loc_list_struct): Add vl_symbol, vbegin, vend.
>   (dwarf2out_locviews_in_attribute): New.
>   (dwarf2out_locviews_in_loclist): New.
>   (dw_val_equal_p): Compare val_view_list of dw_val_class_view_lists.
>   (enum dw_line_info_opcode): Add LI_adv_address.
>   (struct dw_line_info_table): Add view.
>   (RESET_NEXT_VIEW, RESETTING_VIEW_P): New macros.
>   (DWARF2_ASM_VIEW_DEBUG_INFO): Define default.
>   (zero_view_p): New variable.
>   (ZERO_VIEW_P): New macro.
>   (output_asm_line_debug_info): New.
>   (struct var_loc_node): Add view.
>   (add_AT_view_list, AT_loc_list): New.
>   (add_var_loc_to_decl): Add view param.  Test it against last.
>   (new_loc_list): Add view params.  Record them.
>   (AT_loc_list_ptr): Handle loc and view lists.
>   (view_list_to_loc_list_val_node): New.
>   (print_dw_val): Handle dw_val_class_view_list.
>   (size_of_die): Likewise.
>   (value_format): Likewise.
>   (loc_list_has_views): New.
>   (gen_llsym): Set vl_symbol too.
>   (maybe_gen_llsym, skip_loc_list_entry): New.
>   (dwarf2out_maybe_output_loclist_view_pair): New.
>   (output_loc_list): Output view list or entries too.
>   (output_view_list_offset): New.
>   (output_die): Handle dw_val_class_view_list.
>   (output_dwarf_version): New.
>   (output_compilation_unit_header): Use it.
>   (output_skeleton_debug_sections): Likewise.
>   (output_rnglists, output_line_info): Likewise.
>   (output_pubnames, output_aranges): Update version comments.
>   (output_one_line_info_table): Output view numbers in asm comments.
>   (dw_loc_list): Determine current endview, pass it to new_loc_list.
>   Call maybe_gen_llsym.
>   (loc_list_from_tree_1): Adjust.
>   (add_AT_location_description): Create view list attribute if
>   needed, check it's absent otherwise.
>   (convert_cfa_to_fb_loc_list): Adjust.
>   (maybe_emit_file): Call output_asm_line_debug_info for test.
>   (dwarf2out_var_location): Reset views as needed.  Precompute
>   add_var_loc_to_decl args.  Call get_attr_min_length only if we have the
>   attribute.  Set view.
>   (new_line_info_table): Reset next view.
>   (set_cur_line_info_table): Call output_asm_line_debug_info for test.
>   (dwarf2out_source_line): Likewise.  Output view resets and labels to
>   the assembler, or select appropriate line info opcodes.
>   (prune_unused_types_walk_attribs): Handle dw_val_class_view_list.
>   (optimize_string_length): Catch it.  Adjust.
>   (resolve_addr): Copy vl_symbol along with ll_symbol.  Handle
>   dw_val_class_view_list, and remove it if no longer needed.
>   (hash_loc_list): Hash view numbers.
>   (loc_list_hasher::equal): Compare them.
>   (optimize_location_lists): Check whether a view list symbol is
>   needed, and whether the locview attribute is present, and
>   whether they match.  Remove the locview attribute if no longer
>   needed.
>   (index_location_lists): Call skip_loc_list_entry for test.
>   (dwarf2out_finish): Call output_asm_line_debug_info for test.
>   Use output_dwarf_version.
>   * dwarf2out.h (enum dw_val_class): Add dw_val_class_view_list.
>   (struct dw_val_node): Add val_view_list.
>   * final.c: Include langhooks.h.
>   (SEEN_NEXT_VIEW): New.
>   (set_next_view_needed): New.
>   (clear_next_view_needed): New.
> 

Re: [PATCH] Expensive selftests: torture testing for fix-it boundary conditions (PR c/82050)

2017-12-11 Thread Jeff Law
On 11/28/2017 12:31 PM, David Malcolm wrote:
> This patch adds selftest coverage for the fix for PR c/82050 (r255214).
> 
> The selftest iterates over various "interesting" column and line-width
> values to try to shake out bugs in the fix-it printing routines, a kind
> of "torture" selftest.
> 
> Unfortunately this selftest is noticably slower than the other selftests;
> adding it to diagnostic-show-locus.c led to:
>   -fself-test: 40218 pass(es) in 0.172000 seconds
> slowing down to:
>   -fself-test: 97315 pass(es) in 6.109000 seconds
> for an unoptimized build (e.g. when hacking with --disable-bootstrap).
> 
> Given that this affects the compile-edit-test cycle of the "gcc"
> subdirectory, this felt like an unacceptable amount of overhead to add.
> 
> I attempted to optimize the test by reducing the amount of coverage, but
> the test seems useful, and there seems to be a valid role for "torture"
> selftests.
> 
> Hence this patch adds a:
>   gcc.dg/plugin/expensive_selftests_plugin.c
> with the responsibility for running "expensive" selftests, and adds the
> expensive test there.  The patch moves a small amount of code from
> selftest::run_tests into a helper class so that the plugin can print
> a useful summary line (to reassure us that the tests are actually being
> run).
> 
> With that, the compile-edit-test cycle of the "gcc" subdir is unaffected;
> the plugin takes:
>   expensive_selftests_plugin: 26641 pass(es) in 3.127000 seconds
> which seems reasonable within the much longer time taken by "make check"
> (I optimized some of the overhead away, hence the reduction from 6 seconds
> above down to 3 seconds).
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   PR c/82050
>   * selftest-run-tests.c (selftest::run_tests): Move start/finish code
>   to...
>   * selftest.c (selftest::test_runner::test_runner): New ctor.
>   (selftest::test_runner::~test_runner): New dtor.
>   * selftest.h (class selftest::test_runner): New class.
> 
> gcc/testsuite/ChangeLog:
>   PR c/82050
>   * gcc.dg/plugin/expensive-selftests-1.c: New file.
>   * gcc.dg/plugin/expensive_selftests_plugin.c: New file.
>   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
OK.
jeff


Re: [PATCH 01/14] C++: preserve locations within build_address

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

This is needed for the locations of string literals to be usable,
otherwise the ADDR_EXPR has UNKNOWN_LOCATION, despite wrapping
a node with a correct location_t.

gcc/cp/ChangeLog:
* typeck.c (build_address): Use location of operand when building
address expression.


OK; this one seems obvious.

Jason



[PATCH] Fix the new pr83361.c testcase

2017-12-11 Thread Segher Boessenkool
It needs the following.  Sorry for that.  Committing.


Segher


2017-12-11  Segher Boessenkool  

gcc/testsuite/
* gcc.dg/pr83361.c: Add -Wno-div-by-zero to dg-options.

---
 gcc/testsuite/gcc.dg/pr83361.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr83361.c b/gcc/testsuite/gcc.dg/pr83361.c
index 2a6f807..815b055 100644
--- a/gcc/testsuite/gcc.dg/pr83361.c
+++ b/gcc/testsuite/gcc.dg/pr83361.c
@@ -1,6 +1,6 @@
 /* PR rtl-optimization/83361 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -freorder-blocks-and-partition" } */
+/* { dg-options "-O2 -freorder-blocks-and-partition -Wno-div-by-zero" } */
 
 #include 
 
-- 
1.8.3.1



Re: [PATCH 05/14] tree.c: strip location wrappers from integer_zerop etc

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

We need to strip away location wrappers in tree.c predicates like
integer_zerop, otherwise they fail when they're called on
wrapped INTEGER_CST; an example can be seen for
   c-c++-common/Wmemset-transposed-args1.c
in g++.sum, where the warn_for_memset fails to detect integer zero
if the location wrappers aren't stripped.


These shouldn't be needed; callers should have folded away location 
wrappers.  I would hope for STRIP_ANY_LOCATION_WRAPPER to be almost 
never needed.


warn_for_memset may be missing some calls to fold_for_warn.


  int
  really_constant_p (const_tree exp)
  {
+  STRIP_ANY_LOCATION_WRAPPER (exp);
+
/* This is not quite the same as STRIP_NOPS.  It does more.  */
while (CONVERT_EXPR_P (exp)
 || TREE_CODE (exp) == NON_LVALUE_EXPR)


Here we should be able to add VIEW_CONVERT_EXPR to the condition.

Similarly, tree_nop_conversion should treat a VIEW_CONVERT_EXPR to the 
same type as a nop.


Jason


Re: [PATCH 06/14] Fix Wsizeof-pointer-memaccess*.c

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

gcc/c-family/ChangeLog:
* c-warn.c (sizeof_pointer_memaccess_warning): Strip any location
wrappers from src and dest.


Here the existing calls to tree_strip_nop_conversions ought to handle 
the wrappers.


Jason



Re: [PATCH 07/14] reject_gcc_builtin: strip any location wrappers

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

Otherwise pr70144-1.c breaks.

gcc/c-family/ChangeLog:
* c-common.c (reject_gcc_builtin): Strip any location from EXPR.
---
  gcc/c-family/c-common.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 24077c7..739c54e 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -7774,6 +7774,8 @@ pointer_to_zero_sized_aggr_p (tree t)
  bool
  reject_gcc_builtin (const_tree expr, location_t loc /* = UNKNOWN_LOCATION */)
  {
+  STRIP_ANY_LOCATION_WRAPPER (expr);
+
if (TREE_CODE (expr) == ADDR_EXPR)
  expr = TREE_OPERAND (expr, 0);


Don't you want to strip the wrapper after we strip the ADDR_EXPR?  OK 
with that change.


Jason


Re: [PATCH 09/14] Strip location wrappers in null_ptr_cst_p

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

Without this, "NULL" fails to be usable in C++11 onwards.

gcc/cp/ChangeLog:
* call.c (null_ptr_cst_p): Strip location wrappers when
converting from '0' to a pointer type in C++11 onwards.


OK.

Jason



Re: [PATCH 08/14] cp/tree.c: strip location wrappers in lvalue_kind

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

Without this, then lvalue_p returns false for decls, and hence
e.g. uses of them for references fail.

Stripping location wrappers in lvalue_kind restores the correct
behavior of lvalue_p etc.

gcc/cp/ChangeLog:
* tree.c (lvalue_kind): Strip any location wrapper.


Rather, lvalue_kind should learn to handle VIEW_CONVERT_EXPR.

Jason


Re: [PATCH 10/14] warn_for_memset: handle location wrappers

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

gcc/c-family/ChangeLog:
* c-warn.c (warn_for_memset): Strip any location wrappers
from arg0 and arg2.

gcc/cp/ChangeLog:
* parser.c (cp_parser_postfix_expression): Before warn_for_memset,
strip any wrapper around "arg2" before testing for CONST_DECL.


Despite my earlier comment about fold_for_warn, I guess this is 
specifically interested in literals, so this is OK.


Jason



Re: [PATCH 11/14] Handle location wrappers in string_conv_p

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

gcc/cp/ChangeLog:
* typeck.c (string_conv_p): Strip any location wrapper from "exp".


OK.

Jason



Re: [PATCH 12/14] C++: introduce null_node_p

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

Eschew comparison with null_node in favor of a new null_node_p
function, which strips any location wrappers.


OK.

Jason



Re: [PATCH 13/14] c-format.c: handle location wrappers

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

gcc/c-family/ChangeLog:
* c-format.c (check_format_arg): Strip any location wrapper around
format_tree.
---
  gcc/c-family/c-format.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 164d035..6b436ec 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -1536,6 +1536,8 @@ check_format_arg (void *ctx, tree format_tree,
  
location_t fmt_param_loc = EXPR_LOC_OR_LOC (format_tree, input_location);
  
+  STRIP_ANY_LOCATION_WRAPPER (format_tree);

+
if (VAR_P (format_tree))
  {
/* Pull out a constant value if the front end didn't.  */


It seems like we want fold_for_warn here instead of the special variable 
handling.  That probably makes sense for the other places you change in 
this patch, too.


Jason


Re: [PATCH 14/14] pp_c_cast_expression: don't print casts for location wrappers

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

This patch suppresses the user-visible printing of location wrappers
for "%E" (and "%qE"), adding test coverage via selftests.


OK.

Jason



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-11 Thread Martin Sebor

On 12/04/2017 08:34 AM, Qing Zhao wrote:

Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538 

missing -Wformat-overflow with %s and non-member array arguments

-Wformat-overflow uses the routine "get_range_strlen" to decide the
maximum string length, however, currently "get_range_strlen" misses
the handling of non-member arrays.

Adding the handling of non-member array resolves the issue.
Adding test case pr79538.c into gcc.dg.

During gcc bootstrap, 2 source files (c-family/c-cppbuiltin.c,
fortran/class.c) were detected new warnings by -Wformat-overflow
due to the new handling of non-member array in "get_range_strlen".
in order to avoid these new warnings and continue with bootstrap,
updating these 2 files to avoid the warnings.


Just for context, before posting it Qing discussed the patch with
me in private.  The fixes look good to me.

Martin



in c-family/c-cppbuiltin.c, the warning is following:

../../latest_gcc_2/gcc/c-family/c-cppbuiltin.c:1627:15: note:
‘sprintf’ output 2 or more bytes (assuming 257) into a destination
of size 256
  sprintf (buf1, "%s=%s", macro, buf2);
  ^~~~
in the above, buf1 and buf2 are declared as:
char buf1[256], buf2[256];

i.e, buf1 and buf2 have same size. adjusting the size of buf1 and
buf2 resolves the warning.

fortran/class.c has the similar issue as above. Instead of adjusting
size of the buffers, replacing sprintf with xasprintf.

bootstraped and tested on both X86 and aarch64. no regression.

Okay for trunk?

thanks.

Qing

===

gcc/ChangeLog

2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>

  PR middle_end/79538
  * gimple-fold.c (get_range_strlen): Add the handling of non-member array.

gcc/fortran/ChangeLog

2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>

   PR middle_end/79538
   * class.c (gfc_build_class_symbol): Replace call to
   sprintf with xasprintf to avoid format-overflow warning.
   (generate_finalization_wrapper): Likewise.
   (gfc_find_derived_vtab): Likewise.
   (find_intrinsic_vtab): Likewise.


gcc/c-family/ChangeLog

2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>

  PR middle_end/79538
* c-cppbuiltin.c (builtin_define_with_hex_fp_value):
  Adjust the size of buf1 and buf2, add a new buf to avoid
  format-overflow warning.

gcc/testsuite/ChangeLog

2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>

  PR middle_end/79538
  * gcc.dg/pr79538.c: New test.

---
gcc/c-family/c-cppbuiltin.c| 10 -
gcc/fortran/class.c| 49 --
gcc/gimple-fold.c  | 13 +++
gcc/testsuite/gcc.dg/pr79538.c | 23 
4 files changed, 69 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr79538.c

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 2ac9616..9e33aed 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1613,7 +1613,7 @@ builtin_define_with_hex_fp_value (const char *macro,
  const char *fp_cast)
{
  REAL_VALUE_TYPE real;
-  char dec_str[64], buf1[256], buf2[256];
+  char dec_str[64], buf[256], buf1[128], buf2[64];

  /* This is very expensive, so if possible expand them lazily.  */
  if (lazy_hex_fp_value_count < 12
@@ -1656,11 +1656,11 @@ builtin_define_with_hex_fp_value (const char *macro,

  /* Assemble the macro in the following fashion
 macro = fp_cast [dec_str fp_suffix] */
-  sprintf (buf1, "%s%s", dec_str, fp_suffix);
-  sprintf (buf2, fp_cast, buf1);
-  sprintf (buf1, "%s=%s", macro, buf2);
+  sprintf (buf2, "%s%s", dec_str, fp_suffix);
+  sprintf (buf1, fp_cast, buf2);
+  sprintf (buf, "%s=%s", macro, buf1);

-  cpp_define (parse_in, buf1);
+  cpp_define (parse_in, buf);
}

/* Return a string constant for the suffix for a value of type TYPE
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index ebbd41b..a08fb8d 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -602,7 +602,8 @@ bool
gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
gfc_array_spec **as)
{
-  char name[GFC_MAX_SYMBOL_LEN+1], tname[GFC_MAX_SYMBOL_LEN+1];
+  char tname[GFC_MAX_SYMBOL_LEN+1];
+  char *name;
  gfc_symbol *fclass;
  gfc_symbol *vtab;
  gfc_component *c;
@@ -633,17 +634,17 @@ gfc_build_class_symbol (gfc_typespec *ts, 
symbol_attribute *attr,
  rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank;
  get_unique_hashed_string (tname, ts->u.derived);
  if ((*as) && attr->allocatable)
-sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%da", tname, rank, (*as)->corank);
  else if ((*as) && attr->pointer)
-sprintf (name, "__class_%s_%d_%dp", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%dp", tname

Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-11 Thread Paolo Carlini

Hi,

On 11/12/2017 23:16, Michele Pezzutti wrote:

I lowered to N = 250 and still fails with a good margin.
Good. At the moment however, I think we need a bit of rationale for the 
change that you are proposing, what would you put in a comment in the 
code? It's been a while since the last time I looked into these 
algorithms, is there a simple way to explain why the change is needed 
within the basic rejection method proposed by Devroye? Devroye's book is 
freely available, have you been able to study the relevant bits already? 
(http://www.nrbook.com/devroye/). He is also very approachable in 
private email, if I remember correctly.


Eventually, we could also agree on a good way to extend the coverage of 
the testing, maybe for gcc8 simply add the testcase, but then, for gcc9 
I think we could extend it quite a bit in a consistent way, something 
like a grid from 1.0 to 50 step 1.0 with an increased N. Better if we 
figure out something that looks generic but would also have caught 
anyway 83237, if you see what I mean. I can take care of that. For the 
other discrete distributions too of course.


Thanks a lot for your help!

Paolo.



[PATCH] avoid alignment error for attribute warn_if_not_aligned

2017-12-11 Thread Martin Sebor

My enhancement to improve the detection of attribute collisions
introduced a regression of sorts in the g++.dg/pr53037-4.C test
on ia64.

https://gcc.gnu.org/ml/gcc-testresults/2017-12/msg00672.html

The regression hasn't been noticed anywhere else only because
all the other targets apparently have no low limit on function
alignment, whereas ia64 has a limit of 32.  As a result of
the limit the i64 compiler would issue not one but two errors
for a function declaration with attribute warn_if_not_aligned
that specifies a value less than 32:
one for the attribute itself (because it's not allowed on
functions),
and another for the lower value.

The attached patch fixes this by issuing the error only for
attribute aligned with a lower requirement but not attribute
warn_if_not_aligned.

I tried to come up with a test for this that would fail on
other targets besides ia64 (those with no low bound on
function alignment) but couldn't find a way to do it.  If
someone knows of a way I'd be glad to add a better test
than pr53037-4.C.

Martin
gcc/c-family/ChangeLog:

	* c-attribs.c (common_handle_aligned_attribute): Avoid issuing
	an error for attribute warn_if_not_aligned.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 186df05..74c971d 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -1886,14 +1886,17 @@ common_handle_aligned_attribute (tree *node, tree name, tree args, int flags,
   curalign /= BITS_PER_UNIT;
   bitalign /= BITS_PER_UNIT;
 
+  bool diagd = true;
   if (DECL_USER_ALIGN (decl) || DECL_USER_ALIGN (last_decl))
-	warning (OPT_Wattributes,
-		 "ignoring attribute %<%E (%u)%> because it conflicts with "
-		 "attribute %<%E (%u)%>", name, bitalign, name, curalign);
-  else
+	diagd = warning (OPT_Wattributes,
+			  "ignoring attribute %<%E (%u)%> because it conflicts "
+			  "with attribute %<%E (%u)%>",
+			  name, bitalign, name, curalign);
+  else if (!warn_if_not_aligned_p)
+	/* Do not error out for attribute warn_if_not_aligned.  */
 	error ("alignment for %q+D must be at least %d", decl, curalign);
 
-  if (note)
+  if (diagd && note)
 	inform (DECL_SOURCE_LOCATION (last_decl), "previous declaration here");
 
   *no_add_attrs = true;


Re: Add support for masked load/store_lanes

2017-12-11 Thread Jeff Law
On 11/17/2017 02:36 AM, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> This patch adds support for vectorising groups of IFN_MASK_LOADs
>> and IFN_MASK_STOREs using conditional load/store-lanes instructions.
>> This requires new internal functions to represent the result
>> (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
>>
>> The normal IFN_{LOAD,STORE}_LANES functions are const operations
>> that logically just perform the permute: the load or store is
>> encoded as a MEM operand to the call statement.  In contrast,
>> the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
>> interface as IFN_MASK_{LOAD,STORE}, since the memory is only
>> conditionally accessed.
>>
>> The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
>>
>> Tested on aarch64-linux-gnu (both with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
> 
> Here's an updated (and much simpler) version that applies on top of the
> series I just posted to remove vectorizable_mask_load_store.  Tested as
> before.
> 
> Thanks,
> Richard
> 
> 
> 2017-11-17  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
>   (vec_mask_store_lanes@var{m}@var{n}): Likewise.
>   * optabs.def (vec_mask_load_lanes_optab): New optab.
>   (vec_mask_store_lanes_optab): Likewise.
>   * internal-fn.def (MASK_LOAD_LANES): New internal function.
>   (MASK_STORE_LANES): Likewise.
>   * internal-fn.c (mask_load_lanes_direct): New macro.
>   (mask_store_lanes_direct): Likewise.
>   (expand_mask_load_optab_fn): Handle masked operations.
>   (expand_mask_load_lanes_optab_fn): New macro.
>   (expand_mask_store_optab_fn): Handle masked operations.
>   (expand_mask_store_lanes_optab_fn): New macro.
>   (direct_mask_load_lanes_optab_supported_p): Likewise.
>   (direct_mask_store_lanes_optab_supported_p): Likewise.
>   * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
>   parameter.
>   (vect_load_lanes_supported): Likewise.
>   * tree-vect-data-refs.c (strip_conversion): New function.
>   (can_group_stmts_p): Likewise.
>   (vect_analyze_data_ref_accesses): Use it instead of checking
>   for a pair of assignments.
>   (vect_store_lanes_supported): Take a masked_p parameter.
>   (vect_load_lanes_supported): Likewise.
>   * tree-vect-loop.c (vect_analyze_loop_2): Update calls to
>   vect_store_lanes_supported and vect_load_lanes_supported.
>   * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
>   * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
>   parameter.  Don't allow gaps for masked accesses.
>   Use vect_get_store_rhs.  Update calls to vect_store_lanes_supported
>   and vect_load_lanes_supported.
>   (get_load_store_type): Take a masked_p parameter and update
>   call to get_group_load_store_type.
>   (vectorizable_store): Update call to get_load_store_type.
>   Handle IFN_MASK_STORE_LANES.
>   (vectorizable_load): Update call to get_load_store_type.
>   Handle IFN_MASK_LOAD_LANES.
> 
> gcc/testsuite/
>   * gcc.dg/vect/vect-ooo-group-1.c: New test.
>   * gcc.target/aarch64/sve_mask_struct_load_1.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_1_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_2.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_2_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_3.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_3_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_4.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_5.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_6.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_7.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_load_8.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_1.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_1_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_2.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_2_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_3.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_3_run.c: Likewise.
>   * gcc.target/aarch64/sve_mask_struct_store_4.c: Likewise.
> 
> Index: gcc/doc/md.texi
> ===
> --- gcc/doc/md.texi   2017-11-17 09:06:19.783260344 +
> +++ gcc/doc/md.texi   2017-11-17 09:35:23.400133274 +
> @@ -4855,6 +4855,23 @@ loads for vectors of mode @var{n}.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_load_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_load_lanes@var{m}@var{n}}
> +Like @samp{vec_load_lanes@var{m}@var{n}},

Re: [PATCH] have -Wnonnull print inlining stack (PR 83369)

2017-12-11 Thread David Malcolm
On Mon, 2017-12-11 at 15:18 -0700, Martin Sebor wrote:
> On 12/11/2017 02:08 PM, David Malcolm wrote:
> > On Mon, 2017-12-11 at 09:51 -0700, Martin Sebor wrote:
> > > Bug 83369 - Missing diagnostics during inlining, notes that when
> > > -Wnonnull is issued for an inlined call to a built-in function,
> > > GCC doesn't print the inlining stack, making it hard to debug
> > > where the problem comes from.
> > > 
> > > When the -Wnonnull warning was introduced into the middle-end
> > > the diagnostic machinery provided no way to print the inlining
> > > stack (analogous to %K for trees).  Since then GCC has gained
> > > support for the %G directive which does just that.  The attached
> > > patch makes use of the directive to print the inlining context
> > > for -Wnonnull.
> > > 
> > > The patch doesn't include a test because the DejaGnu framework
> > > provides no mechanism to validate this part of GCC output (see
> > > also bug 83336).
> > > 
> > > Tested on x86_64-linux with no regressions.
> > > 
> > > Martin
> > 
> > I'm wondering if we should eliminate %K and %G altogether, and make
> > tree-diagnostic.c and friends automatically print the inlining
> > stack
> > -they just need a location_t (the issue is with system headers, I
> > suppose, but maybe we can just make that smarter: perhaps only
> > suppress
> > if every location in the chain is in a system header?).  I wonder
> > if
> > that would be GCC 9 material at this point though?
> 
> Getting rid of %G and %K sounds fine to me.  I can't think of
> a use case for suppressing middle end diagnostics in system
> headers so unless someone else can it might be a non-issue.
> Since the change would fix a known bug it seems to me that it
> should be acceptable even at this stage.

There's the "artificial" attribute, which as far as I can tell was
introduced as a kind of workaround for the "inlined code now looks like
it's in a system header" issue (but may also related to debuginfo???)

> > Coming back to this patch: regarding tests, would you be able to
> > use
> > the techniques of:
> >   https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00646.html
> > to build a test case?
> 
> I think so.  If I'm reading it right, it depends on the prune.exp
> changes and could be as simple as
> testsuite/gcc.dg/plugin/diagnostic-test-inlining-4.c, right?
> 
> Does it need to be  in the plugin directory or can any test
> use this approach?

Any test could use that approach, without needing to use the plugin;
that plugin exists purely to separate the testing of the "print
inlining information for middle-end warnings" out from any particular
middle-end warning.

Now that I think about it a bit more, that approach has an issue, in
that it hardcodes a lot of what we expect the output format to be.

That's OK for the patch I linked to, since that's the purpose of that
test.

But for your purposes, I think that all you need is the approach Aldy
used in gcc.dg/tm/pr52141.c, which has just a:

  /* { dg-message "inlined from \'f\'" "" { target *-*-* } 0 } */

to verify that *something* was printed about the inlining, without
overspecifying exactly what.

Dave


Re: [PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2017-12-11 Thread Jason Merrill

On 11/10/2017 04:45 PM, David Malcolm wrote:

The initial version of the patch kit added location wrapper nodes
around constants and uses-of-declarations, along with some other
places in the parser (typeid, alignof, sizeof, offsetof).

This version takes a much more minimal approach: it only adds
location wrapper nodes around the arguments at callsites, thus
not adding wrapper nodes around uses of constants and decls in other
locations.

It keeps them for the other places in the parser (typeid, alignof,
sizeof, offsetof).

In addition, for now, each site that adds wrapper nodes is guarded
with !processing_template_decl, suppressing the creation of wrapper
nodes when processing template declarations.  This is to simplify
the patch kit so that we don't have to support wrapper nodes during
template expansion.


Hmm, it should be easy to support them, since NON_LVALUE_EXPR and 
VIEW_CONVERT_EXPR don't otherwise appear in template trees.


Jason


Re: [SFN+LVU+IEPM v4 3/9] [SFN] not-quite-boilerplate changes in preparation to introduce nonbind markers

2017-12-11 Thread Alexandre Oliva
On Dec  7, 2017, Jeff Law  wrote:

> On 11/09/2017 07:34 PM, Alexandre Oliva wrote:
>> This patch adjusts numerous parts of the compiler that would
> OK.


> I'll note you may need minor tweaks due to the Cilk+ removal.  THose
> changes are pre-approved.

Thanks, here's what I've just installed:

>From bce107d7e689174f402f931b34071ab12b0262cb Mon Sep 17 00:00:00 2001
From: aoliva 
Date: Tue, 12 Dec 2017 02:15:30 +
Subject: [PATCH 3/7] [SFN] not-quite-boilerplate changes in preparation to
 introduce nonbind markers

This patch adjusts numerous parts of the compiler that would
malfunction should they find debug markers at points where they may be
introduced.  The changes purport to allow the compiler to pass
bootstrap-debug-lean (-fcompare-debug in stage3) at various
optimization levels, as well as bootstrap-debug-lib (-fcompare-debug
for target libraries), even after the compiler is changed so that
debug markers are introduced in code streams at spots where earlier
debug stmts, insns and notes wouldn't normally appear.

This patch depends on an earlier SFN boilerplate patch, and on another
SFN patch that introduces new RTL insn-walking functions.

for  gcc/ChangeLog

* cfgcleanup.c (delete_unreachable_blocks): Use alternate
block removal order if MAY_HAVE_DEBUG_BIND_INSNS.
* cfgexpand.c (label_rtx_for_bb): Skip debug insns.
* cfgrtl.c (try_redirect_by_replacing_jump): Skip debug insns.
(rtl_tidy_fallthru_edge): Likewise.
(rtl_verify_fallthru): Likewise.
(rtl_verify_bb_layout): Likewise.
(skip_insns_after_block): Likewise.
(duplicate_insn_chain): Use DEBUG_BIND_INSN_P.
* dwarf2out.c: Include print-rtl.h.
(dwarf2out_next_real_insn): New.
(dwarf2out_var_location): Call it.  Disregard begin stmt markers.
Dump debug binds in asm comments.
* gimple-iterator.c (gimple_find_edge_insert_loc): Skip debug stmts.
* gimple-iterator.h (gsi_start_bb_nondebug): Remove; adjust
callers to use gsi_start_nondebug_bb instead.
(gsi_after_labels): Skip gimple debug stmts.
(gsi_start_nondebug): New.
* gimple-loop-interchange.c (find_deps_in_bb_for_stmt): Adjust.
(proper_loop_form_for_interchange): Adjust.
* gimple-low.c (gimple_seq_may_fallthru): Take last nondebug stmt.
* gimple.h (gimple_seq_last_nondebug_stmt): New.
* gimplify.c (last_stmt_in_scope): Skip debug stmts.
(collect_fallthrough_labels): Likewise.
(should_warn_for_implicit_fallthrough): Likewise.
(warn_implicit_fallthrough_r): Likewise.
(expand_FALLTHROUGH_r): Likewise.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Adjust.
(graphite_copy_stmts_from_block): Skip nonbind markers.
* haifa-sched.c (sched_extend_bb): Skip debug insns.
* ipa-icf-gimple.c (func_checker::compare_bb): Adjust.
* jump.c (clean_barriers): Skip debug insns.
* omp-expand.c (expand_parallel_call): Skip debug insns.
(expand_task_call): Likewise.
(remove_exit_barrier): Likewise.
(expand_omp_taskreg): Likewise.
(expand_omp_for_init_counts): Likewise.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_nochunk): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_simd): Likewise.
(expand_omp_taskloop_for_outer): Likewise.
(expand_omp_taskloop_for_inner): Likewise.
(expand_oacc_for): Likewise.
(expand_omp_sections): Likewise.
(expand_omp_single): Likewise.
(expand_omp_synch): Likewise.
(expand_omp_atomic_load): Likewise.
(expand_omp_atomic_store): Likewise.
(expand_omp_atomic_fetch_op): Likewise.
(expand_omp_atomic_pipeline): Likewise.
(expand_omp_atomic_mutex): Likewise.
(expand_omp_target): Likewise.
(grid_expand_omp_for_loop): Likewise.
(grid_expand_target_grid_body): Likewise.
(build_omp_regions_1): Likewise.
* omp-low.c (check_combined_parallel): Skip debug stmts.
* postreload.c (fixup_debug_insns): Skip nonbind debug insns.
* regcprop.c (find_oldest_value_reg): Ensure REGNO is not a pseudo.
* sese.c (sese_trivially_empty_bb_p): Call is_gimple_debug in
test.
* tree-cfg.c (make_blobs_1): Skip debug stmts.
(make_edges): Likewise.
(cleanup_dead_labels): Likewise.
(gimple_can_merge_blocks_p): Likewise.
(stmt_starts_bb_p): Likewise.
(gimple_block_label): Likewise.
(gimple_redirect_edge_and_branch): Likewise.
* tree-cfgcleanup.c (remove_forwarder_block): Rearrange skipping
of debug stmts.
(execute_cleanup_cfg_post_optimizing): Dump enumerated decls with
TDF_SLIM.
* tree-pretty-print (print_declaration): Omit initializer in slim
dumps.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): 

Re: [SFN+LVU+IEPM v4 4/9] [SFN] stabilize find_bb_boundaries

2017-12-11 Thread Alexandre Oliva
On Dec  7, 2017, Jeff Law  wrote:

> On 11/09/2017 07:34 PM, Alexandre Oliva wrote:
>> * cfgbuild.c (find_bb_boundaries): Don't purge dead edges if,
>> without debug insns, we wouldn't, but clean up debug insns
>> after a control flow insn nevertheless.
> OK.  Seems to me like it's independent of the rest of the work and
> should go in immediately.

Yeah, I wasn't even sure whether to submit it as part of the SFN
patchset.  IIRC it was a build regression of the patchset, that only
came up after I'd first posted it, and I could only trigger it with the
patchset, IIRC because it involved debug insns at places where only
these patches would insert them.  So I kept it in the set, but as a
separate patch.

Anyway, thanks, here it is as installed, FTR:

>From 116cfb8c5abdcd64333be8fa105a4af2dfd13823 Mon Sep 17 00:00:00 2001
From: aoliva 
Date: Tue, 12 Dec 2017 02:15:44 +
Subject: [PATCH 4/7] [SFN] stabilize find_bb_boundaries

If find_bb_boundaries is given a block with zero or one nondebug insn
beside debug insns, it shouldn't purge dead edges, because without
debug insns we wouldn't purge them at that point.  Doing so may change
the order in which edges are processed, and ultimately lead to
different transformations to the CFG and then to different
optimizations.

We shouldn't, however, retain debug insns after control flow insns, so
if we find debug insns after a single insn that happens to be a
control flow insn, do the debug insn cleanups, but still refrain from
purging dead edges at that point.


for  gcc/ChangeLog

* cfgbuild.c (find_bb_boundaries): Don't purge dead edges if,
without debug insns, we wouldn't, but clean up debug insns
after a control flow insn nevertheless.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@255567 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |  4 
 gcc/cfgbuild.c | 40 +++-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 03ad41c3e27a..3c1add62c944 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2017-12-12  Alexandre Oliva 
 
+   * cfgbuild.c (find_bb_boundaries): Don't purge dead edges if,
+   without debug insns, we wouldn't, but clean up debug insns
+   after a control flow insn nevertheless.
+
* cfgcleanup.c (delete_unreachable_blocks): Use alternate
block removal order if MAY_HAVE_DEBUG_BIND_INSNS.
* cfgexpand.c (label_rtx_for_bb): Skip debug insns.
diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 8fa15fec45e4..7b57589f1dfa 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -444,10 +444,43 @@ find_bb_boundaries (basic_block bb)
   rtx_insn *flow_transfer_insn = NULL;
   rtx_insn *debug_insn = NULL;
   edge fallthru = NULL;
+  bool skip_purge;
 
   if (insn == end)
 return;
 
+  if (DEBUG_INSN_P (insn) || DEBUG_INSN_P (end))
+{
+  /* Check whether, without debug insns, the insn==end test above
+would have caused us to return immediately, and behave the
+same way even with debug insns.  If we don't do this, debug
+insns could cause us to purge dead edges at different times,
+which could in turn change the cfg and affect codegen
+decisions in subtle but undesirable ways.  */
+  while (insn != end && DEBUG_INSN_P (insn))
+   insn = NEXT_INSN (insn);
+  rtx_insn *e = end;
+  while (insn != e && DEBUG_INSN_P (e))
+   e = PREV_INSN (e);
+  if (insn == e)
+   {
+ /* If there are debug insns after a single insn that is a
+control flow insn in the block, we'd have left right
+away, but we should clean up the debug insns after the
+control flow insn, because they can't remain in the same
+block.  So, do the debug insn cleaning up, but then bail
+out without purging dead edges as we would if the debug
+insns hadn't been there.  */
+ if (e != end && !DEBUG_INSN_P (e) && control_flow_insn_p (e))
+   {
+ skip_purge = true;
+ flow_transfer_insn = e;
+ goto clean_up_debug_after_control_flow;
+   }
+ return;
+   }
+}
+
   if (LABEL_P (insn))
 insn = NEXT_INSN (insn);
 
@@ -475,7 +508,6 @@ find_bb_boundaries (basic_block bb)
  if (debug_insn && code != CODE_LABEL && code != BARRIER)
prev = PREV_INSN (debug_insn);
  fallthru = split_block (bb, prev);
-
  if (flow_transfer_insn)
{
  BB_END (bb) = flow_transfer_insn;
@@ -527,6 +559,9 @@ find_bb_boundaries (basic_block bb)
  ordinary jump, we need to take care and move basic block boundary.  */
   if (flow_transfer_insn && flow_transfer_insn != end)
 {
+  skip_purge = false;
+
+clean_up_debug_after_control_flow:
   BB_END (bb) = flow_transfer_insn;
 
   /* Clean up the bb field for the insns that do not belong to BB.  */
@@ -543,6 +578,9 @@ f

  1   2   >