date:20180122

Re: [PATCH] Change PRED_LOOP_EXIT from 85 to 92

2018-01-22 Thread Martin Liška

On 01/02/2018 12:57 PM, Martin Liška wrote:
> From what I've measured, suggested adjustment for this particular predictor 
> would be
> increase to 89.

Hello.

Note that I've installed the patch as r256888.

Martin

[PACH][RFC] Pass -m -jN to gcc_build from gcc_release

2018-01-22 Thread Richard Biener


Currently gcc_release builds GCC (for generating in-tree generated
files) serially - that's prohibitly expensive and takges ages (>4h for 
me).  I'm using (when I remember to edit gcc_release ...) -j6 without
a problem for some years, thus the following proposal.

Any recommendation for the default N?  4 might work just fine as well
and no release manager should do without at least 4 cores...

Thanks,
Richard.

2018-01-22  Richard Biener  

maintainer-scripts/
* gcc_release (build_sources): Pass -m "-j1" to gcc_build.

Index: maintainer-scripts/gcc_release
===
--- maintainer-scripts/gcc_release  (revision 256939)
+++ maintainer-scripts/gcc_release  (working copy)
@@ -210,7 +210,8 @@ EOF
 inform "Building compiler"
 OBJECT_DIRECTORY=../objdir
 contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \
-  -c "--enable-generated-files-in-srcdir --disable-multilib" build || \
+  -c "--enable-generated-files-in-srcdir --disable-multilib" \
+  -m "-j1" build || \
   error "Could not rebuild GCC"
   fi

Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote:
> 
> Currently gcc_release builds GCC (for generating in-tree generated
> files) serially - that's prohibitly expensive and takges ages (>4h for 
> me).  I'm using (when I remember to edit gcc_release ...) -j6 without
> a problem for some years, thus the following proposal.
> 
> Any recommendation for the default N?  4 might work just fine as well
> and no release manager should do without at least 4 cores...

Perhaps
num_cpus=1
if type -p getconf 2>/dev/null; then
  num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null`
  case "$num_cpus" in
'' | 0* | *[!0-9]*) num_cpus=1;;
  esac
fi

and "-j$num_cpus" ?

> 2018-01-22  Richard Biener  
> 
>   maintainer-scripts/
>   * gcc_release (build_sources): Pass -m "-j1" to gcc_build.
> 
> Index: maintainer-scripts/gcc_release
> ===
> --- maintainer-scripts/gcc_release(revision 256939)
> +++ maintainer-scripts/gcc_release(working copy)
> @@ -210,7 +210,8 @@ EOF
>  inform "Building compiler"
>  OBJECT_DIRECTORY=../objdir
>  contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \
> -  -c "--enable-generated-files-in-srcdir --disable-multilib" build || \
> +  -c "--enable-generated-files-in-srcdir --disable-multilib" \
> +  -m "-j1" build || \
>error "Could not rebuild GCC"
>fi
>  

Jakub

[PATCH] Minor warning option sync with clang

2018-01-22 Thread Franz Sirl


Hi,

a mail in the gcc-list https://gcc.gnu.org/ml/gcc/2018-01/msg00144.html
reminded me about this minor stuff I have (so it can be controlled by 
-Werror),

but never managed to write testcases. I'm posting it here in case
it's good enough or someone wants to take over from here.

Franz


c/ChangeLog;
2018-01-22  Franz Sirl  

    * c-decl.c (grokdeclarator): Use OPT_Wextern_initializer.
    * c-typeck.c (build_binary_op): Use
    OPT_Wcompare_distinct_pointer_types.


c-family/ChangeLog:
2018-01-22  Franz Sirl  

    * c.opt (-Wcompare-distinct-pointer-types): New option.
    (-Wextern-initializer): Likewise.


cp/ChangeLog:
2018-01-22  Franz Sirl  

    * decl.c (grokdeclarator): Use OPT_Wextern_initializer.
    * typeck.c (composite_pointer_error): Use
    OPT_Wcompare_distinct_pointer_types.


Index: gcc-trunk/gcc/c-family/c.opt
===
--- gcc-trunk/gcc/c-family/c.opt(revision 256939)
+++ gcc-trunk/gcc/c-family/c.opt(working copy)
@@ -318,10 +318,6 @@
 alloca, and on bounded uses of alloca whose bound can be larger than
  bytes.
 
-Warray-bounds
-LangEnabledBy(C ObjC C++ ObjC++,Wall)
-; in common.opt
-
 Warray-bounds=
 LangEnabledBy(C ObjC C++ ObjC++,Wall,1,0)
 ; in common.opt
@@ -420,6 +416,10 @@
 C ObjC C++ ObjC++ Warning Alias(Wcomment)
 Synonym for -Wcomment.
 
+Wcompare-distinct-pointer-types
+C ObjC C++ ObjC++ Var(warn_distinct_pointer_types) Init(1) Warning
+Warn for comparison of distinct pointer types.
+
 Wconditionally-supported
 C++ ObjC++ Var(warn_conditionally_supported) Warning
 Warn for conditionally-supported constructs.
@@ -512,6 +512,10 @@
 C ObjC RejectNegative Warning Alias(Werror=, implicit-function-declaration)
 This switch is deprecated; use -Werror=implicit-function-declaration instead.
 
+Wextern-initializer
+C ObjC C++ ObjC++ Var(warn_extern_initializer) Init(1) Warning
+Warn about extern declarations with an initializer.
+
 Wextra
 C ObjC C++ ObjC++ Warning
 ; in common.opt
@@ -694,6 +698,10 @@
 C C++ Common Var(warn_misleading_indentation) Warning LangEnabledBy(C C++,Wall)
 Warn when the indentation of the code does not reflect the block structure.
 
+Wmisleading-indentation-ltb
+C C++ Common Var(warn_misleading_indentation_ltb) Warning
+Warn when the indentation of the code does not reflect the block structure.
+
 Wmissing-braces
 C ObjC C++ ObjC++ Var(warn_missing_braces) Warning LangEnabledBy(C ObjC,Wall)
 Warn about possibly missing braces around initializers.
Index: gcc-trunk/gcc/c/c-decl.c
===
--- gcc-trunk/gcc/c/c-decl.c(revision 256939)
+++ gcc-trunk/gcc/c/c-decl.c(working copy)
@@ -5888,8 +5888,8 @@
/* It is fine to have 'extern const' when compiling at C
   and C++ intersection.  */
if (!(warn_cxx_compat && constp))
- warning_at (loc, 0, "%qE initialized and declared %",
-name);
+ warning_at (loc, OPT_Wextern_initializer,
+"%qE initialized and declared %", name);
  }
   else
error_at (loc, "%qE has both % and initializer", name);
Index: gcc-trunk/gcc/cp/decl.c
===
--- gcc-trunk/gcc/cp/decl.c (revision 256939)
+++ gcc-trunk/gcc/cp/decl.c (working copy)
@@ -12408,7 +12408,8 @@
/* It's common practice (and completely valid) to have a const
   be initialized and declared extern.  */
if (!(type_quals & TYPE_QUAL_CONST))
- warning (0, "%qs initialized and declared %", name);
+ warning_at (input_location, OPT_Wextern_initializer,
+ "%qs initialized and declared %", name);
  }
else
  {
Index: gcc-trunk/gcc/c/c-typeck.c
===
--- gcc-trunk/gcc/c/c-typeck.c  (revision 256939)
+++ gcc-trunk/gcc/c/c-typeck.c  (working copy)
@@ -11599,8 +11599,9 @@
  else
/* Avoid warning about the volatile ObjC EH puts on decls.  */
if (!objc_ok)
- pedwarn (location, 0,
-  "comparison of distinct pointer types lacks a cast");
+ pedwarn (location, OPT_Wcompare_distinct_pointer_types,
+  "comparison of distinct pointer types "
+  "%qT and %qT lacks a cast", type0, type1);
 
  if (result_type == NULL_TREE)
{
@@ -11711,8 +11712,9 @@
  int qual = ENCODE_QUAL_ADDR_SPACE (as_common);
  result_type = build_pointer_type
  (build_qualified_type (void_type_node, qual));
- pedwarn (location, 0,
-  "comparison of distinct pointer types lacks a cast");
+ pedwarn (location, OPT_Wcompare_distinct_pointer_types,
+  "comparison of distinct pointer type

Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-22 Thread Paolo Carlini


Hi,

On 20/01/2018 02:59, Paolo Carlini wrote:

Hi again,

On 19/01/2018 23:55, Paolo Carlini wrote:
...Therefore It seems to me that a way to more cleanly solve the bug 
would be moving something like || !DECL_NONTRIVIALLY_INITIALIZED_P to 
the the above check in check_for_uninitialized_const_var, and 
therefore remove completely the uninitialized case from 
potential_constant_expression_1, ...
Of course this doesn't work. check_for_uninitialized_const would 
really need to know that the VAR_DECL appears in a statement 
expression which is initializing a constexpr variable, nothing to do 
with DECL_NONTRIVIALLY_INITIALIZED_P. I'll give the issue more thought 
over the we, but removing completely the check from 
potential_constant_expression_1 seems very tough to me, assuming of 
course we really care about diagnosing the uninitialized inner in 
stmtexpr20.C, which is my invention, isn't in the testsuite yet ;)
Thus the below, carefully tested over the we, would be a change 
completely removing the problematic error_at call, plus some testcases 
checking that we would still do the right thing in a few cases (bug 
submitter added constexpr-83921-2.C). The updated stmtexpr20.C shows 
that we would reject anyway a statement expression using an 
uninitialized inner, simply because the whole initializer would still be 
flagged as non const. ICC does the same. CLANG diagnoses the 
uninitialized inner but then actually accepts something using 'static 
const int inner = 1' whereas we don't anyway, we (and ICC) don't accept 
anything similar to stmtexpr20.C. So, I would say this kind of change 
may actually make sense...


Thanks,
Paolo.

//.
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 256939)
+++ cp/constexpr.c  (working copy)
@@ -5707,13 +5707,6 @@ potential_constant_expression_1 (tree t, bool want
  "% in % context", tmp);
  return false;
}
- else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp))
-   {
- if (flags & tf_error)
-   error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized "
- "variable %qD in % context", tmp);
- return false;
-   }
}
   return RECUR (tmp, want_rval);
 
Index: testsuite/g++.dg/cpp1y/constexpr-83921-1.C
===
--- testsuite/g++.dg/cpp1y/constexpr-83921-1.C  (nonexistent)
+++ testsuite/g++.dg/cpp1y/constexpr-83921-1.C  (working copy)
@@ -0,0 +1,5 @@
+// PR c++/83921
+// { dg-do compile { target c++14 } }
+
+struct Foo { };
+constexpr void test() { Foo f; }
Index: testsuite/g++.dg/cpp1y/constexpr-83921-2.C
===
--- testsuite/g++.dg/cpp1y/constexpr-83921-2.C  (nonexistent)
+++ testsuite/g++.dg/cpp1y/constexpr-83921-2.C  (working copy)
@@ -0,0 +1,5 @@
+// PR c++/83921
+// { dg-do compile { target c++14 } }
+
+struct Foo { Foo() = default; };
+constexpr void test() { Foo f; }
Index: testsuite/g++.dg/cpp1y/constexpr-83921-3.C
===
--- testsuite/g++.dg/cpp1y/constexpr-83921-3.C  (nonexistent)
+++ testsuite/g++.dg/cpp1y/constexpr-83921-3.C  (working copy)
@@ -0,0 +1,5 @@
+// PR c++/83921
+// { dg-do compile { target c++14 } }
+
+struct Foo { int m; };
+constexpr void test() { Foo f; }  // { dg-error "uninitialized" }
Index: testsuite/g++.dg/ext/stmtexpr20.C
===
--- testsuite/g++.dg/ext/stmtexpr20.C   (nonexistent)
+++ testsuite/g++.dg/ext/stmtexpr20.C   (working copy)
@@ -0,0 +1,13 @@
+// PR c++/83921
+// { dg-options "" }
+// { dg-do compile { target c++11 } }
+
+struct test { const int *addr; };
+
+const test* setup()
+{
+  static constexpr test atest =
+{ ({ int inner; &inner; }) };  // { dg-error "not a constant" }
+
+  return &atest;
+}

[PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust

2018-01-22 Thread Kyrill Tkachov


Hi all,

This test has needlessly restrictive requirements. It tries to force a 
soft-float target and tries to run.
This makes it unsupportable for any non-soft-float variant.
In fact, the test can be a run-time test for any target, and only the 
scan-assembler tests are specific to
-mfloat-abi=soft. So this patch makes the test always runnable and makes the 
scan-assembler checks predicable
on the the new arm_sotftfloat effective target check.

Committing to trunk.
Thanks,
Kyrill

2018-01-22  Kyrylo Tkachov  

* doc/sourcebuild.texi (arm_softfloat): Document.

2018-01-22  Kyrylo Tkachov  

* lib/target-supports.exp (check_effective_target_arm_softfloat):
New procedure.
* gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere.
Adjust scan-assembler checks for soft-float.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index f0233c9cca4ca32248db407bf00fb4c97eb740b1..69fbf6ac76842ae90fcf4f07638755c18ecc23c9 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1650,6 +1650,10 @@ ARM target adheres to the VFP and Advanced SIMD Register Arguments
 variant of the ABI for the ARM Architecture (as selected with
 @code{-mfloat-abi=hard}).
 
+@item arm_softfloat
+ARM target uses the soft-float ABI with no floating-point instructions
+used whatsoever (as selected with @code{-mfloat-abi=soft}).
+
 @item arm_hard_vfp_ok
 ARM target supports @code{-mfpu=vfp -mfloat-abi=hard}.
 Some multilibs may be incompatible with these options.
diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
index d79d014e27cf445cd741504c6b256a3a51ace6cd..fdbeeadc01e1c9b9a7810a8ff8b23c58f6c429a5 100644
--- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
+++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
@@ -1,8 +1,8 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_thumb2_ok } */
-/* { dg-require-effective-target arm_soft_ok } */
-/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=softfp" "-mfloat-abi=hard" } { "" } } */
-/* { dg-options "-O2 -mfloat-abi=soft --save-temps" } */
+/* { dg-add-options arm_arch_v6t2 } */
+/* { dg-additional-options "-O2 --save-temps" } */
+
 extern void abort (void);
 
 #define N 16
@@ -42,8 +42,8 @@ main (int argc, char **argv)
 {
   int index = 0;
 
-/* { dg-final { scan-assembler-times "bfi" 2 } } */
-/* { dg-final { scan-assembler-times "lsr" 1 } } */
+/* { dg-final { scan-assembler-times "bfi" 2 { target arm_softfloat } } } */
+/* { dg-final { scan-assembler-times "lsr" 1 { target arm_softfloat } } } */
   for (index; index < N; index++)
 {
   if (__builtin_copysignf (a_f[index], b_f[index]) != c_f[index])
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4095f6386b19d601ffd345922b14e015565a2462..50ec1adbf8158b5f29130d3919fcd86de8be5248 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4879,6 +4879,19 @@ proc check_effective_target_arm_hf_eabi { } {
 }]
 }
 
+# Return 1 if this is an ARM target that uses the soft float ABI
+# with no floating-point instructions at all (e.g. -mfloat-abi=soft).
+
+proc check_effective_target_arm_softfloat { } {
+return [check_no_compiler_messages arm_softfloat object {
+	#if !defined(__SOFTFP__)
+	#error not soft-float EABI
+	#else
+	int dummy;
+	#endif
+}]
+}
+
 # Return 1 if this is an ARM target supporting -mcpu=iwmmxt.
 # Some multilibs may be incompatible with this option.

Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi

2018-01-22 Thread Sudakshina Das


Hi Kyrill

On 19/01/18 18:00, Kyrill Tkachov wrote:


On 16/01/18 10:31, Sudakshina Das wrote:

Hi Christophe

On 12/01/18 18:32, Christophe Lyon wrote:

Le 12 janv. 2018 15:26, "Sudakshina Das"  a écrit :

Hi

This patch fixes my earlier test case that fails for arm-none-eabi
with explicit user option for -mfloat-abi which conflict with
the test case options. I have added a guard to skip the test
on those cases.

@Christophe:
Sorry about this. I think this should fix the test case.
Can you please confirm if this works for you?


Yes it does thanks


Thanks for checking that. I have added one more directive for armv5t 
as well to avoid any conflicts for mcpu options.




I agree with what Sudi said in 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01422.html
I'd rather keep the test in the generic torture suite as long as we get 
the directives right.


So this is ok for trunk (as the changes are arm-specific directives) 
with one change below:


Thanks,
Kyrill


Sudi




Thanks
Sudi

gcc/testsuite/ChangeLog

2018-01-12  Sudakshina Das  

 * gcc.c-torture/compile/pr82096.c: Add dg-skip-if
 directive.






diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096.c

index 9fed28c..35551f5 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target arm_arch_v5t_ok } */

Please also guard this on { target arm*-*-* }
That way this test will be run on other targets as well so that they can 
benefit from it.


+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
  /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { 
target arm*-*-* } } */




Thanks committed with the change as r256941

Sudi

RE: [patch][x86] -march=icelake

2018-01-22 Thread Koval, Julia

Hi,
I tried omp_clause_mask and it looks ok. But it lacks check if there is any bit 
or none. With addition of it(as proposed or in some other way it should work. 
What do you think about this approach(patch attached)?

Thanks,
Julia

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, December 19, 2017 2:50 PM
> To: Koval, Julia 
> Cc: Richard Biener ; Uros Bizjak
> ; GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [patch][x86] -march=icelake
> 
> On Tue, Dec 19, 2017 at 12:34:03PM +, Koval, Julia wrote:
> > >> Maybe [] operator could be used instead of a dynamic handling here.
> > I had another solution in mind, with enums, which then addresses elements
> using its index, please look the patch attached.
> 
> You can also have a look at the omp_clause_mask class in c-common.h, that is
> also something that has been added to handle the case where we run out of
> 64-bits for a particular bitmask, wanted to keep using pretty much the same
> interfaces and be able to handle it fast.  Using 2 enums for the two halves
> and treating it accordingly is also an option.
> 
> I agree sbitmap is too heavy for this.
> 
>   Jakub


0001-test.patch
Description: 0001-test.patch

Re: [patch][x86] -march=icelake

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 11:30:10AM +, Koval, Julia wrote:
> Hi, I tried omp_clause_mask and it looks ok.  But it lacks check if there
> is any bit or none.  With addition of it(as proposed or in some other way
> it should work.  What do you think about this approach(patch attached)?

Well, I certainly didn't mean to use omp_clause_mask for something
completely unrelated to OpenMP, the reason I've mentioned it is that it is a
class that deals with a similar problem.

So, if you want to use the same class, it would need to be moved to some
generic header, renamed and then c-common.h would typedef that_class
omp_clause_mask.

I'm surprised you need any, doesn't ((mask & (...)) != 0 already handle
that?

Jakub

[PATCH] Fix PR83963

2018-01-22 Thread Richard Biener


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-01-22  Richard Biener  

PR tree-optimization/83963
* graphite-scop-detection.c (scop_detection::get_sese): Delay
including the loop exit block.
(scop_detection::merge_sese): Likewise.
(scop_detection::add_scop): Do it here instead.

* gcc.dg/graphite/pr83963.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 256939)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -420,13 +420,6 @@ scop_detection::get_sese (loop_p loop)
   edge scop_end = single_exit (loop);
   if (!scop_end || (scop_end->flags & (EDGE_COMPLEX|EDGE_FAKE)))
 return invalid_sese;
-  /* Include the BB with the loop-closed SSA PHI nodes.
- canonicalize_loop_closed_ssa makes sure that is in proper shape.  */
-  if (! single_pred_p (scop_end->dest)
-  || ! single_succ_p (scop_end->dest)
-  || ! sese_trivially_empty_bb_p (scop_end->dest))
-gcc_unreachable ();
-  scop_end = single_succ_edge (scop_end->dest);
 
   return sese_l (scop_begin, scop_end);
 }
@@ -507,17 +500,6 @@ scop_detection::merge_sese (sese_l first
 }
   while (! bitmap_empty_p (worklist));
 
-  /* Include the BB with the loop-closed SSA PHI nodes.
- canonicalize_loop_closed_ssa makes sure that is in proper shape.  */
-  if (exit->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
-  && loop_exit_edge_p (exit->src->loop_father, exit))
-{
-  gcc_assert (single_pred_p (exit->dest)
- && single_succ_p (exit->dest)
- && sese_trivially_empty_bb_p (exit->dest));
-  exit = single_succ_edge (exit->dest);
-}
-
   sese_l combined (entry, exit);
 
   DEBUG_PRINT (dp << "[merged-sese] s1: "; print_sese (dump_file, combined));
@@ -608,6 +590,18 @@ scop_detection::add_scop (sese_l s)
 {
   gcc_assert (s);
 
+  /* Include the BB with the loop-closed SSA PHI nodes, we need this
+ block in the region for code-generating out-of-SSA copies.
+ canonicalize_loop_closed_ssa makes sure that is in proper shape.  */
+  if (s.exit->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
+  && loop_exit_edge_p (s.exit->src->loop_father, s.exit))
+{
+  gcc_assert (single_pred_p (s.exit->dest)
+ && single_succ_p (s.exit->dest)
+ && sese_trivially_empty_bb_p (s.exit->dest));
+  s.exit = single_succ_edge (s.exit->dest);
+}
+
   /* Do not add scops with only one loop.  */
   if (region_has_one_loop (s))
 {
Index: gcc/testsuite/gcc.dg/graphite/pr83963.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83963.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83963.c (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O -floop-nest-optimize -fno-tree-loop-im" } */
+
+int mg, et;
+
+void
+s5 (int is)
+{
+  if (is == 0)
+{
+g6:
+  ++is;
+}
+
+  while (mg < 1)
+{
+  while (et < 1)
+   {
+ if (is == 0)
+   return;
+
+ ++et;
+   }
+
+  while (mg < 1)
+   ++mg;
+}
+
+  goto g6;
+}
+

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Christophe Lyon

On 19 January 2018 at 22:45, Jakub Jelinek  wrote:
> Hi!
>
> This PR is about a certain test FAILing on arm, because it scans for
> "Invalid sum ..." message in the *.ira dump, but recent GCC versions have
> that message in there; not introduced by IRA though, but all the way from
> expansion.  We are expanding:
>[local count: 1073741825]:
>   _1 = *x_3(D);
>   if (_1 u>= 0.0)
> goto ; [99.95%]
>   else
> goto ; [0.05%]
>
>[local count: 536864]:
>   sqrtf (_1);
> and do_compare_rtx_and_jump decides to split the single _1 u>= 0.0
> comparison into two.  The expectation is that the basic block counts stay
> the same, so if bb 3's count is 0.05% times bb 2's count, the probabilities
> need to be computed on both jumps so that this is preserved.
> We want to expand essentially to:
>[local count: 1073741825]:
> ...
>   if (cond1)
> goto ; [first_prob]
>   else
> goto ; [first_prob.invert ()]
>
>   :
>   if (cond2)
> goto ; [prob]
>   else
> goto ; [prob.invert ()]
>
>[local count: 536864]:
>   sqrtf (_1);
> and compute first_prob and prob from the original prob such that the bb
> counts match.  The code used to hardcode first_prob to 1% or 99% depending
> on condition, and leave the second probability the original one.
>
> That IMHO can't work and the Invalid sum message verifies that.  If we want
> the first jump to hit 99times more often than the second one or vice versa,
> I believe first_prob should be .99 * prob or .01 * prob respectively, and
> the second probability then should be (0.01 * prob) / (1 - 0.99 * prob)
> or (0.99 * prob) / (1 - 0.01 * prob) respectively.
>
> With this change the Invalid sum message disappears.
> predict-8.c testcase was apparently trying to match the hardcoded 0.99
> probability rather than .99 * 65% emitted now.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
Hi,

Did you notice any performance impact?

Thanks,

Christophe

> If this patch is right, I think do_jump_by_parts* are buggy similarly too,
> there we emit prob or prob.invert () for all the N jumps we emit instead of
> the original single conditional jump with probability prob.  I think we'd
> need to decide what relative probabilities we want to use for the different
> jumps, e.g. all have even relative likelyhood and then adjust the
> probability of each branch and from what we compute the following
> probabiliries similarly to this patch.
>
> 2018-01-19  Jakub Jelinek  
>
> PR tree-optimization/83081
> * dojump.c (do_compare_rtx_and_jump): Fix adjustment of probabilities
> when splitting a single conditional jump into 2.
>
> * gcc.dg/predict-8.c: Adjust expected probability.
>
> --- gcc/dojump.c.jj 2018-01-03 10:19:55.0 +0100
> +++ gcc/dojump.c2018-01-19 17:07:25.238927314 +0100
> @@ -1122,11 +1122,30 @@ do_compare_rtx_and_jump (rtx op0, rtx op
> {
>   profile_probability first_prob = prob;
>   if (first_code == UNORDERED)
> -   first_prob = profile_probability::guessed_always 
> ().apply_scale
> -(1, 100);
> +   {
> + /* We want to split:
> +if (x) goto t; // prob;
> +into
> +if (a) goto t; // first_prob;
> +if (b) goto t; // prob;
> +such that the overall probability of jumping to t
> +remains the same, but the first jump jumps
> +much less often than the second jump.  */
> + first_prob = prob.guessed ().apply_scale (1, 100);
> + prob = (prob.guessed () - first_prob) / first_prob.invert 
> ();
> +   }
>   else if (first_code == ORDERED)
> -   first_prob = profile_probability::guessed_always 
> ().apply_scale
> -(99, 100);
> +   {
> + /* See above, except the first jump should jump much more
> +often than the second one.  */
> + first_prob = prob.guessed ().apply_scale (99, 100);
> + prob = (prob.guessed () - first_prob) / first_prob.invert 
> ();
> +   }
> + else
> +   {
> + first_prob = prob.guessed ().apply_scale (50, 100);
> + prob = first_prob;
> +   }
>   if (and_them)
> {
>   rtx_code_label *dest_label;
> --- gcc/testsuite/gcc.dg/predict-8.c.jj 2017-11-21 23:17:43.149093787 +0100
> +++ gcc/testsuite/gcc.dg/predict-8.c2018-01-19 22:24:09.949249810 +0100
> @@ -8,4 +8,4 @@ int foo(float a, float b) {
>  return 2;
>  }
>
> -/* { dg-final { scan-rtl-dump-times "99.0. .guessed" 2 "expand"} } */
> +/* { dg-final { scan-rtl-dump-times "64.[34]. .guessed" 2 "expand"} } */
>
> Jakub

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 01:11:34PM +0100, Christophe Lyon wrote:
> On 19 January 2018 at 22:45, Jakub Jelinek  wrote:
> > That IMHO can't work and the Invalid sum message verifies that.  If we want
> > the first jump to hit 99times more often than the second one or vice versa,
> > I believe first_prob should be .99 * prob or .01 * prob respectively, and
> > the second probability then should be (0.01 * prob) / (1 - 0.99 * prob)
> > or (0.99 * prob) / (1 - 0.01 * prob) respectively.
> >
> > With this change the Invalid sum message disappears.
> > predict-8.c testcase was apparently trying to match the hardcoded 0.99
> > probability rather than .99 * 65% emitted now.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> Hi,
> 
> Did you notice any performance impact?

Haven't been looking for that, this is a bugfix, the current code is broken.
It changes (fixes) probabilities of some edges, sure, so it could
affect something both ways.

Jakub

Re: [C++ PATCH] PR c++/83895

2018-01-22 Thread Nathan Sidwell


On 01/21/2018 09:21 AM, Ville Voutilainen wrote:

Finishing testing the attached, OK for trunk?


ok (I had thought typedefs would be covered by the decl_context == 
TYPENAME case, but I see that is not true.)


nathan

--
Nathan Sidwell

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jan Hubicka


Dne 2018-01-19 22:45, Jakub Jelinek napsal:

Hi!

This PR is about a certain test FAILing on arm, because it scans for
"Invalid sum ..." message in the *.ira dump, but recent GCC versions 
have
that message in there; not introduced by IRA though, but all the way 
from

expansion.  We are expanding:
   [local count: 1073741825]:
  _1 = *x_3(D);
  if (_1 u>= 0.0)
goto ; [99.95%]
  else
goto ; [0.05%]

   [local count: 536864]:
  sqrtf (_1);
and do_compare_rtx_and_jump decides to split the single _1 u>= 0.0
comparison into two.  The expectation is that the basic block counts 
stay
the same, so if bb 3's count is 0.05% times bb 2's count, the 
probabilities

need to be computed on both jumps so that this is preserved.
We want to expand essentially to:
   [local count: 1073741825]:
...
  if (cond1)
goto ; [first_prob]
  else
goto ; [first_prob.invert ()]

  :
  if (cond2)
goto ; [prob]
  else
goto ; [prob.invert ()]

   [local count: 536864]:
  sqrtf (_1);
and compute first_prob and prob from the original prob such that the bb
counts match.  The code used to hardcode first_prob to 1% or 99% 
depending

on condition, and leave the second probability the original one.


OK, so if we end u with first_prob 1% we increase probability to 
dispatch to bb 4 by little
bit that is acceptale noise, but when first_prob is 99% then we 
obviously get it wrong.


That IMHO can't work and the Invalid sum message verifies that.  If we 
want
the first jump to hit 99times more often than the second one or vice 
versa,
I believe first_prob should be .99 * prob or .01 * prob respectively, 
and

the second probability then should be (0.01 * prob) / (1 - 0.99 * prob)
or (0.99 * prob) / (1 - 0.01 * prob) respectively.


In new code bb4 is reached by first_prob + (1-first_prob)*second_prob
which should be equal to prob.

Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and 
you want to know c2 to obtain second_prob=c2*prob


You want the combined probability of branch 
(c*prob)+(1-c*prob)*(prob*c2) to be the same prob


This should give c2=(1-c)/(1-c*prob) so 
(c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to 
c*prob+(1-c)*prob


that is
  profile_probability cprob = 
profile_probability::guessed_always ().apply_scale (1, 100);

  first_prob = prob * cprob;
  prob = cprob.inverse () / (first_prob).inverse ();

unless I missed something.  No need to do .guessed conversion on prob. 
It will be propagated

in as cprob which is already constructed as a guess.

I would say that there are multiple cases where we want to redistribute 
probabilities so perhaps factor it out into a helper. Perhaps something 
like


void profile_probability::split (profile_probability cprob, 
profile_probability *first_prob, profile_probability *second_prob)


I would bet we already have something like that but don't see it.

Other cases seems to choose c as 1/2 that  simplifies the formulate bit 
it is also easy to get misunderstand such as in:

profile_probability false_prob = prob.invert ();
profile_probability op0_false_prob = false_prob.apply_scale 
(1, 2);
profile_probability op1_false_prob = false_prob.apply_scale 
(1, 2)

/ op0_false_prob.invert ();
So I would convert those into profile_probability::split

OK with this change.

Thanks for looking into this!
Honza


With this change the Invalid sum message disappears.
predict-8.c testcase was apparently trying to match the hardcoded 0.99
probability rather than .99 * 65% emitted now.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

If this patch is right, I think do_jump_by_parts* are buggy similarly 
too,
there we emit prob or prob.invert () for all the N jumps we emit 
instead of
the original single conditional jump with probability prob.  I think 
we'd
need to decide what relative probabilities we want to use for the 
different

jumps, e.g. all have even relative likelyhood and then adjust the
probability of each branch and from what we compute the following
probabiliries similarly to this patch.

2018-01-19  Jakub Jelinek  

PR tree-optimization/83081
* dojump.c (do_compare_rtx_and_jump): Fix adjustment of probabilities
when splitting a single conditional jump into 2.

* gcc.dg/predict-8.c: Adjust expected probability.

--- gcc/dojump.c.jj 2018-01-03 10:19:55.0 +0100
+++ gcc/dojump.c2018-01-19 17:07:25.238927314 +0100
@@ -1122,11 +1122,30 @@ do_compare_rtx_and_jump (rtx op0, rtx op
{
  profile_probability first_prob = prob;
  if (first_code == UNORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-(1, 100);
+   {
+ /* We want to split:
+if (x) goto t; // prob;
+into
+if (a) go

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 01:52:11PM +0100, Jan Hubicka wrote:
> In new code bb4 is reached by first_prob + (1-first_prob)*second_prob
> which should be equal to prob.
> 
> Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and you
> want to know c2 to obtain second_prob=c2*prob
> 
> You want the combined probability of branch (c*prob)+(1-c*prob)*(prob*c2) to
> be the same prob
> 
> This should give c2=(1-c)/(1-c*prob) so
> (c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to c*prob+(1-c)*prob
> 
> that is
>   profile_probability cprob =
> profile_probability::guessed_always ().apply_scale (1, 100);
> first_prob = prob * cprob;

Yes, if we want it to be done with arbitrary cprob, we can do it the above
way and move the latter into the helper.

> prob = cprob.inverse () / (first_prob).inverse ();

But this looks incorrect, because that computes only the above c2 in prob, not
second_prob.  It needs to be
  prob = cprob.invert () * prob / first_prob.invert ();
or
  prob *= cprob.invert () / first_prob.invert ();
or that
  prob = (prob - first_prob) / first_prob.invert ();
I had in the patch.  The last one looked to me like cheapest to compute.

> void profile_probability::split (profile_probability cprob,
> profile_probability *first_prob, profile_probability *second_prob)

This doesn't take prob as an argument, or should it be a method and take
prob as *this?  Couldn't it simply return first_prob and adjust itself, so
profile_probability split (profile_probability cprob)
{
  profile_probability ret = *this * cprob;
  *this = (*this - prob) / ret.invert ();
// or *this = *this * cprob.invert () / ret.invert ();
  return ret;
}
?
> 
> I would bet we already have something like that but don't see it.
> 
> Other cases seems to choose c as 1/2 that  simplifies the formulate bit it
> is also easy to get misunderstand such as in:
> profile_probability false_prob = prob.invert ();
> profile_probability op0_false_prob = false_prob.apply_scale (1,
> 2);
> profile_probability op1_false_prob = false_prob.apply_scale (1,
> 2)
> / op0_false_prob.invert ();

Or profile_probability op1_false_prob = op0_false_prob / op0_false_prob.invert 
();

Jakub

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jan Hubicka


Dne 2018-01-22 14:29, Jakub Jelinek napsal:

On Mon, Jan 22, 2018 at 01:52:11PM +0100, Jan Hubicka wrote:

In new code bb4 is reached by first_prob + (1-first_prob)*second_prob
which should be equal to prob.

Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and 
you

want to know c2 to obtain second_prob=c2*prob

You want the combined probability of branch 
(c*prob)+(1-c*prob)*(prob*c2) to

be the same prob

This should give c2=(1-c)/(1-c*prob) so
(c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to 
c*prob+(1-c)*prob


that is
  profile_probability cprob =
profile_probability::guessed_always ().apply_scale (1, 100);
  first_prob = prob * cprob;


Yes, if we want it to be done with arbitrary cprob, we can do it the 
above

way and move the latter into the helper.


  prob = cprob.inverse () / (first_prob).inverse ();


But this looks incorrect, because that computes only the above c2 in 
prob, not

second_prob.  It needs to be
  prob = cprob.invert () * prob / first_prob.invert ();
or
  prob *= cprob.invert () / first_prob.invert ();
or that
  prob = (prob - first_prob) / first_prob.invert ();
I had in the patch.  The last one looked to me like cheapest to 
compute.


Aha, I see.  Ok that makes sense to me now :)



void profile_probability::split (profile_probability cprob,
profile_probability *first_prob, profile_probability *second_prob)


This doesn't take prob as an argument, or should it be a method and 
take
prob as *this?  Couldn't it simply return first_prob and adjust itself, 
so

profile_probability split (profile_probability cprob)
{
  profile_probability ret = *this * cprob;
  *this = (*this - prob) / ret.invert ();
// or *this = *this * cprob.invert () / ret.invert ();
  return ret;
}
?


Yep, that is fine.


I would bet we already have something like that but don't see it.

Other cases seems to choose c as 1/2 that  simplifies the formulate 
bit it

is also easy to get misunderstand such as in:
profile_probability false_prob = prob.invert ();
profile_probability op0_false_prob = 
false_prob.apply_scale (1,

2);
profile_probability op1_false_prob = 
false_prob.apply_scale (1,

2)
/ op0_false_prob.invert ();


Or profile_probability op1_false_prob = op0_false_prob /
op0_false_prob.invert ();


I would use the splitting function here so once someone decides to 
change the weights the

code will not produce corrupt profile.

Thanks,
Honza


Jakub

[PATCH 2/2] Introduce prefetch-dynamic-strides option.

2018-01-22 Thread Luis Machado

The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.

Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.

For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.

This also aligns GCC with what LLVM does for Falkor.

Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.

I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

Ok?

2018-01-22  Luis Machado  

Introduce option to control whether the software prefetch pass should
issue prefetch hints for non-constant strides.

gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
: New const unsigned int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 11 +++
 gcc/doc/invoke.texi | 10 ++
 gcc/params.def  |  9 +
 gcc/params.h|  2 ++
 gcc/tree-ssa-loop-prefetch.c| 10 ++
 6 files changed, 45 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 8736bd9..22bd9ae 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* Whether software prefetch hints should be issued for non-constant
+ strides.  */
+  const unsigned int prefetch_dynamic_strides;
   /* The minimum constant stride beyond which we should use prefetch
  hints for.  */
   const int minimum_stride;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0ed9f14..713b230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,  /* l1_cache_size  */
   -1,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
@@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
@@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   1024,/* l2_cache_size  */
+  0,   /* prefetch_dynamic_strides */
   2048,/* minimum_stride */
   3/* default_opt_level  */
 };
@@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   16*1024, /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   3/* default_opt_level  */
 };
@@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
@@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   256, /* l2_cache_size

[PATCH 0/2] Add a couple new options to control loop prefetch pass

2018-01-22 Thread Luis Machado

The following couple patches add options to better control the loop prefetch
pass.

With the current settings the pass tends to be very aggressive, issuing a lot
of prefetch hints. Most of these don't translate to better performance. Some
of these issued hints may even cause more cache evictions.

Luis Machado (2):
  Introduce prefetch-minimum stride option
  Introduce prefetch-dynamic-strides option.

 gcc/config/aarch64/aarch64-protos.h |  6 ++
 gcc/config/aarch64/aarch64.c| 24 +++-
 gcc/doc/invoke.texi | 25 +
 gcc/params.def  | 18 ++
 gcc/params.h|  4 
 gcc/tree-ssa-loop-prefetch.c| 26 ++
 6 files changed, 102 insertions(+), 1 deletion(-)

-- 
2.7.4

[PATCH 1/2] Introduce prefetch-minimum stride option

2018-01-22 Thread Luis Machado

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?

2018-01-22  Luis Machado  

Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.

gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 13 -
 gcc/doc/invoke.texi | 15 +++
 gcc/params.def  |  9 +
 gcc/params.h|  2 ++
 gcc/tree-ssa-loop-prefetch.c| 16 
 6 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index ef1b0bc..8736bd9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+ hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 174310c..0ed9f14 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,  /* l1_cache_size  */
   -1,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
 
@@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
 
@@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   1024,/* l2_cache_size  */
-  -1   /* default_opt_level  */
+  2048,/* minimum_stride */
+  3/* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   16*1024, /* l2_cache_size  */
+  -1,  /* minimum_stride */
   3/* default_opt_level  */
 };
 
@@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
 
@@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_

Re: [PATCH v3] Ability to remap file names in FILE, etc (PR other/70268)

2018-01-22 Thread Ximin Luo

Boris Kolpackov:
> Ximin Luo  writes:
> 
>> -I to an absolute path is not that common for system / distro-built
>> stuff.
> 
> Ok, thanks for clarifying.
> 
> 
>> In the cases that it occurs, indeed it could and should be fixed
>> by the package buildsystem, e.g. by stripping a prefix when they
>> add -I flags to CFLAGS. But that's a separate issue from what
>> we're talking about here.
> 
> I believe it is the same issue: any package that blindly embeds
> information about how it was built into the result of the build
> does not care about reproducible builds.
> 

I don't think that's realistic position to take, and it seems like a fairly 
arbitrary assertion made merely to "brush this issue under the carpet". I think 
this because of the "implicit contract" of how CFLAGS is generally used:

CFLAGS (and other flags) is a way to pass information from multiple layers of 
programs down to GCC. It is necessary to "blindly embed" it if you want to save 
it *at all*, otherwise you are forced to violate the separation-of-concerns 
between layers in your buildsystem stack.

- Program A sets CFLAGS += --flags_A, then calls program B
- Program B sets CFLAGS += --flags_B, then calls program C
- Program C sets CFLAGS += --flags_C, then calls Make
- Make reads CFLAGS then calls $(CC) $(CFLAGS) [etc]

In general, you are not supposed to edit what was already in CFLAGS, you are 
supposed to append to it. That is why it's good practise to add "--no" options 
for various command-line flags in gcc (and other tools).

So each program X has its own flags_X that it is "responsible for". If all 
programs in the stack makes sure their own flags_X contain reproducible values, 
then the whole build will be reproducible even when recording CFLAGS.

However, this breaks in the case of -f*-prefix-map. If Program A sets this flag 
in flags_A, then this interferes with Program C, even though flags_C was fully 
reproducible.

Note that program B and C don't even know that they are being used with GCC, 
and so really have no business filtering -f*-prefix-map out of the CFLAGS that 
program A already set.

My proposal would allow all the flags_{A,B,C} to be reproducible, and avoid 
different programs having to embed layer-violating logic into themselves.

> If others do agree that this should be "fixed" in GCC, then I
> would suggest that you add a separate option for the environment
> variable case (e.g., -ffile-prefix-map-env) and sidestep the
> whole "how to portably distinguish a path from an environment
> variable" issue.
> 

That would also be fine for me sure.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Re: [PATCH 4/5] Remove predictors that are unrealiable.

2018-01-22 Thread Martin Liška

On 01/19/2018 12:57 PM, Martin Liška wrote:
> Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% 
> w/ dominant edges,
> and later one only 11%. It's caused by these 2 benchmarks with a high 
> coverage:
> 

Hi.

I'm sending details about the 2 edges that influence the statistics 
significantly:

> 500.perlbench_r: regexec.c.065i.profile:
>   negative return heuristics of edge 1368->1370: 2.0%  exec 2477714850 hit 
> 2429863555 (98.1%)

2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp)
-: 3485:{
2477714850: 3486:CHECKPOINT lastcp;
2477714850: 3487:REGEXP *const rx = reginfo->prog;
2477714850: 3488:regexp *const prog = ReANY(rx);
2477714850: 3489:SSize_t result;
[snip]
2477714850: 8046:assert(!result ||  locinput - reginfo->strbeg >= 0);
2477714850: 8047:return result ?  locinput - reginfo->strbeg : -1;
-:  8048:}

As seen it return -1 if a regex is not found, which is in case of perlbench 
very likely branch.

> 
> and 523.xalancbmk_r:
> build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile:  
> negative return heuristics of edge 3->4: 2.0%  exec 1221735072 hit 1221522453 
> (100.0%)

1221735072:   74:int NameDatatypeValidator::compare(const XMLCh* const lValue
-:   75:   , const XMLCh* const rValue
-:   76:   ,   MemoryManager* 
const)
-:   77:{
1221735072:   78:return ( XMLString::equals(lValue, rValue)? 0 : -1);
-:   79:}
-:   80:

IP profile dump file:

xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * 
const this, const XMLCh * const lValue, const XMLCh * const rValue, struct 
MemoryManager * const D.17157)
{
  bool _1;
  int iftmp.0_2;

   [count: 1221735072]:
  # DEBUG BEGIN_STMT
  _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D));
  if (_1 != 0)
goto ; [0.02%]
  else
goto ; [99.98%]

   [count: 1221522453]:

   [count: 1221735072]:
  # iftmp.0_2 = PHI <0(2), -1(3)>
  return iftmp.0_2;

}

Likewise, XML values are more commonly non-equal.
Ok, so may I mark also negative return with PROB_EVEN to track it?

Thanks,
Martin

> 
> Ideas what to do with the predictor for GCC 8 release?
> Martin

Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)

2018-01-22 Thread Jason Merrill

On Thu, Jan 18, 2018 at 5:39 PM, Marek Polacek  wrote:
> The problem in this PR is that we get
> error: constexpr call flows off the end of the function
> for a valid testcase, but not in C++17 mode.  That's because in C++17 we
> execute:
>
> 4321   if (cxx_dialect >= cxx17 && !BINFO_VIRTUAL_P (binfo))
> 4322 {
> 4323   tree decl = build_base_field_1 (t, basetype, next_field);
> 4324   DECL_FIELD_OFFSET (decl) = BINFO_OFFSET (binfo);
> 4325   DECL_FIELD_BIT_OFFSET (decl) = bitsize_zero_node;
> 4326   SET_DECL_OFFSET_ALIGN (decl, BITS_PER_UNIT);
> 4327 }
>
> which puts some additional fields into some classes.  This makes a difference
> for split_nonconstant_init:
>
> 751   if (split_nonconstant_init_1 (dest, init))
> 752 init = NULL_TREE;
>
> where split_nonconstant_init_1 is true so that means that the DECL_INITIAL 
> will
> be null.  Why is split_nonconstant_init_1 true?  Because it ends up calling
> complete_ctor_at_level_p with
>
> 6114   return count_type_elements (type, true) == num_elts;
>
> but the count_type_elements result differs between c++14/c++17 precisely
> because of the fields added in build_base_field.  But we should only add those
> fields when initializing aggregates with bases, which is not what's happening
> in this testcase.
>
> With DECL_INITIAL being null, we error here (result is NULL):
>
> 1702   result = *ctx->values->get (slot ? slot : res);
> 1703   if (result == NULL_TREE && !*non_constant_p)
> 1704 {
> 1705   if (!ctx->quiet)
> 1706 error ("% call flows off the end "
> 1707"of the function");
> 1708   *non_constant_p = true;
> 1709 }
>
> I think the problem is that we're dealing with an empty class here, for which
> we should just generate an empty ctor, as in few other spots.

This seems like a workaround rather than a fix; we should fix
split_nonconstant_init to work properly rather than override it.
Specifically, it seems odd to return true if num_elts is 0; we
shouldn't override DECL_INITIAL if we didn't actually split anything
out.

Jason

RE: [patch][x86] -march=icelake

2018-01-22 Thread Koval, Julia

Yes, you are right, any() is not required. Here is the patch.

Thanks,
Julia

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, January 22, 2018 12:36 PM
> To: Koval, Julia 
> Cc: Richard Biener ; Uros Bizjak
> ; GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [patch][x86] -march=icelake
> 
> On Mon, Jan 22, 2018 at 11:30:10AM +, Koval, Julia wrote:
> > Hi, I tried omp_clause_mask and it looks ok.  But it lacks check if there
> > is any bit or none.  With addition of it(as proposed or in some other way
> > it should work.  What do you think about this approach(patch attached)?
> 
> Well, I certainly didn't mean to use omp_clause_mask for something
> completely unrelated to OpenMP, the reason I've mentioned it is that it is a
> class that deals with a similar problem.
> 
> So, if you want to use the same class, it would need to be moved to some
> generic header, renamed and then c-common.h would typedef that_class
> omp_clause_mask.
> 
> I'm surprised you need any, doesn't ((mask & (...)) != 0 already handle
> that?
> 
>   Jakub



0001-test.patch
Description: 0001-test.patch

[PATCH][AArch64] PR79262: Adjust vector cost

2018-01-22 Thread Wilco Dijkstra

PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a 
result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  

PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust 
vec_to_scalar_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */

[PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-22 Thread Sudakshina Das


Hi

This is a patch to backport r256526 and r256941 (Fix case fix) of trunk 
to fix emit_store_flag_force () function to fix the ICE. The original 
discussion is at 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html


Is this ok for gcc-7-branch?
Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf.

Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-01-22  Sudakshina Das  

Backport from mainline:
2018-01-10  Sudakshina Das  

PR target/82096
* expmed.c (emit_store_flag_force): Swap if const op0
and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2018-01-22  Sudakshina Das  

Backport from mainline:
2018-01-10  Sudakshina Das  

PR target/82096
* gcc.c-torture/compile/pr82096.c: New test.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index e9f634a..30001ac 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5886,6 +5886,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (tem != 0)
 return tem;
 
+  /* If one operand is constant, make it the second one.  Only do this
+ if the other operand is not constant as well.  */
+
+  if (swap_commutative_operands_p (op0, op1))
+{
+  std::swap (op0, op1);
+  code = swap_condition (code);
+}
+
+  if (mode == VOIDmode)
+mode = GET_MODE (op0);
+
   if (!target)
 target = gen_reg_rtx (word_mode);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
new file mode 100644
index 000..d144b70
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -0,0 +1,11 @@
+/* { dg-require-effective-target arm_arch_v5t_ok { target arm*-*-* } } */
+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}

[PATCH] Fix followup for PR83963

2018-01-22 Thread Richard Biener


GRAPHITE has little testing coverage so only SPEC uncovered a latent
issue in scop_detection::harmful_loop_in_region.

Gets us some more testcases.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-01-22  Richard Biener  

PR tree-optimization/83963
* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Properly terminate dominator walk when crossing the exit edge not
when visiting its source block.

* gfortran.dg/graphite/pr83963.f: New testcase.
* gcc.dg/graphite/pr83963-2.c: Likewise.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 256947)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -677,10 +677,10 @@ scop_detection::harmful_loop_in_region (
if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb))
  return true;
 
-  if (bb != exit_bb)
-   for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
-dom;
-dom = next_dom_son (CDI_DOMINATORS, dom))
+  for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
+  dom;
+  dom = next_dom_son (CDI_DOMINATORS, dom))
+   if (dom != scop.exit->dest)
  worklist.safe_push (dom);
 }
 
Index: gcc/testsuite/gfortran.dg/graphite/pr83963.f
===
--- gcc/testsuite/gfortran.dg/graphite/pr83963.f(nonexistent)
+++ gcc/testsuite/gfortran.dg/graphite/pr83963.f(working copy)
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-O -floop-nest-optimize" }
+
+  SUBROUTINE DAVCI(NORB,NCOR,NCI,NA,NB,
+ *CI,MAXP,MAXW1,
+ *  IHMCON,ISTRB,ISTRP,ISTAR,II)
+  DIMENSION EC(MAXP,MAXP),IWRK1(2*MAXW1)
+ EC(II,II) = 1.0D+00
+ DO 1396 II=1,MAXP
+DO 1398 JJ=1,II-1
+   EC(II,JJ) = 0.0D+00
+ 1398   CONTINUE
+ 1396CONTINUE
+  IF (NA.EQ.NB) THEN
+  CALL RINAB0(SI1,SI2,NORB,NCOR,NCI,NA,NB,CI(1,IP),IACON1,IBCON1,
+ *   IWRK1,IHMCON,ISTRB,ISTRP,ISTAR)
+  ENDIF
+  END
Index: gcc/testsuite/gcc.dg/graphite/pr83963-2.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83963-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83963-2.c   (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O -floop-nest-optimize" } */
+
+int Chv_countBigEntries (int npivot, int pivotsizes[], int countflag,
+double droptol, int nD)
+{
+  double absval ;
+  double *entries ;
+  int count;
+  int ii, jj, kinc, kk, kstart, stride ;
+  for ( ii = 0 ; ii < nD ; ii++ )
+{ 
+  kk = kstart ;  
+  kinc = stride ;
+  for ( jj = 0 ; jj < ii ; jj++ )
+   {
+ absval = __builtin_fabs(entries[kk]) ; 
+ if ( absval >= droptol )
+   count++ ;   
+ kk += kinc ;
+ kinc -= 2 ; 
+   }
+  kstart-- ;   
+}
+  return count;
+}

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-01-22 Thread Richard Biener

On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra  wrote:
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?

It would be better to dissect this cost into vec_to_scalar and vec_extract where
vec_to_scalar really means getting at the scalar value of a vector of
uniform values
which most targets can do without any instruction (just use a subreg).

I suppose we could also make vec_to_scalar equal to vector extraction and remove
the uses for the other case (reduction vector result to scalar reg).

Richard.

> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
>
> PR target/79262
> * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>1, /* vec_int_stmt_cost  */
>1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
>1, /* vec_align_load_cost  */
>1, /* vec_unalign_load_cost  */

Re: [PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-22 Thread Richard Biener

On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das  wrote:
> Hi
>
> This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to
> fix emit_store_flag_force () function to fix the ICE. The original
> discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html
> and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html
>
> Is this ok for gcc-7-branch?
> Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf.

The branch is currently frozen so please wait until after the GCC 7.3 release.

Thanks,
Richard.

> Thanks
> Sudi
>
> ChangeLog entries:
>
> *** gcc/ChangeLog ***
>
> 2018-01-22  Sudakshina Das  
>
> Backport from mainline:
> 2018-01-10  Sudakshina Das  
>
> PR target/82096
> * expmed.c (emit_store_flag_force): Swap if const op0
> and change VOIDmode to mode of op0.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-01-22  Sudakshina Das  
>
> Backport from mainline:
> 2018-01-10  Sudakshina Das  
>
> PR target/82096
> * gcc.c-torture/compile/pr82096.c: New test.

Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)

2018-01-22 Thread Marek Polacek

On Mon, Jan 22, 2018 at 09:40:50AM -0500, Jason Merrill wrote:
> This seems like a workaround rather than a fix; we should fix
> split_nonconstant_init to work properly rather than override it.
> Specifically, it seems odd to return true if num_elts is 0; we
> shouldn't override DECL_INITIAL if we didn't actually split anything
> out.

Ah, ok, here's a patch for that, then.  Thanks.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-01-22  Marek Polacek  

PR c++/81933
* typeck2.c (split_nonconstant_init_1): Return false if we didn't
split out anything.

* g++.dg/cpp1y/constexpr-empty4.C: New test.

diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
index 8d933257f5f..8cb0e8811d3 100644
--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -725,6 +725,11 @@ split_nonconstant_init_1 (tree dest, tree init)
 
   /* The rest of the initializer is now a constant. */
   TREE_CONSTANT (init) = 1;
+
+  /* We didn't split out anything.  */
+  if (num_split_elts == 0)
+return false;
+
   return complete_p && complete_ctor_at_level_p (TREE_TYPE (init),
 num_split_elts, inner_type);
 }
diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C 
gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
index e69de29bb2d..059220f8268 100644
--- gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
@@ -0,0 +1,51 @@
+// PR c++/81933
+// { dg-do compile { target c++14 } }
+
+namespace std {
+template  struct __decay_and_strip { typedef _Tp __type; };
+template  struct enable_if { typedef int type; };
+template  struct _Head_base {
+  constexpr _Head_base(_Head) {}
+};
+template  struct _Tuple_impl;
+template 
+struct _Tuple_impl<_Idx, _Head, _Tail...> : _Tuple_impl<1, _Tail...>, // { 
dg-warning "direct base" }
+_Head_base<_Head> {
+  typedef _Tuple_impl<1, _Tail...> _Inherited;
+  typedef _Head_base<_Head> _Base;
+  constexpr _Tuple_impl(_Head __head, _Tail... __tail)
+  : _Inherited(__tail...), _Base(__head) {}
+  _Tuple_impl(const _Tuple_impl &) = default;
+  _Tuple_impl(_Tuple_impl &&);
+};
+template 
+struct _Tuple_impl<_Idx, _Head> : _Head_base<_Head> {
+  typedef _Head_base<_Head> _Base;
+  constexpr _Tuple_impl(_Head __head) : _Base(__head) {}
+};
+template  struct _TC {
+  static constexpr bool _NotSameTuple() { return true; }
+};
+template  class tuple : _Tuple_impl<0, _Elements...> {
+  typedef _Tuple_impl<0, _Elements...> _Inherited;
+
+public:
+  template ::_NotSameTuple()>::type = false>
+  constexpr tuple(_UElements... __elements) : _Inherited(__elements...) {}
+  tuple(const tuple &) = default;
+};
+template 
+constexpr tuple::__type...>
+make_tuple(_Elements... __args) {
+  typedef tuple::__type...> 
__result_type;
+  return __result_type(__args...);
+}
+}
+struct any_udt {};
+template  constexpr auto flatten(Tuples... tuples) {
+  auto all = std::make_tuple(tuples...);
+  auto flat(all);
+  return flat;
+}
+constexpr auto fail = flatten(any_udt{}, any_udt{});

Marek

Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-22 Thread Jason Merrill

On Mon, Jan 22, 2018 at 5:08 AM, Paolo Carlini  wrote:
> On 20/01/2018 02:59, Paolo Carlini wrote:
>>
>> Hi again,
>>
>> On 19/01/2018 23:55, Paolo Carlini wrote:
>>>
>>> ...Therefore It seems to me that a way to more cleanly solve the bug
>>> would be moving something like || !DECL_NONTRIVIALLY_INITIALIZED_P to the
>>> the above check in check_for_uninitialized_const_var, and therefore remove
>>> completely the uninitialized case from potential_constant_expression_1, ...
>>
>> Of course this doesn't work. check_for_uninitialized_const would really
>> need to know that the VAR_DECL appears in a statement expression which is
>> initializing a constexpr variable, nothing to do with
>> DECL_NONTRIVIALLY_INITIALIZED_P.

Ah, that makes sense.  And the static/thread_local checks are
similarly duplicates of checks in start_decl.  It might be nice to
factor those three checks out into another function called in both
places, but perhaps that's not necessary given how small they are.

> Thus the below, carefully tested over the we, would be a change completely
> removing the problematic error_at call, plus some testcases checking that we
> would still do the right thing in a few cases (bug submitter added
> constexpr-83921-2.C). The updated stmtexpr20.C shows that we would reject
> anyway a statement expression using an uninitialized inner, simply because
> the whole initializer would still be flagged as non const.

Do we still diagnose it without the use of the variable?

If not, how about adjusting check_for_uninitialized_const_var to
support calling it from potential_constant_expression_1?  I suppose
we'd need to add a couple of parameters, one complain parm and another
to indicate that the variable is in a constexpr context, and then
return something.

Jason

Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)

2018-01-22 Thread Jason Merrill

OK, thanks.

On Mon, Jan 22, 2018 at 10:44 AM, Marek Polacek  wrote:
> On Mon, Jan 22, 2018 at 09:40:50AM -0500, Jason Merrill wrote:
>> This seems like a workaround rather than a fix; we should fix
>> split_nonconstant_init to work properly rather than override it.
>> Specifically, it seems odd to return true if num_elts is 0; we
>> shouldn't override DECL_INITIAL if we didn't actually split anything
>> out.
>
> Ah, ok, here's a patch for that, then.  Thanks.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-01-22  Marek Polacek  
>
> PR c++/81933
> * typeck2.c (split_nonconstant_init_1): Return false if we didn't
> split out anything.
>
> * g++.dg/cpp1y/constexpr-empty4.C: New test.
>
> diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
> index 8d933257f5f..8cb0e8811d3 100644
> --- gcc/cp/typeck2.c
> +++ gcc/cp/typeck2.c
> @@ -725,6 +725,11 @@ split_nonconstant_init_1 (tree dest, tree init)
>
>/* The rest of the initializer is now a constant. */
>TREE_CONSTANT (init) = 1;
> +
> +  /* We didn't split out anything.  */
> +  if (num_split_elts == 0)
> +return false;
> +
>return complete_p && complete_ctor_at_level_p (TREE_TYPE (init),
>  num_split_elts, inner_type);
>  }
> diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C 
> gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
> index e69de29bb2d..059220f8268 100644
> --- gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
> +++ gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C
> @@ -0,0 +1,51 @@
> +// PR c++/81933
> +// { dg-do compile { target c++14 } }
> +
> +namespace std {
> +template  struct __decay_and_strip { typedef _Tp __type; };
> +template  struct enable_if { typedef int type; };
> +template  struct _Head_base {
> +  constexpr _Head_base(_Head) {}
> +};
> +template  struct _Tuple_impl;
> +template 
> +struct _Tuple_impl<_Idx, _Head, _Tail...> : _Tuple_impl<1, _Tail...>, // { 
> dg-warning "direct base" }
> +_Head_base<_Head> {
> +  typedef _Tuple_impl<1, _Tail...> _Inherited;
> +  typedef _Head_base<_Head> _Base;
> +  constexpr _Tuple_impl(_Head __head, _Tail... __tail)
> +  : _Inherited(__tail...), _Base(__head) {}
> +  _Tuple_impl(const _Tuple_impl &) = default;
> +  _Tuple_impl(_Tuple_impl &&);
> +};
> +template 
> +struct _Tuple_impl<_Idx, _Head> : _Head_base<_Head> {
> +  typedef _Head_base<_Head> _Base;
> +  constexpr _Tuple_impl(_Head __head) : _Base(__head) {}
> +};
> +template  struct _TC {
> +  static constexpr bool _NotSameTuple() { return true; }
> +};
> +template  class tuple : _Tuple_impl<0, _Elements...> {
> +  typedef _Tuple_impl<0, _Elements...> _Inherited;
> +
> +public:
> +  template  +enable_if<_TC<1>::_NotSameTuple()>::type = false>
> +  constexpr tuple(_UElements... __elements) : _Inherited(__elements...) {}
> +  tuple(const tuple &) = default;
> +};
> +template 
> +constexpr tuple::__type...>
> +make_tuple(_Elements... __args) {
> +  typedef tuple::__type...> 
> __result_type;
> +  return __result_type(__args...);
> +}
> +}
> +struct any_udt {};
> +template  constexpr auto flatten(Tuples... tuples) {
> +  auto all = std::make_tuple(tuples...);
> +  auto flat(all);
> +  return flat;
> +}
> +constexpr auto fail = flatten(any_udt{}, any_udt{});
>
> Marek

Re: [C++] Deprecate old for-scope handling

2018-01-22 Thread Jason Merrill

On Sun, Jan 21, 2018 at 1:40 AM, Eric Gallager  wrote:
> On 1/19/18, Nathan Sidwell  wrote:
>> Jason,
>> what do you think about deprecating the ARM-era for-scope handling that
>> allows:
>>void f ()
>>{
>>  for (int i = 0;;);
>>  i = 2;
>>}
>>
>> we noisily accept that in c++98 mode with -fpermissive.  It wasn't even
>> well formed then.  Implementing this has some unique requirements in the
>> name-lookup machinery, which I ran into again today.
>>
>> Option A: rip out now because it's a c++98 ARM-compatibility crutch
>> Option B: deprecate in gcc-8 and remove in gcc-9.
>
> I support Option B because it's good to let people know ahead of time
> about impending removals of things

I'd just remove it in GCC 9.  I think the existing permerror already
qualifies as deprecation, but I wouldn't object to adjusting the
diagnostic to mention this plan.  I wouldn't want to tear it out at
this stage of GCC 8.

Jason

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 02:43:50PM +0100, Jan Hubicka wrote:
> > But this looks incorrect, because that computes only the above c2 in
> > prob, not
> > second_prob.  It needs to be
> >   prob = cprob.invert () * prob / first_prob.invert ();
> > or
> >   prob *= cprob.invert () / first_prob.invert ();
> > or that
> >   prob = (prob - first_prob) / first_prob.invert ();
> > I had in the patch.  The last one looked to me like cheapest to compute.
> 
> Aha, I see.  Ok that makes sense to me now :)
> > 
> > > void profile_probability::split (profile_probability cprob,
> > > profile_probability *first_prob, profile_probability *second_prob)
> > 
> > This doesn't take prob as an argument, or should it be a method and take
> > prob as *this?  Couldn't it simply return first_prob and adjust itself,
> > so
> > profile_probability split (profile_probability cprob)
> > {
> >   profile_probability ret = *this * cprob;
> >   *this = (*this - prob) / ret.invert ();
> > // or *this = *this * cprob.invert () / ret.invert ();
> >   return ret;
> > }
> > ?
> 
> Yep, that is fine.

So like this if it passes bootstrap/regtest on {x86_64,i686}-linux?

Besides adding the new helper method and using it in the spots I've changed
plus the two you've mentioned, I had to do another fix - looking
at how TRUTH_ANDIF_EXPR is handled in dojump_1 and seeing Invalid sum
messages in predict-8.c's *.expand jump, I've realized that it also works
as is only for the discussed, i.e.
  if (x)
goto t; // prob
transformed into:
  if (a)
goto t; // first_prob
  if (b)
goto t; // prob
but not for the case where and_them is true, where we have:
  if (c)
goto dummy; // first_prob.invert ()
  if (d)
goto t; // prob
dummy:
which needs to be handled similarly to the TRUTH_ANDIF_EXPR case.
At least when trying in gdb on predict-8.c various values of cprob
(1%, 15%, 50%, 62%, 98%) none of them generate the Invalid sum messages.

2018-01-22  Jakub Jelinek  

PR tree-optimization/83081
* profile-count.h (profile_probability::split): New method.
* dojump.c (do_jump_1) :
Use profile_probability::split.
(do_compare_rtx_and_jump): Fix adjustment of probabilities
when splitting a single conditional jump into 2.

* gcc.dg/predict-8.c: Adjust expected probability.

--- gcc/profile-count.h.jj  2018-01-19 19:41:52.910549618 +0100
+++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100
@@ -410,6 +410,19 @@ public:
   return *this;
 }
 
+  /* Split *THIS (ORIG) probability into 2 probabilities, such that
+ the returned one (FIRST) is *THIS * CPROB and *THIS is
+ adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
+ == ORIG.  */
+  profile_probability split (const profile_probability &cprob)
+{
+  profile_probability ret = *this * cprob;
+  /* The following is equivalent to:
+*this = cprob.invert () * *this / ret.invert ();  */
+  *this = (*this - ret) / ret.invert ();
+  return ret;
+}
+
   gcov_type apply (gcov_type val) const
 {
   if (*this == profile_probability::uninitialized ())
--- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100
+++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100
@@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = profile_probability::uninitialized ();
 if (prob.initialized_p ())
   {
-profile_probability false_prob = prob.invert ();
-profile_probability op0_false_prob = false_prob.apply_scale (1, 2);
-   profile_probability op1_false_prob = false_prob.apply_scale (1, 2)
-   / op0_false_prob.invert ();
+   op1_prob = prob.invert ();
+   op0_prob = op1_prob.split (profile_probability::even ());
 /* Get the probability that each jump below is true.  */
-op0_prob = op0_false_prob.invert ();
-op1_prob = op1_false_prob.invert ();
+   op0_prob = op0_prob.invert ();
+   op1_prob = op1_prob.invert ();
   }
if (if_false_label == NULL)
   {
@@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = profile_probability::uninitialized ();
 if (prob.initialized_p ())
   {
-op0_prob = prob.apply_scale (1, 2);
-op1_prob = prob.apply_scale (1, 2) / op0_prob.invert ();
+   op1_prob = prob;
+   op0_prob = op1_prob.split (profile_probability::even ());
  }
if (if_true_label == NULL)
  {
@@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op
 
  else
{
- profile_probability first_prob = prob;
+ profile_probability cprob
+   = profile_probability::guessed_always ();
  if (first_code == UNORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-

Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release

2018-01-22 Thread Jeff Law

On 01/22/2018 01:57 AM, Jakub Jelinek wrote:
> On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote:
>>
>> Currently gcc_release builds GCC (for generating in-tree generated
>> files) serially - that's prohibitly expensive and takges ages (>4h for 
>> me).  I'm using (when I remember to edit gcc_release ...) -j6 without
>> a problem for some years, thus the following proposal.
>>
>> Any recommendation for the default N?  4 might work just fine as well
>> and no release manager should do without at least 4 cores...
> 
> Perhaps
> num_cpus=1
> if type -p getconf 2>/dev/null; then
>   num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null`
>   case "$num_cpus" in
> '' | 0* | *[!0-9]*) num_cpus=1;;
>   esac
> fi
> 
> and "-j$num_cpus" ?
Seems reasonable to me -- I believe we do something similar on our
various internal scripts ({redhat,fedora}-rpm-config).

jeff

Re: [PATCH] avoid assuming known string length is constant (PR 83896)

2018-01-22 Thread Jakub Jelinek

On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote:
> gcc/ChangeLog:
> 
>   PR tree-optimization/83896
>   * tree-ssa-strlen.c (get_string_len): Rename...
>   (get_string_cst_length): ...to this.  Return HOST_WIDE_INT.
>   Avoid assuming length is constant.
>   (handle_char_store): Use HOST_WIDE_INT for string length.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/83896
>   * gcc.dg/strlenopt-43.c: New test.
...
> 
>if (TREE_CODE (rhs) == MEM_REF
>&& integer_zerop (TREE_OPERAND (rhs, 1)))

For the future, there is no reason it couldn't handle
also non-zero offsets if the to be returned length is bigger or
equal than that offset, where the offset can be then subtracted.
But let's defer that for GCC9.

> @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
> if (idx > 0)
>   {
> strinfo *si = get_strinfo (idx);
> -   if (si && si->full_string_p)
> +   if (si && si->full_string_p
> +   && tree_fits_shwi_p (si->nonzero_chars))

If a && or || using condition doesn't fit onto a single line, it should be
split into one condition per line, so please change the above into:
  if (si
  && si->full_string_p
  && tree_fits_shwi_p (si->nonzero_chars))

> @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
>unsigned HOST_WIDE_INT offset = 0;
>  
>/* Set to the length of the string being assigned if known.  */
> -  int rhslen;
> +  HOST_WIDE_INT rhslen;

Please remove the empty line above the above comment, the function
starts with definitions of multiple variables, rhslen isn't even
initialized and there is nothing special on it compared to others,
so they should be in one block.

Ok for trunk with those changes.

> --- gcc/testsuite/gcc.dg/strlenopt-43.c   (nonexistent)
> +++ gcc/testsuite/gcc.dg/strlenopt-43.c   (working copy)
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
> +   with non-constant length
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
> +
> +extern char a[5];
> +extern char b[];
> +
> +void f (void)
> +{
> +  if (__builtin_strlen (b) != 4)
> +__builtin_memcpy (a, b, sizeof a);
> +}

Most of the strlenopt*.c tests use the strlenopt.h header and then
use the non-__builtin_ function names, if you want to change that too,
please do so, but not a requirement.

Jakub

Re: [PATCH, rs6000] Add 128-bit support for vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be() builtins.

2018-01-22 Thread Carl Love

Segher:

I put back the xxpermdi,2 stuff.  Per our private discussion about the
parallel [(const_int 0)], I found that I could get GCC to compile
without parallel.  GCC worked with a -O0 on the test case but I got and
IRC when using -O1.  So, I had to put the parallel back in.  The patch
is now works for -O0, -O1, -O2 and -O3.  

I have completed the regression testing again for 

  powerpc64le-unknown-linux-gnu (Power 8 LE)
  powerpc64le-unknown-linux-gnu (Power 8 BE)
  powerpc64le-unknown-linux-gnu (Power 9 LE)

and all looks good.

So, just for the record, here is the final patch that will get
committed.

 Carl Love


gcc/ChangeLog:

2018-01-22 Carl Love 

* config/rs6000/rs6000-builtin.def (ST_ELEMREV_V1TI, LD_ELEMREV_V1TI,
LVX_V1TI): Add macro expansion.
* config/rs6000/rs6000-c.c (altivec_builtin_types): Add argument
definitions for VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_VEC_ST,
VSX_BUILTIN_VEC_XL, LD_ELEMREV_V1TI builtins.
* config/rs6000/rs6000-p8swap.c (insn_is_swappable_p);
Change check to determine if the instruction is a byte reversing
entry.  Fix typo in comment.
* config/rs6000/rs6000.c (altivec_expand_builtin): Add case entry
for VSX_BUILTIN_ST_ELEMREV_V1TI and VSX_BUILTIN_LD_ELEMREV_V1TI.
Add def_builtin calls for new builtins.
* config/rs6000/vsx.md (vsx_st_elemrev_v1ti, vsx_ld_elemrev_v1ti):
Add define_insn expansion.

gcc/testsuite/ChangeLog:

2018-01-22  Carl Love  
* gcc.target/powerpc/powerpc.exp: Add torture tests for
builtins-4-runnable.c, builtins-6-runnable.c,
builtins-5-p9-runnable.c, builtins-6-p9-runnable.c.
* gcc.target/powerpc/builtins-6-runnable.c: New test file.
* gcc.target/powerpc/builtins-4-runnable.c: Add additional tests
for signed/unsigned 128-bit and long long int loads.
---
 gcc/config/rs6000/rs6000-builtin.def   |3 +
 gcc/config/rs6000/rs6000-c.c   |   39 +
 gcc/config/rs6000/rs6000-p8swap.c  |5 +-
 gcc/config/rs6000/rs6000.c |   20 +
 gcc/config/rs6000/vsx.md   |   42 +-
 .../gcc.target/powerpc/builtins-4-runnable.c   |  494 +-
 .../gcc.target/powerpc/builtins-6-runnable.c   | 1001 
 gcc/testsuite/gcc.target/powerpc/powerpc.exp   |   12 +
 8 files changed, 1581 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index b17036c5a..757fd6d50 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1234,6 +1234,7 @@ BU_ALTIVEC_X (LVXL_V8HI,  "lvxl_v8hi",MEM)
 BU_ALTIVEC_X (LVXL_V16QI,  "lvxl_v16qi",   MEM)
 BU_ALTIVEC_X (LVX, "lvx",  MEM)
 BU_ALTIVEC_X (LVX_V2DF,"lvx_v2df", MEM)
+BU_ALTIVEC_X (LVX_V1TI,"lvx_v1ti", MEM)
 BU_ALTIVEC_X (LVX_V2DI,"lvx_v2di", MEM)
 BU_ALTIVEC_X (LVX_V4SF,"lvx_v4sf", MEM)
 BU_ALTIVEC_X (LVX_V4SI,"lvx_v4si", MEM)
@@ -1783,12 +1784,14 @@ BU_VSX_X (STXVW4X_V4SF,   "stxvw4x_v4sf",   MEM)
 BU_VSX_X (STXVW4X_V4SI,  "stxvw4x_v4si",   MEM)
 BU_VSX_X (STXVW4X_V8HI,  "stxvw4x_v8hi",   MEM)
 BU_VSX_X (STXVW4X_V16QI,  "stxvw4x_v16qi", MEM)
+BU_VSX_X (LD_ELEMREV_V1TI,"ld_elemrev_v1ti",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DF,"ld_elemrev_v2df",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DI,"ld_elemrev_v2di",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SF,"ld_elemrev_v4sf",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SI,"ld_elemrev_v4si",  MEM)
 BU_VSX_X (LD_ELEMREV_V8HI,"ld_elemrev_v8hi",  MEM)
 BU_VSX_X (LD_ELEMREV_V16QI,   "ld_elemrev_v16qi", MEM)
+BU_VSX_X (ST_ELEMREV_V1TI,"st_elemrev_v1ti",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DF,"st_elemrev_v2df",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DI,"st_elemrev_v2di",  MEM)
 BU_VSX_X (ST_ELEMREV_V4SF,"st_elemrev_v4sf",  MEM)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 123e46aa1..a0f790d39 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -3149,16 +3149,27 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF,
 RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+RS6000_BTI_unsigned_V1TI, RS6000_BTI_

Re: [PATCH][ARM] Use utxb rN, rM, ror #8 to implement zero_extract on armv6.

2018-01-22 Thread Kyrill Tkachov


Hi Roger,

On 15/01/18 14:58, Roger Sayle wrote:


I was hoping I could ask an ARM backend maintainer to look over the
following patch.

I was examining the code generated for the following C snippet on a
raspberry pi,

static inline int popcount_lut8(unsigned *buf, int n)
{
  int cnt=0;
  unsigned int i;
  do {
i = *buf;
cnt += lut[i&255];
cnt += lut[i>>8&255];
cnt += lut[i>>16&255];
cnt += lut[i>>24];
buf++;
  } while(--n);
  return cnt;
}

and was surprised to see following instruction sequence generated by the
compiler:

movr5, r2, lsr #8
uxtb   r5, r5

This sequence can be performed by a single ARM instruction:

   uxtb   r5, r2, ror #8



I agree.


The attached patch allows GCC's combine pass to take advantage of the ARM's
uxtb with
rotate functionality to implement the above zero_extract, and likewise to
use the sxtb
with rotate to implement sign_extract.  ARM's uxtb and sxtb can only be used
with rotates
of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a
nop, and extends
with ror #24 can be implemented using regular shifts],  so the approach here
is to add the
six missing but useful instructions as 6 different define_insn in arm.md,
rather than try to
be clever with new predicates.

Alas, later ARM hardware has advanced bit field instructions, and earlier
ARM cores
didn't support extend-with-rotate, so this appears to only benefit armv6 era
CPUs.



Right, the deal with these instructions is that for ARM state they appear
from ARMv6 onwards and for Thumb they appear from ARMv6T2.
However, in ARMv6T2 we also get the more advanced UBFX instructions and we
might as well use those (and we do). So indeed the UXTB+ROR instructions
are only useful for ARMv6 ARM-state, which is what your patch correctly
specifies.


The following patch has been minimally tested by building cc1 of a
cross-compiler
and confirming the desired instructions appear in the assembly output for
the test
case.  Alas, my minimal raspberry pi hardware is unlikely to be able to
bootstrap gcc
or run the testsuite, so I'm hoping a ARM expert can check (and confirm)
whether this
change is safe and suitable.  [Thanks in advance and apologies for any
inconvenience].


I've bootstrapped and tested your patch on arm-none-linux-gnueabihf
so the patch looks good testing-wise.

I only have a few minor comments inline.
With those addressed the patch is ok.
That being said, GCC is now in stage 4: regression fixes only.
You'll have to wait until GCC 8 is released and stage 1 development
is reopened before committing this patch.

Thanks,
Kyrill




2018-01-14  Roger Sayle  

* config/arm/arm.md (*arm_zeroextractsi2_8_8,
*arm_signextractsi2_8_8,
*arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16,
*arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New.

2018-01-14  Roger Sayle  

* gcc.target/arm/extend-ror.c: New test.



Index: gcc/gcc/config/arm/arm.md
===
--- gcc/gcc/config/arm/arm.md   (revision 256667)
+++ gcc/gcc/config/arm/arm.md   (working copy)
@@ -11684,6 +11684,72 @@
 ""
 )
 
+ Implement zero_extract using uxtb/uxth instruction with

+ the ror #N qualifier when applicable.

Please use only two ";;" as comments.



/* { dg-do compile } */
/* { dg-options "-O -march=armv6" } */
/* { dg-prune-output "switch .* conflicts with" } */

These directives can be tricky to get right for all the multilib
variations that folks test. I would recommend you use something like:
/* { dg-do compile } */
/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { 
"-march=armv6" } } */
/* { dg-require-effective-target arm_arm_ok } */
/* { dg-add-options arm_arch_v6 } */
/* { dg-additional-options "-O -marm" } */

That way it properly adds -march=armv6 -marm and skips the test gracefully if 
the
user tries to force an incompatible set of options.

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jan Hubicka


Dne 2018-01-22 16:57, Jakub Jelinek napsal:

On Mon, Jan 22, 2018 at 02:43:50PM +0100, Jan Hubicka wrote:

> But this looks incorrect, because that computes only the above c2 in
> prob, not
> second_prob.  It needs to be
>   prob = cprob.invert () * prob / first_prob.invert ();
> or
>   prob *= cprob.invert () / first_prob.invert ();
> or that
>   prob = (prob - first_prob) / first_prob.invert ();
> I had in the patch.  The last one looked to me like cheapest to compute.

Aha, I see.  Ok that makes sense to me now :)
>
> > void profile_probability::split (profile_probability cprob,
> > profile_probability *first_prob, profile_probability *second_prob)
>
> This doesn't take prob as an argument, or should it be a method and take
> prob as *this?  Couldn't it simply return first_prob and adjust itself,
> so
> profile_probability split (profile_probability cprob)
> {
>   profile_probability ret = *this * cprob;
>   *this = (*this - prob) / ret.invert ();
> // or *this = *this * cprob.invert () / ret.invert ();
>   return ret;
> }
> ?

Yep, that is fine.


So like this if it passes bootstrap/regtest on {x86_64,i686}-linux?

Besides adding the new helper method and using it in the spots I've 
changed

plus the two you've mentioned, I had to do another fix - looking
at how TRUTH_ANDIF_EXPR is handled in dojump_1 and seeing Invalid sum
messages in predict-8.c's *.expand jump, I've realized that it also 
works

as is only for the discussed, i.e.
  if (x)
goto t; // prob
transformed into:
  if (a)
goto t; // first_prob
  if (b)
goto t; // prob
but not for the case where and_them is true, where we have:
  if (c)
goto dummy; // first_prob.invert ()
  if (d)
goto t; // prob
dummy:
which needs to be handled similarly to the TRUTH_ANDIF_EXPR case.
At least when trying in gdb on predict-8.c various values of cprob
(1%, 15%, 50%, 62%, 98%) none of them generate the Invalid sum 
messages.


2018-01-22  Jakub Jelinek  

PR tree-optimization/83081
* profile-count.h (profile_probability::split): New method.
* dojump.c (do_jump_1) :
Use profile_probability::split.
(do_compare_rtx_and_jump): Fix adjustment of probabilities
when splitting a single conditional jump into 2.

* gcc.dg/predict-8.c: Adjust expected probability.

--- gcc/profile-count.h.jj  2018-01-19 19:41:52.910549618 +0100
+++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100
@@ -410,6 +410,19 @@ public:
   return *this;
 }

+  /* Split *THIS (ORIG) probability into 2 probabilities, such that
+ the returned one (FIRST) is *THIS * CPROB and *THIS is
+ adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
+ == ORIG.  */


To make documentation clear, i would include the pseudocode with 
conditionals and individual

probabilities.
It is bit non-obvious transformation and it would be nice to reduce 
level of apparent

magicness in profiling code :)

OK,
Honza


+  profile_probability split (const profile_probability &cprob)
+{
+  profile_probability ret = *this * cprob;
+  /* The following is equivalent to:
+*this = cprob.invert () * *this / ret.invert ();  */
+  *this = (*this - ret) / ret.invert ();
+  return ret;
+}
+
   gcov_type apply (gcov_type val) const
 {
   if (*this == profile_probability::uninitialized ())
--- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100
+++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100
@@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = 
profile_probability::uninitialized ();

 if (prob.initialized_p ())
   {
-profile_probability false_prob = prob.invert ();
-profile_probability op0_false_prob = 
false_prob.apply_scale (1, 2);
-	profile_probability op1_false_prob = false_prob.apply_scale (1, 
2)

-   / op0_false_prob.invert ();
+   op1_prob = prob.invert ();
+   op0_prob = op1_prob.split (profile_probability::even ());
 /* Get the probability that each jump below is true.  */
-op0_prob = op0_false_prob.invert ();
-op1_prob = op1_false_prob.invert ();
+   op0_prob = op0_prob.invert ();
+   op1_prob = op1_prob.invert ();
   }
if (if_false_label == NULL)
   {
@@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = 
profile_probability::uninitialized ();

 if (prob.initialized_p ())
   {
-op0_prob = prob.apply_scale (1, 2);
-op1_prob = prob.apply_scale (1, 2) / op0_prob.invert ();
+   op1_prob = prob;
+   op0_prob = op1_prob.split (profile_probability::even ());
  }
if (if_true_label == NULL)
  {
@@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op

  else
{
- profile_probability first_prob = prob;

Disable some patterns for fold-left reductions (PR 83965)

2018-01-22 Thread Richard Sandiford

In this PR we recognised a PLUS_EXPR as a fold-left reduction,
then applied pattern matching to convert it to a WIDEN_SUM_EXPR.
We need to keep the original code in this case since we implement
the reduction using scalar rather than vector operations.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-01-22  Richard Sandiford  

gcc/
PR tree-optimization/83965
* tree-vect-patterns.c (vect_reassociating_reduction_p): New function.
(vect_recog_dot_prod_pattern, vect_recog_sad_pattern): Use it
instead of checking only for a reduction.
(vect_recog_widen_sum_pattern): Likewise.

gcc/testsuite/
PR tree-optimization/83965
* gcc.dg/vect/pr83965.c: New test.

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2018-01-13 18:01:51.240735098 +
+++ gcc/tree-vect-patterns.c2018-01-22 17:51:06.751462689 +
@@ -217,6 +217,16 @@ vect_recog_temp_ssa_var (tree type, gimp
   return make_temp_ssa_name (type, stmt, "patt");
 }
 
+/* Return true if STMT_VINFO describes a reduction for which reassociation
+   is allowed.  */
+
+static bool
+vect_reassociating_reduction_p (stmt_vec_info stmt_vinfo)
+{
+  return (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
+ && STMT_VINFO_REDUC_TYPE (stmt_vinfo) != FOLD_LEFT_REDUCTION);
+}
+
 /* Function vect_recog_dot_prod_pattern
 
Try to find the following pattern:
@@ -336,7 +346,7 @@ vect_recog_dot_prod_pattern (vec *s
 {
   gimple *def_stmt;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def
+  if (!vect_reassociating_reduction_p (stmt_vinfo)
  && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo))
return NULL;
   plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
@@ -1181,7 +1191,7 @@ vect_recog_widen_sum_pattern (vec

Fix vect_float markup for a couple of tests (PR 83888)

2018-01-22 Thread Richard Sandiford

vect_float is true for arm*-*-* targets, but the support is only
available when -funsafe-math-optimizations is on.  This caused
failures in two tests that disable fast-math.

The easiest fix seemed to be to add a new target selector for
"vect_float without special options".

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-01-22  Richard Sandiford  

gcc/
PR testsuite/83888
* doc/sourcebuild.texi (vect_float): Say that the selector
only describes the situation when -funsafe-math-optimizations is on.
(vect_float_strict): Document.

gcc/testsuite/
PR testsuite/83888
* lib/target-supports.exp (check_effective_target_vect_float): Say
that the result only holds when -funsafe-math-optimizations is on.
(check_effective_target_vect_float_strict): New procedure.
* gcc.dg/vect/no-fast-math-vect16.c: Use vect_float_strict instead
of vect_float.
* gcc.dg/vect/vect-reduc-6.c: Likewise.

Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi2018-01-22 17:51:03.860579049 +
+++ gcc/doc/sourcebuild.texi2018-01-22 17:54:02.172848564 +
@@ -1403,7 +1403,13 @@ The target's preferred vector alignment
 alignment.
 
 @item vect_float
-Target supports hardware vectors of @code{float}.
+Target supports hardware vectors of @code{float} when
+@option{-funsafe-math-optimizations} is in effect.
+
+@item vect_float_strict
+Target supports hardware vectors of @code{float} when
+@option{-funsafe-math-optimizations} is not in effect.
+This implies @code{vect_float}.
 
 @item vect_int
 Target supports hardware vectors of @code{int}.
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   2018-01-22 17:51:03.817580787 
+
+++ gcc/testsuite/lib/target-supports.exp   2018-01-22 17:54:02.173848531 
+
@@ -5492,7 +5492,8 @@ proc check_effective_target_vect_long {
 return $answer
 }
 
-# Return 1 if the target supports hardware vectors of float, 0 otherwise.
+# Return 1 if the target supports hardware vectors of float when
+# -funsafe-math-optimizations is enabled, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
 
@@ -5525,6 +5526,14 @@ proc check_effective_target_vect_float {
 return $et_vect_float_saved($et_index)
 }
 
+# Return 1 if the target supports hardware vectors of float without
+# -funsafe-math-optimizations being enabled, 0 otherwise.
+
+proc check_effective_target_vect_float_strict { } {
+return [expr { [check_effective_target_vect_float]
+  && ![istarget arm*-*-*] }]
+}
+
 # Return 1 if the target supports hardware vectors of double, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
Index: gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c
===
--- gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-13 
18:01:15.293116922 +
+++ gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-22 
17:54:02.172848564 +
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_float_strict } */
 
 #include 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-13 18:01:15.294116882 
+
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-22 17:54:02.172848564 
+
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_float_strict } */
 /* { dg-additional-options "-fno-fast-math" } */
 
 #include

Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble

2018-01-22 Thread Segher Boessenkool

Hi Mike,

Thanks for doing this!

On Sun, Jan 21, 2018 at 07:03:58PM -0500, Michael Meissner wrote:
> I've reworked the patch for PR83864.  This bug is due to the compiler issuing
> an internal error for code of the form on a little endian system:
> 
> int sb (_Float128 *p, unsigned long n) { return __builtin_signbit (p[n]); 
> }
> 
> The problem is that the memory optimization wants to load the high double word
> from memory to isolate the signbit.  In little endian, the high word is at
> offset 8 instead of 0.
> 
> The bug will also show up if you use the long double type, and you use the
> -mabi=ieeelongdouble option.
> 
> Previously the code did one UNSPEC (signbit2_dm) that was done before
> register allocation, and combined the value being in vector register, memory,
> and GPR register along with the shift to isolate the signbit.  Then after the
> split after register allocation, it created a second UNSPEC
> (signbit2_dm2) that was just the direct move, or it did the memory and
> GPR code path separately.
> 
> With these patches, the generator code issues an UNSPEC (signbit2_dm)
> just to get the high double word, and the shift right is then emitted.  The
> UNSPEC only handles the value being in either vector or GPR register.  There 
> is
> a second UNSPEC that is created by the combiner if the value is in memory.  On
> little endian systems, the first split pass (before register allocation) will
> allocate a pseudo register to hold the initial ADD of the base and index
> registers for indexed loads, and then forms a LD reg,8(tmp) to load the high
> word.  Just in case, some code after register allocation reforms the UNSPEC, 
> it
> uses a base register for the load, and it can use the base register as needed
> for the temporary.

But does it have to do an unspec at all?  Can't it just immediately take
the relevant 64-bit half (during expand), no unspec in sight, and that
is that?

> I have run this code on both little endian and big endian power8 systems, 
> doing
> bootstrap and regression test.  There were no regressions.  Can I install this
> bug on the trunk?
> 
> I don't believe the bug shows up in gcc 6/7 branches, but I will build these
> and test to see if the code is needed to be backported.

If it is needed on the branches, okay for there too (once 7 reopens).

> [gcc]
> 2018-01-21  Michael Meissner  
> 
>   PR target/83862
>   * config/rs6000/rs6000-protos.h (rs6000_split_signbit): Delete,
>   no longer used.
>   * config/rs6000/rs6000.c (rs6000_split_signbit): Likewise.
>   * config/rs6000/rs6000.md (signbit2): Change code for IEEE
>   128-bit to produce an UNSPEC move to get the double word with the
>   signbit and then a shift directly to do signbit.
>   (signbit2_dm): Replace old IEEE 128-bit signbit
>   implementation with a new version that just does either a direct
>   move or a regular move.  Move memory interface to separate insns.
>   Move insns so they are next to the expander.
>   (signbit2_dm_mem_be): New combiner insns to combine load
>   with signbit move.  Split big and little endian case.
>   (signbit2_dm_mem_le): Likewise.
>   (signbit2_dm_ext): Delete, no longer used.
>   (signbit2_dm2): Likewise.
> 
> [gcc/testsuite]
> 2018-01-22  Michael Meissner  
> 
>   PR target/83862
>   * gcc.target/powerpc/pr83862.c: New test.

The patch is okay for trunk, but I still think this can be a lot simpler.

> +;;  We can't use SUBREG:DI of the IEEE 128-bit value before register
> +;; allocation, because KF/TFmode aren't tieable with DImode.  Also, having 
> the
> +;; 64-bit scalar part in the high end of the register instead of the low end
> +;; also complicates things.  So instead, we use an UNSPEC to do the move of 
> the
> +;; high double word to isolate the signbit.

I'll think about this.

Either way, thank you for your patience!  And the patch :-)


Segher

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 06:44:17PM +0100, Jan Hubicka wrote:
> > +  /* Split *THIS (ORIG) probability into 2 probabilities, such that
> > + the returned one (FIRST) is *THIS * CPROB and *THIS is
> > + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
> > + == ORIG.  */
> 
> To make documentation clear, i would include the pseudocode with
> conditionals and individual
> probabilities.
> It is bit non-obvious transformation and it would be nice to reduce level of
> apparent
> magicness in profiling code :)

So like this?

2018-01-22  Jakub Jelinek  

PR tree-optimization/83081
* profile-count.h (profile_probability::split): New method.
* dojump.c (do_jump_1) :
Use profile_probability::split.
(do_compare_rtx_and_jump): Fix adjustment of probabilities
when splitting a single conditional jump into 2.

* gcc.dg/predict-8.c: Adjust expected probability.

--- gcc/profile-count.h.jj  2018-01-19 19:41:52.910549618 +0100
+++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100
@@ -410,6 +410,30 @@ public:
   return *this;
 }
 
+  /* Split *THIS (ORIG) probability into 2 probabilities, such that
+ the returned one (FIRST) is *THIS * CPROB and *THIS is
+ adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
+ == ORIG.  This is useful e.g. when splitting a conditional
+ branch like:
+ if (cond)
+   goto lab; // ORIG probability
+ into
+ if (cond1)
+   goto lab; // FIRST = ORIG * CPROB probability
+ if (cond2)
+   goto lab; // SECOND probability
+ such that the overall probability of jumping to lab remains
+ the same.  CPROB gives the relative probability between the
+ branches.  */
+  profile_probability split (const profile_probability &cprob)
+{
+  profile_probability ret = *this * cprob;
+  /* The following is equivalent to:
+ *this = cprob.invert () * *this / ret.invert ();  */
+  *this = (*this - ret) / ret.invert ();
+  return ret;
+}
+
   gcov_type apply (gcov_type val) const
 {
   if (*this == profile_probability::uninitialized ())
--- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100
+++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100
@@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = profile_probability::uninitialized ();
 if (prob.initialized_p ())
   {
-profile_probability false_prob = prob.invert ();
-profile_probability op0_false_prob = false_prob.apply_scale (1, 2);
-   profile_probability op1_false_prob = false_prob.apply_scale (1, 2)
-   / op0_false_prob.invert ();
+   op1_prob = prob.invert ();
+   op0_prob = op1_prob.split (profile_probability::even ());
 /* Get the probability that each jump below is true.  */
-op0_prob = op0_false_prob.invert ();
-op1_prob = op1_false_prob.invert ();
+   op0_prob = op0_prob.invert ();
+   op1_prob = op1_prob.invert ();
   }
if (if_false_label == NULL)
   {
@@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = profile_probability::uninitialized ();
 if (prob.initialized_p ())
   {
-op0_prob = prob.apply_scale (1, 2);
-op1_prob = prob.apply_scale (1, 2) / op0_prob.invert ();
+   op1_prob = prob;
+   op0_prob = op1_prob.split (profile_probability::even ());
  }
if (if_true_label == NULL)
  {
@@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op
 
  else
{
- profile_probability first_prob = prob;
+ profile_probability cprob
+   = profile_probability::guessed_always ();
  if (first_code == UNORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-(1, 100);
+   cprob = cprob.apply_scale (1, 100);
  else if (first_code == ORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-(99, 100);
+   cprob = cprob.apply_scale (99, 100);
+ else
+   cprob = profile_probability::even ();
+ /* We want to split:
+if (x) goto t; // prob;
+into
+if (a) goto t; // first_prob;
+if (b) goto t; // prob;
+such that the overall probability of jumping to t
+remains the same and first_prob is prob * cprob.  */
  if (and_them)
{
  rtx_code_label *dest_label;
+ prob = prob.invert ();
+ profile_probability first_prob = prob.split (cprob).invert ();
+ prob = prob.invert ();
  /*

[Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Paul Richard Thomas

This patch has been triggered by Thomas's recent message to the list.
Not only did I start work late relative to stage 3 but debugging took
somewhat longer than anticipated. Therefore, to get this committed
asap, we will have to beg the indulgence of the release managers and
prompt review and/or testing by fortran maintainers. (Dominique has
already undertaken to test -m32.)

The patch delivers rank up to 15 for F2008, the descriptor information
needed to enact the F2018 C descriptor macros and an attribute field
to store such information as pointer/allocatable, contiguous etc..
Only the first has been enabled so far but it was necessary to submit
the array descriptor changes now to avoid any further ABI breakage in
9.0.0.

I took the design choice choice to replace the dtype with a structure:
typedef struct dtype_type
{
  size_t elem_len;
  int version;
  int rank;
  int type;
  int attribute;
}
dtype_type;

This choice was intended to reduce the changes to a minimum, since in
most references to the dtype, one dtype is assigned to another.

The F2018 interop defines the 'type and 'attribute fields to be signed
char types. I used this intially but found that using int was the best
way to silence the warnings about padding since it also allows for
more attribute information to be carried.

Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as
if latent bugs were uncovered by the change to the descriptor. If so,
the time spent debugging was well worthwhile.

It should be noted that some of the intrinsics, which use switch/case
for the type/kind selection, limit the effective element size that
they handle to the maximum value of size_t, less 7 bits. A bit of
straightforward work there would fix this limitation and would allow
the GFC_DTYPE shifts and masks to be eliminated.

Bootstraps and regtests on FC23/x86_64 - OK for trunk?

Paul

2018-22-01  Paul Thomas  

PR fortran/37577
* array.c (gfc_match_array_ref): If standard earlier than F2008
it is an error if the reference dimension is greater than 7.
libgfortran.h : Increase GFC_MAX_DIMENSIONS to 15. Change the
dtype masks and shifts accordingly.
* trans-array.c (gfc_conv_descriptor_dtype): Use the dtype
type node to check the field.
(gfc_conv_descriptor_dtype): Access the rank field of dtype.
(duplicate_allocatable_coarray): Access the rank field of the
dtype descriptor rather than the dtype itself.
* trans-expr.c (get_scalar_to_descriptor_type): Store the type
of 'scalar' on entry and use its TREE_TYPE if it is ARRAY_TYPE
(ie. a character).
(gfc_conv_procedure_call): Pass TREE_OPERAND (tmp,0) to
get_scalar_to_descriptor_type if the actual expression is a
constant.
(gfc_trans_structure_assign): Assign the rank directly to the
dtype rank field.
(gfc_conv_intrinsic_sizeof): Obtain the element size from the
'elem_len' field of the dtype.
* trans-io.c (gfc_build_io_library_fndecls): Replace
gfc_int4_type_node with dtype_type_node where necessary.
(transfer_namelist_element): Use gfc_get_dtype_rank_type for
scalars.
* trans-types.c : Provide 'get_dtype_type_node' to acces the
dtype_type_node and, if necessary, build it.
The maximum size of an array element is now determined by the
maximum value of size_t.
Update the description of the array descriptor, including the
type def for the dtype_type.
(gfc_get_dtype_rank_type): Build a constructor for the dtype.
Distinguish RECORD_TYPEs that are BT_DERIVED or BT_CLASS.
(gfc_get_array_descriptor_base): Change the type of the dtype
field to dtype_type_node.
(gfc_get_array_descr_info): Get the offset to the rank field of
the dtype.
* trans-types.h : Add a prototype for 'get_dtype_type_node ()'.
* trans.h : Define the indices of the dtype fields.

2018-22-01  Paul Thomas  

PR fortran/37577
* gfortran.dg/coarray_18.f90: Allow dimension 15 for F2008.
* gfortran.dg/coarray_lib_this_image_2.f90: Change 'array1' to
'array01' in the tree dump comparison.
* gfortran.dg/coarray_lib_token_4.f90: Likewise.
* gfortran.dg/inline_sum_1.f90: Similar - allow two digits.
* gfortran.dg/rank_1.f90: Allow dimension 15 for F2008.

2018-22-01  Paul Thomas  

PR fortran/37577
* caf/single.c (_gfortran_caf_failed_images): Access the 'type'
and 'elem_len' fields of the dtype instead of the shifts.
(_gfortran_caf_stopped_images): Likewise.
* intrinsics/associated.c (associated): Compare the 'type' and
'elem_len' fields instead of the dtype.
* caf/date_and_time.c : Access the dtype fields rather using
shifts and masks.
* io/transfer.c (transfer_array ): Comment on item count.
(set_nml_var,st_set_nml_var): Change dtype type and use fields.
(st_set_nml_dtio_var): Likewise.
* libgfortran.h : Change definition of GFC_ARRAY_DESCRIPTOR and
add a typedef for the dtype_type. Change the GFC_DTYPE_* macros
to

Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble

2018-01-22 Thread Michael Meissner

On Mon, Jan 22, 2018 at 12:32:13PM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> Thanks for doing this!
> 
> On Sun, Jan 21, 2018 at 07:03:58PM -0500, Michael Meissner wrote:
> > With these patches, the generator code issues an UNSPEC (signbit2_dm)
> > just to get the high double word, and the shift right is then emitted.  The
> > UNSPEC only handles the value being in either vector or GPR register.  
> > There is
> > a second UNSPEC that is created by the combiner if the value is in memory.  
> > On
> > little endian systems, the first split pass (before register allocation) 
> > will
> > allocate a pseudo register to hold the initial ADD of the base and index
> > registers for indexed loads, and then forms a LD reg,8(tmp) to load the high
> > word.  Just in case, some code after register allocation reforms the 
> > UNSPEC, it
> > uses a base register for the load, and it can use the base register as 
> > needed
> > for the temporary.
> 
> But does it have to do an unspec at all?  Can't it just immediately take
> the relevant 64-bit half (during expand), no unspec in sight, and that
> is that?

Yes it needs the UNSPEC.  As I said, you can't use SUBREG due to a combination
of MODES_TIEABLE_P and the way things are represented in vector registers (with
the scalar part being in the upper part of the register).  I tried it, and
could not get it to work.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[PATCH] Finish removing class move_computations_dom_walker

2018-01-22 Thread David Malcolm

r232820 (aka 2c7b2f8860794cc9b9cf5eeea9d7dc109c0de3be) removed the
implementation of class move_computations_dom_walker, but kept the decl.

This patch removes the stray decl.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
PR tree-optimization/69452
* tree-ssa-loop-im.c (class move_computations_dom_walker): Remove decl.
---
 gcc/tree-ssa-loop-im.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 7864fbd..4655766 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -1081,17 +1081,6 @@ invariantness_dom_walker::before_dom_children 
(basic_block bb)
   return NULL;
 }
 
-class move_computations_dom_walker : public dom_walker
-{
-public:
-  move_computations_dom_walker (cdi_direction direction)
-: dom_walker (direction), todo_ (0) {}
-
-  virtual edge before_dom_children (basic_block);
-
-  unsigned int todo_;
-};
-
 /* Hoist the statements in basic block BB out of the loops prescribed by
data stored in LIM_DATA structures associated with each statement.  Callback
for walk_dominator_tree.  */
-- 
1.8.5.3

[PATCH] Fix memory leaks in sbitmap.c selftests

2018-01-22 Thread David Malcolm

"make selftest-valgrind" shows a few leaks in sbitmap.c's selftests;
this patch fixes them.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* sbitmap.c (selftest::test_set_range): Fix memory leaks.
(selftest::test_bit_in_range): Likewise.
---
 gcc/sbitmap.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index cf46cb2..967868a 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -897,6 +897,7 @@ test_set_range ()
   bitmap_set_range (s, 15, 1);
   ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 14));
   ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 15, 15));
+  sbitmap_free (s);
 
   s = sbitmap_alloc (1024);
   bitmap_clear (s);
@@ -914,6 +915,7 @@ test_set_range ()
   ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 512 + 64, 1023));
   ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512));
   ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512 + 63, 512 + 63));
+  sbitmap_free (s);
 }
 
 /* Verify bitmap_bit_in_range_p functions for sbitmap.  */
@@ -935,6 +937,8 @@ test_bit_in_range ()
   ASSERT_TRUE (bitmap_bit_in_range_p (s, 100, 100));
   ASSERT_TRUE (bitmap_bit_p (s, 100));
 
+  sbitmap_free (s);
+
   s = sbitmap_alloc (64);
   bitmap_clear (s);
   bitmap_set_bit (s, 63);
@@ -942,6 +946,7 @@ test_bit_in_range ()
   ASSERT_TRUE (bitmap_bit_in_range_p (s, 1, 63));
   ASSERT_TRUE (bitmap_bit_in_range_p (s, 63, 63));
   ASSERT_TRUE (bitmap_bit_p (s, 63));
+  sbitmap_free (s);
 
   s = sbitmap_alloc (1024);
   bitmap_clear (s);
@@ -985,6 +990,7 @@ test_bit_in_range ()
   ASSERT_FALSE (bitmap_bit_in_range_p (s, 17, 31));
   ASSERT_FALSE (bitmap_bit_in_range_p (s, 49, 63));
   ASSERT_FALSE (bitmap_bit_in_range_p (s, 65, 1023));
+  sbitmap_free (s);
 }
 
 /* Run all of the selftests within this file.  */
-- 
1.8.5.3

[PATCH] Add more test coverage to selftest::test_location_wrappers

2018-01-22 Thread David Malcolm

This patch adds a few extra assertions to selftest::test_location_wrappers.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* tree.c (selftest::test_location_wrappers): Add more test
coverage.
---
 gcc/tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tree.c b/gcc/tree.c
index b3e93b8..c5baf08 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -14490,6 +14490,8 @@ test_location_wrappers ()
 {
   location_t loc = BUILTINS_LOCATION;
 
+  ASSERT_EQ (NULL_TREE, maybe_wrap_with_location (NULL_TREE, loc));
+
   /* Wrapping a constant.  */
   tree int_cst = build_int_cst (integer_type_node, 42);
   ASSERT_FALSE (CAN_HAVE_LOCATION_P (int_cst));
@@ -14500,6 +14502,14 @@ test_location_wrappers ()
   ASSERT_EQ (loc, EXPR_LOCATION (wrapped_int_cst));
   ASSERT_EQ (int_cst, tree_strip_any_location_wrapper (wrapped_int_cst));
 
+  /* We shouldn't add wrapper nodes for UNKNOWN_LOCATION.  */
+  ASSERT_EQ (int_cst, maybe_wrap_with_location (int_cst, UNKNOWN_LOCATION));
+
+  /* We shouldn't add wrapper nodes for nodes that CAN_HAVE_LOCATION_P.  */
+  tree cast = build1 (NOP_EXPR, char_type_node, int_cst);
+  ASSERT_TRUE (CAN_HAVE_LOCATION_P (cast));
+  ASSERT_EQ (cast, maybe_wrap_with_location (cast, loc));
+
   /* Wrapping a STRING_CST.  */
   tree string_cst = build_string (4, "foo");
   ASSERT_FALSE (CAN_HAVE_LOCATION_P (string_cst));
-- 
1.8.5.3

Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)

2018-01-22 Thread Jan Hubicka


Dne 2018-01-22 19:36, Jakub Jelinek napsal:

On Mon, Jan 22, 2018 at 06:44:17PM +0100, Jan Hubicka wrote:

> +  /* Split *THIS (ORIG) probability into 2 probabilities, such that
> + the returned one (FIRST) is *THIS * CPROB and *THIS is
> + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
> + == ORIG.  */

To make documentation clear, i would include the pseudocode with
conditionals and individual
probabilities.
It is bit non-obvious transformation and it would be nice to reduce 
level of

apparent
magicness in profiling code :)


So like this?

2018-01-22  Jakub Jelinek  

PR tree-optimization/83081
* profile-count.h (profile_probability::split): New method.
* dojump.c (do_jump_1) :
Use profile_probability::split.
(do_compare_rtx_and_jump): Fix adjustment of probabilities
when splitting a single conditional jump into 2.

* gcc.dg/predict-8.c: Adjust expected probability.


Looks great. Thanks a lot!
Honza


--- gcc/profile-count.h.jj  2018-01-19 19:41:52.910549618 +0100
+++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100
@@ -410,6 +410,30 @@ public:
   return *this;
 }

+  /* Split *THIS (ORIG) probability into 2 probabilities, such that
+ the returned one (FIRST) is *THIS * CPROB and *THIS is
+ adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
+ == ORIG.  This is useful e.g. when splitting a conditional
+ branch like:
+ if (cond)
+   goto lab; // ORIG probability
+ into
+ if (cond1)
+   goto lab; // FIRST = ORIG * CPROB probability
+ if (cond2)
+   goto lab; // SECOND probability
+ such that the overall probability of jumping to lab remains
+ the same.  CPROB gives the relative probability between the
+ branches.  */
+  profile_probability split (const profile_probability &cprob)
+{
+  profile_probability ret = *this * cprob;
+  /* The following is equivalent to:
+ *this = cprob.invert () * *this / ret.invert ();  */
+  *this = (*this - ret) / ret.invert ();
+  return ret;
+}
+
   gcov_type apply (gcov_type val) const
 {
   if (*this == profile_probability::uninitialized ())
--- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100
+++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100
@@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = 
profile_probability::uninitialized ();

 if (prob.initialized_p ())
   {
-profile_probability false_prob = prob.invert ();
-profile_probability op0_false_prob = 
false_prob.apply_scale (1, 2);
-	profile_probability op1_false_prob = false_prob.apply_scale (1, 
2)

-   / op0_false_prob.invert ();
+   op1_prob = prob.invert ();
+   op0_prob = op1_prob.split (profile_probability::even ());
 /* Get the probability that each jump below is true.  */
-op0_prob = op0_false_prob.invert ();
-op1_prob = op1_false_prob.invert ();
+   op0_prob = op0_prob.invert ();
+   op1_prob = op1_prob.invert ();
   }
if (if_false_label == NULL)
   {
@@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0
 profile_probability op1_prob = 
profile_probability::uninitialized ();

 if (prob.initialized_p ())
   {
-op0_prob = prob.apply_scale (1, 2);
-op1_prob = prob.apply_scale (1, 2) / op0_prob.invert ();
+   op1_prob = prob;
+   op0_prob = op1_prob.split (profile_probability::even ());
  }
if (if_true_label == NULL)
  {
@@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op

  else
{
- profile_probability first_prob = prob;
+ profile_probability cprob
+   = profile_probability::guessed_always ();
  if (first_code == UNORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-(1, 100);
+   cprob = cprob.apply_scale (1, 100);
  else if (first_code == ORDERED)
-   first_prob = profile_probability::guessed_always ().apply_scale
-(99, 100);
+   cprob = cprob.apply_scale (99, 100);
+ else
+   cprob = profile_probability::even ();
+ /* We want to split:
+if (x) goto t; // prob;
+into
+if (a) goto t; // first_prob;
+if (b) goto t; // prob;
+such that the overall probability of jumping to t
+remains the same and first_prob is prob * cprob.  */
  if (and_them)
{
  rtx_code_label *dest_label;
+ prob = prob.invert ();
+ profile_probability first_prob = prob.split (cprob).invert ();

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Janne Blomqvist

On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
 wrote:
> This patch has been triggered by Thomas's recent message to the list.
> Not only did I start work late relative to stage 3 but debugging took
> somewhat longer than anticipated. Therefore, to get this committed
> asap, we will have to beg the indulgence of the release managers and
> prompt review and/or testing by fortran maintainers. (Dominique has
> already undertaken to test -m32.)

I think that if we can "guarantee" that we're happy with the current
ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case
for committing it now. But if anybody if planning on doing some
ABI-breaking work in the foreseeable future then maybe we should wait
until GCC 9 stage1 opens. Anybody with such plans?

> The patch delivers rank up to 15 for F2008, the descriptor information
> needed to enact the F2018 C descriptor macros and an attribute field
> to store such information as pointer/allocatable, contiguous etc..
> Only the first has been enabled so far but it was necessary to submit
> the array descriptor changes now to avoid any further ABI breakage in
> 9.0.0.
>
> I took the design choice choice to replace the dtype with a structure:
> typedef struct dtype_type
> {
>   size_t elem_len;
>   int version;
>   int rank;
>   int type;
>   int attribute;
> }
> dtype_type;
>
> This choice was intended to reduce the changes to a minimum, since in
> most references to the dtype, one dtype is assigned to another.

The only potential objection I can come up with is that if we at some
point want to make the F2018 C interoperability descriptor the native
descriptor type, then we have to mash those fields into the parent
descriptor anyway.  But I don't really think this is a very strong
objection, as I don't think it's a huge difference in the amount of
work whichever path is taken. If this is the easiest path for now, so
be it.

> The F2018 interop defines the 'type and 'attribute fields to be signed
> char types. I used this intially but found that using int was the best
> way to silence the warnings about padding since it also allows for
> more attribute information to be carried.

To be pedantic, F2018 doesn't require that they be signed char, it's
just what I wrote on https://gcc.gnu.org/wiki/ArrayDescriptorUpdate as
*one* potential implementation of ISO_Fortran_binding.h. What the
(draft) F2018 standard says is:

CFI_attribute_t is a typedef name for a standard integer type capable
of representing the values of the attribute codes.

CFI_type_t is a typedef name for a standard integer type capable of
representing the values for the supported type specifiers.

CFI_rank_t is a typedef name for a standard integer type capable of
representing the largest supported rank.

So using int is standard conforming.

(elem_len and version are explicitly specified to be size_t and int,
respectively)

> It should be noted that some of the intrinsics, which use switch/case
> for the type/kind selection, limit the effective element size that
> they handle to the maximum value of size_t, less 7 bits. A bit of
> straightforward work there would fix this limitation and would allow
> the GFC_DTYPE shifts and masks to be eliminated.

Hmm, is this a hidden ABI break, then? Are you saying that even after
this patch we encode the type/kind into the elem_len field? If at some
point we stop doing that and use the type field, it's an ABI break
even though the descriptor layout doesn't change?

(I haven't had time to look into the patch itself, yet. If we decide
to do this for GCC 8 I'll try to find some time.)

-- 
Janne Blomqvist

[PATCH] v2 -Warray-bounds: Fix false positive in some "switch" stmts (PR tree-optimization/83510)

2018-01-22 Thread David Malcolm

On Fri, 2018-01-19 at 09:45 +0100, Richard Biener wrote:
> On Fri, Jan 19, 2018 at 12:36 AM, David Malcolm 
> wrote:
> > PR tree-optimization/83510 reports that r255649 (for
> > PR tree-optimization/83312) introduced a false positive for
> > -Warray-bounds for array accesses within certain switch statements:
> > those for which value-ranges allow more than one case to be
> > reachable,
> > but for which one or more of the VR-unreachable cases contain
> > out-of-range array accesses.
> > 
> > In the reproducer, after the switch in f is inlined into g, we have
> > 3 cases
> > for the switch (case 9, case 10-19, and default), within a loop
> > that
> > ranges from 0..9.
> > 
> > With both the old and new code,
> > vr_values::simplify_switch_using_ranges clears
> > the EDGE_EXECUTABLE flag on the edge to the "case 10-19"
> > block.  This
> > happens during the dom walk within the substitute_and_fold_engine.
> > 
> > With the old code, the clearing of that EDGE_EXECUTABLE flag led to
> > the
> >   /* Skip blocks that were found to be unreachable.  */
> > code in the old implementation of vrp_prop::check_all_array_refs
> > skipping
> > the "case 10-19" block.
> > 
> > With the new code, we have a second dom walk, and that dom_walker's
> > ctor
> > sets all edges to be EDGE_EXECUTABLE, losing that information.
> > 
> > Then, dom_walker::before_dom_children (here, the subclass'
> > check_array_bounds_dom_walker::before_dom_children) can return one
> > edge, if
> > there's a unique successor edge, and dom_walker::walk filters the
> > dom walk
> > to just that edge.
> > 
> > Here we have two VR-valid edges (case 9 and default), and an VR-
> > invalid
> > successor edge (case 10-19).  There's no *unique* valid successor
> > edge,
> > and hence taken_edge is NULL, and the filtering in dom_walker::walk
> > doesn't fire.
> > 
> > Hence we've lost the filtering of the "case 10-19" BB, hence the
> > false
> > positive.
> > 
> > The issue is that we have two dom walks: first within vr_values'
> > substitute_and_fold_dom_walker (which has skip_unreachable_blocks
> > == false),
> > then another within vrp_prop::check_all_array_refs (with
> > skip_unreachable_blocks == true).
> > 
> > Each has different "knowledge" about ruling out edges due to value-
> > ranges,
> > but we aren't combining that information.  The former "knows" about
> > out-edges at a particular control construct (e.g. at a switch), the
> > latter
> > "knows" about dominance, but only about unique successors (hence
> > the
> > problem when two out of three switch cases are valid).
> > 
> > This patch combines the information by preserving the
> > EDGE_EXECUTABLE
> > flags from the first dom walk, and using it in the second dom walk,
> > potentially rejecting additional edges.
> > 
> > Doing so fixes the false positive.
> > 
> > I attempted an alternative fix, merging the two dom walks into one,
> > but
> > that led to crashes in identify_jump_threads, so I went with this,
> > as
> > a less invasive fix.
> > 
> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > OK for trunk?
> 
> Ok, but I think you need to update the domwalk construction in
> graphite-scop-detection.c as well - did you test w/o graphite?

Thanks.  I did test with graphite enabled; the use of default args meant I
didn't have to touch that code.

That said, I've become unhappy the two bool params (both defaulting
to false) for dom_walker's ctor, especially given that they're
really a tristate.

So here's an alternative version of the patch, which, rather than adding
a new bool, instead introduces a 3-valued enum to explicitly capture the valid
possibilities.  Also, having it as an enum rather than booleans makes the
meaning clearer, and makes the places that override the default more obvious.

(This version of the patch *did* require touching that graphite file).

> grep might be your friend...

(and indeed, with an enum, it's even more grep-friendly)

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (with graphite).
OK for trunk?
(or did you prefer the earlier patch?)

> Thanks,
> Richard.

[...snip...]

gcc/ChangeLog:
PR tree-optimization/83510
* domwalk.c (set_all_edges_as_executable): New function.
(dom_walker::dom_walker): Convert bool param
"skip_unreachable_blocks" to enum reachability.  Move setup of
edge flags to set_all_edges_as_executable and only do it when
reachability is REACHABLE_BLOCKS.
* domwalk.h (enum dom_walker::reachability): New enum.
(dom_walker::dom_walker): Convert bool param
"skip_unreachable_blocks" to enum reachability.
(set_all_edges_as_executable): New decl.
* graphite-scop-detection.c  (gather_bbs::gather_bbs): Convert
from false for "skip_unreachable_blocks" to ALL_BLOCKS for
"reachability".
* tree-ssa-dom.c (dom_opt_dom_walker::dom_opt_dom_walker): Likewise,
but converting true to REACHABLE_BLOCKS.
* tree-ssa-

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Thomas Koenig


Am 22.01.2018 um 20:59 schrieb Janne Blomqvist:

On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
 wrote:

This patch has been triggered by Thomas's recent message to the list.
Not only did I start work late relative to stage 3 but debugging took
somewhat longer than anticipated. Therefore, to get this committed
asap, we will have to beg the indulgence of the release managers and
prompt review and/or testing by fortran maintainers. (Dominique has
already undertaken to test -m32.)


I think that if we can "guarantee" that we're happy with the current
ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case
for committing it now. But if anybody if planning on doing some
ABI-breaking work in the foreseeable future then maybe we should wait
until GCC 9 stage1 opens. Anybody with such plans?


For asynchronous I/O, we could add a pointer to void (unused at
present) for later use.  That's all from my side.



It should be noted that some of the intrinsics, which use switch/case
for the type/kind selection, limit the effective element size that
they handle to the maximum value of size_t, less 7 bits. A bit of
straightforward work there would fix this limitation and would allow
the GFC_DTYPE shifts and masks to be eliminated.


Hmm, is this a hidden ABI break, then?


No.

This concerns code like

  type_size = GFC_DTYPE_TYPE_SIZE(array);

  switch(type_size)
{
case GFC_DTYPE_LOGICAL_1:
case GFC_DTYPE_INTEGER_1:
case GFC_DTYPE_DERIVED_1:
  pack_i1 ((gfc_array_i1 *) ret, (gfc_array_i1 *) array,
   (gfc_array_l1 *) mask, (gfc_array_i1 *) vector);
  return;

for example in patck_generic.c.

I think that, if we commit Paul's patch now, we can then fix these cases
before gcc 8 is released.  This is rather straigtforward.

Regards

Thomas

Re: [PATCH] Finish removing class move_computations_dom_walker

2018-01-22 Thread Richard Biener

On January 22, 2018 8:50:44 PM GMT+01:00, David Malcolm  
wrote:
>r232820 (aka 2c7b2f8860794cc9b9cf5eeea9d7dc109c0de3be) removed the
>implementation of class move_computations_dom_walker, but kept the
>decl.
>
>This patch removes the stray decl.
>
>Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>OK for trunk?

OK. 

Richard. 

>gcc/ChangeLog:
>   PR tree-optimization/69452
>   * tree-ssa-loop-im.c (class move_computations_dom_walker): Remove
>decl.
>---
> gcc/tree-ssa-loop-im.c | 11 ---
> 1 file changed, 11 deletions(-)
>
>diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>index 7864fbd..4655766 100644
>--- a/gcc/tree-ssa-loop-im.c
>+++ b/gcc/tree-ssa-loop-im.c
>@@ -1081,17 +1081,6 @@ invariantness_dom_walker::before_dom_children
>(basic_block bb)
>   return NULL;
> }
> 
>-class move_computations_dom_walker : public dom_walker
>-{
>-public:
>-  move_computations_dom_walker (cdi_direction direction)
>-: dom_walker (direction), todo_ (0) {}
>-
>-  virtual edge before_dom_children (basic_block);
>-
>-  unsigned int todo_;
>-};
>-
>/* Hoist the statements in basic block BB out of the loops prescribed
>by
>data stored in LIM_DATA structures associated with each statement. 
>Callback
>for walk_dominator_tree.  */

Re: [wwwdocs] Add GCC 7.3 section

2018-01-22 Thread Gerald Pfeifer

On Mon, 22 Jan 2018, Sebastian Huber wrote:
>> +GCC 7.3
>> +
>> +This is the > href="https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=7.3";>list
>> +of problem reports (PRs) from GCC's bug tracking system that are
>> +known to be fixed in the 7.3 release. This list might not be
>> +complete (that is, it is possible that some PRs that have been fixed
>> +are not listed here).

Yes, thanks. (I would have been fine for you to declare this obvious
and commit. ;-)

>> +Operating Systems
>> +
>> +RTEMS
>> +   
>> + Support for EPIPHANY has been added.
>> +   

Now, this doesn't look like adding a GCC 7.3 section? ;-)

Fine, too, though perhaps explain what EPIPHANY is?  As in
"Support for the EPIPHANY was-auch-immer has been added."

Thanks,
Gerald

Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)

2018-01-22 Thread Jason Merrill

On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek  wrote:
> Before location wrappers were introduced, the potential_constant_expression_1
> assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right,
> if it is a type dependent reinterpret cast, it would be
> REINTERPRET_CAST_EXPR instead.
>
> Location wrappers around decls can have NULL type, if the decl they wrap
> also has NULL type.

Hmm, why would a decl have NULL type?  Are we wrapping a CONST_DECL in
VIEW_CONVERT_EXPR?  They should get NON_LVALUE_EXPR, since they aren't
lvalues (except the objc++ ones that have TREE_STATIC set).

> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?  Alternatively it could if (TREE_TYPE (t) && ...
>
> On a related note, I wonder if the
> return (RECUR (from, TREE_CODE (t) != VIEW_CONVERT_EXPR));
> a few lines below is correct, VCE isn't always an lvalue, only if it is used
> in non-rvalue contexts.  So, shouldn't it be instead:
> return (RECUR (from, (TREE_CODE (t) == VIEW_CONVERT_EXPR
>   ? want_rval : rval)));

A VCE should always be an lvalue.  If it's used in an rvalue context,
then it's an lvalue used in an rvalue context and so the larger
expression is an rvalue, but the VCE is still an lvalue.

Jason

Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-22 Thread Paolo Carlini


Hi,

On 22/01/2018 16:52, Jason Merrill wrote:

Thus the below, carefully tested over the we, would be a change completely
removing the problematic error_at call, plus some testcases checking that we
would still do the right thing in a few cases (bug submitter added
constexpr-83921-2.C). The updated stmtexpr20.C shows that we would reject
anyway a statement expression using an uninitialized inner, simply because
the whole initializer would still be flagged as non const.

Do we still diagnose it without the use of the variable?
Ah, I missed that case. The updated stmtexpr20.C and the additional 
stmtexpr21.C reflect that.

If not, how about adjusting check_for_uninitialized_const_var to
support calling it from potential_constant_expression_1?  I suppose
we'd need to add a couple of parameters, one complain parm and another
to indicate that the variable is in a constexpr context, and then
return something.
Ok. The below passes the C++ testsuite and I'm finishing testing it. 
Therefore, as you already hinted to, we can now say that what was 
*really* missing from potential_constant_expression_1 was the use of 
default_init_uninitialized_part, which does all the non-trivial work 
besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. 
check_for_uninitialized_const_var also provides the informs, which were 
completely missing.


Thanks!
Paolo.


Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 256961)
+++ cp/constexpr.c  (working copy)
@@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want
  "% in % context", tmp);
  return false;
}
- else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp))
-   {
- if (flags & tf_error)
-   error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized "
- "variable %qD in % context", tmp);
- return false;
-   }
+ else if (!check_for_uninitialized_const_var
+  (tmp, /*constexpr_context_p=*/true, flags))
+   return false;
}
   return RECUR (tmp, want_rval);
 
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 256961)
+++ cp/cp-tree.h(working copy)
@@ -6221,6 +6221,7 @@ extern tree finish_case_label 
(location_t, tree,
 extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t);
 extern bool check_array_designated_initializer  (constructor_elt *,
 unsigned HOST_WIDE_INT);
+extern bool check_for_uninitialized_const_var   (tree, bool, tsubst_flags_t);
 
 /* in decl2.c */
 extern void record_mangling(tree, bool);
Index: cp/decl.c
===
--- cp/decl.c   (revision 256961)
+++ cp/decl.c   (working copy)
@@ -72,7 +72,6 @@ static int check_static_variable_definition (tree,
 static void record_unknown_type (tree, const char *);
 static tree builtin_function_1 (tree, tree, bool);
 static int member_function_or_else (tree, tree, enum overload_flags);
-static void check_for_uninitialized_const_var (tree);
 static tree local_variable_p_walkfn (tree *, int *, void *);
 static const char *tag_name (enum tag_types);
 static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool);
@@ -5543,10 +5542,14 @@ maybe_commonize_var (tree decl)
 }
 }
 
-/* Issue an error message if DECL is an uninitialized const variable.  */
+/* Issue an error message if DECL is an uninitialized const variable.
+   CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr
+   context from potential_constant_expression.  Returns true if all is well,
+   false otherwise.  */
 
-static void
-check_for_uninitialized_const_var (tree decl)
+bool
+check_for_uninitialized_const_var (tree decl, bool constexpr_context_p,
+  tsubst_flags_t complain)
 {
   tree type = strip_array_types (TREE_TYPE (decl));
 
@@ -,26 +5558,36 @@ maybe_commonize_var (tree decl)
  7.1.6 */
   if (VAR_P (decl)
   && TREE_CODE (type) != REFERENCE_TYPE
-  && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
+  && (CP_TYPE_CONST_P (type)
+ || (!constexpr_context_p && var_in_constexpr_fn (decl))
+ || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)))
   && !DECL_INITIAL (decl))
 {
   tree field = default_init_uninitialized_part (type);
   if (!field)
-   return;
+   return true;
 
   if (CP_TYPE_CONST_P (type))
-   permerror (DECL_SOURCE_LOCATION (decl),
-  "uninitialized const %qD", decl);
-  else
{
- if (!is_instantiation_of_constexpr (current_function_decl))
+ if (complain & tf_error)
+   permerror (DECL_SOURCE_LOCATION (decl),
+

Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-22 Thread Paolo Carlini


Hi again,

On 22/01/2018 22:50, Paolo Carlini wrote:
Ok. The below passes the C++ testsuite and I'm finishing testing it. 
Therefore, as you already hinted to, we can now say that what was 
*really* missing from potential_constant_expression_1 was the use of 
default_init_uninitialized_part, which does all the non-trivial work 
besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. 
check_for_uninitialized_const_var also provides the informs, which 
were completely missing.
Grrr. Testing the library revealed immediately the failure of 
18_support/byte/ops.cc, because in constexpr_context_p == true, thus 
from potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. 
I guess we really want to keep the existing constexpr_context_p == false 
cases separate. I'm therefore restarting testing with the below.


Paolo.

///


Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 256961)
+++ cp/constexpr.c  (working copy)
@@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want
  "% in % context", tmp);
  return false;
}
- else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp))
-   {
- if (flags & tf_error)
-   error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized "
- "variable %qD in % context", tmp);
- return false;
-   }
+ else if (!check_for_uninitialized_const_var
+  (tmp, /*constexpr_context_p=*/true, flags))
+   return false;
}
   return RECUR (tmp, want_rval);
 
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 256961)
+++ cp/cp-tree.h(working copy)
@@ -6221,6 +6221,7 @@ extern tree finish_case_label 
(location_t, tree,
 extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t);
 extern bool check_array_designated_initializer  (constructor_elt *,
 unsigned HOST_WIDE_INT);
+extern bool check_for_uninitialized_const_var   (tree, bool, tsubst_flags_t);
 
 /* in decl2.c */
 extern void record_mangling(tree, bool);
Index: cp/decl.c
===
--- cp/decl.c   (revision 256961)
+++ cp/decl.c   (working copy)
@@ -72,7 +72,6 @@ static int check_static_variable_definition (tree,
 static void record_unknown_type (tree, const char *);
 static tree builtin_function_1 (tree, tree, bool);
 static int member_function_or_else (tree, tree, enum overload_flags);
-static void check_for_uninitialized_const_var (tree);
 static tree local_variable_p_walkfn (tree *, int *, void *);
 static const char *tag_name (enum tag_types);
 static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool);
@@ -5543,10 +5542,14 @@ maybe_commonize_var (tree decl)
 }
 }
 
-/* Issue an error message if DECL is an uninitialized const variable.  */
+/* Issue an error message if DECL is an uninitialized const variable.
+   CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr
+   context from potential_constant_expression.  Returns true if all is well,
+   false otherwise.  */
 
-static void
-check_for_uninitialized_const_var (tree decl)
+bool
+check_for_uninitialized_const_var (tree decl, bool constexpr_context_p,
+  tsubst_flags_t complain)
 {
   tree type = strip_array_types (TREE_TYPE (decl));
 
@@ -,26 +5558,39 @@ maybe_commonize_var (tree decl)
  7.1.6 */
   if (VAR_P (decl)
   && TREE_CODE (type) != REFERENCE_TYPE
-  && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
+  && ((!constexpr_context_p
+  && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)))
+ || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)))
   && !DECL_INITIAL (decl))
 {
   tree field = default_init_uninitialized_part (type);
   if (!field)
-   return;
+   return true;
 
-  if (CP_TYPE_CONST_P (type))
-   permerror (DECL_SOURCE_LOCATION (decl),
-  "uninitialized const %qD", decl);
-  else
+  if (!constexpr_context_p)
{
- if (!is_instantiation_of_constexpr (current_function_decl))
-   error_at (DECL_SOURCE_LOCATION (decl),
- "uninitialized variable %qD in % function",
- decl);
- cp_function_chain->invalid_constexpr = true;
+ if (CP_TYPE_CONST_P (type))
+   {
+ if (complain & tf_error)
+   permerror (DECL_SOURCE_LOCATION (decl),
+  "uninitialized const %qD", decl);
+   }
+ else
+   {
+ if (!is_instantiation_of_constexpr (current_function_decl)
+ && (complain & tf_error))
+   error_at (DECL_SOUR

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Steve Kargl

On Mon, Jan 22, 2018 at 09:59:41PM +0200, Janne Blomqvist wrote:
> On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
>  wrote:
> > This patch has been triggered by Thomas's recent message to the list.
> > Not only did I start work late relative to stage 3 but debugging took
> > somewhat longer than anticipated. Therefore, to get this committed
> > asap, we will have to beg the indulgence of the release managers and
> > prompt review and/or testing by fortran maintainers. (Dominique has
> > already undertaken to test -m32.)
> 
> I think that if we can "guarantee" that we're happy with the current
> ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case
> for committing it now. But if anybody if planning on doing some
> ABI-breaking work in the foreseeable future then maybe we should wait
> until GCC 9 stage1 opens. Anybody with such plans?
> 

I don't have any plans to break the ABI in 9.0.  OTOH, if the
patch isn't included in 8.0, then we know that the ABI will be
broken for 9.0.  I think we should shoot for stabilizing the
ABI by including this in 8.0.

-- 
Steve

C++ PATCH for c++/83270, ICE with lambda and LTO

2018-01-22 Thread Jason Merrill

In this testcase, determine_visibility was relying on the template
argument handling to constrain the lambda in h to internal linkage.
In the new lambda model, the local lambda has no template info, so
that was breaking.  So we shouldn't rely on the decl itself to trigger
checking the containing function template arguments.

I initially just took the internal linkage from the containing
function and cleared template_decl, but that seems like it could break
-fvisibility-inlines-hidden.

Tested x86_64-pc-linux-gnu, applying to trunk.
2018-01-22  Jason Merrill  

PR c++/83720
* decl2.c (determine_visibility): Fix template_decl handling for
local lambda.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index a2b2e2892b8..ef7e6de41c3 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2416,7 +2416,7 @@ determine_visibility (tree decl)
 but have no TEMPLATE_INFO.  Their containing template
 function does, and the local class could be constrained
 by that.  */
- if (template_decl)
+ if (DECL_LANG_SPECIFIC (fn) && DECL_USE_TEMPLATE (fn))
template_decl = fn;
}
   else if (VAR_P (decl) && DECL_TINFO_P (decl)
diff --git a/gcc/testsuite/g++.dg/lto/pr83720_0.C 
b/gcc/testsuite/g++.dg/lto/pr83720_0.C
new file mode 100644
index 000..4e63c9be7cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr83720_0.C
@@ -0,0 +1,55 @@
+// PR c++/83720
+// { dg-lto-do assemble }
+
+#pragma GCC diagnostic ignored "-Wreturn-type"
+
+namespace b {
+class h {
+public:
+  template  h(ae af::*...) {
+[] {};
+  }
+};
+class ai {};
+template  class c {
+public:
+  template  void aj(const char *, ag f) { h(f, int()); }
+};
+}
+template  class al;
+template  class i {
+protected:
+  static e g(const int) {  }
+};
+template  class j;
+template 
+class j : i {
+  typedef i ap;
+
+public:
+  static an aq(const int &ar, ao... as) { ap::g(ar)(as...); }
+};
+template  class al {
+  template  using ax = a;
+
+public:
+  template , typename = ax>
+  al(e);
+  using ay = an (*)(const int &, ao...);
+  ay az;
+};
+template 
+template 
+al::al(e) {
+  az = j::aq;
+}
+class k {
+public:
+  k(al);
+} d([](b::ai) {
+  struct be {
+virtual void f();
+  };
+  struct bf;
+  b::c().aj("", &be::f);
+});

[committed] Fix ICE with -ftree-parallelize-loops=2 --param parloops-schedule=dynamic (PR tree-optimization/83957)

2018-01-22 Thread Jakub Jelinek

Hi!

The ompexpssa pass will do TODO_update_ssa_only_virtuals, so don't need
to handle in detail vops.  In this case the inner_phi is NULL for the vop
and we crash on it.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-01-22  Jakub Jelinek  

PR tree-optimization/83957
* omp-expand.c (expand_omp_for_generic): Ignore virtual PHIs.  Remove
semicolon after for body surrounded by braces.

* gcc.dg/autopar/pr83957.c: New test.

--- gcc/omp-expand.c.jj 2018-01-15 22:46:52.058226631 +0100
+++ gcc/omp-expand.c2018-01-22 13:20:50.614656137 +0100
@@ -3156,6 +3156,9 @@ expand_omp_for_generic (struct omp_regio
  gphi *nphi;
  gphi *exit_phi = psi.phi ();
 
+ if (virtual_operand_p (gimple_phi_result (exit_phi)))
+   continue;
+
  edge l2_to_l3 = find_edge (l2_bb, l3_bb);
  tree exit_res = PHI_ARG_DEF_FROM_EDGE (exit_phi, l2_to_l3);
 
@@ -3178,7 +3181,7 @@ expand_omp_for_generic (struct omp_regio
  add_phi_arg (nphi, exit_res, l2_to_l0, UNKNOWN_LOCATION);
 
  add_phi_arg (inner_phi, new_res, l0_to_l1, UNKNOWN_LOCATION);
-   };
+   }
}
 
   set_immediate_dominator (CDI_DOMINATORS, l2_bb,
--- gcc/testsuite/gcc.dg/autopar/pr83957.c.jj   2018-01-22 13:32:32.841783616 
+0100
+++ gcc/testsuite/gcc.dg/autopar/pr83957.c  2018-01-22 13:22:49.525114497 
+0100
@@ -0,0 +1,11 @@
+/* PR tree-optimization/83957 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-dce --param 
parloops-schedule=dynamic" } */
+
+void
+foo (int *x, int y)
+{
+  if (y < 0)
+for (; y < 1; ++y)
+  x = &y;
+}

Jakub

[C++ PATCH] Fix a structured binding ICE (PR c++/83958)

2018-01-22 Thread Jakub Jelinek

Hi!

I've recently added the complete_type call in case the structured binding
is a reference to a type that needs to be instantiated.
This testcase shows a problem where it can't be instantiated and we ICE
because we don't expect an incomplete type.  If decl isn't a reference,
cp_finish_decl for it should handle everything.
I've considered using complete_type_or_else, but we aren't checking that
decl's type is complete, but rather that the type it refers to is complete,
and furthermore not sure if it is very nice to print 
in the diagnostics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-22  Jakub Jelinek  

PR c++/83958
* decl.c (cp_finish_decomp): Diagnose if reference structure binding
refers to incomplete type.

* g++.dg/cpp1z/decomp35.C: New test.

--- gcc/cp/decl.c.jj2018-01-19 23:34:36.634278658 +0100
+++ gcc/cp/decl.c   2018-01-22 12:16:39.316524652 +0100
@@ -7433,6 +7433,12 @@ cp_finish_decomp (tree decl, tree first,
   type = complete_type (TREE_TYPE (type));
   if (type == error_mark_node)
goto error_out;
+  if (!COMPLETE_TYPE_P (type))
+   {
+ error_at (loc, "structured binding refers to incomplete type %qT",
+   type);
+ goto error_out;
+   }
 }
 
   tree eltype = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp1z/decomp35.C.jj2018-01-22 12:18:19.421399364 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp35.C   2018-01-22 12:22:28.243046089 
+0100
@@ -0,0 +1,35 @@
+// PR c++/83958
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template  struct A;
+class B;
+template > class C;
+template  struct D;
+template 
+struct E {
+  using X = W;
+  X operator* ();
+  T operator++ ();
+  template 
+  bool operator!= (E);
+};
+template 
+struct F {
+  class G;
+  using H = D;
+  using I = E;
+  class G : public I {};
+  G begin ();
+  G end ();
+};
+template  struct C : F {
+  using J = F;
+  using J::begin;
+  using J::end;
+};
+using K = class L;
+struct M {
+  void foo () { for (auto & [ a ] : m) {} }// { dg-error "incomplete type" 
}
+  C m;   // { dg-warning "only available 
with" "" { target c++14_down } .-1 }
+};

Jakub

Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)

2018-01-22 Thread Jakub Jelinek

On Mon, Jan 22, 2018 at 04:49:34PM -0500, Jason Merrill wrote:
> On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek  wrote:
> > Before location wrappers were introduced, the 
> > potential_constant_expression_1
> > assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right,
> > if it is a type dependent reinterpret cast, it would be
> > REINTERPRET_CAST_EXPR instead.
> >
> > Location wrappers around decls can have NULL type, if the decl they wrap
> > also has NULL type.
> 
> Hmm, why would a decl have NULL type?  Are we wrapping a CONST_DECL in
> VIEW_CONVERT_EXPR?  They should get NON_LVALUE_EXPR, since they aren't
> lvalues (except the objc++ ones that have TREE_STATIC set).

So, do you want following instead, if it passes bootstrap/regtest?

2018-01-22  Jakub Jelinek  

PR c++/83918
* tree.c (maybe_wrap_with_location): Use NON_LVALUE_EXPR rather than
VIEW_CONVERT_EXPR to wrap CONST_DECLs.

* g++.dg/cpp1z/pr83918.C: New test.

--- gcc/tree.c.jj   2018-01-16 16:16:37.124694238 +0100
+++ gcc/tree.c  2018-01-22 23:53:58.171837029 +0100
@@ -14085,8 +14085,10 @@ maybe_wrap_with_location (tree expr, loc
   if (EXCEPTIONAL_CLASS_P (expr))
 return expr;
 
-  tree_code code = (CONSTANT_CLASS_P (expr) && TREE_CODE (expr) != STRING_CST
-   ? NON_LVALUE_EXPR : VIEW_CONVERT_EXPR);
+  tree_code code
+= (((CONSTANT_CLASS_P (expr) && TREE_CODE (expr) != STRING_CST)
+   || (TREE_CODE (expr) == CONST_DECL && !TREE_STATIC (expr)))
+   ? NON_LVALUE_EXPR : VIEW_CONVERT_EXPR);
   tree wrapper = build1_loc (loc, code, TREE_TYPE (expr), expr);
   /* Mark this node as being a wrapper.  */
   EXPR_LOCATION_WRAPPER_P (wrapper) = 1;
--- gcc/testsuite/g++.dg/cpp1z/pr83918.C.jj 2018-01-22 23:54:40.824824960 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/pr83918.C2018-01-22 23:54:40.824824960 
+0100
@@ -0,0 +1,32 @@
+// PR c++/83918
+// { dg-do compile }
+// { dg-options "-std=c++17" }
+
+constexpr unsigned
+foo (unsigned x, unsigned y)
+{
+  return x > y ? x : y;
+}
+
+template  struct A;
+template  struct B;
+template 
+struct A , B >
+{
+  enum : unsigned
+  {
+u = foo (sizeof (S), sizeof (U)),
+v = A , B >::w,
+w = foo (u, v)
+  };
+};
+
+template <>
+struct A , B <>>
+{
+  enum : unsigned { w = 0 };
+};
+
+constexpr static const auto v { A ,
+  B <9,8,7,6,5,4,3,2,1>>::w };
+static_assert (v == sizeof (int));


Jakub

Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble

2018-01-22 Thread Michael Meissner

On Mon, Jan 22, 2018 at 12:32:13PM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> If it is needed on the branches, okay for there too (once 7 reopens).

It is needed on GCC 7 if you compile the test for -mcpu=power9 and use
an explicit __float128 instead of long double (-mabi=ieeelongdouble is not
really supported in GCC 7).  I'll check GCC 6 as well.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)

2018-01-22 Thread Jason Merrill

On Mon, Jan 22, 2018 at 6:14 PM, Jakub Jelinek  wrote:
> On Mon, Jan 22, 2018 at 04:49:34PM -0500, Jason Merrill wrote:
>> On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek  wrote:
>> > Before location wrappers were introduced, the 
>> > potential_constant_expression_1
>> > assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right,
>> > if it is a type dependent reinterpret cast, it would be
>> > REINTERPRET_CAST_EXPR instead.
>> >
>> > Location wrappers around decls can have NULL type, if the decl they wrap
>> > also has NULL type.
>>
>> Hmm, why would a decl have NULL type?  Are we wrapping a CONST_DECL in
>> VIEW_CONVERT_EXPR?  They should get NON_LVALUE_EXPR, since they aren't
>> lvalues (except the objc++ ones that have TREE_STATIC set).
>
> So, do you want following instead, if it passes bootstrap/regtest?

Looks good.

Jason

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Jerry DeLisle

On 01/22/2018 12:24 PM, Thomas Koenig wrote:
> Am 22.01.2018 um 20:59 schrieb Janne Blomqvist:
>> On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
>>  wrote:
>>> This patch has been triggered by Thomas's recent message to the list.
>>> Not only did I start work late relative to stage 3 but debugging took
>>> somewhat longer than anticipated. Therefore, to get this committed
>>> asap, we will have to beg the indulgence of the release managers and
>>> prompt review and/or testing by fortran maintainers. (Dominique has
>>> already undertaken to test -m32.)
>>
>> I think that if we can "guarantee" that we're happy with the current
>> ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case
>> for committing it now. But if anybody if planning on doing some
>> ABI-breaking work in the foreseeable future then maybe we should wait
>> until GCC 9 stage1 opens. Anybody with such plans?
> 
> For asynchronous I/O, we could add a pointer to void (unused at
> present) for later use.  That's all from my side.
> 

Do you mean adding something like this:

diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
index 4c643b7e17b..c86e0b45e1d 100644
--- a/libgfortran/libgfortran.h
+++ b/libgfortran/libgfortran.h
@@ -600,6 +600,7 @@ typedef struct st_parameter_common
   GFC_INTEGER_4 line;
   CHARACTER2 (iomsg);
   GFC_INTEGER_4 *iostat;
+  void *reserved;
 }
 st_parameter_common;

and the corresponding front end side.
>  snip --

Regarding Paul's patch, assuming the -m32 tests can be confirmed I think the
patch should go in now.

Jerry

Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-22 Thread Damian Rouson

Is Fortran 2018 teams patch ok for trunk?   

Damian

On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) 
wrote:

I can confirm that the little change suggested by Steve passes the  
regtests (on x86_64-pc-linux-gnu) and the regular tests using  
OpenCoarrays.  

On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl  
 wrote:  
> On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote:  
>> Thanks for catching that, Steve, and for responding, Alessandro.  
>>  
>> Anything else?  
>>  
>  
> I've only just started to look at the patch. Unfortunately,  
> I know zero about teams, so need to read the patch and F2018  
> standard simultaneously.  
>  
> --  
> Steve  

--  

Alessandro Fanfarillo, Ph.D.  
Postdoctoral Researcher  
National Center for Atmospheric Research  
Mesa Lab, Boulder, CO, USA  
303-497-1229

libgo patch committed: Add buildid support for AIX

2018-01-22 Thread Ian Lance Taylor

This patch by Tony Reix adds buildid support to the go tool for AIX.
This supports caching of build results.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256877)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-87525458bcd5ab4beb5b95e7d58e3dfdbc1bd478
+3488a401e50835de5de5c4f153772ac2798d0e71
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/internal/work/buildid.go
===
--- libgo/go/cmd/go/internal/work/buildid.go(revision 256794)
+++ libgo/go/cmd/go/internal/work/buildid.go(working copy)
@@ -330,6 +330,43 @@ func (b *Builder) gccgoBuildIDELFFile(a
return sfile, nil
 }
 
+// gccgoBuildIDXCOFFFile creates an assembler file that records the
+// action's build ID in a CSECT (AIX linker deletes CSECTs that are
+// not referenced in the output file).
+func (b *Builder) gccgoBuildIDXCOFFFile(a *Action) (string, error) {
+   sfile := a.Objdir + "_buildid.s"
+
+   var buf bytes.Buffer
+   fmt.Fprintf(&buf, "\t.csect .go.buildid[XO]\n")
+   fmt.Fprintf(&buf, "\t.byte ")
+   for i := 0; i < len(a.buildID); i++ {
+   if i > 0 {
+   if i%8 == 0 {
+   fmt.Fprintf(&buf, "\n\t.byte ")
+   } else {
+   fmt.Fprintf(&buf, ",")
+   }
+   }
+   fmt.Fprintf(&buf, "%#02x", a.buildID[i])
+   }
+   fmt.Fprintf(&buf, "\n")
+
+   if cfg.BuildN || cfg.BuildX {
+   for _, line := range bytes.Split(buf.Bytes(), []byte("\n")) {
+   b.Showcmd("", "echo '%s' >> %s", line, sfile)
+   }
+   if cfg.BuildN {
+   return sfile, nil
+   }
+   }
+
+   if err := ioutil.WriteFile(sfile, buf.Bytes(), 0666); err != nil {
+   return "", err
+   }
+
+   return sfile, nil
+}
+
 // buildID returns the build ID found in the given file.
 // If no build ID is found, buildID returns the content hash of the file.
 func (b *Builder) buildID(file string) string {
Index: libgo/go/cmd/go/internal/work/exec.go
===
--- libgo/go/cmd/go/internal/work/exec.go   (revision 256873)
+++ libgo/go/cmd/go/internal/work/exec.go   (working copy)
@@ -637,6 +637,16 @@ func (b *Builder) build(a *Action) (err
return err
}
objects = append(objects, ofiles...)
+   case "aix":
+   asmfile, err := b.gccgoBuildIDXCOFFFile(a)
+   if err != nil {
+   return err
+   }
+   ofiles, err := BuildToolchain.asm(b, a, 
[]string{asmfile})
+   if err != nil {
+   return err
+   }
+   objects = append(objects, ofiles...)
}
}
 
Index: libgo/go/cmd/internal/buildid/buildid.go
===
--- libgo/go/cmd/internal/buildid/buildid.go(revision 256593)
+++ libgo/go/cmd/internal/buildid/buildid.go(working copy)
@@ -7,6 +7,7 @@ package buildid
 import (
"bytes"
"debug/elf"
+   "debug/xcoff"
"fmt"
"io"
"os"
@@ -40,6 +41,9 @@ func ReadFile(name string) (id string, e
return "", err
}
if string(buf) != "!\n" {
+   if string(buf) == "\n" {
+   return readGccgoBigArchive(name, f)
+   }
return readBinary(name, f)
}
 
@@ -156,6 +160,85 @@ func readGccgoArchive(name string, f *os
}
}
 }
+
+// readGccgoBigArchive tries to parse the archive as an AIX big
+// archive file, and fetch the build ID from the _buildid.o entry.
+// The _buildid.o entry is written by (*Builder).gccgoBuildIDXCOFFFile
+// in cmd/go/internal/work/exec.go.
+func readGccgoBigArchive(name string, f *os.File) (string, error) {
+   bad := func() (string, error) {
+   return "", &os.PathError{Op: "parse", Path: name, Err: 
errBuildIDMalformed}
+   }
+
+   // Read fixed-length header.
+   if _, err := f.Seek(0, io.SeekStart); err != nil {
+   return "", err
+   }
+   var flhdr [128]byte
+   if _, err := io.ReadFull(f, flhdr[:]); err != nil {
+   return "", err
+   }
+   // Read first member offset.
+   offStr := strings.TrimSpace(string(flhdr[68:88]))
+   off, err := strconv.ParseInt(offStr, 10, 64)
+   if e

Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure

2018-01-22 Thread Ian Lance Taylor

On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak  wrote:
>
> The default "go build" compile options over-optimize the auxiliary
> executable, built in TestCrashDumpsAllThreads testcase
> (libgo/go/runtime/crash_unix_test.go). This over-optimization results
> in removal of the trivial summing loop and in the inlining of the
> main.loop function into main.$thunk0.
>
> The testcase expects backtrace that shows several instances of
> main.loop in the backtrace dump. When main.loop gets inlined into
> thunk, its name is not dumped anymore.
>
> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu and
> alphev68-linux-gnu, where for the later target the patch fixed the
> mentioned failure.

That sounds like a bug somewhere.  Even when one function gets inlined
into another, its name should still be dumped.  This is implemented by
report_inlined_functions in libbacktrace/dwarf.c.  While something
like your patch may be needed, I'd like to more clearly understand why
libbacktrace isn't working.

Ian

Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-22 Thread Steve Kargl

I'm heading out of town for a meeting at the end of
week, so gfortran patches/reviews are on hold at the
moment.  If someone else wants to step up to review
the patch, I won't object.

-- 
steve

On Mon, Jan 22, 2018 at 08:29:41PM -0800, Damian Rouson wrote:
> Is Fortran 2018 teams patch ok for trunk?   
> 
> Damian
> 
> On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) 
> wrote:
> 
> I can confirm that the little change suggested by Steve passes the  
> regtests (on x86_64-pc-linux-gnu) and the regular tests using  
> OpenCoarrays.  
> 
> On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl  
>  wrote:  
> > On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote:  
> >> Thanks for catching that, Steve, and for responding, Alessandro.  
> >>  
> >> Anything else?  
> >>  
> >  
> > I've only just started to look at the patch. Unfortunately,  
> > I know zero about teams, so need to read the patch and F2018  
> > standard simultaneously.  
> >  
> > --  
> > Steve  
> 
> 
> 
> --  
> 
> Alessandro Fanfarillo, Ph.D.  
> Postdoctoral Researcher  
> National Center for Atmospheric Research  
> Mesa Lab, Boulder, CO, USA  
> 303-497-1229  

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-22 Thread Dominique d'Humières

Dear Paul,

The test suite passed without new regression with both -m32 and -m64.

Thanks for the work,

Dominique

Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure

2018-01-22 Thread Uros Bizjak

On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor  wrote:
> On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak  wrote:
>>
>> The default "go build" compile options over-optimize the auxiliary
>> executable, built in TestCrashDumpsAllThreads testcase
>> (libgo/go/runtime/crash_unix_test.go). This over-optimization results
>> in removal of the trivial summing loop and in the inlining of the
>> main.loop function into main.$thunk0.
>>
>> The testcase expects backtrace that shows several instances of
>> main.loop in the backtrace dump. When main.loop gets inlined into
>> thunk, its name is not dumped anymore.
>>
>> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining.
>>
>> Patch was bootstrapped and regression tested on x86_64-linux-gnu and
>> alphev68-linux-gnu, where for the later target the patch fixed the
>> mentioned failure.
>
> That sounds like a bug somewhere.  Even when one function gets inlined
> into another, its name should still be dumped.  This is implemented by
> report_inlined_functions in libbacktrace/dwarf.c.  While something
> like your patch may be needed, I'd like to more clearly understand why
> libbacktrace isn't working.

If I manually compile the testcase from crash_unix_test.go wth gccgo
-O2 (the asm is attached), the main.loop function is not even present
in the dump.

Even gdb session shows:

(gdb) b main.loop
Breakpoint 1 at 0x1200021f8: file main.go, line 11.
(gdb) r
Starting program: /space/homedirs/uros/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0x200028831e0 (LWP 23072)]
[New Thread 0x2000308b1e0 (LWP 23073)]
[New Thread 0x20003c931e0 (LWP 23074)]
[New Thread 0x20008fff1e0 (LWP 23075)]
[New Thread 0x2000a7ff1e0 (LWP 23076)]
[Switching to Thread 0x20003c931e0 (LWP 23074)]

Thread 4 "a.out" hit Breakpoint 1, main.$thunk0 () at main.go:25
25  go loop(i, chans[i])
(gdb) bt
#0  main.$thunk0 () at main.go:25
#1  0x02000100bf54 in runtime.kickoff () at
/space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161
#2  0x020001a4f934 in ?? () from /lib/libc.so.6.1

While it is possible to put breakpoint at main.loop, the gdb backtrace
omits function name and displays only thunk.

When -O0 is used, the gdb backtrace shows:

(gdb) bt
#0  main.loop () at main.go:11
#1  0x00012000237c in main.$thunk0 () at main.go:25
#2  0x02000100bf54 in runtime.kickoff () at
/space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161
#3  0x020001a4f934 in ?? () from /lib/libc.so.6.1

I don't think this is a bug in libbacktrace, the function simply isn't there.

Uros.


main.s.gz
Description: GNU Zip compressed data

70 matches

Mail list logo