[PATCH] S/390: libatomic: Fix 16 byte atomic exchange

2018-03-05 Thread Andreas Krebbel
The compiler builtin will use the hardware instruction cdsg if the
memory operand is properly aligned and will fall back to the
library call otherwise.
In case the compiler for one part is able to detect that the
location is aligned and fails to do so for another usage of the hw
instruction and the sw fall back would be mixed on the same memory
location.  To avoid this the library fall back also has to use the
hardware instruction if possible.

Bootstrapped and regression tested on s390x.

I'll wait a few days for comments before committing it.


libatomic/ChangeLog:

2018-03-05  Andreas Krebbel  

* config/s390/exch_n.c: New file.
* configure.tgt: Add the config directory for s390.
---
 libatomic/config/s390/exch_n.c | 69 ++
 libatomic/configure.tgt|  5 +++
 2 files changed, 74 insertions(+)
 create mode 100644 libatomic/config/s390/exch_n.c

diff --git a/libatomic/config/s390/exch_n.c b/libatomic/config/s390/exch_n.c
new file mode 100644
index 000..b2340b4
--- /dev/null
+++ b/libatomic/config/s390/exch_n.c
@@ -0,0 +1,69 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Andreas Krebbel 
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include 
+
+
+/* The compiler builtin will use the hardware instruction cdsg if the
+   memory operand is properly aligned and will fall back to the
+   library call otherwise.
+
+   In case the compiler for one part is able to detect that the
+   location is aligned and fails to do so for another usage of the hw
+   instruction and the sw fall back would be mixed on the same memory
+   location.  To avoid this the library fall back also has to use the
+   hardware instruction if possible.  */
+
+#if !DONE && N == 16
+UTYPE
+SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel UNUSED)
+{
+  if (!((uintptr_t)mptr & 0xf))
+{
+  /* Use the builtin only if the memory operand is 16 byte
+aligned.  */
+  return __atomic_exchange_n ((UTYPE *)__builtin_assume_aligned (mptr, 16),
+ newval, __ATOMIC_SEQ_CST);
+}
+  else
+{
+  UTYPE oldval;
+  UWORD magic;
+
+  pre_seq_barrier (smodel);
+  magic = protect_start (mptr);
+
+  oldval = *mptr;
+  *mptr = newval;
+
+  protect_end (mptr, magic);
+  post_seq_barrier (smodel);
+
+  return oldval;
+}
+}
+#define DONE 1
+#endif /* N == 16 */
+
+#include "../../exch_n.c"
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index 807ef10..ea8c34f 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -129,6 +129,11 @@ case "${target}" in
config_path="${config_path} linux/arm posix"
;;
 
+  s390*-*-linux*)
+   # OS support for atomic primitives.
+   config_path="${config_path} s390 posix"
+   ;;
+
   *-*-linux* | *-*-gnu* | *-*-k*bsd*-gnu \
   | *-*-netbsd* | *-*-freebsd* | *-*-openbsd* | *-*-dragonfly* \
   | *-*-solaris2* | *-*-sysv4* | *-*-irix6* | *-*-osf* | *-*-hpux11* \
-- 
2.9.1



[PATCH][1/2] Fix PR84670

2018-03-05 Thread Richard Biener

The following fixes a thinko in the assert I added.  It's true that
the value-set of ANTIC_IN shouldn't grow during iteration but there's
an exception for when the successors we intersect are not all visited
(recursively).  For example when the latch wasn't visited and thus
its ANTIC_IN is first computed without the finally antic i + 1
expression it won't get a new value for i + 1 in the next iteration
when translating over the backedge.  That only happens the second time.
It's enough if the loop exit block computes i + 1 with a different
expression (and VN was smart enough to discover the equivalence) that
the value won't be pruned by the intersection and thus its expression
reaches the backedge (once in untranslatable form and once in translatable
form).

So instead of nuking the assert completely I'm adjusting it as follows.

Bootstrap and regtest on x86_64-unknown-linux-gnu in progress.

Sorry for the inconvenience.

Richard.

2018-03-05  Richard Biener  

PR tree-optimization/84670
* tree-ssa-pre.c (struct bb_bitmap_sets): Add visited_with_visited_succs
member.
(BB_VISITED_WITH_VISITED_SUCCS): New define.
(compute_antic): Initialize BB_VISITED_WITH_VISITED_SUCCS.
(compute_antic_aux): Only assert the number of values in ANTIC_IN
doesn't grow if all successors (recursively) were visited at least
once.

* gcc.dg/pr84670-3.c: New testcase.
* gcc.dg/pr84670-4.c: Likewise.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 258124)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -484,6 +484,10 @@ typedef struct bb_bitmap_sets
   /* True if we have visited this block during ANTIC calculation.  */
   unsigned int visited : 1;
 
+  /* True if we have visited this block after all successors have been
+ visited this way.  */
+  unsigned int visited_with_visited_succs : 1;
+
   /* True when the block contains a call that might not return.  */
   unsigned int contains_may_not_return_call : 1;
 } *bb_value_sets_t;
@@ -497,6 +501,8 @@ typedef struct bb_bitmap_sets
 #define NEW_SETS(BB)   ((bb_value_sets_t) ((BB)->aux))->new_sets
 #define EXPR_DIES(BB)  ((bb_value_sets_t) ((BB)->aux))->expr_dies
 #define BB_VISITED(BB) ((bb_value_sets_t) ((BB)->aux))->visited
+#define BB_VISITED_WITH_VISITED_SUCCS(BB) \
+((bb_value_sets_t) ((BB)->aux))->visited_with_visited_succs
 #define BB_MAY_NOTRETURN(BB) ((bb_value_sets_t) 
((BB)->aux))->contains_may_not_return_call
 #define BB_LIVE_VOP_ON_EXIT(BB) ((bb_value_sets_t) ((BB)->aux))->vop_on_exit
 
@@ -2032,6 +2047,8 @@ compute_antic_aux (basic_block block, bo
 {
   e = single_succ_edge (block);
   gcc_assert (BB_VISITED (e->dest));
+  BB_VISITED_WITH_VISITED_SUCCS (block)
+   = BB_VISITED_WITH_VISITED_SUCCS (e->dest);
   phi_translate_set (ANTIC_OUT, ANTIC_IN (e->dest), e);
 }
   /* If we have multiple successors, we take the intersection of all of
@@ -2042,6 +2059,7 @@ compute_antic_aux (basic_block block, bo
   size_t i;
   edge first = NULL;
 
+  BB_VISITED_WITH_VISITED_SUCCS (block) = true;
   auto_vec worklist (EDGE_COUNT (block->succs));
   FOR_EACH_EDGE (e, ei, block->succs)
{
@@ -2060,6 +2078,8 @@ compute_antic_aux (basic_block block, bo
fprintf (dump_file, "ANTIC_IN is MAX on %d->%d\n",
 e->src->index, e->dest->index);
}
+ BB_VISITED_WITH_VISITED_SUCCS (block)
+   &= BB_VISITED_WITH_VISITED_SUCCS (e->dest);
}
 
   /* Of multiple successors we have to have visited one already
@@ -2139,7 +2159,7 @@ compute_antic_aux (basic_block block, bo
   changed = true;
   /* After the initial value set computation the value set may
  only shrink during the iteration.  */
-  if (was_visited && flag_checking)
+  if (was_visited && BB_VISITED_WITH_VISITED_SUCCS (block) && 
flag_checking)
{
  bitmap_iterator bi;
  unsigned int i;
@@ -2318,6 +2342,7 @@ compute_antic (void)
   FOR_ALL_BB_FN (block, cfun)
 {
   BB_VISITED (block) = 0;
+  BB_VISITED_WITH_VISITED_SUCCS (block) = 0;
 
   FOR_EACH_EDGE (e, ei, block->preds)
if (e->flags & EDGE_ABNORMAL)
@@ -2334,6 +2359,7 @@ compute_antic (void)
 
   /* At the exit block we anticipate nothing.  */
   BB_VISITED (EXIT_BLOCK_PTR_FOR_FN (cfun)) = 1;
+  BB_VISITED_WITH_VISITED_SUCCS (EXIT_BLOCK_PTR_FOR_FN (cfun)) = 1;
 
   /* For ANTIC computation we need a postorder that also guarantees that
  a block with a single successor is visited after its successor.

Index: gcc/testsuite/gcc.dg/pr84670-3.c
===
--- gcc/testsuite/gcc.dg/pr84670-3.c(nonexistent)
+++ gcc/testsuite/gcc.dg/pr84670-3.c(working copy)
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fno-strict-overflow" } */
+
+typedef unsigned char u

PING: [PATCH] i386: Don't generate alias for function return thunk

2018-03-05 Thread H.J. Lu
On Mon, Feb 26, 2018 at 12:48 PM, H.J. Lu  wrote:
> Function return thunks shouldn't be aliased to indirect branch thunks
> since indirect branch thunks are placed in COMDAT section and a COMDAT
> section with indirect branch may not have function return thunk.  This
> patch generates function return thunks directly.
>
> Tested on i686 and x86-64.  OK for trunk?
>
> H.J.
> ---
> gcc/
>
> PR target/84574
> * config/i386/i386.c (indirect_thunk_needed): Update comments.
> (indirect_thunk_bnd_needed): Likewise.
> (indirect_thunks_used): Likewise.
> (indirect_thunks_bnd_used): Likewise.
> (indirect_return_needed): New.
> (indirect_return_bnd_needed): Likewise.
> (output_indirect_thunk_function): Add a bool argument for
> function return.
> (output_indirect_thunk_function): Don't generate alias for
> function return thunk.
> (ix86_code_end): Call output_indirect_thunk_function to generate
> function return thunks.
> (ix86_output_function_return): Set indirect_return_bnd_needed
> and indirect_return_needed instead of indirect_thunk_bnd_needed
> and indirect_thunk_needed.
>
> gcc/testsuite/
>
> PR target/84574
> * gcc.target/i386/ret-thunk-9.c: Expect __x86_return_thunk
> label instead of __x86_indirect_thunk label.

PING:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01460.html

-- 
H.J.


PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT

2018-03-05 Thread H.J. Lu
On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu  wrote:
> For x86 targets, when -fno-plt is used, external functions are called
> via GOT slot, in 64-bit mode:
>
> [bnd] call/jmp *foo@GOTPCREL(%rip)
>
> and in 32-bit mode:
>
> [bnd] call/jmp *foo@GOT[(%reg)]
>
> With -mindirect-branch=, they are converted to, in 64-bit mode:
>
> pushq  foo@GOTPCREL(%rip)
> [bnd] jmp  __x86_indirect_thunk[_bnd]
>
> and in 32-bit mode:
>
> pushl  foo@GOT[(%reg)]
> [bnd] jmp  __x86_indirect_thunk[_bnd]
>
> which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
> register, we generate:
>
> movq   foo@GOTPCREL(%rip), %r11
> [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>
> instead.  We do it in ix86_output_indirect_branch so that we can use
> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
>
> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
>
> movq   foo@OTPCREL_THUNK(%rip), %r11
> [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>
> to load GOT slot into R11.  If foo is defined locally, linker can can
> convert
>
> movq   foo@GOTPCREL_THUNK(%rip), %reg
> call/jmp   __x86_indirect_thunk_reg
>
> to
>
> call/jmp   foo
> nop0L(%rax)
>
> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
> used to function parameters, there is no scratch register available.  For
> -fno-plt -fno-pic -mindirect-branch=, we expand external function call
> to:
>
> movl   foo@GOT, %reg
> [bnd] call/jmp *%reg
>
> so that it can be converted to
>
> movl   foo@GOT, %reg
> [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
>
> in ix86_output_indirect_branch.  Since this is performed during RTL
> expansion, other instructions may be inserted between movl and call/jmp.
> Linker optimization isn't always possible.
>
> Tested on i686 and x86-64.  OK for trunk?
>
>
> H.J.
> ---
> gcc/
>
> PR target/83970
> * config/i386/constraints.md (Bs): Allow GOT_memory_operand
> for TARGET_LP64 with indirect branch conversion.
> (Bw): Likewise.
> * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with
> -mindirect-branch=.
> (ix86_nopic_noplt_attribute_p): Likewise.
> (ix86_output_indirect_branch): In 64-bit mode, convert function
> call via GOT with R11 as a scratch register using
> __x86_indirect_thunk_r11.
> (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when
> calling ix86_output_indirect_branch with function call via GOT.
> * config/i386/i386.md (*call_got_thunk): New call pattern for
> TARGET_LP64 with indirect branch conversion.
> (*call_value_got_thunk): Likewise.
>
> gcc/testsuite/
>
> PR target/83970
> * gcc.target/i386/indirect-thunk-5.c: Updated.
> * gcc.target/i386/indirect-thunk-6.c: Likewise.
> * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
> * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
> * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
> * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
> * gcc.target/i386/indirect-thunk-13.c: New test.
> * gcc.target/i386/indirect-thunk-14.c: Likewise.
> * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise.
> * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-11.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-12.c: Likewise.
> * gcc.target/i386/indirect-thunk-inline-8.c: Likewise.
> * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.

PING:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01527.html


-- 
H.J.


[PATCH] Fix PR84650

2018-03-05 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-03-05  Richard Biener  

PR tree-optimization/84650
* tree-ssa-loop-im.c (pass_lim::execute): Reset the SCEV cache
if executed in the loop pipeline.

* gcc.dg/graphite/pr84650.c: New testcase.

Index: gcc/tree-ssa-loop-im.c
===
--- gcc/tree-ssa-loop-im.c  (revision 258124)
+++ gcc/tree-ssa-loop-im.c  (working copy)
@@ -2616,6 +2616,8 @@ pass_lim::execute (function *fun)
 
   if (!in_loop_pipeline)
 loop_optimizer_finalize ();
+  else
+scev_reset ();
   return todo;
 }
 
Index: gcc/testsuite/gcc.dg/graphite/pr84650.c
===
--- gcc/testsuite/gcc.dg/graphite/pr84650.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr84650.c (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fgraphite-identity -fno-tree-copy-prop --param 
lim-expensive=3" } */
+
+unsigned int dj;
+
+void
+np (void)
+{
+  const unsigned int uw = 2;
+  unsigned int eu;
+
+  for (eu = 0; eu < uw; ++eu)
+{
+  for (dj = 0; dj < uw; ++dj)
+   ;
+  eu -= !!(dj - uw - 1);
+}
+}


Re: [PR c++/84497] ref to undefined tls init

2018-03-05 Thread Nathan Sidwell

On 03/02/2018 01:43 PM, Jason Merrill wrote:

On Fri, Mar 2, 2018 at 11:07 AM, Nathan Sidwell  wrote:



NEEDS_CONSTRUCTING && !HAS_CONSTEXPR_CTOR && !HAS_DEFAULT_CONSTRUCTOR seems
like it would be sufficient. and indeed that works in this case.


Do you mean !HAS_TRIVIAL_DFLT rather than !HAS_DEFAULT_CONSTRUCTOR?
That makes sense to me.


Even better!

nathan

--
Nathan Sidwell
2018-03-05  Pádraig Brady  
	Jason  Merrill  
	Nathan Sidwell  

	PR c++/84497
	* decl2.c (get_tls_init_fn): Check TYPE_HAS_TRIVIAL_DFLT too.

	PR c++/84497
	* g++.dg/cpp0x/pr84497.C: New.

Index: cp/decl2.c
===
--- cp/decl2.c	(revision 258114)
+++ cp/decl2.c	(working copy)
@@ -3337,7 +3337,8 @@ get_tls_init_fn (tree var)
 	  /* If the variable is defined somewhere else and might have static
 	 initialization, make the init function a weak reference.  */
 	  if ((!TYPE_NEEDS_CONSTRUCTING (obtype)
-	   || TYPE_HAS_CONSTEXPR_CTOR (obtype))
+	   || TYPE_HAS_CONSTEXPR_CTOR (obtype)
+	   || TYPE_HAS_TRIVIAL_DFLT (obtype))
 	  && TYPE_HAS_TRIVIAL_DESTRUCTOR (obtype)
 	  && DECL_EXTERNAL (var))
 	declare_weak (fn);
Index: testsuite/g++.dg/cpp0x/pr84497.C
===
--- testsuite/g++.dg/cpp0x/pr84497.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr84497.C	(working copy)
@@ -0,0 +1,37 @@
+// PR 84497 mismatch with thread constructor fn weakness
+// { dg-do compile { target c++11 } }
+// { dg-require-weak "" }
+
+struct Base
+{
+  int m;
+
+  Base() noexcept = default;  // trivial but not constexpr
+  ~Base() noexcept = default;
+};
+
+struct Derived : Base {};
+struct Container {
+  Base m;
+};
+
+#ifdef DEF
+// This bit for exposition only.
+// All items placed in .tbss
+// __tls_init simply sets __tls_guard
+// no aliases to __tls_init generated
+thread_local Base base_obj;
+thread_local Derived derived_obj;
+thread_local Container container_obj;
+#else
+// Erroneously created strong undef refs to
+// _ZTH11derived_obj, _ZTH13container_obj, _ZTH8base_obj
+extern thread_local Base base_obj;
+extern thread_local Derived derived_obj;
+extern thread_local Container container_obj;
+int main() { return !(&base_obj && &derived_obj && &container_obj);}
+#endif
+
+// { dg-final { scan-assembler ".weak\[ \t\]*_ZTH8base_obj" } }
+// { dg-final { scan-assembler ".weak\[ \t\]*_ZTH11derived_obj" } }
+// { dg-final { scan-assembler ".weak\[ \t\]*_ZTH13container_obj" } }


Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp

2018-03-05 Thread Richard Biener
On Fri, Mar 2, 2018 at 11:18 PM, Jeff Law  wrote:
> On 02/28/2018 03:43 AM, Richard Biener wrote:
> [ More snipping ]
>
>>
>>> It's actually pretty easy to fix the CFG.  We  just need to recognize
>>> that a "returns twice" function returns not to the call, but to the
>>> point immediately after the call.  So if we have a call to a returns
>>> twice function that ends a block with a single successor, when we wire
>>> up the abnormal dispatcher, we target the single successor rather than
>>> the block containing the returns-twice call.
>>
>> Hmm, I think you need to check whether the successor has a single
>> predecessor, not whether we have a single successor (we always have
>> that unless setjmp also throws).  If you fix that you keep the CFG
>> "incorrect" if there are multiple predecessors so I think in addition
>> to properly creating the edges you have to work on the BB building
>> part to ensure that there's a single-predecessor block after
>> returns-twice function calls.  Note that currently we force returns-twice
>> to be the first (and only) stmt of a block -- your fix would relax this,
>> returns-twice no longer needs to start a new BB.
> So I found the code which makes the setjmp start a new block. But I
> haven't found the code which makes setjmp end a block.  I'm going to
> have to throw things into the debugger  to find the latter.

stmt_starts_bb_p

>
> We ought to remove the code that makes the setjmp start a new block.
> That's just unnecessary.   setjmp certainly needs to end the block though.

yes, after your change, of course.  The code in stmt_starts_bb_p
uses ECF_RETURNS_TWICE, so ...

>
>
>
>>
>> -   handle_abnormal_edges (dispatcher_bbs, bb, bb_to_omp_idx,
>> -  &ab_edge_call, false);
>> +   {
>> + bool target_after_setjmp = false;
>> +
>> + /* If the returns twice statement looks like a setjmp
>> +call at the end of a block with a single successor
>> +then we want the edge from the dispatcher to target
>> +that single successor.  That more accurately reflects
>> +actual control flow.  The more accurate CFG also
>> +results in fewer false positive warnings.  */
>> + if (gsi_stmt (gsi_last_nondebug_bb (bb)) == call_stmt
>> + && gimple_call_fndecl (call_stmt)
>> + && setjmp_call_p (gimple_call_fndecl (call_stmt))
>> + && single_succ_p (bb))
>> +   target_after_setjmp = true;
>> + handle_abnormal_edges (dispatcher_bbs, bb, bb_to_omp_idx,
>> +&ab_edge_call, false,
>> +target_after_setjmp);
>> +   }
>>
>> I don't exactly get the hops you jump through here -- I think it's
>> better to split the returns-twice (always last stmt of a block after
>> the fixing) and the setjmp-receiver (always first stmt of a block) cases.
>> So, remove the handling of returns-twice from the above case and
>> handle returns-twice via
> Just wanted to verify the setjmp was the last statement in the block and
> the block passed control to a single successor.  If the setjmp is not
> the last statement, then having the longjmp pass control to the
> successor block potentially skips over statements between the setjmp and
> the end of the block.  That obviously would be bad.
>
> As I mentioned before the single_succ_p test was just my paranoia.
>
> Note that GSI can point to a setjmp receiver at this point.  We don't
> want to treat that like a setjmp.

True.

>
>>
>>   gimple *last = last_stmt (bb);
>>   if (last && ...)
>>
>> also handle all returns-twice calls this way, not only setjmp_call_p.
> Note that setjmp_call_p returns true for any returns-twice function.  So
> we are handling those.

... that's intended as well I think.

>
> So I think the open issue with this patch is removal of making the
> setjmp start a block and verification that we always have it end the
> block.  The latter should allow some simplifications to the code I added
> in make_edges and provide a level of consistency that is desirable.

We've abstracted that bit into GF_CALL_CTRL_ALTERING which we
compute during CFG build and only ever clear afterwards (so an
indirect call to setjmp via a type not having returns_twice will not
end up ending a BB and will not have abnormal edges associated).

So I don't think anything besides fixing CFG build is necessary.
Well - the whole RTL transition business of course.

Richard.

> Jeff
>


Re: [C++ Patch] PR 84618 ("[8 Regression] ICE in build_capture_proxy, at cp/lambda.c:460")

2018-03-05 Thread Jason Merrill
OK.

On Sun, Mar 4, 2018 at 8:55 PM, Paolo Carlini  wrote:
> Hi,
>
> a rather simple ice on invalid (not sure why only P4 given that no
> meaningful diagnostic is emitted before ICEing). What happens is that the
> ill-formed capture naming b, an OVERLOAD, escapes the early check in
> cp_parser_lambda_introducer to eventually trigger a recently added
> gcc_assert in build_capture_proxy. Adjusting the check means we also provide
> slightly clearer (IMO, matching clang, anyway) error messages for two
> testcases we already have.
>
> Tested x86_64-linux.
>
> Thanks, Paolo.
>
> ///
>


Re: Don't vectorise zero-step rmw operations (PR 84485)

2018-03-05 Thread Richard Biener
On Fri, Mar 2, 2018 at 3:12 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
  wrote:
> GCC 6 and 7 would vectorise:
>
> void
> f (unsigned long incx, unsigned long incy,
>float *restrict dx, float *restrict dy)
> {
>   unsigned long ix = 0, iy = 0;
>   for (unsigned long i = 0; i < 512; ++i)
> {
>   dy[iy] += dx[ix];
>   ix += incx;
>   iy += incy;
> }
> }
>
> without first proving that incy is nonzero.  This is a regression from
> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based
> on whether incy is zero, but that's obviously too invasive to backport.
> This patch instead bails out for non-constant steps in the place that
> trunk would try a check for zeroness.
>
> I did wonder about trying to use range information to prove nonzeroness
> for SSA_NAMEs, but that would be entirely new code and didn't seem
> suitable for a release branch.
>
> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase
> to trunk too.

 Given dist == 0 isn't it enough to test either DR_STEP (dra) or
 DR_STEP (drb)?
 That seems what trunk is doing (just look at dr_zero_step_indicator of 
 dra).
 Even when not using range-info I think that using !
 tree_expr_nonzero_p (DR_STEP (dra))
 is more to the point of the issue we're fixing -- that also would catch
 integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but 
 your
 patch wouldn't so it looks like a more "complete" fix.
>>>
>>> OK.
>>>
 Last but not least trunk and your patch guards all this by
 !loop->force_vectorize.
 But force_vectorize doesn't give any such guarantee that step isn't
 zero so I wonder
 why we deliberately choose to possibly miscompile stuff here?
>>>
>>> This was based on the pre-existing:
>>>
>>>   if (loop_vinfo && integer_zerop (step))
>>> {
>>>   GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
>>>   if (!nested_in_vect_loop_p (loop, stmt))
>>> return DR_IS_READ (dr);
>>>   /* Allow references with zero step for outer loops marked
>>>  with pragma omp simd only - it guarantees absence of
>>>  loop-carried dependencies between inner loop iterations.  */
>>>   if (!loop->force_vectorize)
>>> {
>>>   if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>>  "zero step in inner loop of nest\n");
>>>   return false;
>>> }
>>> }
>>>
>>> AIUI #pragma omp simd really does guarantee that iterations of
>>> the loop can be executed concurrently (up to the limit given by
>>> safelen if present).  So something like:
>>>
>>>   #pragma omp simd
>>>   for (int i = 0; i < n; ++i)
>>> a[i * step] += 1;
>>>
>>> would be incorrect for step==0.  (#pragma ordered simd forces
>>> things to be executed in order where necessary.)
>>
>> But then we should check safelen, not force_vectorize...
>
> OK.  Following on from irc, this version uses expr_not_equal_to
> instead of tree_expr_nonzero_p.  It also adds safelen to the existing
> use of force_vectorize (which seemed safer than replacing it).
>
> Tested on aarch64-linux-gnu.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-02-28  Richard Sandiford  
>
> gcc/
> PR tree-optimization/84485
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
> true for zero dependence distances if the step might be zero,
> and if there is no metadata that guarantees correctness.
> (vect_analyze_data_ref_access): Check safelen as well as
> force_vectorize.
>
> gcc/testsuite/
> PR tree-optimization/84485
> * gcc.dg/vect/pr84485.c: New test.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-02-28 14:19:22.326678337 +
> +++ gcc/tree-vect-data-refs.c   2018-03-02 13:28:06.339321095 +
> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
> }
> }
>
> + unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));
> + if (loop->safelen < 2
> + && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"step could be zero.\n");
> + return true;
> +   }
> +
>   continue;
> }
>
> @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat
>/* Allow references with zero step for outer loops marked
>  with pragma omp simd only - it guarantee

C++ PATCH for c++/84707, ICE with nested anonymous namespace

2018-03-05 Thread Marek Polacek
Since Nathan's r253489 we seem to not use anon_identifier anymore; rather, the
DECL_NAME is simply NULL.  This crashed in duplicate_decls on this invalid code
because UDLIT_OPER_P was blithely used on a possibly null tree.  Other spots in
this function check this, too.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-03-05  Marek Polacek  

PR c++/84707
* decl.c (duplicate_decls): Check DECL_NAME before accessing
UDLIT_OPER_P.

* g++.dg/cpp0x/inline-ns10.C: New test.

diff --git gcc/cp/decl.c gcc/cp/decl.c
index 1866e8f3574..b2e19a6549d 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -1410,7 +1410,9 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
   || TREE_TYPE (olddecl) == error_mark_node)
 return error_mark_node;
 
-  if (UDLIT_OPER_P (DECL_NAME (newdecl))
+  if (DECL_NAME (newdecl)
+  && DECL_NAME (olddecl)
+  && UDLIT_OPER_P (DECL_NAME (newdecl))
   && UDLIT_OPER_P (DECL_NAME (olddecl)))
 {
   if (TREE_CODE (newdecl) == TEMPLATE_DECL
diff --git gcc/testsuite/g++.dg/cpp0x/inline-ns10.C 
gcc/testsuite/g++.dg/cpp0x/inline-ns10.C
index e69de29bb2d..3ab425f7be4 100644
--- gcc/testsuite/g++.dg/cpp0x/inline-ns10.C
+++ gcc/testsuite/g++.dg/cpp0x/inline-ns10.C
@@ -0,0 +1,8 @@
+// PR c++/84707
+// { dg-do compile { target c++11 } }
+
+inline namespace {
+  namespace {}
+}
+
+namespace {} // { dg-error "conflicts" }

Marek


Re: [PATCH, rs6000] Fix PR84264: ICE in rs6000_emit_le_vsx_store

2018-03-05 Thread Segher Boessenkool
Hi Peter,

On Sat, Mar 03, 2018 at 10:55:28PM -0600, Peter Bergner wrote:
> In PR84264, we hit an assert in rs6000_emit_le_vsx_store causing an ICE
> in LRA.  We get there, because LRA called the movv4si expander to generate
> a spill and the mov pattern calls rs6000_emit_le_vsx_move which in turn
> calls rs6000_emit_le_vsx_store.  The rs6000_emit_le_vsx_{load,store}
> routines are used at expand time when targeting P8 on an LE system to
> generate vsx load/store insns along with their associated byte swaps.
> After expand, we shouldn't call them, hence the asserts.
> 
> The problem in this case is that LRA calls the movv4si expander to
> generate a spill store and we satisfy all the conditions that lead
> to calling rs6000_emit_le_vsx_move().  What is different here, is that
> with GCC 8, we now generate altivec lvx/stvx insns which are LE friendly
> and do not need byte swaps.  In this specific case, LRA is generating
> a store to an altivec mem, so we shouldn't call rs6000_emit_le_vsx_move(),
> but rather we should just emit the default RTL from the expander.
> 
> The simple fix here is to just verify the memory_operand is not an altivec
> mem operand before calling rs6000_emit_le_vsx_move.
> 
> This passed bootstrap and regtesting on powerpc64le-linux with no
> regressions.  Ok for trunk?

Yes this looks correct, okay for trunk.  Thanks.  But it is very
non-obvious; maybe a comment will help, or the code can be restructured
a bit?


Segher


[PATCH] Fix PR84486

2018-03-05 Thread Richard Biener

The following fixes a missed optimization caused by code hoisting.
When PRE inserts expressions somewhere the SSA names created in that
process do not have any associated info like alignment.

Fixed for the very specific case of __builtin_assume_aligned.

Bootstrapped and tested on x86_64-unknown-linux-gnu, desired effect
verified by reporter, applied to trunk.

Richard.

2018-03-05  Richard Biener  

PR tree-optimization/84486
* tree-ssa-pre.c (create_expression_by_pieces): Remove dead code.
When inserting a __builtin_assume_aligned call set the LHS
SSA name alignment info accordingly.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 258124)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2736,11 +2736,7 @@ create_expression_by_pieces (basic_block
  unsigned int operand = 1;
  vn_reference_op_t currop = &ref->operands[0];
  tree sc = NULL_TREE;
- tree fn;
- if (TREE_CODE (currop->op0) == FUNCTION_DECL)
-   fn = currop->op0;
- else
-   fn = find_or_generate_expression (block, currop->op0, stmts);
+ tree fn  = find_or_generate_expression (block, currop->op0, stmts);
  if (!fn)
return NULL_TREE;
  if (currop->op1)
@@ -2758,14 +2754,27 @@ create_expression_by_pieces (basic_block
return NULL_TREE;
  args.quick_push (arg);
}
- gcall *call
-   = gimple_build_call_vec ((TREE_CODE (fn) == FUNCTION_DECL
- ? build_fold_addr_expr (fn) : fn), args);
+ gcall *call = gimple_build_call_vec (fn, args);
  gimple_call_set_with_bounds (call, currop->with_bounds);
  if (sc)
gimple_call_set_chain (call, sc);
  tree forcedname = make_ssa_name (currop->type);
  gimple_call_set_lhs (call, forcedname);
+ /* There's no CCP pass after PRE which would re-compute alignment
+information so make sure we re-materialize this here.  */
+ if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED)
+ && args.length () - 2 <= 1
+ && tree_fits_uhwi_p (args[1])
+ && (args.length () != 3 || tree_fits_uhwi_p (args[2])))
+   {
+ unsigned HOST_WIDE_INT halign = tree_to_uhwi (args[1]);
+ unsigned HOST_WIDE_INT hmisalign
+   = args.length () == 3 ? tree_to_uhwi (args[2]) : 0;
+ if ((halign & (halign - 1)) == 0
+ && (hmisalign & ~(halign - 1)) == 0)
+   set_ptr_info_alignment (get_ptr_info (forcedname),
+   halign, hmisalign);
+   }
  gimple_set_vuse (call, BB_LIVE_VOP_ON_EXIT (block));
  gimple_seq_add_stmt_without_update (&forced_stmts, call);
  folded = forcedname;


Re: Rename __builtin_rs6000_speculation_barrier

2018-03-05 Thread Segher Boessenkool
Hi!

On Sun, Mar 04, 2018 at 01:52:20PM -0600, Bill Schmidt wrote:
> We realized recently that the use of "rs6000" in a builtin name doesn't agree
> with our normal naming conventions.  Thus this patch changes such a builtin
> to __builtin_powerpc_speculation_barrier instead.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
> this okay for trunk?

Most of our builtins have *no* powerpc or similar in the name, and the
two that do have "ppc".

Should we use __builtin_speculation_barrier?  It sounds more likely
that other archs will want that as well, than that it will conflict.

For the .md patterns we already have various that are named rs6000_*
(and none ppc* or power*).


Segher


Re: [RFC][PR82479] missing popcount builtin detection

2018-03-05 Thread Richard Biener
On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 1 February 2018 at 23:21, Richard Biener  
> wrote:
>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 31 January 2018 at 21:39, Richard Biener  
>>> wrote:
 On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
  wrote:
> Hi Richard,
>
> Thanks for the review.
> On 25 January 2018 at 20:04, Richard Biener  
> wrote:
>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi All,
>>>
>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>> would like to queue this for review for next stage 1.
>>>
>>> 1. This is done part of loop-distribution and effective for -O3 and 
>>> above.
>>> 2. This does not distribute loop to detect popcount (like
>>> memcpy/memmove). I dont think that happens in practice. Please correct
>>> me if I am wrong.
>>
>> But then it has no business inside loop distribution but instead is
>> doing final value
>> replacement, right?  You are pattern-matching the whole loop after all.  
>> I think
>> final value replacement would already do the correct thing if you
>> teached number of
>> iteration analysis that niter for
>>
>>[local count: 955630224]:
>>   # b_11 = PHI 
>>   _1 = b_11 + -1;
>>   b_8 = _1 & b_11;
>>   if (b_8 != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 850510900]:
>>   goto ; [100.00%]
>
> I am looking into this approach. What should be the scalar evolution
> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
> and can this be represented with the scev?

 No, it's not affine and thus cannot be represented.  You only need the
 scalar evolution of the counting IV which is already handled and
 the number of iteration analysis needs to handle the above IV - this
 is the missing part.
>>> Thanks for the clarification. I am now matching this loop pattern in
>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>> the loop preheater and setting the loop niter with this. This will be
>>> used by the final value replacement. Is this what you wanted?
>>
>> No, you shouldn't insert a popcount stmt but instead the niter
>> GENERIC tree should be a CALL_EXPR to popcount with the
>> appropriate argument.
>
> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
> niter in tree_niter_desc can take such.
>
> Attached patch now does this. Also had to add support for CALL_EXPR in
> few places to handle niter with CALL_EXPR. Does this look OK?

Overall this looks ok - the patch includes changes in places that I don't think
need changes such as chrec_convert_1 or extract_ops_from_tree.
The expression_expensive_p change should be more specific than making
all calls inexpensive as well.

The verify_ssa change looks bogus, you do

+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+  var = build_call_expr (fn, 1, src);
+  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+   build_int_cst (TREE_TYPE (dest), 1));

why do you allocate a new SSA name here?  It seems unused
as you overwrive 'var' with the CALL_EXPR immediately.

I didn't review the pattern matching thoroughly nor the exact place you
call it.  But

+  if (check_popcount_pattern (loop, &count))
+   {
+ niter->assumptions = boolean_false_node;
+ niter->control.base = NULL_TREE;
+ niter->control.step = NULL_TREE;
+ niter->control.no_overflow = false;
+ niter->niter = count;
+ niter->assumptions = boolean_true_node;
+ niter->may_be_zero = boolean_false_node;
+ niter->max = -1;
+ niter->bound = NULL_TREE;
+ niter->cmp = ERROR_MARK;
+ return true;
+   }

simply setting may_be_zero to false looks fishy.  Try
with -fno-tree-loop-ch.  Also max should not be negative,
it should be the number of bits in the IV type?

A related testcase could be that we can completely peel
a loop like the following which iterates at most 8 times:

int a[8];
void foo (unsigned char ctrl)
{
  int c = 0;
  while (ctrl)
{
   ctrl = ctrl & (ctrl - 1);
   a[c++] = ctrl;
}
}

This is now stage1 material so please update and re-post.  Maybe Bin has
further suggestions as well.

Thanks,
Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2018-02-08  Kugan Vivekanandarajah  
>
> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
> * tree-chrec.c (chrec_convert_1): Likewise.
> * tree-scalar-evolution.c (expression_expensive_p): Likewise.
> * tree-ssa-loop-ivopts.c (contains_abnormal_s

Re: Patch ping

2018-03-05 Thread Kirill Yukhin
Hello Jakub,

On Friday, March 2, 2018, Jakub Jelinek  wrote:

> Hi!
>
> I'd like to ping 2 patches:
>
> http://gcc.gnu.org/ml/gcc-patches/2018-02/msg01340.html
>   - PR target/84524 avx512* wrong-code bug

Patch is OK.


>
> http://gcc.gnu.org/ml/gcc-patches/2018-02/msg01337.html
>   - fix c-c++-common/Warray-bounds-2.c testcase for -fpic
>
> Thanks
>
> Jakub
>

—
Thanks, K


Re: Rename __builtin_rs6000_speculation_barrier

2018-03-05 Thread Bill Schmidt

> On Mar 5, 2018, at 9:04 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Sun, Mar 04, 2018 at 01:52:20PM -0600, Bill Schmidt wrote:
>> We realized recently that the use of "rs6000" in a builtin name doesn't agree
>> with our normal naming conventions.  Thus this patch changes such a builtin
>> to __builtin_powerpc_speculation_barrier instead.
>> 
>> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
>> this okay for trunk?
> 
> Most of our builtins have *no* powerpc or similar in the name, and the
> two that do have "ppc".
> 
> Should we use __builtin_speculation_barrier?  It sounds more likely
> that other archs will want that as well, than that it will conflict.

That was my initial proposal.  Richard asked me to insert the "rs6000_" to keep 
this
in a separate namespace for the time being.  There is still discussion about a
more general speculation barrier builtin for all targets, though that has 
languished
for over a month now.
> 
> For the .md patterns we already have various that are named rs6000_*
> (and none ppc* or power*).
> 
Would it be reasonable to go with __builtin_ppc_speculation_barrier for the
builtin name and revert to rs6000_* in the .md patterns, etc.?

Thanks,
Bill

> 
> Segher
> 



Re: C++ PATCH for c++/84707, ICE with nested anonymous namespace

2018-03-05 Thread Nathan Sidwell

On 03/05/2018 09:47 AM, Marek Polacek wrote:

Since Nathan's r253489 we seem to not use anon_identifier anymore; rather, the
DECL_NAME is simply NULL.  This crashed in duplicate_decls on this invalid code
because UDLIT_OPER_P was blithely used on a possibly null tree.  Other spots in
this function check this, too.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.


2018-03-05  Marek Polacek  

PR c++/84707
* decl.c (duplicate_decls): Check DECL_NAME before accessing
UDLIT_OPER_P.

* g++.dg/cpp0x/inline-ns10.C: New test.




--
Nathan Sidwell


[PR c++/84694] ICE on template friend decl

2018-03-05 Thread Nathan Sidwell
This fixes an ICE with a friend decl.  Although the reported source is 
invalid, we can turn it into valid, but useless, source.  ISTM that we 
should be generating a raw identifier_node here, but finding the 
TEMPLATE_DECL at parse time.  But that's a change for another day.  This 
restores the previous non-crashing behaviour.


nathan
--
Nathan Sidwell
2018-03-05  Nathan Sidwell  

	PR c++/84694
	* friend.c (do_friend): Restore check for identifier_p inside
	TEMPLATE_ID_EXPR.

	PR c++/84694
	* g++.dg/template/pr84694.C: New.

Index: cp/friend.c
===
--- cp/friend.c	(revision 258244)
+++ cp/friend.c	(working copy)
@@ -495,7 +495,8 @@ do_friend (tree ctype, tree declarator,
   if (TREE_CODE (declarator) == TEMPLATE_ID_EXPR)
 {
   declarator = TREE_OPERAND (declarator, 0);
-  declarator = OVL_NAME (declarator);
+  if (!identifier_p (declarator))
+	declarator = OVL_NAME (declarator);
 }
 
   if (ctype)
Index: testsuite/g++.dg/template/pr84694.C
===
--- testsuite/g++.dg/template/pr84694.C	(revision 0)
+++ testsuite/g++.dg/template/pr84694.C	(working copy)
@@ -0,0 +1,6 @@
+// PR c++84694 ICE on friend decl
+template
+struct a {
+  template void b();
+  friend void b<0>(); // ICEd with useless friend decl
+};


Re: Patch ping

2018-03-05 Thread Jan Hubicka

Dne 2018-03-05 17:13, Jakub Jelinek napsal:

Hi!

I'd like to ping following patch:

http://gcc.gnu.org/ml/gcc-patches/2018-02/msg01461.html
  PR84564 - fix ICE with -mforce-indirect-call


OK,
thanks!
Honza


Thanks

Jakub




Re: Rename __builtin_rs6000_speculation_barrier

2018-03-05 Thread Segher Boessenkool
On Mon, Mar 05, 2018 at 09:58:50AM -0600, Bill Schmidt wrote:
> 
> > On Mar 5, 2018, at 9:04 AM, Segher Boessenkool  
> > wrote:
> > 
> > Hi!
> > 
> > On Sun, Mar 04, 2018 at 01:52:20PM -0600, Bill Schmidt wrote:
> >> We realized recently that the use of "rs6000" in a builtin name doesn't 
> >> agree
> >> with our normal naming conventions.  Thus this patch changes such a builtin
> >> to __builtin_powerpc_speculation_barrier instead.
> >> 
> >> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
> >> this okay for trunk?
> > 
> > Most of our builtins have *no* powerpc or similar in the name, and the
> > two that do have "ppc".
> > 
> > Should we use __builtin_speculation_barrier?  It sounds more likely
> > that other archs will want that as well, than that it will conflict.
> 
> That was my initial proposal.  Richard asked me to insert the "rs6000_" to 
> keep this
> in a separate namespace for the time being.  There is still discussion about a
> more general speculation barrier builtin for all targets, though that has 
> languished
> for over a month now.
> > 
> > For the .md patterns we already have various that are named rs6000_*
> > (and none ppc* or power*).
> > 
> Would it be reasonable to go with __builtin_ppc_speculation_barrier for the
> builtin name and revert to rs6000_* in the .md patterns, etc.?

Yeah that sounds fine.  Okay for trunk with such changes.  Thanks!


Segher


[PATCH][GCC][ARM] Fix can_change_mode_class for big-endian

2018-03-05 Thread Tamar Christina
Hi All,

Taking the subreg of a vector mode on big-endian may result in an infinite
recursion and eventually a segfault once we run out of stack space.

As an example, taking a subreg of V4HF to SImode we end up in the following
loop on big-endian:

#861 0x008462e9 in operand_subword_force src/gcc/gcc/emit-rtl.c:1787
#862 0x00882a90 in emit_move_multi_word src/gcc/gcc/expr.c:3621
#863 0x0087eea1 in emit_move_insn_1 src/gcc/gcc/expr.c:3698
#864 0x0087f350 in emit_move_insn src/gcc/gcc/expr.c:3757
#865 0x0085e326 in copy_to_reg src/gcc/gcc/explow.c:603
#866 0x008462e9 in operand_subword_force src/gcc/gcc/emit-rtl.c:1787

The reason is that operand_subword_force will always fail. When the value is in
a register that can't be accessed as a multi word the code tries to create a new
psuedo register and emit the value to it. Eventually you end up in 
simplify_gen_subreg
which calls validate_subreg.

validate_subreg will however always fail because of the REG_CAN_CHANGE_MODE_P 
check.

On little endian this check always returns true. On big-endian this check is 
supposed
to prevent values that have a size larger than word size, due to those being 
stored in
VFP registers.

However we are only interested in a subreg of the vector mode, so we should be 
checking
the unit size, not the size of the entire mode. Doing this fixes the problem.

Regtested on armeb-none-eabi and no regressions.
Bootstrapped on arm-none-linux-gnueabihf and no issues.

Ok for trunk? and for backport to GCC 7?

Thanks,
Tamar

gcc/
2018-03-05  Tamar Christina  

PR target/84711
* config/arm/arm.c (arm_can_change_mode_class): Use GET_MODE_UNIT_SIZE
instead of GET_MODE_SIZE when comparing Units.

gcc/testsuite/
2018-03-05  Tamar Christina  

PR target/84711
* gcc.target/arm/big-endian-subreg.c: New.

-- 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 90d62e699bce9594879be2e3016c9b36c7e064c8..703632240822e762a906570964b949c783df56f3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -31508,8 +31508,8 @@ arm_can_change_mode_class (machine_mode from, machine_mode to,
 {
   if (TARGET_BIG_END
   && !(GET_MODE_SIZE (from) == 16 && GET_MODE_SIZE (to) == 8)
-  && (GET_MODE_SIZE (from) > UNITS_PER_WORD
-	  || GET_MODE_SIZE (to) > UNITS_PER_WORD)
+  && (GET_MODE_UNIT_SIZE (from) > UNITS_PER_WORD
+	  || GET_MODE_UNIT_SIZE (to) > UNITS_PER_WORD)
   && reg_classes_intersect_p (VFP_REGS, rclass))
 return false;
   return true;
diff --git a/gcc/testsuite/gcc.target/arm/big-endian-subreg.c b/gcc/testsuite/gcc.target/arm/big-endian-subreg.c
new file mode 100644
index ..4b1ab122f349e61e296fe3f76030a7a258e55f62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/big-endian-subreg.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm_hf_eabi } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-mfp16-format=ieee -mfloat-abi=hard" } */
+
+typedef __fp16 v4f16
+  __attribute__ ((vector_size (8)));
+
+v4f16 fn1 (v4f16 p)
+{
+  return p;
+}



Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp

2018-03-05 Thread Michael Matz
Hi,

On Wed, 28 Feb 2018, Jeff Law wrote:

> The single successor test was strictly my paranoia WRT abnormal/EH edges.
> 
> I don't immediately see why the CFG would be incorrect if the successor
> of the setjmp block has multiple preds.

Actually, without further conditions I don't see how it would be safe for 
the successor to have multiple preds.  We might have this situation:

bb1: ret = setjmp
bb2: x0 = phi 

As you noted the second "return" from setjmp is precisely after the setjmp 
call itself, i.e. on the edge bb1->bb2.  Simply regarding it as landing at 
the start of bb2 it becomes unclear from which edge bb2 was entered and 
hence the runtime model of PHI nodes breaks down.

So, the mental model would be that a hypothetical setjmp_receiver (which 
isn't hypothetical for SJLJ) would have to be inserted on the bb1->bb2 
edge.  From then normal edge insertion routines take over: because the 
bb1-setjmp block only has a single successor (I know it currently doesn't, 
bear with me) it's actually inserted at the end of bb1 (so that edge 
doesn't need splitting), which is indeed what you want.  Except that to 
have a target for the abnormal edges the setjmp_receiver needs to start 
its own basic block, so you'd need to create a new edge, which then 
implicitely creates the situation that the successor of the setjmp block 
only has a single predecessor (until you add the abnormal edges to the 
receiver of course):

bb1 : setjmp; /* fallthrough */
bb1': ret = setjmp_receiver /* fallthrough */
bg2 : x0 = phi

While this models what actually happens with setjmp quite fine it poses 
the problem that nothing must be moved between setjmp and setjmp_receiver, 
so it seems indeed better to leave them as just one instruction.  But the 
edge modelling needs to ensure that the above runtime model is followed, 
and so needs to ensure that the actual target of the abnormal edge doesn't 
have multiple predecessors (before adding the abnormal edge that is), so 
the edge must be split I think.

Actually I wonder if it weren't better to regard return-twice functions 
like COND_EXPR, and always create two outgoing edges of such blocks, one 
the normal first-return, the other the (abnormal) second-return.  The 
target block of that second-return would then also be the target of all 
function calls potentially calling longjmp, and so on.  That target block 
would then only have abnormal predecessors, and your problem would have 
also gone away.  In addition a setjmp call would then be normally 
eliminable by unreachable-code analysis (and the second-return target 
block as well eventually)  This would also naturally deal with the 
problem, that while the real second return is after the setjmp call it 
happens before the setting of the return value.


Ciao,
Michael.


Patch ping

2018-03-05 Thread Jakub Jelinek
Hi!

I'd like to ping following patch:

http://gcc.gnu.org/ml/gcc-patches/2018-02/msg01461.html
  PR84564 - fix ICE with -mforce-indirect-call

Thanks

Jakub


[PR c++/84702] ICE with default tmpl arg of overload set

2018-03-05 Thread Nathan Sidwell
This fixes an ICE where we were not correctly marking a lookup as kept, 
resulting in an assertion failure later on.


nathan
--
Nathan Sidwell
2018-03-05  Nathan Sidwell  

	PR c++/84702
	* pt.c (process_template_arg): Mark lookup_keep on a default arg.

	PR c++/84702
	* g++.dg/lookup/pr84702.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 258254)
+++ cp/pt.c	(working copy)
@@ -4425,6 +4425,9 @@ process_template_parm (tree list, locati
 
   pushdecl (decl);
 
+  if (defval && TREE_CODE (defval) == OVERLOAD)
+lookup_keep (defval, true);
+  
   /* Build the parameter node linking the parameter declaration,
  its default argument (if any), and its constraints (if any). */
   parm = build_tree_list (defval, parm);
Index: testsuite/g++.dg/lookup/pr84702.C
===
--- testsuite/g++.dg/lookup/pr84702.C	(revision 0)
+++ testsuite/g++.dg/lookup/pr84702.C	(working copy)
@@ -0,0 +1,13 @@
+// PR c++/84702 failure to mark overload to keep
+// { dg-do compile { target c++11 } }
+
+void a ();
+
+namespace {
+  void a (int);
+}
+
+template
+void c () {
+  c ();
+}


libgo patch committed: Fix typo in mksysinfo.sh script

2018-03-05 Thread Ian Lance Taylor
This libgo patch by Than McIntosh fixes a small typo in the
mksysinfo.sh script (incorrect input file for a grep command).
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 258112)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-262d629b1592f681fef396166a671e46cdb31230
+3287064c24cbf0c50776cdb87a720d29130b4363
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 257914)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1142,7 +1142,7 @@ grep '^const _RLIM_' gen-sysinfo.go |
 sed -e 's/^\(const \)_\(RLIM_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
 if test "${rlimit}" = "_rlimit64" && grep '^const _RLIM64_INFINITY ' 
gen-sysinfo.go > /dev/null 2>&1; then
   echo 'const RLIM_INFINITY = _RLIM64_INFINITY' >> ${OUT}
-elif grep '^const _RLIM_INFINITY ' gen-sysinfo-go; then
+elif grep '^const _RLIM_INFINITY ' gen-sysinfo.go > /dev/null 2>&1; then
   echo 'const RLIM_INFINITY = _RLIM_INFINITY' >> ${OUT}
 fi
 


Re: [config patch] Fwd from binutils: add ax_pthread.m4 to config/ directory

2018-03-05 Thread Joshua Watt
On Fri, 2018-02-23 at 08:54 -0600, Joshua Watt wrote:
> On Mon, 2018-02-19 at 17:24 -0800, Cary Coutant wrote:
> > Please see this patch posted to the binutils list:
> > 
> >https://sourceware.org/ml/binutils/2018-02/msg00260.html
> > 
> > where Joshua proposes to add the ax_pthread.m4 script, from the
> > autoconf macro archive, to the config/ directory in order to
> > improve
> > gold's configure-time detection of thread support.
> > 
> > Is the config/ part of this patch OK?
> > 
> > config/
> > * ax_pthread.m4: New file.
> > 
> > -cary
> 
> Ping? I can repost this as a patch here instead of linking to the
> binutils thread if necessary.

Ping?

> 
> Thanks,
> Joshua Watt


Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp

2018-03-05 Thread Jeff Law
On 03/05/2018 11:30 AM, Michael Matz wrote:
> Hi,
> 
> On Wed, 28 Feb 2018, Jeff Law wrote:
> 
>> The single successor test was strictly my paranoia WRT abnormal/EH edges.
>>
>> I don't immediately see why the CFG would be incorrect if the successor
>> of the setjmp block has multiple preds.
> 
> Actually, without further conditions I don't see how it would be safe for 
> the successor to have multiple preds.  We might have this situation:
> 
> bb1: ret = setjmp
> bb2: x0 = phi 
No.  Can't happen -- we're still building the initial CFG.  There are no
PHI nodes, there are no SSA_NAMEs.

We have two choices we can either target the setjmp itself, which is
what we've been doing and is inaccurate.  Or we can target the point
immediately after the setjmp, which is accurate.

After we have created the CFG, we'll proceed to build dominance
frontiers, compute lifetimes, etc that are nececessary to place the PHI
nodes.




> 
> As you noted the second "return" from setjmp is precisely after the setjmp 
> call itself, i.e. on the edge bb1->bb2.  Simply regarding it as landing at 
> the start of bb2 it becomes unclear from which edge bb2 was entered and 
> hence the runtime model of PHI nodes breaks down.
?!?Again, we don't have PHIs and we're not simply regarding the
setjmp as landing at the start of BB2.  We are creating an edge from the
dispatcher to BB2.


Jeff


[PATCH] PR fortran/56667 -- Issue a sane error message

2018-03-05 Thread Steve Kargl
For the mangled code in the new testcase, gfortran issues
a somewhat obtuse error message.  The problem is the matcher
for a complex entity runs prior to the matcher for an implied
do-loop.  Errors are queued in that order so an error messagei
for a mangled complex constant is emitted.

Regression tested on x86_64-*-freebsd.

2018-03-07  Steven G. Kargl  

PR fortran/56667
* primary.c (match_sym_complex_part): Give the matcher for an implied
do-loop a chance to run.

2018-03-07  Steven G. Kargl  

PR fortran/56667
* gfortran.dg/implied_do_2.f90: New test.
* gfortran.dg/coarray_8.f90: Update for new error message.

-- 
Steve
Index: gcc/fortran/primary.c
===
--- gcc/fortran/primary.c	(revision 258227)
+++ gcc/fortran/primary.c	(working copy)
@@ -1248,8 +1248,22 @@ match_sym_complex_part (gfc_expr **result)
 
   if (sym->attr.flavor != FL_PARAMETER)
 {
-  gfc_error ("Expected PARAMETER symbol in complex constant at %C");
-  return MATCH_ERROR;
+  /* Give the matcher for implied do-loops a chance to run.  This yields
+	 a much saner error message for "write(*,*) (i, i=1, 6" where the 
+	 right parenthesis is missing.  */
+  char c;
+  gfc_gobble_whitespace ();
+  c = gfc_peek_ascii_char ();
+  if (c == '=' || c == ',')
+	{
+	  m = MATCH_NO;
+	}
+  else
+	{
+	  gfc_error ("Expected PARAMETER symbol in complex constant at %C");
+	  m = MATCH_ERROR;
+	}
+  return m;
 }
 
   if (!sym->value)
Index: gcc/testsuite/gfortran.dg/implied_do_2.f90
===
--- gcc/testsuite/gfortran.dg/implied_do_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/implied_do_2.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/56667
+program error_message
+   implicit none
+   integer :: ir
+   write(*,*) ( ir, ir = 1,10! { dg-error "Expected a right parenthesis" }
+end program error_message 
Index: gcc/testsuite/gfortran.dg/coarray_8.f90
===
--- gcc/testsuite/gfortran.dg/coarray_8.f90	(revision 258227)
+++ gcc/testsuite/gfortran.dg/coarray_8.f90	(working copy)
@@ -145,7 +145,7 @@ end module mmm4
 
 subroutine tfgh()
   integer :: i(2)
-  DATA i/(i, i=1,2)/ ! { dg-error "Expected PARAMETER symbol" }
+  DATA i/(i, i=1,2)/ ! { dg-error "Syntax error in DATA" }
   do i = 1, 5 ! { dg-error "cannot be an array" }
   end do ! { dg-error "Expecting END SUBROUTINE" }
 end subroutine tfgh
@@ -153,7 +153,7 @@ end subroutine tfgh
 subroutine tfgh2()
   integer, save :: x[*]
   integer :: i(2)
-  DATA i/(x, x=1,2)/ ! { dg-error "Expected PARAMETER symbol" }
+  DATA i/(x, x=1,2)/ ! { dg-error "Syntax error in DATA" }
   do x = 1, 5 ! { dg-error "cannot be a coarray" }
   end do ! { dg-error "Expecting END SUBROUTINE" }
 end subroutine tfgh2


[PATCH] rs6000: Don't align tiny loops to 32 bytes for POWER9

2018-03-05 Thread Segher Boessenkool
For POWER4..POWER8 we align loops of 5..8 instructions to 32 bytes
(instead of to 16 bytes) because that executes faster.  This is no
longer the case on POWER9, so we can just as well only align to 16
bytes.

Bootstrapped and tested on a p9 powerpc64le-linux.  Committing to trunk.


Segher


2018-03-05  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_loop_align): Don't align tiny loops
to 32 bytes when compiling for POWER9.

---
 gcc/config/rs6000/rs6000.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index dbc1d79..5cc116f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5285,8 +5285,7 @@ rs6000_loop_align (rtx label)
  || rs6000_tune == PROCESSOR_POWER5
  || rs6000_tune == PROCESSOR_POWER6
  || rs6000_tune == PROCESSOR_POWER7
- || rs6000_tune == PROCESSOR_POWER8
- || rs6000_tune == PROCESSOR_POWER9))
+ || rs6000_tune == PROCESSOR_POWER8))
 return 5;
   else
 return align_loops_log;
-- 
1.8.3.1



Re: [PATCH] [Microblaze]: PIC Data Text Relative

2018-03-05 Thread Michael Eager

On 03/02/2018 08:12 AM, Andrew Sadek wrote:

Hello Michael,

I tried running the whole GCC test suite on the current head (without my 
patch) along with 'microblaze-qemu' but I have the following problems:


1) The test is hanging at 'gcc.c-torture/string-large-1.c' , the gcc is 
making a 100% CPU usage and the machine stucks.
I tried compiling the file alone, it generated a couple of warnings than 
it hangs.
  warning: '__builtin_memchr' specified size 4294967295 exceeds maximum 
object size 2147483647 [-Wstringop-overflow=]

    vp1 = __builtin_memchr (a, b, SIZE1);

Is it a bug? Is there something wrong with my configuration ?
GCC configured with options :  --with-newlib --enable-threads=no 
--disable-shared --with-pkgversion='crosstool-NG 
crosstool-ng-1.23.0-280-g01e3290' --enable-__cxa_atexit 
--disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp 
--disable-libquadmath --disable-libquadmath-support --enable-lto 
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic 
-lm' --enable-target-optspace --disable-nls --enable-multiarch 
--enable-languages=c,c++


Your configuration is more complex than my hard-metal target version,
but it looks reasonable.

The problem with string-large-1.c does appear to be a bug.  You can
add a line to the test case which will mark it as known failure for MB:

  /* { dg-xfail-if "exceeds maximum" { microblaze-*-* } } */

(I have not tested this, but it should work.  Compare with other
xfail's.)

2) For running QEMU, I have no problem with elf execution. But I do not 
know how to auto terminate the QEMU itself  as it remains up even after 
program execution.
Is there some command to be passed to QEMU in order make system shut 
down after program termination with its exit code ?


Yes, this is a problem.  For other targets using QEMU I have added dummy
HLT instructions to terminate QEMU, or used a wrapper around QEMU which
sets breakpoints at exit (or _exit) and stops the simulator when hit.

If you are running Linux on QEMU, instead of using QEMU's built-in gdb
interface you might use the Linux system as the target for the test
suite, running gdbserver on the target.




On Tue, Feb 27, 2018 at 10:13 AM, Andrew Sadek 
mailto:andrew.sadek...@gmail.com>> wrote:


Thanks Micheal for your response.
I shall re-submit patches separately after re-running the whole GCC
Test suite and re-checking code conventions.
For sending to gdb-patches, it was a conflict from my side as
actually I thought it is also for binutils.

On Tue, Feb 27, 2018 at 2:07 AM, Michael Eager mailto:ea...@eagerm.com>> wrote:

On 02/25/2018 11:44 PM, Andrew Guirguis wrote:

Dears,

Kindly find attached the patch bundle for Microblaze
'-mpic-data-text-relative' feature.

Description of the feature in the following link:

https://github.com/andrewsadek/microblaze-pic-data-text-rel/blob/pic_data_text_rel/README.md



>

Bundle includes:
1) Change logs for GCC, binutils
2) GCC Test results and comparison with the original.
3) New Test case (picdtr.c)
4) The Patches (against current heads)


Hi Andrew --

Thanks for the submission.  I have the following recommendations:

Submit each patch to the appropriate project mailing list.  Only
submit
the patch for the specific project, without patches for other
projects.

Include a description of the changes with each patch as well as the
changelog.  Include the patch in your email or as an attachment.

It isn't clear why you sent your submission to the gdb-patches
mailing
list, since there don't appear to be any GDB changes. 
Conversely, it is

not clear why you did not include the binutils mailing list,
since you
include a patch to that project.

Be sure to follow GNU coding conventions,  Check brace placement,
indent, maximum line length, if statements, etc.  I noticed a number
of places where these conventions are not followed in your patches.

GCC regression tests should include all tests (e.g., gcc.dg),
not just the limited number of MicroBlaze-specific tests.

-- 
Michael Eager ea...@eagerm.com 

1960 Park Blvd., Palo Alto, CA 94306




-- 


Andrew




--

Andrew


--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306


Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp

2018-03-05 Thread Michael Matz
Hi,

On Mon, 5 Mar 2018, Jeff Law wrote:

> >> The single successor test was strictly my paranoia WRT abnormal/EH 
> >> edges.
> >>
> >> I don't immediately see why the CFG would be incorrect if the 
> >> successor of the setjmp block has multiple preds.
> > 
> > Actually, without further conditions I don't see how it would be safe 
> > for the successor to have multiple preds.  We might have this 
> > situation:
> > 
> > bb1: ret = setjmp
> > bb2: x0 = phi 
> No.  Can't happen -- we're still building the initial CFG.  There are no
> PHI nodes, there are no SSA_NAMEs.

While that is currently true I think it's short-sighted.  Thinking about 
the situation in terms of SSA names and PHI nodes clears up the mind.  In 
addition there is already code which builds (sub-)cfgs when SSA form 
exists (gimple_find_sub_bbs).  Currently that code can't ever generate 
setjmp calls, so it's not an issue.

> We have two choices we can either target the setjmp itself, which is
> what we've been doing and is inaccurate.  Or we can target the point
> immediately after the setjmp, which is accurate.

Not precisely, because the setting of the return value of setjmp does 
happen after both returns.  So moving the whole second-return edge target 
to after the setjmp call (when it includes an LHS) is not correct 
(irrespective how the situation in the successor BBs like like).

> > As you noted the second "return" from setjmp is precisely after the setjmp 
> > call itself, i.e. on the edge bb1->bb2.  Simply regarding it as landing at 
> > the start of bb2 it becomes unclear from which edge bb2 was entered and 
> > hence the runtime model of PHI nodes breaks down.
> ?!?Again, we don't have PHIs and we're not simply regarding the
> setjmp as landing at the start of BB2.  We are creating an edge from the
> dispatcher to BB2.

Sure, the dispatcher is in between, but I don't regard it as material for 
the issue at hand: it's really
 ret=setjmp --(ab)-> dispatch --(ab)-> XXX
 any-other-call --(ab)-> dispatch
and the question is what XXX should be.  It should be after setjmp for 
precision, but must be before 'ret='.  I was ignoring the dispatcher and 
just said that setjmp and all calls directly transfer to XXX (and then 
discussed what the XXX may be).

So, even if you chose to ignore SSA names and PHI nodes (which probably is 
fine for gcc8) you still have a problem of ignoring the effect on the LHS.


Ciao,
Michael.


[C++ PATCH] Small performance improvement for constexpr_call_hasher::equal (PR c++/84684)

2018-03-05 Thread Jakub Jelinek
Hi!

This doesn't actually fix this PR (Marek is working on that), but
just something I've noticed while analyzing the PR.
We have the hashes saved in the structure (to speed up hash table
expansion), so it is a waste not to test those also in the equal hook,
by giving up cheaply in cases of hash table collisions.

Additionally, the method returns bool, so this patch uses true/false
instead of 1/0.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-05  Jakub Jelinek  

PR c++/84684
* constexpr.c (constexpr_call_hasher::equal): Return false if
lhs->hash != rhs->hash.  Change return 1 to return true and
return 0 to return false.

--- gcc/cp/constexpr.c.jj   2018-03-05 16:11:08.510165108 +0100
+++ gcc/cp/constexpr.c  2018-03-05 16:14:06.130229884 +0100
@@ -1033,9 +1033,11 @@ constexpr_call_hasher::equal (constexpr_
   tree lhs_bindings;
   tree rhs_bindings;
   if (lhs == rhs)
-return 1;
+return true;
+  if (lhs->hash != rhs->hash)
+return false;
   if (!constexpr_fundef_hasher::equal (lhs->fundef, rhs->fundef))
-return 0;
+return false;
   lhs_bindings = lhs->bindings;
   rhs_bindings = rhs->bindings;
   while (lhs_bindings != NULL && rhs_bindings != NULL)
@@ -1044,7 +1046,7 @@ constexpr_call_hasher::equal (constexpr_
   tree rhs_arg = TREE_VALUE (rhs_bindings);
   gcc_assert (TREE_TYPE (lhs_arg) == TREE_TYPE (rhs_arg));
   if (!cp_tree_equal (lhs_arg, rhs_arg))
-return 0;
+return false;
   lhs_bindings = TREE_CHAIN (lhs_bindings);
   rhs_bindings = TREE_CHAIN (rhs_bindings);
 }

Jakub


[PATCH] Fix reg-stack error-recovery ICE (PR inline-asm/84683)

2018-03-05 Thread Jakub Jelinek
Hi!

If we discover some bad inline-asm during reg-stack processing and we
error on those, we replace that inline-asm with a (use (const_int 0))
and therefore the various assumptions of reg-stack pass may not hold.
Seems we already have a couple of spots which are more permissive if
any_malformed_asm is true, this patch just adds another one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-05  Jakub Jelinek  

PR inline-asm/84683
* reg-stack.c (move_for_stack_reg): If any_malformed_asm, avoid
assertion failure.

* g++.dg/ext/pr84683.C: New test.

--- gcc/reg-stack.c.jj  2018-01-03 10:19:55.0 +0100
+++ gcc/reg-stack.c 2018-03-05 17:41:15.558415480 +0100
@@ -1170,7 +1170,8 @@ move_for_stack_reg (rtx_insn *insn, stac
  && XINT (SET_SRC (XVECEXP (pat, 0, 1)), 1) == UNSPEC_TAN)
emit_swap_insn (insn, regstack, dest);
   else
-   gcc_assert (get_hard_regnum (regstack, dest) < FIRST_STACK_REG);
+   gcc_assert (get_hard_regnum (regstack, dest) < FIRST_STACK_REG
+   || any_malformed_asm);
 
   gcc_assert (regstack->top < REG_STACK_SIZE);
 
--- gcc/testsuite/g++.dg/ext/pr84683.C.jj   2018-03-05 17:45:32.901475529 
+0100
+++ gcc/testsuite/g++.dg/ext/pr84683.C  2018-03-05 17:44:52.527467872 +0100
@@ -0,0 +1,13 @@
+// PR inline-asm/84683
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+// { dg-options "-O2" }
+
+void
+foo (float b, double c)
+{
+  for (int e = 0; e < 2; e++)
+{
+  asm volatile ("" : "+f" (c));// { dg-error "must specify a single 
register" }
+  asm ("" : "+rm" (c = b));
+}
+}

Jakub


[PATCH] accept attribute nonstring on all narrow characters (PR 84725)

2018-03-05 Thread Martin Sebor

Attribute nonstring is currently only allowed on arrays and
pointers to plain char, but -Wstringop-truncation triggers
even for strncpy calls whose arguments are arrays of signed
or unsigned char (with or without a cast to char*).

To help deal with -Wstringop-truncation in the Linux kernel
it was suggested to me that it would be useful to be able
to make use of the attribute on all three narrow char types.
Apparently, there are enough calls to strncpy in the Linux
kernel with arguments of the other char types that trigger
the new warning, and the warning is considered sufficiently
useful that making use of the attribute to suppress
the warning rather than changing it ignore the other
two char types is preferable.

The attached patch relaxes the restriction and lets GCC
accept attribute nonstring on all three narrow character
types as well as their qualified forms.

Tested on x86_64-linux.

Martin
PR tree-optimization/84725 - enable attribute nonstring for all narrow character types

gcc/c-family/ChangeLog:

	PR tree-optimization/84725
	* c-attribs.c (handle_nonstring_attribute): Allow attribute nonstring
	with all three narrow character types, including their qualified forms.

gcc/testsuite/ChangeLog:

	PR tree-optimization/84725
	* c-c++-common/Wstringop-truncation-4.c: New test.
	* c-c++-common/attr-nonstring-5.c: New test.

Index: gcc/c-family/c-attribs.c
===
--- gcc/c-family/c-attribs.c	(revision 258259)
+++ gcc/c-family/c-attribs.c	(working copy)
@@ -3194,8 +3194,13 @@ handle_nonstring_attribute (tree *node, tree name,
 
   if (POINTER_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE)
 	{
+	  /* Accept the attribute on arrays and pointers to all three
+	 narrow character types.  */
 	  tree eltype = TREE_TYPE (type);
-	  if (eltype == char_type_node)
+	  eltype = TYPE_MAIN_VARIANT (eltype);
+	  if (eltype == char_type_node
+	  || eltype == signed_char_type_node
+	  || eltype == unsigned_char_type_node)
 	return NULL_TREE;
 	}
 
Index: gcc/testsuite/c-c++-common/Wstringop-truncation-4.c
===
--- gcc/testsuite/c-c++-common/Wstringop-truncation-4.c	(nonexistent)
+++ gcc/testsuite/c-c++-common/Wstringop-truncation-4.c	(working copy)
@@ -0,0 +1,125 @@
+/* PR middle-end/84725 - enable attribute nonstring for all narrow character
+   types
+   Verify that -Wstringop-truncation is issued for uses of arrays and
+   pointers to qualified forms of characters of all three types.
+   { dg-do compile }
+   { dg-options "-O -Wall -Wstringop-truncation" } */
+
+#if __cplusplus
+extern "C"
+#endif
+char* strncpy (char*, const char*, __SIZE_TYPE__);
+
+#define S "1234"
+
+struct Arrays
+{
+  char a[4];
+  signed char b[4];
+  unsigned char c[4];
+};
+
+void test_arrays (struct Arrays *p, const char *s)
+{
+  strncpy (p->a, s, sizeof p->a);
+  strncpy ((char*)p->b, s, sizeof p->b);
+  strncpy ((char*)p->c, s, sizeof p->c);
+}
+
+struct Pointers
+{
+  char *p;
+  signed char *q;
+  unsigned char *r;
+};
+
+void test_pointers (struct Pointers *p)
+{
+  strncpy (p->p, S, sizeof S - 1);
+  strncpy ((char*)p->q, S, sizeof S - 1);
+  strncpy ((char*)p->r, S, sizeof S - 1);
+}
+
+struct ConstArrays
+{
+  const char a[4];
+  const signed char b[4];
+  const unsigned char c[4];
+};
+
+void test_const_arrays (struct ConstArrays *p, const char *s)
+{
+  strncpy (p->a, s, sizeof p->a);
+  strncpy ((char*)p->b, s, sizeof p->b);
+  strncpy ((char*)p->c, s, sizeof p->c);
+}
+
+struct ConstPointers
+{
+  const char *p;
+  const signed char *q;
+  const unsigned char *r;
+};
+
+void test_const_pointers (struct ConstPointers *p)
+{
+  strncpy (p->p, S, sizeof S - 1);  /* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->q, S, sizeof S - 1);   /* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->r, S, sizeof S - 1);   /* { dg-warning [\\\[-Wstringop-truncation" } */
+}
+
+struct VolatileArrays
+{
+  volatile char a[4];
+  volatile signed char b[4];
+  volatile unsigned char c[4];
+};
+
+void test_volatile_arrays (struct VolatileArrays *p, const char *s)
+{
+  strncpy (p->a, s, sizeof p->a);   /* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->b, s, sizeof p->b);/* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->c, s, sizeof p->c);/* { dg-warning [\\\[-Wstringop-truncation" } */
+}
+
+struct VolatilePointers
+{
+  volatile char *p;
+  volatile signed char *q;
+  volatile unsigned char *r;
+};
+
+void test_volatile_pointers (struct VolatilePointers *p)
+{
+  strncpy (p->p, S, sizeof S - 1);  /* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->q, S, sizeof S - 1);   /* { dg-warning [\\\[-Wstringop-truncation" } */
+  strncpy ((char*)p->r, S, sizeof S - 1);   /* { dg-warning [\\\[-Wstringop-truncation" } */
+}
+
+struct ConstVolatileArrays
+{
+  const volatile char a[4];
+

[PATCH] Fix C++ ref op= ICE in the gimplifier with -g (PR c++/84704)

2018-03-05 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because ARRAY_REF on lhs of op=
has a statement list with no side-effects as index (containing
DEBUG_BEGIN_STMT and integer_zero_node), stabilize_reference
does nothing to it (as it has no side-effects), then the array ref
is unshared in unshare_body (mostly_copy_tree_r, which copies
ARRAY_REF, but doesn't copy STATEMENT_LIST) and finally gimplifier
when processing the first ARRAY_REF destroys the STATEMENT_LIST
(moves the 0; out of it and voidifies the rest) and finally when gimplifying
the second ARRAY_REF we ICE because the index has void type.

The following patch fixes it by forcing SAVE_EXPR for STATEMENT_LIST
even when it doesn't have side-effects, that way even the DEBUG_BEGIN_STMTs
are emitted just once rather than multiple times and we don't really need
to unshare it specially.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-05  Jakub Jelinek  

PR c++/84704
* tree.c (stabilize_reference_1): Return save_expr (e) for
STATEMENT_LIST even if it doesn't have side-effects.

* g++.dg/debug/pr84704.C: New test.

--- gcc/tree.c.jj   2018-02-22 12:37:02.634387690 +0100
+++ gcc/tree.c  2018-03-05 10:50:54.37537 +0100
@@ -4352,6 +4352,11 @@ stabilize_reference_1 (tree e)
   switch (TREE_CODE_CLASS (code))
 {
 case tcc_exceptional:
+  /* Always wrap STATEMENT_LIST into SAVE_EXPR, even if it doesn't
+have side-effects.  */
+  if (code == STATEMENT_LIST)
+   return save_expr (e);
+  /* FALLTHRU */
 case tcc_type:
 case tcc_declaration:
 case tcc_comparison:
--- gcc/testsuite/g++.dg/debug/pr84704.C.jj 2018-03-05 11:52:37.558715635 
+0100
+++ gcc/testsuite/g++.dg/debug/pr84704.C2018-03-05 11:53:17.794720038 
+0100
@@ -0,0 +1,11 @@
+// PR c++/84704
+// { dg-do compile }
+// { dg-options "-g -fcompare-debug -O2" }
+
+int a[1] = { 0 };
+
+void
+foo ()
+{
+  a[({ 0; })] %= 5;
+}

Jakub


Re: [PATCH] [ARC] Cleanup unused functions.

2018-03-05 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-02-20 15:38:32 +0200]:

> From: Claudiu Zissulescu 
> 
> Cleanup unsed functions and macros.
> 
> OK to apply?
> Claudiu

Looks good.  Nice cleanup.

Thanks,
Andrew


> 
> gcc/
> 2018-01-26  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_finalize_pic): Remove function.
>   (arc_must_save_register): We use single base PIC register, remove
>   checks to save/restore the PIC register.
>   (arc_expand_prologue): Likewise.
>   * config/arc/arc-protos.h (arc_set_default_type_attributes):
>   Remove.
>   (arc_verify_short): Likewise.
>   (arc_attr_type): Likewise.
>   * config/arc/arc.c (arc_set_default_type_attributes): Remove.
>   (walk_stores): Likewise.
>   (arc_address_cost): Make it static.
>   (arc_verify_short): Likewise.
>   (branch_dest): Likewise.
>   (arc_attr_type): Likewise.
>   * config/arc/arc.c (TARGET_ADJUST_INSN_LENGTH): Remove.
>   (TARGET_INSN_LENGTH_PARAMETERS): Likewise.
>   (arc_final_prescan_insn): Remove inserting the nops due to
>   hardware hazards.  It is done in reorg step.
>   (insn_length_variant_t): Remove.
>   (insn_length_parameters_t): Likewise.
>   (arc_insn_length_parameters): Likewise.
>   (arc_get_insn_variants): Likewise.
>   * config/arc/arc.h (TARGET_UPSIZE_DBR): Remove.
> ---
>  gcc/config/arc/arc-protos.h |   3 -
>  gcc/config/arc/arc.c| 471 
> +---
>  gcc/config/arc/arc.h|   3 -
>  3 files changed, 4 insertions(+), 473 deletions(-)
> 
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index b469cfc4c2c..75cfedab7f1 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -32,7 +32,6 @@ extern bool arc_double_limm_p (rtx);
>  extern void arc_print_operand (FILE *, rtx, int);
>  extern void arc_print_operand_address (FILE *, rtx);
>  extern void arc_final_prescan_insn (rtx_insn *, rtx *, int);
> -extern void arc_set_default_type_attributes(tree type);
>  extern const char *arc_output_libcall (const char *);
>  extern bool prepare_extend_operands (rtx *operands, enum rtx_code code,
>machine_mode omode);
> @@ -97,10 +96,8 @@ extern void split_addsi (rtx *);
>  extern void split_subsi (rtx *);
>  extern void arc_pad_return (void);
>  extern void arc_split_move (rtx *);
> -extern int arc_verify_short (rtx_insn *insn, int unalign, int);
>  extern const char *arc_short_long (rtx_insn *insn, const char *, const char 
> *);
>  extern rtx arc_regno_use_in (unsigned int, rtx);
> -extern int arc_attr_type (rtx_insn *);
>  extern bool arc_scheduling_not_expected (void);
>  extern bool arc_sets_cc_p (rtx_insn *insn);
>  extern int arc_label_align (rtx_insn *label);
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index f572d6f4883..74007d8aba3 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -473,8 +473,6 @@ static void arc_function_arg_advance (cumulative_args_t, 
> machine_mode,
> const_tree, bool);
>  static rtx arc_legitimize_address_0 (rtx, rtx, machine_mode mode);
>  
> -static void arc_finalize_pic (void);
> -
>  /* initialize the GCC target structure.  */
>  #undef  TARGET_COMP_TYPE_ATTRIBUTES
>  #define TARGET_COMP_TYPE_ATTRIBUTES arc_comp_type_attributes
> @@ -612,10 +610,6 @@ static void arc_finalize_pic (void);
>  
>  #define TARGET_LEGITIMIZE_ADDRESS arc_legitimize_address
>  
> -#define TARGET_ADJUST_INSN_LENGTH arc_adjust_insn_length
> -
> -#define TARGET_INSN_LENGTH_PARAMETERS arc_insn_length_parameters
> -
>  #undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
>  #define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P   \
>arc_no_speculation_in_delay_slots_p
> @@ -2131,14 +2125,6 @@ arc_comp_type_attributes (const_tree type1,
>return 1;
>  }
>  
> -/* Set the default attributes for TYPE.  */
> -
> -void
> -arc_set_default_type_attributes (tree type ATTRIBUTE_UNUSED)
> -{
> -  gcc_unreachable();
> -}
> -
>  /* Misc. utilities.  */
>  
>  /* X and Y are two things to compare using CODE.  Emit the compare insn and
> @@ -2368,7 +2354,7 @@ arc_setup_incoming_varargs (cumulative_args_t 
> args_so_far,
>  /* Provide the costs of an addressing mode that contains ADDR.
> If ADDR is not a valid address, its cost is irrelevant.  */
>  
> -int
> +static int
>  arc_address_cost (rtx addr, machine_mode, addr_space_t, bool speed)
>  {
>switch (GET_CODE (addr))
> @@ -2703,10 +2689,6 @@ arc_must_save_register (int regno, struct function 
> *func)
>&& !firq_auto_save_p)
>  return true;
>  
> -  if (flag_pic && crtl->uses_pic_offset_table
> -  && regno == PIC_OFFSET_TABLE_REGNUM)
> -return true;
> -
>return false;
>  }
>  
> @@ -3296,10 +3278,6 @@ arc_expand_prologue (void)
>   emit_insn (gen_stack_tie (stack_pointer_rtx,
> hard_frame_pointer_rtx));
>  }
> -
> -  /* Setup the gp registe

Re: [C++ PATCH] Small performance improvement for constexpr_call_hasher::equal (PR c++/84684)

2018-03-05 Thread Jason Merrill
OK.

On Mon, Mar 5, 2018 at 3:31 PM, Jakub Jelinek  wrote:
> Hi!
>
> This doesn't actually fix this PR (Marek is working on that), but
> just something I've noticed while analyzing the PR.
> We have the hashes saved in the structure (to speed up hash table
> expansion), so it is a waste not to test those also in the equal hook,
> by giving up cheaply in cases of hash table collisions.
>
> Additionally, the method returns bool, so this patch uses true/false
> instead of 1/0.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-03-05  Jakub Jelinek  
>
> PR c++/84684
> * constexpr.c (constexpr_call_hasher::equal): Return false if
> lhs->hash != rhs->hash.  Change return 1 to return true and
> return 0 to return false.
>
> --- gcc/cp/constexpr.c.jj   2018-03-05 16:11:08.510165108 +0100
> +++ gcc/cp/constexpr.c  2018-03-05 16:14:06.130229884 +0100
> @@ -1033,9 +1033,11 @@ constexpr_call_hasher::equal (constexpr_
>tree lhs_bindings;
>tree rhs_bindings;
>if (lhs == rhs)
> -return 1;
> +return true;
> +  if (lhs->hash != rhs->hash)
> +return false;
>if (!constexpr_fundef_hasher::equal (lhs->fundef, rhs->fundef))
> -return 0;
> +return false;
>lhs_bindings = lhs->bindings;
>rhs_bindings = rhs->bindings;
>while (lhs_bindings != NULL && rhs_bindings != NULL)
> @@ -1044,7 +1046,7 @@ constexpr_call_hasher::equal (constexpr_
>tree rhs_arg = TREE_VALUE (rhs_bindings);
>gcc_assert (TREE_TYPE (lhs_arg) == TREE_TYPE (rhs_arg));
>if (!cp_tree_equal (lhs_arg, rhs_arg))
> -return 0;
> +return false;
>lhs_bindings = TREE_CHAIN (lhs_bindings);
>rhs_bindings = TREE_CHAIN (rhs_bindings);
>  }
>
> Jakub


C++ PATCH for c++/84708, ICE with lambda in local class NSDMI

2018-03-05 Thread Jason Merrill
Looking for fake NSDMI 'this' in scope_chain only works for non-local
classes.  But if we can't find it, we can make our own fake.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8fd062c984befc3419f8a0a61224d0f4ecf02495
Author: Jason Merrill 
Date:   Mon Mar 5 16:45:36 2018 -0500

PR c++/84708 - ICE with lambda in local class NSDMI.

* lambda.c (lambda_expr_this_capture): Handle local class NSDMI
context.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 3f77df037a2..345b210e89c 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -757,25 +757,31 @@ lambda_expr_this_capture (tree lambda, bool add_capture_p)
 tlambda,
 lambda_stack);
 
-	  if (LAMBDA_EXPR_EXTRA_SCOPE (tlambda)
-	  && !COMPLETE_TYPE_P (LAMBDA_EXPR_CLOSURE (tlambda))
-	  && TREE_CODE (LAMBDA_EXPR_EXTRA_SCOPE (tlambda)) == FIELD_DECL)
+	  tree closure = LAMBDA_EXPR_CLOSURE (tlambda);
+	  tree containing_function
+	= decl_function_context (TYPE_NAME (closure));
+
+	  tree ex = LAMBDA_EXPR_EXTRA_SCOPE (tlambda);
+	  if (ex && TREE_CODE (ex) == FIELD_DECL)
 	{
-	  /* In an NSDMI, we don't have a function to look up the decl in,
-		 but the fake 'this' pointer that we're using for parsing is
-		 in scope_chain.  But if the closure is already complete, we're
-	 in an instantiation of a generic lambda, and the fake 'this'
-	 is gone.  */
-	  init = scope_chain->x_current_class_ptr;
+	  /* Lambda in an NSDMI.  We don't have a function to look up
+		 'this' in, but we can find (or rebuild) the fake one from
+		 inject_this_parameter.  */
+	  if (!containing_function && !COMPLETE_TYPE_P (closure))
+		/* If we're parsing a lambda in a non-local class,
+		   we can find the fake 'this' in scope_chain.  */
+		init = scope_chain->x_current_class_ptr;
+	  else
+		/* Otherwise it's either gone or buried in
+		   function_context_stack, so make another.  */
+		init = build_this_parm (NULL_TREE, DECL_CONTEXT (ex),
+	TYPE_UNQUALIFIED);
 	  gcc_checking_assert
 		(init && (TREE_TYPE (TREE_TYPE (init))
 			  == current_nonlambda_class_type ()));
 	  break;
 	}
 
-	  tree closure_decl = TYPE_NAME (LAMBDA_EXPR_CLOSURE (tlambda));
-	  tree containing_function = decl_function_context (closure_decl);
-
 	  if (containing_function == NULL_TREE)
 	/* We ran out of scopes; there's no 'this' to capture.  */
 	break;
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi9.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi9.C
new file mode 100644
index 000..fe64f459479
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi9.C
@@ -0,0 +1,15 @@
+// PR c++/84708
+// { dg-do run { target c++14 } }
+
+int main()
+{
+  struct Z
+  {
+int i;
+int b = ([&] { return i; }());
+Z(int i): i(i) {}
+  } z (42);
+
+  if (z.b != 42)
+__builtin_abort ();
+}


[PATCH] Fix ICE with exp.simdclone.0 (PR tree-optimization/84687)

2018-03-05 Thread Jakub Jelinek
Hi!

This patch clears DECL_BUILT_IN on simd clones, similarly how cgraphclones.c
does:
  /* When signature changes, we need to clear builtin info.  */
  if (DECL_BUILT_IN (new_decl)
  && args_to_skip
  && !bitmap_empty_p (args_to_skip))
{
  DECL_BUILT_IN_CLASS (new_decl) = NOT_BUILT_IN;
  DECL_FUNCTION_CODE (new_decl) = (enum built_in_function) 0;
}
because simd clones are always signature changes (would be nice if we had
some way to optimize these later, but seems it wouldn't be easy),
and in order not to regress optimization-wise, also an early optimization
in match.pd - we have this now deferred till late folding of pow(C,x)
to exp(log(C)*x), and also exp(x)*exp(y) folding to exp(x+y), if we
already have one exp (or exp2 or exp10 etc.) call in the multiplication,
there is no reason to defer it any longer, all we do is trade the one
pow call and one exp{,2,10} call and one multiplication for that
exp{,2,10} call plus one multiplication and one addition, the addition
surely will be less expensive than pow and I think precision should not be
that bad either (and it is -ffast-math guarded anyway).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-05  Jakub Jelinek  

PR tree-optimization/84687
* omp-simd-clone.c (simd_clone_create): Clear DECL_BUILT_IN_CLASS
on new_node->decl.
* match.pd (pow(C,x)*expN(y) -> expN(logN(C)*x+y)): New optimization.

* gcc.dg/pr84687.c: New test.

--- gcc/omp-simd-clone.c.jj 2018-02-13 09:33:31.107560174 +0100
+++ gcc/omp-simd-clone.c2018-03-05 16:47:56.943365091 +0100
@@ -456,6 +456,8 @@ simd_clone_create (struct cgraph_node *o
   if (new_node == NULL)
 return new_node;
 
+  DECL_BUILT_IN_CLASS (new_node->decl) = NOT_BUILT_IN;
+  DECL_FUNCTION_CODE (new_node->decl) = (enum built_in_function) 0;
   TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
   DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
   DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
--- gcc/match.pd.jj 2018-02-20 14:55:28.988215213 +0100
+++ gcc/match.pd2018-03-05 16:52:11.486487079 +0100
@@ -4030,6 +4030,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (exps (mult (logs @0) @1))
   (exp2s (mult (log2s @0) @1)))
 
+ /* pow(C,x)*expN(y) -> expN(logN(C)*x+y) if C > 0.  */
+ (for pows (POW)
+  exps (EXP EXP2 EXP10 POW10)
+  logs (LOG LOG2 LOG10 LOG10)
+  (simplify
+   (mult:c (pows:s REAL_CST@0 @1) (exps:s @2))
+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
+   && real_isfinite (TREE_REAL_CST_PTR (@0)))
+(exps (plus (mult (logs @0) @1) @2)
+
  (for sqrts (SQRT)
   cbrts (CBRT)
   pows (POW)
--- gcc/testsuite/gcc.dg/pr84687.c.jj   2018-03-05 16:45:57.020307612 +0100
+++ gcc/testsuite/gcc.dg/pr84687.c  2018-03-05 16:45:41.977300398 +0100
@@ -0,0 +1,19 @@
+/* PR tree-optimization/84687 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+int a[64], b;
+double pow (double, double);
+__attribute__((__simd__)) double exp (double);
+
+void
+foo (double x)
+{
+  int i;
+  double c = exp (x);
+  for (i = 0; i < 64; i++)
+{
+  b = i;
+  a[i] = pow (12.0, b) * pow (c, i);
+}
+}

Jakub


Re: [PATCH] replace ICE with error for failed template deduction (PR 84355)

2018-03-05 Thread Martin Sebor

On 02/23/2018 07:32 PM, Jason Merrill wrote:

On Sun, Feb 18, 2018 at 11:39 PM, Jason Merrill  wrote:

On Fri, Feb 16, 2018 at 4:33 PM, Martin Sebor  wrote:

On 02/16/2018 07:04 AM, Jason Merrill wrote:


On Thu, Feb 15, 2018 at 6:36 PM, Martin Sebor  wrote:


A failed template deduction in template member of a template
triggers an ICE with -std=c++17 due to what seems like
a missing handling of invalid input.  Replacing
the gcc_unreachable() call that causes the ICE with a return
statement indicating the deduction failure eliminates the ICE
and restores sane diagnostics.



Hmm, we really shouldn't have gotten there; that assert is checking
that when we see a TEMPLATE_*_PARM node in the template signature, it
corresponds to one of the actual parms of the template.  Sounds like
something is going wrong in build_deduction_guide.



Are you suggesting that build_deduction_guide should fail somehow
(it's not expected to fail right now) or that the guide it creates
is wrong?


The latter.  Maybe we're handling T wrong somehow?  We shouldn't be
trying to deduce it.  In fact, we probably shouldn't be trying to
deduce arguments for 'b' until we instantiate A.


Looks like the problem is that when we substitute into the
TEMPLATE_TYPE_PARM representing 'B' in the function, we don't touch
CLASS_PLACEHOLDER_TEMPLATE:

else if (tree pl = CLASS_PLACEHOLDER_TEMPLATE (t))
  {
if (DECL_TEMPLATE_TEMPLATE_PARM_P (pl))
  pl = tsubst (pl, args, complain, in_decl);
CLASS_PLACEHOLDER_TEMPLATE (r) = pl;
  }

This code is failing to replace A::B with A::B.


I don't know what to do here/what you're suggesting.  Before
the call to tsubst() pl is a TEMPLATE_DECL of struct A::B.
Calling tsubst() works and replaces the ICE with a reasonable
error but then causes an ICE in cpp1z/class-deduction19.C.
There, pl is also TEMPLATE_DECL, and I'm not sure how to
differentiate between the two at this level.

I was hoping I could fix this ICE quickly but I've spent too
much time on it with no progress so unless I'm just being
dense by missing your hint I think I'm going to have to give
up on this bug.

Martin


[committed] Fix infinite recursion in combine_simplify_rtx (PR target/84700)

2018-03-05 Thread Jakub Jelinek
Hi!

On the following testcase on powerpc64-linux -m32 we end up calling
combine_simplify_rtx on
(plus:SI (ltu:SI (plus:SI (if_then_else:SI (eq (reg:CC 147)
(const_int 0 [0]))
(subreg:SI (reg:DI 126) 4)
(reg:SI 146))
(subreg:SI (reg:DI 126) 4))
(if_then_else:SI (eq (reg:CC 147)
(const_int 0 [0]))
(subreg:SI (reg:DI 126) 4)
(reg:SI 146)))
(reg:SI 133 [ iftmp.0_2 ]))
and if_then_else_cond returns non-NULL on it, but neither false_rtx nor
true_rtx are comparisons and false_rtx is identical to x, so we subst
the false_rtx again with pc_rtx, pc_rtx and call combine_simplify_rtx with
different copy of the same expression and recurse that way forever.

Fixed thusly, bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux,
on powerpc64-linux including -m32, preapproved by Segher in the PR,
committed to trunk.

2018-03-05  Jakub Jelinek  

PR target/84700
* combine.c (combine_simplify_rtx): Don't try to simplify if
if_then_else_cond returned non-NULL, but either true_rtx or false_rtx
are equal to x.

* gcc.target/powerpc/pr84700.c: New test.

--- gcc/combine.c.jj2018-03-02 10:15:05.0 +0100
+++ gcc/combine.c   2018-03-05 18:34:31.135061249 +0100
@@ -5734,7 +5734,11 @@ combine_simplify_rtx (rtx x, machine_mod
  /* If everything is a comparison, what we have is highly unlikely
 to be simpler, so don't use it.  */
  && ! (COMPARISON_P (x)
-   && (COMPARISON_P (true_rtx) || COMPARISON_P (false_rtx
+   && (COMPARISON_P (true_rtx) || COMPARISON_P (false_rtx)))
+ /* Similarly, if we end up with one of the expressions the same
+as the original, it is certainly not simpler.  */
+ && ! rtx_equal_p (x, true_rtx)
+ && ! rtx_equal_p (x, false_rtx))
{
  rtx cop1 = const0_rtx;
  enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1);
--- gcc/testsuite/gcc.target/powerpc/pr84700.c.jj   2018-03-05 
18:39:48.075184332 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr84700.c  2018-03-05 18:40:37.065205545 
+0100
@@ -0,0 +1,12 @@
+/* PR target/84700 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -misel" } */
+
+long long int
+foo (long long int x)
+{
+  long long int a = x < 2;
+  int b = a >= 0;
+
+  return a + ((x == 0) ? a : b);
+}

Jakub


Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp

2018-03-05 Thread Jeff Law
On 03/05/2018 12:30 PM, Michael Matz wrote:
> Hi,
> 
> On Mon, 5 Mar 2018, Jeff Law wrote:
> 
 The single successor test was strictly my paranoia WRT abnormal/EH 
 edges.

 I don't immediately see why the CFG would be incorrect if the 
 successor of the setjmp block has multiple preds.
>>>
>>> Actually, without further conditions I don't see how it would be safe 
>>> for the successor to have multiple preds.  We might have this 
>>> situation:
>>>
>>> bb1: ret = setjmp
>>> bb2: x0 = phi 
>> No.  Can't happen -- we're still building the initial CFG.  There are no
>> PHI nodes, there are no SSA_NAMEs.
> 
> While that is currently true I think it's short-sighted.  Thinking about 
> the situation in terms of SSA names and PHI nodes clears up the mind.  In 
> addition there is already code which builds (sub-)cfgs when SSA form 
> exists (gimple_find_sub_bbs).  Currently that code can't ever generate 
> setjmp calls, so it's not an issue.
It's not clearing up anything for me.  Clearly you're onto something
that I'm missing, but still trying to figure out.

Certainly we have to be careful WRT the implicit set of the return value
of the setjmp call that occurs on the longjmp path.  That's worth
investigating.  I suspect that works today more by accident of having an
incorrect CFG than by design.


> 
>> We have two choices we can either target the setjmp itself, which is
>> what we've been doing and is inaccurate.  Or we can target the point
>> immediately after the setjmp, which is accurate.
> 
> Not precisely, because the setting of the return value of setjmp does 
> happen after both returns.  So moving the whole second-return edge target 
> to after the setjmp call (when it includes an LHS) is not correct 
> (irrespective how the situation in the successor BBs like like).
But it does or at least it should.  It's implicitly set on the longjmp
side.  If we get this wrong I'd expect we'll see uninit uses in the PHI.
 That's easy enough to instrument and check for.

This aspect of setjmp/longjmp is, in some ways, easier to see in RTL
because the call returns its value in a hard reg which is implicitly set
by the longjmp and we immediately copy it into a pseudo.   Which would
magically DTRT if we had the longjmp edge target the point just after
the setjmp in RTL.




Jeff


Re: [PATCH] Fix C++ ref op= ICE in the gimplifier with -g (PR c++/84704)

2018-03-05 Thread Alexandre Oliva
On Mar  5, 2018, Jakub Jelinek  wrote:

>   * tree.c (stabilize_reference_1): Return save_expr (e) for
>   STATEMENT_LIST even if it doesn't have side-effects.

I'm concerned about codegen differences in case the statement list is
created because of debug stmts not present in a nondebug compile.
Wouldn't this make the nondebug compile not have a save_expr at all,
which would then likely cause different code to be generated with the
save_expr in one case and no save_expr in the other?  I guess the
compare-debug would have flagged any such differences, so I suppose my
concerns are unfounded, and in case they aren't, we'll soon find out ;-)
So, no objections from me; thanks for looking into this!

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PR c++/84593] ice on braced init with uninit ref field

2018-03-05 Thread Alexandre Oliva
On Mar  2, 2018, Jason Merrill  wrote:

> On Fri, Mar 2, 2018 at 2:57 AM, Alexandre Oliva  wrote:
>> +  gcc_assert (TREE_CODE (type) == REFERENCE_TYPE);
>> +  init = fold (convert (type, integer_zero_node));

> Maybe build_zero_cst?

Sure.


I wonder, is there any reason to not change any of these to use
build_zero_cst, too?

  else if (TYPE_PTR_OR_PTRMEM_P (type))
init = fold (convert (type, nullptr_node));
  else if (SCALAR_TYPE_P (type))
init = fold (convert (type, integer_zero_node));

I suppose pointers to members might need an explicit conversion, which
build_zero_cst might fallback to if it doesn't consider them aggregate
types.  As for scalar types, are there any C++-specific scalar types
that build_zero_cst wouldn't know how to deal with?  Anyway, it's
probably not the time to change these, if it doesn't fix a regression.
Just curious.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PR84620] output symbolic entry_view as data2, not addr

2018-03-05 Thread Alexandre Oliva
On Mar  2, 2018, Alexandre Oliva  wrote:

> Mark Wielaard is implementing support for LVU and IEPM in elfutils, and
> he was surprised by the encoding of DW_AT_GNU_entry_view; so was I!
> When GCC computes and outputs views internally (broken without internal
> view resets), it outputs entry_view attributes as data: it knows the
> view numbers.  However, when it uses the assembler to compute the views
> symbolically, it has to output the view symbolically.

> I'd chosen lbl_id to that end, without giving it much thought, but that
> was no good: it's not just space-inefficient, it uses an addr encoding,
> indirect in some cases, for something that's not a real label, but
> rather a a small assembler data constant.  Oops.

Jakub wrote (in the BZ):

> I meant to say:
> The char * GTY ((tag ("dw_val_class_symview"))) val_symbolic_view; line 
> should come at the and of the union, not before the other classes.

I've moved the union field down in the revised patch below, but I don't
see the point, and I thought it would be better to keep it close to
logically-similar entries.  If the point is just to make it parallel to
the order of the enum (which manh other entries don't seem to have cared
about), maybe moving the enum would be better?


> What guarantees the symview symbols always fit into 16 bits?  Does 
> somethingpunt if it needs more than 65536 views?

There's no guarantee it will fit in 16 bits; the assembler will warn if
it truncates a view number.  There's no real upper limit, so uleb128
would be ideal, but since that's not viable ATM, I had to pick
something, and 16 bits would cover all really useful cases than 32 or
even 64 bits would, while being more compact.  I was even tempted to go
with 8 bits, but thought that was pushing it.  I can make it 32 if you
consider it better.  Or maybe a param?


> The FIXMEs don't really look helpful, we are not going to change the 
> offset computation from compile time to assemble time, that would be 
> terribly expensive.

Why do you say it would be terribly expensive to let the assembler compute
offsets in .debug_info?


[IEPM] [PR debug/84620] use constant form for DW_AT_GNU_entry_view

When outputting entry views in symbolic mode, we used to use a lbl_id,
but that outputs the view as an addr, perhaps even in an indirect one,
which is all excessive and undesirable for a small assembler-computed
constant.

Introduce a new value class for symbolic views, so that we can output
the labels as constant data, using small forms that should suffice for
any symbolic views.

Ideally, we'd use uleb128, but then the compiler would have to defer
.debug_info offset computation to the assembler.  I'm not going there
for now, so a symbolic uleb128 assembler constant in an attribute is
not something GCC can deal with ATM.

for  gcc/ChangeLog

PR debug/84620
* dwarf2out.h (dw_val_class): Add dw_val_class_symview.
(dw_val_node): Add val_symbolic_view.
* dwarf2out.c (dw_val_equal_p): Handle symview.
(add_AT_symview): New.
(print_dw_val): Handle symview.
(attr_checksum, attr_checksum_ordered): Likewise.
(same_dw_val_p, size_of_die): Likewise.
(value_format, output_die): Likewise.
(add_high_low_attributes): Use add_AT_symview for entry_view.
---
 gcc/dwarf2out.c |   40 +++-
 gcc/dwarf2out.h |4 +++-
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 41bb11558f69..b52edee845f2 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1434,6 +1434,8 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b)
   return a->v.val_die_ref.die == b->v.val_die_ref.die;
 case dw_val_class_fde_ref:
   return a->v.val_fde_index == b->v.val_fde_index;
+case dw_val_class_symview:
+  return strcmp (a->v.val_symbolic_view, b->v.val_symbolic_view) == 0;
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
@@ -3600,6 +3602,7 @@ static addr_table_entry *add_addr_table_entry (void *, 
enum ate_kind);
 static void remove_addr_table_entry (addr_table_entry *);
 static void add_AT_addr (dw_die_ref, enum dwarf_attribute, rtx, bool);
 static inline rtx AT_addr (dw_attr_node *);
+static void add_AT_symview (dw_die_ref, enum dwarf_attribute, const char *);
 static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *);
 static void add_AT_lineptr (dw_die_ref, enum dwarf_attribute, const char *);
 static void add_AT_macptr (dw_die_ref, enum dwarf_attribute, const char *);
@@ -5114,6 +5117,21 @@ add_AT_vms_delta (dw_die_ref die, enum dwarf_attribute 
attr_kind,
   add_dwarf_attr (die, &attr);
 }
 
+/* Add a symbolic view identifier attribute value to a DIE.  */
+
+static inline void
+add_AT_symview (dw_die_ref die, enum dwarf_attribute attr_kind,
+   const char *view_label)
+{
+  dw_attr_node attr;
+
+  attr.dw_attr = attr_kind;
+  attr.dw_attr_val.val_class = dw_v

Re: [C++] [PR84231] overload on cond_expr in template

2018-03-05 Thread Alexandre Oliva
On Mar  2, 2018, Jason Merrill  wrote:

> Let's add a comment along the lines of

> /* Let lvalue_kind know this was a glvalue.  */

> OK with that change.

Thanks, here's what I'm about to check in.

[C++] [PR84231] overload on cond_expr in template

A non-type-dependent COND_EXPR within a template is reconstructed with
the original operands, after one with non-dependent proxies is built to
determine its result type.  This is problematic because the operands of
a COND_EXPR determined to be an rvalue may have been converted to denote
their rvalue nature.  The reconstructed one, however, won't have such
conversions, so lvalue_kind may not recognize it as an rvalue, which may
lead to e.g. incorrect overload resolution decisions.

If we mistake such a COND_EXPR for an lvalue, overload resolution might
regard a conversion sequence that binds it to a non-const reference as
viable, and then select that over one that binds it to a const
reference.  Only after template substitution would we rebuild the
COND_EXPR, realize it is an rvalue, and conclude the reference binding
is ill-formed, but at that point we'd have long discarded any alternate
candidates we could have used.

This patch modifies the logic that determines whether a
(non-type-dependent) COND_EXPR in a template is an lvalue, to rely on
its type, more specifically, on the presence of a REFERENCE_TYPE
wrapper.  In order to avoid a type bootstrapping problem, the
REFERENCE_TYPE that wraps the type of some such COND_EXPRs is
introduced earlier, so that we don't have to test for lvalueness of
the expression using the very code that we wish to change.


for  gcc/cp/ChangeLog

PR c++/84231
* tree.c (lvalue_kind): Use presence/absence of REFERENCE_TYPE
only while processing template decls.
* typeck.c (build_x_conditional_expr): Move wrapping of
reference type around type...
* call.c (build_conditional_expr_1): ... here.  Rename
is_lvalue to is_glvalue.
* parser.c (cp_parser_fold_expression): Catch REFERENCE_REF_P
INDIRECT_REF of COND_EXPR too.

for  gcc/testsuite/ChangeLog

PR c++/84231
* g++.dg/pr84231.C: New.
---
 gcc/cp/call.c  |   12 
 gcc/cp/parser.c|4 +++-
 gcc/cp/tree.c  |   15 +++
 gcc/cp/typeck.c|4 
 gcc/testsuite/g++.dg/pr84231.C |   29 +
 5 files changed, 55 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr84231.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 11fe28292fb1..f83d51f3457e 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4782,7 +4782,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
   tree arg3_type;
   tree result = NULL_TREE;
   tree result_type = NULL_TREE;
-  bool is_lvalue = true;
+  bool is_glvalue = true;
   struct z_candidate *candidates = 0;
   struct z_candidate *cand;
   void *p;
@@ -5037,7 +5037,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
  return error_mark_node;
}
 
-  is_lvalue = false;
+  is_glvalue = false;
   goto valid_operands;
 }
   /* [expr.cond]
@@ -5155,6 +5155,10 @@ build_conditional_expr_1 (location_t loc, tree arg1, 
tree arg2, tree arg3,
   && same_type_p (arg2_type, arg3_type))
 {
   result_type = arg2_type;
+  if (processing_template_decl)
+   /* Let lvalue_kind know this was a glvalue.  */
+   result_type = cp_build_reference_type (result_type, xvalue_p (arg2));
+
   arg2 = mark_lvalue_use (arg2);
   arg3 = mark_lvalue_use (arg3);
   goto valid_operands;
@@ -5167,7 +5171,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
  cv-qualified) class type, overload resolution is used to
  determine the conversions (if any) to be applied to the operands
  (_over.match.oper_, _over.built_).  */
-  is_lvalue = false;
+  is_glvalue = false;
   if (!same_type_p (arg2_type, arg3_type)
   && (CLASS_TYPE_P (arg2_type) || CLASS_TYPE_P (arg3_type)))
 {
@@ -5361,7 +5365,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
   /* We can't use result_type below, as fold might have returned a
  throw_expr.  */
 
-  if (!is_lvalue)
+  if (!is_glvalue)
 {
   /* Expand both sides into the same slot, hopefully the target of
 the ?: expression.  We used to check for TARGET_EXPRs here,
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e1acb07d29ef..f21257f41e7b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -4963,7 +4963,9 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
   else if (is_binary_op (TREE_CODE (expr1)))
 error_at (location_of (expr1),
  "binary expression in operand of fold-expression");
-  else if (TREE_CODE (expr1) == COND_EXPR)
+  else if (TREE_CODE (expr1) == COND_EXPR
+  || (REFERENCE_REF_P (expr1)
+

Re: [PR c++/84593] ice on braced init with uninit ref field

2018-03-05 Thread Alexandre Oliva
On Mar  2, 2018, Jason Merrill  wrote:

>> +  gcc_assert (TREE_CODE (type) == REFERENCE_TYPE);
>> +  init = fold (convert (type, integer_zero_node));

> Maybe build_zero_cst?

> OK either way.

Here's what I'm installing:

[PR c++/84593] ice on braced init with uninit ref field

If an initializer expr is to be NULL in a ctor initializer list, we
ICE in picflag_from_initializer and elsewhere.

If we're missing an initializer for a reference field, we report the
error, but then build a zero initializer to avoid the ICE.

for  gcc/cp/ChangeLog

PR c++/84593
* init.c (build_zero_init_1): Zero-initialize references.

for  gcc/testsuite/ChangeLog

PR c++/84593
* g++.dg/cpp1y/pr84593.C: New.
---
 gcc/cp/init.c|5 -
 gcc/testsuite/g++.dg/cpp1y/pr84593.C |8 
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/pr84593.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index d0d14abdc9fa..15cee17c780c 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -284,7 +284,10 @@ build_zero_init_1 (tree type, tree nelts, bool 
static_storage_p,
   else if (VECTOR_TYPE_P (type))
 init = build_zero_cst (type);
   else
-gcc_assert (TREE_CODE (type) == REFERENCE_TYPE);
+{
+  gcc_assert (TREE_CODE (type) == REFERENCE_TYPE);
+  init = build_zero_cst (type);
+}
 
   /* In all cases, the initializer is a constant.  */
   if (init)
diff --git a/gcc/testsuite/g++.dg/cpp1y/pr84593.C 
b/gcc/testsuite/g++.dg/cpp1y/pr84593.C
new file mode 100644
index ..8aa869f19193
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/pr84593.C
@@ -0,0 +1,8 @@
+// PR c++/84593
+// { dg-do compile { target c++14 } }
+
+struct a {
+  int x;
+  int c = 0;
+  int &b;
+} c = {}; // { dg-error "uninitialized reference" }


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] Fix ICE with exp.simdclone.0 (PR tree-optimization/84687)

2018-03-05 Thread Richard Biener
On March 5, 2018 9:40:28 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>This patch clears DECL_BUILT_IN on simd clones, similarly how
>cgraphclones.c
>does:
>  /* When signature changes, we need to clear builtin info.  */
>  if (DECL_BUILT_IN (new_decl)
>  && args_to_skip
>  && !bitmap_empty_p (args_to_skip))
>{
>  DECL_BUILT_IN_CLASS (new_decl) = NOT_BUILT_IN;
>  DECL_FUNCTION_CODE (new_decl) = (enum built_in_function) 0;
>}
>because simd clones are always signature changes (would be nice if we
>had
>some way to optimize these later, but seems it wouldn't be easy),
>and in order not to regress optimization-wise, also an early
>optimization
>in match.pd - we have this now deferred till late folding of pow(C,x)
>to exp(log(C)*x), and also exp(x)*exp(y) folding to exp(x+y), if we
>already have one exp (or exp2 or exp10 etc.) call in the
>multiplication,
>there is no reason to defer it any longer, all we do is trade the one
>pow call and one exp{,2,10} call and one multiplication for that
>exp{,2,10} call plus one multiplication and one addition, the addition
>surely will be less expensive than pow and I think precision should not
>be
>that bad either (and it is -ffast-math guarded anyway).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Thanks, 
Richard. 

>2018-03-05  Jakub Jelinek  
>
>   PR tree-optimization/84687
>   * omp-simd-clone.c (simd_clone_create): Clear DECL_BUILT_IN_CLASS
>   on new_node->decl.
>   * match.pd (pow(C,x)*expN(y) -> expN(logN(C)*x+y)): New optimization.
>
>   * gcc.dg/pr84687.c: New test.
>
>--- gcc/omp-simd-clone.c.jj2018-02-13 09:33:31.107560174 +0100
>+++ gcc/omp-simd-clone.c   2018-03-05 16:47:56.943365091 +0100
>@@ -456,6 +456,8 @@ simd_clone_create (struct cgraph_node *o
>   if (new_node == NULL)
> return new_node;
> 
>+  DECL_BUILT_IN_CLASS (new_node->decl) = NOT_BUILT_IN;
>+  DECL_FUNCTION_CODE (new_node->decl) = (enum built_in_function) 0;
>   TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
>   DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
>   DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
>--- gcc/match.pd.jj2018-02-20 14:55:28.988215213 +0100
>+++ gcc/match.pd   2018-03-05 16:52:11.486487079 +0100
>@@ -4030,6 +4030,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (exps (mult (logs @0) @1))
>   (exp2s (mult (log2s @0) @1)))
> 
>+ /* pow(C,x)*expN(y) -> expN(logN(C)*x+y) if C > 0.  */
>+ (for pows (POW)
>+  exps (EXP EXP2 EXP10 POW10)
>+  logs (LOG LOG2 LOG10 LOG10)
>+  (simplify
>+   (mult:c (pows:s REAL_CST@0 @1) (exps:s @2))
>+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
>+  && real_isfinite (TREE_REAL_CST_PTR (@0)))
>+(exps (plus (mult (logs @0) @1) @2)
>+
>  (for sqrts (SQRT)
>   cbrts (CBRT)
>   pows (POW)
>--- gcc/testsuite/gcc.dg/pr84687.c.jj  2018-03-05 16:45:57.020307612
>+0100
>+++ gcc/testsuite/gcc.dg/pr84687.c 2018-03-05 16:45:41.977300398 +0100
>@@ -0,0 +1,19 @@
>+/* PR tree-optimization/84687 */
>+/* { dg-do compile } */
>+/* { dg-options "-Ofast" } */
>+
>+int a[64], b;
>+double pow (double, double);
>+__attribute__((__simd__)) double exp (double);
>+
>+void
>+foo (double x)
>+{
>+  int i;
>+  double c = exp (x);
>+  for (i = 0; i < 64; i++)
>+{
>+  b = i;
>+  a[i] = pow (12.0, b) * pow (c, i);
>+}
>+}
>
>   Jakub



Re: [PATCH] Fix C++ ref op= ICE in the gimplifier with -g (PR c++/84704)

2018-03-05 Thread Jakub Jelinek
On Tue, Mar 06, 2018 at 02:28:40AM -0300, Alexandre Oliva wrote:
> On Mar  5, 2018, Jakub Jelinek  wrote:
> 
> > * tree.c (stabilize_reference_1): Return save_expr (e) for
> > STATEMENT_LIST even if it doesn't have side-effects.
> 
> I'm concerned about codegen differences in case the statement list is
> created because of debug stmts not present in a nondebug compile.
> Wouldn't this make the nondebug compile not have a save_expr at all,
> which would then likely cause different code to be generated with the
> save_expr in one case and no save_expr in the other?  I guess the
> compare-debug would have flagged any such differences, so I suppose my
> concerns are unfounded, and in case they aren't, we'll soon find out ;-)
> So, no objections from me; thanks for looking into this!

Yes, I was concerned, but couldn't create a testcase that would fail with
-fcompare-debug nor the bootstrap has found anything.

And note that the concerns weren't primarily about the SAVE_EXPRs, but about
the preexisting gimplification of the STATEMENT_LIST vs. gimplification of
just the last statement of it, where the former creates retval.N temporary
and thus if we managed to preserve some retval.M temporary till the final
dumps, it would be different for -fcompare-debug.  If we ever hit that
issue, one way out of it would be to gimplify STATEMENT_LIST without
TREE_SIDE_EFFECTS that contain only some optional DEBUG_BEGIN_STMTs and
a single other stmt differently, either but not generating the temporary
at all, or by forcing it to be nameless.

And SAVE_EXPRs can in theory have the same issue, they use
get_initialized_tmp_var with !allow_ssa, for ({ 0; }) and similar that
creates a nameless temporary decl, which is what we want and shouldn't
affect anything, but if we have say ({ i; }) or ({ &j; }), then those can
get a name.  So, we'd need to force creation of non-cached (lookup_tmp_var)
nameless temporary for SAVE_EXPRs that contain the above described
STATEMENT_LISTs.

Maybe a better fix if we run into this would be to make create_tmp_var_raw
create DECL_NAMELESS decls and make sure we don't print names of
DECL_NAMELESS variables in -fdump-final-insns= dumps?
I mean the fancy names are causing us trouble all the time, e.g. the SRA
fancy names that later get concatenated to other fancy names and sometimes
contains uids...

Jakub


Re: [PR84620] output symbolic entry_view as data2, not addr

2018-03-05 Thread Jakub Jelinek
On Tue, Mar 06, 2018 at 03:13:11AM -0300, Alexandre Oliva wrote:
> On Mar  2, 2018, Alexandre Oliva  wrote:
> 
> > Mark Wielaard is implementing support for LVU and IEPM in elfutils, and
> > he was surprised by the encoding of DW_AT_GNU_entry_view; so was I!
> > When GCC computes and outputs views internally (broken without internal
> > view resets), it outputs entry_view attributes as data: it knows the
> > view numbers.  However, when it uses the assembler to compute the views
> > symbolically, it has to output the view symbolically.
> 
> > I'd chosen lbl_id to that end, without giving it much thought, but that
> > was no good: it's not just space-inefficient, it uses an addr encoding,
> > indirect in some cases, for something that's not a real label, but
> > rather a a small assembler data constant.  Oops.
> 
> Jakub wrote (in the BZ):
> 
> > I meant to say:
> > The char * GTY ((tag ("dw_val_class_symview"))) val_symbolic_view; line 
> > should come at the and of the union, not before the other classes.
> 
> I've moved the union field down in the revised patch below, but I don't
> see the point, and I thought it would be better to keep it close to
> logically-similar entries.  If the point is just to make it parallel to
> the order of the enum (which manh other entries don't seem to have cared
> about), maybe moving the enum would be better?

I think the order should match the order of the dw_val_class entries and
should be sorted from the most commonly used ones (ones used by most
different attributes etc.), so that somebody trying to learn dwarf2out
stuff can learn it more easily (say the dw_val_class_const,
dw_val_class_const_unsigned, dw_val_class_flag first etc.), but apparently
it is completely random already, I'll defer to Jason what he wants.

> > What guarantees the symview symbols always fit into 16 bits?  Does 
> > somethingpunt if it needs more than 65536 views?
> 
> There's no guarantee it will fit in 16 bits; the assembler will warn if
> it truncates a view number.  There's no real upper limit, so uleb128
> would be ideal, but since that's not viable ATM, I had to pick
> something, and 16 bits would cover all really useful cases than 32 or
> even 64 bits would, while being more compact.  I was even tempted to go
> with 8 bits, but thought that was pushing it.  I can make it 32 if you
> consider it better.  Or maybe a param?

I'm afraid I haven't understood yet why we need labels instead of compiler
assigned numbers for the views, so not really sure I can answer this.
Can the compiler count how many different view labels it has produced or
will produce, or at least compute easily an upper bound, and decide on the
form based on that?  Or what will exactly happen if you use too small form?
Silent wrong-debug, assembler error, something else?

> > The FIXMEs don't really look helpful, we are not going to change the 
> > offset computation from compile time to assemble time, that would be 
> > terribly expensive.
> 
> Why do you say it would be terribly expensive to let the assembler compute
> offsets in .debug_info?

Because people are already complaining about slow assembly times of debug
info, and offsets are used pretty much everywhere in .debug_info.
With some assemblers that don't allow .uleb128/.sleb128 that is impossible
to do, with others you'd need to emit a new label before every DIE and all
DW_AT_type and many others would need to use .LD123456-.LD1092456 kind of
expressions which assembler would need to parse and resolve
(DW_FORM_ref{1,2,4,8,_udata}).

Jakub


Re: [PATCH] PR fortran/56667 -- Issue a sane error message

2018-03-05 Thread Thomas Koenig

Hi Steve,


For the mangled code in the new testcase, gfortran issues
a somewhat obtuse error message.  The problem is the matcher
for a complex entity runs prior to the matcher for an implied
do-loop.  Errors are queued in that order so an error messagei
for a mangled complex constant is emitted.

Regression tested on x86_64-*-freebsd.


OK for trunk.

Thanks for the patch!

Regards

Thomas


Re: [PATCH] deprecate -finline-limit and make it an optimization option (PR 84603)

2018-03-05 Thread Richard Biener
On Sat, 3 Mar 2018, Martin Sebor wrote:

> On 03/02/2018 01:05 AM, Richard Biener wrote:
> > On Thu, 1 Mar 2018, Martin Sebor wrote:
> > 
> > > While testing my recent changes to the handling of attributes
> > > on C++ templates I noticed that the -finline-limit=N option
> > > is not recognized in attribute or pragma optimize.  In response
> > > to the bug I raised, Richard you said the option is deprecated,
> > > so I went ahead and documented the deprecation in the manual
> > > and also added a deprecation warning.  I also added it to
> > > Optimization options.  The attached patch reflects these
> > > changes.
> > > 
> > > I also tried to put together a test to verify that the option
> > > works as expected, both in attributes and in pragmas.  I wasn't
> > > able to get it to work reliably or sensibly.
> > > 
> > > In C, #pragma optimize ("inline-limit=1") has a global effect
> > > on all functions in the file, even those lexically before the
> > > pragma.  In C++, the pragma has no effect.  Both of these
> > > effects are contrary to my reading of the manual.
> > > 
> > > Attribute optimize ("inline-limit") behaves similarly, which
> > > seems even more unexpected and even more contrary to the manual
> > > which by referring to Function Specific Option Pragmas suggests
> > > that both the pragma and especially the attribute apply
> > > optimizations to just the functions they're specified for.
> > 
> > It can't really work so please do _not_ add the option to
> > the set of Optimize options.  Inlining is an IPA task so
> > "per function" parameters do not make sense at all.
> 
> It does work as I expect for other inlining options (e.g., for
> -finline-functions or -finline-functions-called-once) so either
> there's something special/different about -finline-limit or that
> it works for the others is an accident.  Can you please clarify
> which it is?

It works for the others because those flags can be properly
tested per function.  Since -finline-limit just maps to
some --params and those are not per function it cannot work
similarly for that.  Adding 'Optimization' to -finline-limit
will just cause that flag to be kept per function without
any effect given nothing tests that flag but only the params.

Consider

#pragma GCC optimize("-finline-limit=10")

void foo () { }

#pragma GCC optimize("-finline-limit=100")

where we'll likely just have the 100 limit at the end.

I'd rather _not_ expose this kind of brokeness to the user.

> Either way, wherever they come into play, the limitations you
> refer to should be mentioned in the GCC warning when an option
> that can't or may not work is used in attribute or #pragma
> optimize, and it should also be explained in the manual.  How
> else are users supposed to know?

They know because the option doesn't work there but only
on the command line?

> As it is, since -finline-limit is listed among optimization
> options, and since attribute and #pragma optimize say they
> accept optimization options, it's a bug that GCC rejects it
> as if it were invalid.

That's a documentation bug then.  Specifially it says

"This pragma allows you to set global optimization options for functions
defined later in the source file."

that's not very specific and I don't read into it that you
can add all and only those options listed in the
"Optimization Options" list.  After all there's no cross-reference.
Specifically it says "global" optimization options -- which is
a subset of "optimization options".  I'd have expected
"local" optimization options here but that's besides the point.

I'm fine with documenting for -finline-limit that it cannot be used
in the pragma or attribute but I expect others to be in the same
category.

Richard.


Re: [PATCH] Fix C++ ref op= ICE in the gimplifier with -g (PR c++/84704)

2018-03-05 Thread Richard Biener
On Tue, 6 Mar 2018, Jakub Jelinek wrote:

> On Tue, Mar 06, 2018 at 02:28:40AM -0300, Alexandre Oliva wrote:
> > On Mar  5, 2018, Jakub Jelinek  wrote:
> > 
> > >   * tree.c (stabilize_reference_1): Return save_expr (e) for
> > >   STATEMENT_LIST even if it doesn't have side-effects.
> > 
> > I'm concerned about codegen differences in case the statement list is
> > created because of debug stmts not present in a nondebug compile.
> > Wouldn't this make the nondebug compile not have a save_expr at all,
> > which would then likely cause different code to be generated with the
> > save_expr in one case and no save_expr in the other?  I guess the
> > compare-debug would have flagged any such differences, so I suppose my
> > concerns are unfounded, and in case they aren't, we'll soon find out ;-)
> > So, no objections from me; thanks for looking into this!
> 
> Yes, I was concerned, but couldn't create a testcase that would fail with
> -fcompare-debug nor the bootstrap has found anything.
> 
> And note that the concerns weren't primarily about the SAVE_EXPRs, but about
> the preexisting gimplification of the STATEMENT_LIST vs. gimplification of
> just the last statement of it, where the former creates retval.N temporary
> and thus if we managed to preserve some retval.M temporary till the final
> dumps, it would be different for -fcompare-debug.  If we ever hit that
> issue, one way out of it would be to gimplify STATEMENT_LIST without
> TREE_SIDE_EFFECTS that contain only some optional DEBUG_BEGIN_STMTs and
> a single other stmt differently, either but not generating the temporary
> at all, or by forcing it to be nameless.
> 
> And SAVE_EXPRs can in theory have the same issue, they use
> get_initialized_tmp_var with !allow_ssa, for ({ 0; }) and similar that
> creates a nameless temporary decl, which is what we want and shouldn't
> affect anything, but if we have say ({ i; }) or ({ &j; }), then those can
> get a name.  So, we'd need to force creation of non-cached (lookup_tmp_var)
> nameless temporary for SAVE_EXPRs that contain the above described
> STATEMENT_LISTs.
> 
> Maybe a better fix if we run into this would be to make create_tmp_var_raw
> create DECL_NAMELESS decls and make sure we don't print names of
> DECL_NAMELESS variables in -fdump-final-insns= dumps?

They are already DECL_IGNORED_P, that should include the effects of
DECL_NAMELESS if I read both documentations in tree.h correctly.

> I mean the fancy names are causing us trouble all the time, e.g. the SRA
> fancy names that later get concatenated to other fancy names and sometimes
> contains uids...

Yeah...  so lets not dump names of DECL_IGNORED_P | DECL_NAMELESS decls.

Richard.