date:20190308

Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-08 Thread H.J. Lu

On Thu, Mar 7, 2019 at 4:09 PM Jakub Jelinek  wrote:
>
> On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote:
> > > +(define_insn "*avx512f_load_mask"
> > > +  [(set (match_operand: 0 "register_operand" "=v")
> > > +   (vec_merge:
> > > + (vec_merge:
> > > +   (vec_duplicate:
> > > + (match_operand:MODEF 1 "memory_operand" "m"))
> > > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > > +   (match_operand:QI 3 "nonmemory_operand" "Yk"))
> >
> > Is there a reason to have nonmemory_operand predicate here instead of
> > register_operand?
>
> Thanks for catching that up, that was from my earlier attempt to have
> Yk,n constraints and deal with that during output.  For store it was
> possible, for others there were some cases it couldn't handle but further
> testing revealed that the combiner already handles most of the constant
> mask cases right.
>
> Here is updated patch, I've changed this in two spots.  It even improves the
> constant 1 case (the only one that is still not optimized as much as it
> should):
>  f4:
> -   movzbl  .LC0(%rip), %eax
> +   movl$1, %eax
> kmovw   %eax, %k1
> vmovsd  (%rsi), %xmm0{%k1}{z}
> ret
> Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c
> and compiling/eyeballing differences on the short testcase I've posted
> in the description with also the u, -> 1, and u, -> 0, changes, appart
> from the above f4 no differences.
>
> Ok for trunk if it passes another full bootstrap/regtest?
>
> 2019-03-07  Jakub Jelinek  
>
> PR target/89602
> * config/i386/sse.md (avx512f_mov_mask,
> *avx512f_load_mask, avx512f_store_mask): New define_insns.
> (avx512f_load_mask): New define_expand.
> * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
> __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
> __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
> __builtin_ia32_movess_mask): New builtins.
> * config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
> _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
> _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
> _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89630

This looks very strange since this patch only touched backend.

-- 
H.J.

V5 [PATCH] Optimize vector constructor

2019-03-08 Thread H.J. Lu

On Thu, Mar 7, 2019 at 9:51 AM H.J. Lu  wrote:
>
> On Wed, Mar 6, 2019 at 8:33 PM Richard Biener
>  wrote:
> >
> > On Wed, Mar 6, 2019 at 8:46 AM H.J. Lu  wrote:
> > >
> > > On Tue, Mar 5, 2019 at 1:46 AM H.J. Lu  wrote:
> > > >
> > > > On Mon, Mar 04, 2019 at 12:55:04PM +0100, Richard Biener wrote:
> > > > > On Sun, Mar 3, 2019 at 10:13 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Sun, Mar 03, 2019 at 06:40:09AM -0800, Andrew Pinski wrote:
> > > > > > > )
> > > > > > > ,On Sun, Mar 3, 2019 at 6:32 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > For vector init constructor:
> > > > > > > >
> > > > > > > > ---
> > > > > > > > typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> > > > > > > >
> > > > > > > > __v4sf
> > > > > > > > foo (__v4sf x, float f)
> > > > > > > > {
> > > > > > > >   __v4sf y = { f, x[1], x[2], x[3] };
> > > > > > > >   return y;
> > > > > > > > }
> > > > > > > > ---
> > > > > > > >
> > > > > > > > we can optimize vector init constructor with vector copy or 
> > > > > > > > permute
> > > > > > > > followed by a single scalar insert:
> > > >
> > > > > and you want to advance to the _1 = BIT_INSERT_EXPR here.  The 
> > > > > easiest way
> > > > > is to emit a new stmt for _2 = copy ...; and do the set_rhs with the
> > > > > BIT_INSERT_EXPR.
> > > >
> > > > Thanks for BIT_INSERT_EXPR suggestion.  I am testing this patch.
> > > >
> > > >
> > > > H.J.
> > > > ---
> > > > We can optimize vector constructor with vector copy or permute followed
> > > > by a single scalar insert:
> > > >
> > > >   __v4sf y;
> > > >   __v4sf D.1930;
> > > >   float _1;
> > > >   float _2;
> > > >   float _3;
> > > >
> > > >:
> > > >   _1 = BIT_FIELD_REF ;
> > > >   _2 = BIT_FIELD_REF ;
> > > >   _3 = BIT_FIELD_REF ;
> > > >   y_6 = {f_5(D), _3, _2, _1};
> > > >   return y_6;
> > > >
> > > > with
> > > >
> > > >  __v4sf y;
> > > >   __v4sf D.1930;
> > > >   float _1;
> > > >   float _2;
> > > >   float _3;
> > > >   vector(4) float _8;
> > > >
> > > >:
> > > >   _1 = BIT_FIELD_REF ;
> > > >   _2 = BIT_FIELD_REF ;
> > > >   _3 = BIT_FIELD_REF ;
> > > >   _8 = x_9(D);
> > > >   y_6 = BIT_INSERT_EXPR ;
> > > >   return y_6;
> > > >
> > > > gcc/
> > > >
> > > > PR tree-optimization/88828
> > > > * tree-ssa-forwprop.c (simplify_vector_constructor): Optimize
> > > > vector init constructor with vector copy or permute followed
> > > > by a single scalar insert.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR tree-optimization/88828
> > > > * gcc.target/i386/pr88828-1a.c: New test.
> > > > * gcc.target/i386/pr88828-2b.c: Likewise.
> > > > * gcc.target/i386/pr88828-2.c: Likewise.
> > > > * gcc.target/i386/pr88828-3a.c: Likewise.
> > > > * gcc.target/i386/pr88828-3b.c: Likewise.
> > > > * gcc.target/i386/pr88828-3c.c: Likewise.
> > > > * gcc.target/i386/pr88828-3d.c: Likewise.
> > > > * gcc.target/i386/pr88828-4a.c: Likewise.
> > > > * gcc.target/i386/pr88828-4b.c: Likewise.
> > > > * gcc.target/i386/pr88828-5a.c: Likewise.
> > > > * gcc.target/i386/pr88828-5b.c: Likewise.
> > > > * gcc.target/i386/pr88828-6a.c: Likewise.
> > > > * gcc.target/i386/pr88828-6b.c: Likewise.
> > >
> > > Here is the updated patch with run-time tests.
> >
> > -  if (TREE_CODE (elt->value) != SSA_NAME)
> > +  if (TREE_CODE (ce->value) != SSA_NAME)
> > return false;
> >
> > hmm, so it doesn't allow { 0, v[1], v[2], v[3] }?  I think the single
> > scalar value can be a constant as well.
>
> Fixed.
>
> >if (!def_stmt)
> > -   return false;
> > +   {
> > + if (gimple_nop_p (SSA_NAME_DEF_STMT (ce->value)))
> >
> > if (SSA_NAME_IS_DEFAULT_DEF (ce->value))
> >
> > +   {
> >
> > also you seem to disallow
> >
> >   { i + 1, v[1], v[2], v[3] }
>
> Fixed by
>
>  if (code != BIT_FIELD_REF)
> {
>   /* Only allow one scalar insert.  */
>   if (nscalars != 0)
> return false;
>
>   nscalars = 1;
>   insert = true;
>   scalar_idx = i;
>   sel.quick_push (i);
>   scalar_element = ce->value;
>   continue;
> }
>
> > because get_prop_source_stmt will return the definition computing
> > i + 1 in this case and your code will be skipped?
> >
> > I think you can simplify the code by treating scalar_element != NULL
> > as nscalars == 1 and eliding nscalars.
>
> It works only if
>
> TYPE_VECTOR_SUBPARTS (type).to_constant ()  == (nscalars + nvectors)
>
> We need to check both nscalars and nvectors.  Elide nscalar
> check doesn't help much here.
>
> > -  if (conv_code == ERROR_MARK)
> > +  if (conv_code == ERROR_MARK && !insert)
> > gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig[0],
> > orig[1], op2);
> >else
> > @@ -2148,10 +2198,25 @@ simplify_vector_constructor (gimple_stmt_ite

Re: [PATCH] bring netbsd/arm support up to speed. eabi, etc.

2019-03-08 Thread coypu

Ping.

Link for possible convenience :-)
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01899.html

[PATCH, i386]: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics.

2019-03-08 Thread Uros Bizjak

Attached patch adds missing _mm_loadu_si64 and _mm_storeu_si64 intrinsics.

2019-03-08  Uroš Bizjak  

PR target/68924
PR target/78782
PR target/87558
* config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic.
(_mm_storeu_si64): Ditto.

testsuite/ChangeLog:

2019-03-08  Uroš Bizjak  

PR target/68924
PR target/78782
PR target/87558
* gcc.target/i386/pr78782.c: New test.
* gcc.target/i386/pr87558.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

HJ, are these changes OK?

Uros.
From 3441130f8c6115cada8fe0e235deb2860d4f0088 Mon Sep 17 00:00:00 2001
From: Uros Bizjak 
Date: Fri, 8 Mar 2019 10:55:36 +0100
Subject: [PATCH] x86: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

2019-03-08  Uroš Bizjak  

	PR target/68924
	PR target/78782
	PR target/87558
	* config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic.
	(_mm_storeu_si64): Ditto.

testsuite/ChangeLog:

2019-03-08  Uroš Bizjak  

	PR target/68924
	PR target/78782
	PR target/87558
	* gcc.target/i386/pr78782.c: New test.
	* gcc.target/i386/pr87558.c: Ditto.
---
 gcc/config/i386/emmintrin.h | 12 
 gcc/testsuite/gcc.target/i386/pr78782.c |  9 +
 gcc/testsuite/gcc.target/i386/pr87558.c |  9 +
 3 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr78782.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87558.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index d9bc3f7f28f6..f9e7b33b0ddf 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -709,6 +709,12 @@ _mm_loadl_epi64 (__m128i_u const *__P)
   return _mm_set_epi64 ((__m64)0LL, *(__m64_u *)__P);
 }
 
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_loadu_si64 (void const *__P)
+{
+  return _mm_loadl_epi64 ((__m128i_u *)__P);
+}
+
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_store_si128 (__m128i *__P, __m128i __B)
 {
@@ -727,6 +733,12 @@ _mm_storel_epi64 (__m128i_u *__P, __m128i __B)
   *(__m64_u *)__P = (__m64) ((__v2di)__B)[0];
 }
 
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_storeu_si64 (void *__P, __m128i __B)
+{
+  _mm_storel_epi64 ((__m128i_u *)__P, __B);
+}
+
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_movepi64_pi64 (__m128i __B)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr78782.c b/gcc/testsuite/gcc.target/i386/pr78782.c
new file mode 100644
index ..e91d953dba92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr78782.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#include 
+
+__m128i foo (unsigned char *p)
+{
+  return _mm_loadu_si64 ((void *)p);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87558.c b/gcc/testsuite/gcc.target/i386/pr87558.c
new file mode 100644
index ..c210507d04ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87558.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#include 
+
+void foo (unsigned char *p, __m128i x)
+{
+  _mm_storeu_si64 ((void *)p, x);
+}
-- 
2.20.1

Re: [PATCH] Fix up diagnostics in gimple-ssa-warn-alloca.c

2019-03-08 Thread Richard Biener

On Thu, 7 Mar 2019, Jakub Jelinek wrote:

> Hi!
> 
> When looking at the diagnostics PRs, I've noticed that several diagnostic 
> calls
> in gimple-ssa-warn-alloca.c use G_(...) uselessly, it is only needed if the
> argument is not a string literal.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> verified the messages are unmodified in gcc.pot, ok for trunk?

OK.

Richard.

> 2019-03-07  Jakub Jelinek  
> 
>   * gimple-ssa-warn-alloca.c (pass_walloca::execute): Don't wrap
>   warning_at or inform messages in G_() if there is no ?:.
> 
> --- gcc/gimple-ssa-warn-alloca.c.jj   2019-01-01 12:37:18.193957952 +0100
> +++ gcc/gimple-ssa-warn-alloca.c  2019-03-07 16:43:30.308166042 +0100
> @@ -528,7 +528,7 @@ pass_walloca::execute (function *fun)
>   }
> else if (warn_alloca)
>   {
> -   warning_at (loc, OPT_Walloca, G_("use of %"));
> +   warning_at (loc, OPT_Walloca, "use of %");
> continue;
>   }
> else if (warn_alloca_limit < 0)
> @@ -571,8 +571,8 @@ pass_walloca::execute (function *fun)
>   && t.limit != 0)
> {
>   print_decu (t.limit, buff);
> - inform (loc, G_("limit is %wu bytes, but argument "
> - "may be as large as %s"),
> + inform (loc, "limit is %wu bytes, but argument "
> +  "may be as large as %s",
>   is_vla ? warn_vla_limit : adjusted_alloca_limit,
>   buff);
> }
> @@ -588,7 +588,7 @@ pass_walloca::execute (function *fun)
>   && t.limit != 0)
> {
>   print_decu (t.limit, buff);
> - inform (loc, G_("limit is %wu bytes, but argument is %s"),
> + inform (loc, "limit is %wu bytes, but argument is %s",
> is_vla ? warn_vla_limit : adjusted_alloca_limit,
> buff);
> }
> @@ -606,7 +606,7 @@ pass_walloca::execute (function *fun)
> break;
>   case ALLOCA_IN_LOOP:
> gcc_assert (!is_vla);
> -   warning_at (loc, wcode, G_("use of % within a loop"));
> +   warning_at (loc, wcode, "use of % within a loop");
> break;
>   case ALLOCA_CAST_FROM_SIGNED:
> gcc_assert (invalid_casted_type != NULL_TREE);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Only set TREE_NO_WARNING after warning if warning returned true (PR tree-optimization/89550)

2019-03-08 Thread Richard Biener

On Thu, 7 Mar 2019, Jakub Jelinek wrote:

> Hi!
> 
> While looking at why we didn't warn before Richi's commit, I've discovered
> is that the change is that in a different pass we used to call c_strlen too,
> but at that point with a location from system headers, so no warning has
> been printed, but TREE_NO_WARNING has been set anyway and later on when
> c_strlen was called again with a location not from system header, we
> wouldn't warn because TREE_NO_WARNING is set.
> 
> Apparently there are quite a few spots that have similar bug (and quite a
> few spots that do it right).
> 
> The following patch fixes what I found.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-03-07  Jakub Jelinek  
> 
>   PR tree-optimization/89550
>   * builtins.c (c_strlen): Only set TREE_NO_WARNING if warning_at
>   returned true.  Formatting fixes.
>   (expand_builtin_strnlen): Formatting fixes.
>   * tree-vrp.c (vrp_prop::check_mem_ref): Only set TREE_NO_WARNING
>   if warning_at returned true.
>   * tree-cfg.c (pass_warn_function_return::execute): Likewise.
> c-family/
>   * c-common.c (c_common_truthvalue_conversion): Only set
>   TREE_NO_WARNING if warning_at returned true.
>   * c-warn.c (overflow_warning, warn_logical_operator): Likewise.
> c/
>   * c-decl.c (finish_function): Only set TREE_NO_WARNING if warning_at
>   returned true.
>   (c_write_global_declarations_1): Only set TREE_NO_WARNING if pedwarn
>   or warning returned true.
> cp/
>   * semantics.c (maybe_convert_cond): Only set TREE_NO_WARNING if
>   warning_at returned true.
>   * decl2.c (c_parse_final_cleanups): Likewise.
>   * typeck.c (convert_for_assignment): Likewise.
>   * decl.c (finish_function): Likewise.
> 
> --- gcc/builtins.c.jj 2019-03-05 17:21:42.536993265 +0100
> +++ gcc/builtins.c2019-03-07 12:01:23.114479128 +0100
> @@ -760,15 +760,13 @@ c_strlen (tree src, int only_value, c_st
>   runtime.  */
>if (eltoff < 0 || eltoff >= maxelts)
>  {
> - /* Suppress multiple warnings for propagated constant strings.  */
> +  /* Suppress multiple warnings for propagated constant strings.  */
>if (only_value != 2
> -   && !TREE_NO_WARNING (src))
> -{
> -   warning_at (loc, OPT_Warray_bounds,
> -   "offset %qwi outside bounds of constant string",
> -   eltoff);
> -  TREE_NO_WARNING (src) = 1;
> -}
> +   && !TREE_NO_WARNING (src)
> +   && warning_at (loc, OPT_Warray_bounds,
> +  "offset %qwi outside bounds of constant string",
> +  eltoff))
> + TREE_NO_WARNING (src) = 1;
>return NULL_TREE;
>  }
>  
> @@ -3099,7 +3097,7 @@ expand_builtin_strnlen (tree exp, rtx ta
>"%K%qD specified bound %E "
>"exceeds maximum object size %E",
>exp, func, bound, maxobjsize))
> -   TREE_NO_WARNING (exp) = true;
> + TREE_NO_WARNING (exp) = true;
>  
>bool exact = true;
>if (!len || TREE_CODE (len) != INTEGER_CST)
> @@ -3158,7 +3156,7 @@ expand_builtin_strnlen (tree exp, rtx ta
>"%K%qD specified bound [%wu, %wu] "
>"exceeds maximum object size %E",
>exp, func, min.to_uhwi (), max.to_uhwi (), maxobjsize))
> -  TREE_NO_WARNING (exp) = true;
> +TREE_NO_WARNING (exp) = true;
>  
>bool exact = true;
>if (!len || TREE_CODE (len) != INTEGER_CST)
> --- gcc/tree-vrp.c.jj 2019-02-01 09:43:40.264757408 +0100
> +++ gcc/tree-vrp.c2019-03-07 12:05:05.959845856 +0100
> @@ -4749,7 +4749,8 @@ vrp_prop::check_mem_ref (location_t loca
>if (warned && DECL_P (arg))
>   inform (DECL_SOURCE_LOCATION (arg), "while referencing %qD", arg);
>  
> -  TREE_NO_WARNING (ref) = 1;
> +  if (warned)
> + TREE_NO_WARNING (ref) = 1;
>return;
>  }
>  
> @@ -4762,11 +4763,10 @@ vrp_prop::check_mem_ref (location_t loca
>  {
>HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi ();
>  
> -  warning_at (location, OPT_Warray_bounds,
> -   "intermediate array offset %wi is outside array bounds "
> -   "of %qT",
> -   tmpidx,  reftype);
> -  TREE_NO_WARNING (ref) = 1;
> +  if (warning_at (location, OPT_Warray_bounds,
> +   "intermediate array offset %wi is outside array bounds "
> +   "of %qT", tmpidx, reftype))
> + TREE_NO_WARNING (ref) = 1;
>  }
>  }
>  
> --- gcc/tree-cfg.c.jj 2019-02-22 23:02:47.768118220 +0100
> +++ gcc/tree-cfg.c2019-03-07 12:02:52.662019138 +0100
> @@ -9328,9 +9328,9 @@ pass_warn_function_return::execute (func
> location = gimple_location (last);
> if (LOCATION_LOCUS (location) == UNKNOWN_LOCATION)
>   location = fun->function_end_locus;
> -   warning_

Re: [PATCH] Fix some cases of "whatever " on one line and " something" on the next one in diagnostics (PR other/80058)

2019-03-08 Thread Richard Biener

On Thu, 7 Mar 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following patch fixes a couple of cases where two or more spaces
> are introduced in a middle of diagnostic message because we've split source
> line and left a space both at the end of one line and at the start of next
> one.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

Richard.

> 2019-03-07  Jakub Jelinek  
> 
>   PR other/80058
>   * lra-constraints.c (process_alt_operands): Avoid one space before
>   " at the end of line and another after " on another line in a string
>   literal.
>   * attribs.c (handle_dll_attribute): Likewise.
>   * config/avr/avr-devices.c (avr_texinfo): Likewise.
> cp/
>   * parser.c (cp_parser_template_declaration_after_parameters): Avoid
>   one space before " at the end of line and another after " on another
>   line in a string literal.
> fortran/
>   * arith.c (gfc_complex2complex): Avoid two spaces in the middle of
>   diagnostics.
>   * resolve.c (resolve_allocate_expr): Likewise.
> 
> --- gcc/lra-constraints.c.jj  2019-02-20 22:14:42.288643681 +0100
> +++ gcc/lra-constraints.c 2019-03-07 16:58:59.121039574 +0100
> @@ -2681,7 +2681,7 @@ process_alt_operands (int only_alternati
> if (lra_dump_file != NULL)
>   fprintf (lra_dump_file,
>"alt=%d: reload pseudo for op %d "
> -  " cannot hold the mode value -- refuse\n",
> +  "cannot hold the mode value -- refuse\n",
>nalt, nop);
> goto fail;
>   }
> --- gcc/attribs.c.jj  2019-03-05 14:38:14.447414660 +0100
> +++ gcc/attribs.c 2019-03-07 16:57:36.600383417 +0100
> @@ -1664,7 +1664,7 @@ handle_dll_attribute (tree * pnode, tree
> && DECL_DECLARED_INLINE_P (node))
>   {
> warning (OPT_Wattributes, "inline function %q+D declared as "
> -   " dllimport: attribute ignored", node);
> +   "dllimport: attribute ignored", node);
> *no_add_attrs = true;
>   }
>/* Like MS, treat definition of dllimported variables and
> --- gcc/config/avr/avr-devices.c.jj   2019-01-01 12:37:28.987780853 +0100
> +++ gcc/config/avr/avr-devices.c  2019-03-07 17:07:31.992688170 +0100
> @@ -76,7 +76,7 @@ avr_texinfo[] =
>  "the @code{MOVW} instruction." },
>{ ARCH_AVR3,
>  "``Classic'' devices with 16@tie{}KiB up to 64@tie{}KiB of "
> -" program memory." },
> +"program memory." },
>{ ARCH_AVR31,
>  "``Classic'' devices with 128@tie{}KiB of program memory." },
>{ ARCH_AVR35,
> --- gcc/cp/parser.c.jj2019-03-07 10:06:49.0 +0100
> +++ gcc/cp/parser.c   2019-03-07 17:02:12.514890165 +0100
> @@ -27865,7 +27865,7 @@ cp_parser_template_declaration_after_par
> if (cxx_dialect > cxx17)
>   error ("literal operator template %qD has invalid parameter list;"
>  "  Expected non-type template parameter pack  "
> -"  or single non-type parameter of class type",
> +"or single non-type parameter of class type",
>  decl);
> else
>   error ("literal operator template %qD has invalid parameter list."
> --- gcc/fortran/arith.c.jj2019-02-25 10:12:55.454061762 +0100
> +++ gcc/fortran/arith.c   2019-03-07 17:06:09.267034995 +0100
> @@ -2472,7 +2472,7 @@ gfc_complex2complex (gfc_expr *src, int
>int w = warn_conversion ? OPT_Wconversion : OPT_Wconversion_extra;
>  
>gfc_warning_now (w, "Change of value in conversion from "
> -" %qs to %qs at %L",
> +"%qs to %qs at %L",
>  gfc_typename (&src->ts), gfc_typename (&result->ts),
>  &src->where);
>did_warn = true;
> --- gcc/fortran/resolve.c.jj  2019-03-04 10:22:33.985168769 +0100
> +++ gcc/fortran/resolve.c 2019-03-07 17:06:38.781554480 +0100
> @@ -7798,7 +7798,7 @@ resolve_allocate_expr (gfc_expr *e, gfc_
>   if (mpz_cmp_si (ar->start[i]->value.integer, 1) < 0)
> {
>   gfc_error ("Upper cobound is less than lower cobound "
> -" of 1 at %L", &ar->start[i]->where);
> +"of 1 at %L", &ar->start[i]->where);
>   goto failure;
> }
> }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH, i386]: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics.

2019-03-08 Thread H.J. Lu

Looks good to me.

Thanks.

On Fri, Mar 8, 2019, 6:04 PM Uros Bizjak  wrote:

> Attached patch adds missing _mm_loadu_si64 and _mm_storeu_si64 intrinsics.
>
> 2019-03-08  Uroš Bizjak  
>
> PR target/68924
> PR target/78782
> PR target/87558
> * config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic.
> (_mm_storeu_si64): Ditto.
>
> testsuite/ChangeLog:
>
> 2019-03-08  Uroš Bizjak  
>
> PR target/68924
> PR target/78782
> PR target/87558
> * gcc.target/i386/pr78782.c: New test.
> * gcc.target/i386/pr87558.c: Ditto.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> HJ, are these changes OK?
>
> Uros.
>

Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)

2019-03-08 Thread Richard Biener

On Thu, Mar 7, 2019 at 8:22 PM Jakub Jelinek  wrote:
>
> Hi!
>
> We have -fconstexpr-loop-limit= option to have an upper bound for constexpr
> evaluation of a single loop.  Even with that limit in place, if we have
> several nested loops during constexpr evaluation, even when each could have
> a few hundred iterations, the whole loop nest could take years to evaluate.
> And the loops can be split across several constexpr function calls even.
>
> The following patch adds another, slightly larger, limit on total number of
> iterations in the whole loop nest.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Shouldn't we somehow limit the number of stmt evaluations instead?
I can imagine using constexpr templates you can do "loops" easily
without actually writing loops ...

Richard.

> 2019-03-07  Jakub Jelinek  
>
> PR c++/87481
> * doc/invoke.texi (-fconstexpr-loop-nest-limit=): Document.
> (-fconstexpr-loop-limit=): Slightly adjust wording.
>
> * c.opt (-fconstexpr-loop-nest-limit=): New option.
> (-fconstexpr-loop-limit=): Slightly adjust description.
>
> * constexpr.c (cxx_eval_loop_expr): Count also number of iterations
> of all nested loops and punt if it is above constexpr_loop_nest_limit.
>
> * g++.dg/cpp1y/constexpr-87481.C: New test.
>
> --- gcc/doc/invoke.texi.jj  2019-03-02 09:03:25.822755663 +0100
> +++ gcc/doc/invoke.texi 2019-03-07 14:25:34.881197149 +0100
> @@ -210,7 +210,7 @@ in the following sections.
>  @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
>  -faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t  -fcheck-new @gol
>  -fconstexpr-depth=@var{n}  -fconstexpr-loop-limit=@var{n} @gol
> --fno-elide-constructors @gol
> +-fconstexpr-loop-nest-limit=@var{n} -fno-elide-constructors @gol
>  -fno-enforce-eh-specs @gol
>  -fno-gnu-keywords @gol
>  -fno-implicit-templates @gol
> @@ -2522,10 +2522,19 @@ is 512.
>
>  @item -fconstexpr-loop-limit=@var{n}
>  @opindex fconstexpr-loop-limit
> -Set the maximum number of iterations for a loop in C++14 constexpr functions
> -to @var{n}.  A limit is needed to detect infinite loops during
> +Set the maximum number of iterations for a single loop in C++14 constexpr
> +functions to @var{n}.  A limit is needed to detect infinite loops during
>  constant expression evaluation.  The default is 262144 (1<<18).
>
> +@item -fconstexpr-loop-nest-limit=@var{n}
> +@opindex fconstexpr-loop-nest-limit
> +Set the maximum number of iterations for all loops in a loop nest in C++14
> +constexpr functions to @var{n}.  Even when number of iterations of a single
> +loop is limited with the above limit, if there are several nested loops and
> +each of them has many iterations but still smaller than the above limit,
> +the resulting evaluation of the loop nest might take too long.
> +The default is 1048576 (1<<20).
> +
>  @item -fdeduce-init-list
>  @opindex fdeduce-init-list
>  Enable deduction of a template type parameter as
> --- gcc/c-family/c.opt.jj   2019-02-25 23:56:58.149055806 +0100
> +++ gcc/c-family/c.opt  2019-03-07 14:25:07.782639790 +0100
> @@ -1414,7 +1414,11 @@ C++ ObjC++ Joined RejectNegative UIntege
>
>  fconstexpr-loop-limit=
>  C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_limit) 
> Init(262144)
> --fconstexpr-loop-limit=Specify maximum constexpr loop 
> iteration count.
> +-fconstexpr-loop-limit=Specify maximum constexpr loop 
> iteration count of a single loop.
> +
> +fconstexpr-loop-nest-limit=
> +C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_nest_limit) 
> Init(1048576)
> +-fconstexpr-loop-nest-limit=   Specify maximum constexpr loop 
> iteration count for all nested loops.
>
>  fdebug-cpp
>  C ObjC C++ ObjC++
> --- gcc/cp/constexpr.c.jj   2019-03-06 19:45:40.360751756 +0100
> +++ gcc/cp/constexpr.c  2019-03-07 14:03:00.040329107 +0100
> @@ -4190,8 +4190,20 @@ cxx_eval_loop_expr (const constexpr_ctx
>  default:
>gcc_unreachable ();
>  }
> +
> +  /* True on entry to cxx_eval_loop_expr if called indirectly from another
> + cxx_eval_loop_expr.  */
> +  static bool constexpr_loop_nested;
> +
> +  /* Number of evaluated loop iterations in the whole current loop nest.  */
> +  static int constexpr_loop_nest_count;
> +
> +  bool save_constexpr_loop_nested = constexpr_loop_nested;
> +  constexpr_loop_nested = true;
> +
>hash_set save_exprs;
>new_ctx.save_exprs = &save_exprs;
> +
>do
>  {
>if (count != -1)
> @@ -4248,6 +4260,24 @@ cxx_eval_loop_expr (const constexpr_ctx
>   *non_constant_p = true;
>   break;
> }
> +
> +  /* In nested loops, don't count the first iteration, as it has been
> +counted in the parent loop already.  */
> +  if (count > (save_constexpr_loop_nested ? 1 : 0))
> +   {
> + if (++constexpr_loop_nest_count >= constexpr_loop_nest_limit)
> +   {
> + if (!ctx->quiet)
> +

Re: [PATCH] Two diagnostics fixes for ipa-devirt.c (PR ipa/80000, PR target/85665)

2019-03-08 Thread Richard Biener

On Thu, Mar 7, 2019 at 8:32 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch fixes two diagnostics issues in ipa-devirt.c, one
> is trailing space in one warning, the other is lack of articles in another
> warning, both reported by the translation team.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

RIchard.

> 2019-03-07  Jakub Jelinek  
>
> PR ipa/8
> * ipa-devirt.c (compare_virtual_tables): Remove two trailing spaces
> from diagnostics.  Formatting fixes.
>
> PR target/85665
> * ipa-devirt.c (odr_types_equivalent_p): Fix grammar in
> warn_odr diagnostics.
>
> --- gcc/ipa-devirt.c.jj 2019-01-10 11:43:14.380377810 +0100
> +++ gcc/ipa-devirt.c2019-03-07 16:21:14.641937211 +0100
> @@ -842,17 +842,16 @@ compare_virtual_tables (varpool_node *pr
> {
>   class_type->odr_violated = true;
>   auto_diagnostic_group d;
> - if (warning_at (DECL_SOURCE_LOCATION
> -   (TYPE_NAME (DECL_CONTEXT (vtable->decl))),
> - OPT_Wodr,
> + tree ctx = TYPE_NAME (DECL_CONTEXT (vtable->decl));
> + if (warning_at (DECL_SOURCE_LOCATION (ctx), OPT_Wodr,
>   "virtual table of type %qD violates "
> - "one definition rule  ",
> + "one definition rule",
>   DECL_CONTEXT (vtable->decl)))
> {
> - inform (DECL_SOURCE_LOCATION
> -   (TYPE_NAME (DECL_CONTEXT (prevailing->decl))),
> - "the conflicting type defined in another 
> translation "
> - "unit has virtual table of different size");
> + ctx = TYPE_NAME (DECL_CONTEXT (prevailing->decl));
> + inform (DECL_SOURCE_LOCATION (ctx),
> + "the conflicting type defined in another 
> translation"
> + " unit has virtual table of different size");
> }
> }
>   return;
> @@ -1607,7 +1606,8 @@ odr_types_equivalent_p (tree t1, tree t2
> if (DECL_BIT_FIELD (f1) != DECL_BIT_FIELD (f2))
>   {
> warn_odr (t1, t2, f1, f2, warn, warned,
> - G_("one field is bitfield while other is not"));
> + G_("one field is a bitfield while the other "
> +"is not"));
> return false;
>   }
> else
>
> Jakub

Re: [PATCH] Fix a config/s390/s390.c diagnostics bug (PR target/79846)

2019-03-08 Thread Richard Biener

On Thu, Mar 7, 2019 at 8:53 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, using HOST_WIDE_INT_PRINT_* in the middle of
> translatable message is highly undesirable, we end up with:
> #: config/s390/s390.c:737
> #, gcc-internal-format
> msgid "constant argument %d for builtin %qF is out of range (0.."
> msgstr ""
>
> #: config/s390/s390.c:754
> #, gcc-internal-format
> msgid "constant argument %d for builtin %qF is out of range ("
> msgstr ""
> in gcc.pot that way and nothing is translated.
>
> The following patch should fix that by using proper %wu/%wd.
> Tested by building a cross-compiler to s390x-linux, ok for trunk?

OK.

Richard.

> 2019-03-07  Jakub Jelinek  
>
> PR target/79846
> * config/s390/s390.c (s390_const_operand_ok): Use %wu instead of
> HOST_WIDE_INT_PRINT_UNSIGNED and %wd instead of
> HOST_WIDE_INT_PRINT_DEC.  Formatting fixes.
>
> --- gcc/config/s390/s390.c.jj   2019-02-18 20:48:32.873728534 +0100
> +++ gcc/config/s390/s390.c  2019-03-07 18:13:44.757949114 +0100
> @@ -734,10 +734,9 @@ s390_const_operand_ok (tree arg, int arg
>if (!tree_fits_uhwi_p (arg)
>   || tree_to_uhwi (arg) > (HOST_WIDE_INT_1U << bitwidth) - 1)
> {
> - error("constant argument %d for builtin %qF is out of range (0.."
> -   HOST_WIDE_INT_PRINT_UNSIGNED ")",
> -   argnum, decl,
> -   (HOST_WIDE_INT_1U << bitwidth) - 1);
> + error ("constant argument %d for builtin %qF is out of range "
> +"(0..%wu)", argnum, decl,
> +(HOST_WIDE_INT_1U << bitwidth) - 1);
>   return false;
> }
>  }
> @@ -751,12 +750,10 @@ s390_const_operand_ok (tree arg, int arg
>   || tree_to_shwi (arg) < -(HOST_WIDE_INT_1 << (bitwidth - 1))
>   || tree_to_shwi (arg) > ((HOST_WIDE_INT_1 << (bitwidth - 1)) - 1))
> {
> - error("constant argument %d for builtin %qF is out of range ("
> -   HOST_WIDE_INT_PRINT_DEC ".."
> -   HOST_WIDE_INT_PRINT_DEC ")",
> -   argnum, decl,
> -   -(HOST_WIDE_INT_1 << (bitwidth - 1)),
> -   (HOST_WIDE_INT_1 << (bitwidth - 1)) - 1);
> + error ("constant argument %d for builtin %qF is out of range "
> +"(%wd..%wd)", argnum, decl,
> +-(HOST_WIDE_INT_1 << (bitwidth - 1)),
> +(HOST_WIDE_INT_1 << (bitwidth - 1)) - 1);
>   return false;
> }
>  }
>
> Jakub

Re: [PATCH] rs6000: Fix lost ud chains in swap optimization

2019-03-08 Thread Richard Biener

On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt  wrote:
>
> Hi,
>
> We recently discovered a problem in swap optimization where the du- and 
> ud-chains
> were getting corrupted after a preliminary modification phase and prior to the
> main body of the pass.  The fix for this is to rebuild the chains between 
> phases.

It looks expensive - is it possible to keep them up-to-date instead?

> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> I've not included a test case because the problem tends to get lost in 
> reduction,
> and may shift over time anyway.  Is this okay for trunk, and eventual backport
> to 8 and 7?
>
> Thanks!
>
> Bill
>
>
> 2019-03-07  Bill Schmidt  
>
> * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild
> ud- and du-chains between phases.
>
>
> Index: gcc/config/rs6000/rs6000-p8swap.c
> ===
> --- gcc/config/rs6000/rs6000-p8swap.c   (revision 269471)
> +++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
> @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun)
>
>/* Pre-pass to recombine lvx and stvx patterns so we don't lose info.  */
>recombine_lvx_stvx_patterns (fun);
> +
> +  /* Rebuild ud- and du-chains.  */
> +  df_remove_problem (df_chain);
>df_process_deferred_rescans ();
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>
>/* Allocate structure to represent webs of insns.  */
>insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ());
>

Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref

2019-03-08 Thread Richard Biener

On Fri, Mar 8, 2019 at 2:40 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR88497 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88497),
> when we meet some code pattern like:
>
>V1[0] + V1[1] + ... + V1[k] + V2[0] + ... + V2[k] + ... Vn[k]
>// V1...Vn of VECTOR_TYPE
>
> We can teach reassoc to transform it to:
>
>Vs = (V1 + V2 + ... + Vn)
>Vs[0] + Vs[1] + ... + Vs[k]
>
> It saves addition and bit_field_ref operations and exposes more
> opportunities for downstream passes, I notice that even on one
> target doesn't support vector type and vector type gets expanded
> in veclower, it's still better to have it, since the generated
> sequence is more friendly for widening_mul.  （If one more time
> DCE after forwprop, it should be the same.  )
>
> Bootstrapped/regtested on powerpc64le-linux-gnu, ok for trunk?

Hmm.  You are basically vectorizing the reduction partially here (note
basic-block vectorization doesn't handle reductions yet).  So I'm not sure
whether we should eventually just change op sort order to sort
v1[1] after v2[0] to sort v1[0] and v2[0] together.  That would be also maybe
an intermediate step that makes implementing the vectorization part
"easier"?  I also imagine that the "vectorization" would be profitable even
if there's just two elements of the vectors summed and the real vectors are
larger.

Note that the code you transform contains no vector operations (just
element extracts) and thus you need to check for target support before
creating vector add.

This is for GCC 10.  Also it doesn't fix PR88497, does it?

Richard.

> Thanks in advance!
>
>
> gcc/ChangeLog
>
> 2019-03-08  Kewen Lin  
>
> PR target/88497
> * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of
> GIMPLE_BINARY_RHS check and gimple_visited_p check, call new
> function undistribute_bitref_for_vector.
> (undistribute_bitref_for_vector): New function.
> (cleanup_vinfo_map): Likewise.
> (unsigned_cmp): Likewise.
>
> gcc/testsuite/ChangeLog
>
> 2019-03-08  Kewen Lin  
>
> * gcc.dg/tree-ssa/pr88497.c: New test.
>
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497.c |  18 +++
>  gcc/tree-ssa-reassoc.c  | 274 
> +++-
>  2 files changed, 287 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
> new file mode 100644
> index 000..4d9ac67
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> +  v2df temp;
> +  temp = arg1[0] * arg2[0];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[1] * arg2[1];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[2] * arg2[2];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[3] * arg2[3];
> +  accumulator += temp[0] + temp[1];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index e1c4dfe..fc0e297 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -1772,6 +1772,263 @@ undistribute_ops_list (enum tree_code opcode,
>return changed;
>  }
>
> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> +- offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> +- ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
> +struct v_info
> +{
> +  auto_vec offsets;
> +  auto_vec ops_indexes;
> +};
> +
> +typedef struct v_info *v_info_ptr;
> +
> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
> +static int
> +unsigned_cmp (const void *p_i, const void *p_j)
> +{
> +  if (*(const unsigned *) p_i >= *(const unsigned *) p_j)
> +return 1;
> +  else
> +return -1;
> +}
> +
> +/* Cleanup hash map for VECTOR information.  */
> +static void
> +cleanup_vinfo_map (hash_map &info_map)
> +{
> +  for (hash_map::iterator it = info_map.begin ();
> +   it != info_map.end (); ++it)
> +{
> +  v_info_ptr info = (*it).second;
> +  delete info;
> +  (*it).second = NULL;
> +}
> +}
> +
> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
> + V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
> +   is transformed to
> + Vs = (V1 + V2 + ... + Vn)
> + Vs[0] + Vs[1] + ... + Vs[k]
> +
> +   The basic steps are listed below:
> +
> +1) Check the addition chain *OPS by looking those summands coming from
> +   VECTOR bit_field_ref on VECTOR type. Put the information into
> +   v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
> +
> +2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs ar

Re: V5 [PATCH] Optimize vector constructor

2019-03-08 Thread Richard Biener

On Fri, Mar 8, 2019 at 9:49 AM H.J. Lu  wrote:
>
> On Thu, Mar 7, 2019 at 9:51 AM H.J. Lu  wrote:
> >
> > On Wed, Mar 6, 2019 at 8:33 PM Richard Biener
> >  wrote:
> > >
> > > On Wed, Mar 6, 2019 at 8:46 AM H.J. Lu  wrote:
> > > >
> > > > On Tue, Mar 5, 2019 at 1:46 AM H.J. Lu  wrote:
> > > > >
> > > > > On Mon, Mar 04, 2019 at 12:55:04PM +0100, Richard Biener wrote:
> > > > > > On Sun, Mar 3, 2019 at 10:13 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sun, Mar 03, 2019 at 06:40:09AM -0800, Andrew Pinski wrote:
> > > > > > > > )
> > > > > > > > ,On Sun, Mar 3, 2019 at 6:32 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > For vector init constructor:
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > > typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> > > > > > > > >
> > > > > > > > > __v4sf
> > > > > > > > > foo (__v4sf x, float f)
> > > > > > > > > {
> > > > > > > > >   __v4sf y = { f, x[1], x[2], x[3] };
> > > > > > > > >   return y;
> > > > > > > > > }
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > we can optimize vector init constructor with vector copy or 
> > > > > > > > > permute
> > > > > > > > > followed by a single scalar insert:
> > > > >
> > > > > > and you want to advance to the _1 = BIT_INSERT_EXPR here.  The 
> > > > > > easiest way
> > > > > > is to emit a new stmt for _2 = copy ...; and do the set_rhs with the
> > > > > > BIT_INSERT_EXPR.
> > > > >
> > > > > Thanks for BIT_INSERT_EXPR suggestion.  I am testing this patch.
> > > > >
> > > > >
> > > > > H.J.
> > > > > ---
> > > > > We can optimize vector constructor with vector copy or permute 
> > > > > followed
> > > > > by a single scalar insert:
> > > > >
> > > > >   __v4sf y;
> > > > >   __v4sf D.1930;
> > > > >   float _1;
> > > > >   float _2;
> > > > >   float _3;
> > > > >
> > > > >:
> > > > >   _1 = BIT_FIELD_REF ;
> > > > >   _2 = BIT_FIELD_REF ;
> > > > >   _3 = BIT_FIELD_REF ;
> > > > >   y_6 = {f_5(D), _3, _2, _1};
> > > > >   return y_6;
> > > > >
> > > > > with
> > > > >
> > > > >  __v4sf y;
> > > > >   __v4sf D.1930;
> > > > >   float _1;
> > > > >   float _2;
> > > > >   float _3;
> > > > >   vector(4) float _8;
> > > > >
> > > > >:
> > > > >   _1 = BIT_FIELD_REF ;
> > > > >   _2 = BIT_FIELD_REF ;
> > > > >   _3 = BIT_FIELD_REF ;
> > > > >   _8 = x_9(D);
> > > > >   y_6 = BIT_INSERT_EXPR ;
> > > > >   return y_6;
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR tree-optimization/88828
> > > > > * tree-ssa-forwprop.c (simplify_vector_constructor): Optimize
> > > > > vector init constructor with vector copy or permute followed
> > > > > by a single scalar insert.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR tree-optimization/88828
> > > > > * gcc.target/i386/pr88828-1a.c: New test.
> > > > > * gcc.target/i386/pr88828-2b.c: Likewise.
> > > > > * gcc.target/i386/pr88828-2.c: Likewise.
> > > > > * gcc.target/i386/pr88828-3a.c: Likewise.
> > > > > * gcc.target/i386/pr88828-3b.c: Likewise.
> > > > > * gcc.target/i386/pr88828-3c.c: Likewise.
> > > > > * gcc.target/i386/pr88828-3d.c: Likewise.
> > > > > * gcc.target/i386/pr88828-4a.c: Likewise.
> > > > > * gcc.target/i386/pr88828-4b.c: Likewise.
> > > > > * gcc.target/i386/pr88828-5a.c: Likewise.
> > > > > * gcc.target/i386/pr88828-5b.c: Likewise.
> > > > > * gcc.target/i386/pr88828-6a.c: Likewise.
> > > > > * gcc.target/i386/pr88828-6b.c: Likewise.
> > > >
> > > > Here is the updated patch with run-time tests.
> > >
> > > -  if (TREE_CODE (elt->value) != SSA_NAME)
> > > +  if (TREE_CODE (ce->value) != SSA_NAME)
> > > return false;
> > >
> > > hmm, so it doesn't allow { 0, v[1], v[2], v[3] }?  I think the single
> > > scalar value can be a constant as well.
> >
> > Fixed.
> >
> > >if (!def_stmt)
> > > -   return false;
> > > +   {
> > > + if (gimple_nop_p (SSA_NAME_DEF_STMT (ce->value)))
> > >
> > > if (SSA_NAME_IS_DEFAULT_DEF (ce->value))
> > >
> > > +   {
> > >
> > > also you seem to disallow
> > >
> > >   { i + 1, v[1], v[2], v[3] }
> >
> > Fixed by
> >
> >  if (code != BIT_FIELD_REF)
> > {
> >   /* Only allow one scalar insert.  */
> >   if (nscalars != 0)
> > return false;
> >
> >   nscalars = 1;
> >   insert = true;
> >   scalar_idx = i;
> >   sel.quick_push (i);
> >   scalar_element = ce->value;
> >   continue;
> > }
> >
> > > because get_prop_source_stmt will return the definition computing
> > > i + 1 in this case and your code will be skipped?
> > >
> > > I think you can simplify the code by treating scalar_element != NULL
> > > as nscalars == 1 and eliding nscalars.
> >
> > It works only if
> >
> > TYPE_VECTOR_SUBPARTS (type).to_constant ()  == (nscalars + nvectors)
> >
> > We need to check both nscalars and nvecto

[PATCH] Restrict IPA split (PR go/63560).

2019-03-08 Thread Martin Liška

Hi.

This is quite old hunk that Honza forgot to install:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63560#c1

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
Ready to be installed? Or should I wait for next stage1?

Thanks,
Martin

gcc/ChangeLog:

2019-03-08  Jan Hubicka  

PR go/63560
* ipa-split.c (execute_split_functions): Do not split
'noinline' or 'section' function.
---
 gcc/ipa-split.c | 15 +++
 1 file changed, 15 insertions(+)


diff --git a/gcc/ipa-split.c b/gcc/ipa-split.c
index cfd03abb07d..5eaf8257f98 100644
--- a/gcc/ipa-split.c
+++ b/gcc/ipa-split.c
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "ipa-fnsummary.h"
 #include "cfgloop.h"
+#include "attribs.h"
 
 /* Per basic block info.  */
 
@@ -1751,6 +1752,20 @@ execute_split_functions (void)
   return 0;
 }
 
+  if (lookup_attribute ("noinline", DECL_ATTRIBUTES (current_function_decl)))
+{
+  if (dump_file)
+	fprintf (dump_file, "Not splitting: function is noinline.\n");
+  return 0;
+}
+  if (lookup_attribute ("section", DECL_ATTRIBUTES (current_function_decl)))
+{
+  if (dump_file)
+	fprintf (dump_file, "Not splitting: function is in user defined "
+		 "section.\n");
+  return 0;
+}
+
   /* We enforce splitting after loop headers when profile info is not
  available.  */
   if (profile_status_for_fn (cfun) != PROFILE_READ)

Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)

2019-03-08 Thread Jakub Jelinek

On Fri, Mar 08, 2019 at 11:36:36AM +0100, Richard Biener wrote:
> Shouldn't we somehow limit the number of stmt evaluations instead?
> I can imagine using constexpr templates you can do "loops" easily
> without actually writing loops ...

Maybe.  The question is what exactly should we count.  We could count only
in the cxx_eval_statement_list loop individual non-debug statements,
but that would miss statements that aren't in STATEMENT_LISTs.
Or we could count number of cxx_eval_constant_expression calls
(perhaps after the initial if (jump_target && *jump_target) skipping)
within a single cxx_eval_outermost_constant_expr, but we couldn't call that
number of statements even when it would be much more accurate on how long
actually something will take to evaluate, because one can have a single
statement with hundreds of thousands of operations in it.
So -fconstexpr-ops-limit= ?  Probably should use Host_Wide_Int in that case
rather than UInteger though, so users can allow more than 4G ops.

Looking at what clang++ does, they have -fconstexpr-steps= limit which is
declared to count statements with a quite low default 1048576 and it rejects
already
constexpr unsigned
foo ()
{
  unsigned int r = 0;
  for (int i = 0; i < 65536; i++)
for (int j = 0; j < 14; j++)
  r += (i + j);
  return r;
}
constexpr auto x = foo ();
(but not 13 instead of 14).  13 * 65536 is 851968, so that would be count of
r += (i + j); statements, plus the inner for as whole probably 65536 times
and with 14 * 65536 + 65536 we are still below the limit, so it must count
something else, but e.g. can't count j++ already.

> > 2019-03-07  Jakub Jelinek  
> >
> > PR c++/87481
> > * doc/invoke.texi (-fconstexpr-loop-nest-limit=): Document.
> > (-fconstexpr-loop-limit=): Slightly adjust wording.
> >
> > * c.opt (-fconstexpr-loop-nest-limit=): New option.
> > (-fconstexpr-loop-limit=): Slightly adjust description.
> >
> > * constexpr.c (cxx_eval_loop_expr): Count also number of iterations
> > of all nested loops and punt if it is above 
> > constexpr_loop_nest_limit.
> >
> > * g++.dg/cpp1y/constexpr-87481.C: New test.

Jakub

Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)

2019-03-08 Thread Jakub Jelinek

On Fri, Mar 08, 2019 at 12:25:31PM +0100, Jakub Jelinek wrote:
> Maybe.  The question is what exactly should we count.  We could count only
> in the cxx_eval_statement_list loop individual non-debug statements,
> but that would miss statements that aren't in STATEMENT_LISTs.
> Or we could count number of cxx_eval_constant_expression calls
> (perhaps after the initial if (jump_target && *jump_target) skipping)
> within a single cxx_eval_outermost_constant_expr, but we couldn't call that
> number of statements even when it would be much more accurate on how long
> actually something will take to evaluate, because one can have a single
> statement with hundreds of thousands of operations in it.
> So -fconstexpr-ops-limit= ?  Probably should use Host_Wide_Int in that case
> rather than UInteger though, so users can allow more than 4G ops.

Or, if we wanted to count statements, perhaps count just specific subset
of tree codes in cxx_eval_statement_list, like *_STMT, RETURN_EXPR,
SWITCH_EXPR, anything else that must be a statement rather than some
subexpression and is usually not wrapped with EXPR_STMT?

Or count both statements and ops and have two limits.

With the ops limit I'm afraid even if we have quite high default, it might
trigger if people use huge multi-megabyte constexpr arrays or similar.

Jakub

Re: [PATCH] Restrict IPA split (PR go/63560).

2019-03-08 Thread Jan Hubicka

> Hi.
> 
> This is quite old hunk that Honza forgot to install:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63560#c1
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> Ready to be installed? Or should I wait for next stage1?
> 
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-03-08  Jan Hubicka  
> 
>   PR go/63560
>   * ipa-split.c (execute_split_functions): Do not split
>   'noinline' or 'section' function.

It is OK now. It only makes us more conservative and moving code out of
user sections may lead to surprises.

Honza

[PATCH] Fix -Wenum-compare-switch warning in i386.c.

2019-03-08 Thread Martin Liška

A small patch that deals with:
gcc/config/i386/i386.c:39427:11:Semantic Issue: comparison of two values with 
different enumeration types in switch statement ('enum built_in_function' and 
'ix86_builtins'): -Wenum-compare-switch

Is it fine to install it?
Thanks,
Martin

gcc/ChangeLog:

2019-03-08  Martin Liska  

* config/i386/i386.c (ix86_builtin_reciprocal): Cast
DECL_FUNCTION_CODE into ix86_builtins enum before
the switch statement.
---
 gcc/config/i386/i386.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2d6a993238b..f170180304b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -39424,7 +39424,9 @@ use_rsqrt_p ()
 static tree
 ix86_builtin_reciprocal (tree fndecl)
 {
-  switch (DECL_FUNCTION_CODE (fndecl))
+  enum ix86_builtins fn_code
+= (enum ix86_builtins) DECL_FUNCTION_CODE (fndecl);
+  switch (fn_code)
 {
   /* Vectorized version of sqrt to rsqrt conversion.  */
 case IX86_BUILTIN_SQRTPS_NR:

Re: [PATCH] rs6000: Fix lost ud chains in swap optimization

2019-03-08 Thread Bill Schmidt

On 3/8/19 4:40 AM, Richard Biener wrote:
> On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt  wrote:
>> Hi,
>>
>> We recently discovered a problem in swap optimization where the du- and 
>> ud-chains
>> were getting corrupted after a preliminary modification phase and prior to 
>> the
>> main body of the pass.  The fix for this is to rebuild the chains between 
>> phases.
> It looks expensive - is it possible to keep them up-to-date instead?

That's what I've been doing up till now.  There appears to be a problem with
rescanning and the DF_DU_CHAIN and DF_UD_CHAIN dataflow problems.  Whether I
use df_process_deferred_rescans or df_insn_rescan_all, I see a problem where
now and then a use-def chain gets lost.

As I looked into it further, I found this comment on df_insn_rescan_all:

/* Rescan all of the insns in the function.  Note that the artificial   
   uses and defs are not touched.  This function will destroy def-use   
   or use-def chains.  */

So if that's accurate, it appears that the rescan machinery is not compatible
with du-/ud-chains, which is consistent with my experience.  This is the
second time we've encountered this issue.  It doesn't come up often, but
when it does, it's a real head-scratcher to debug.

If an expert in the df code thinks the above comment is erroneous, then there
is at least a subtle bug somewhere in the df code that would be good to fix.
But seeing the comment, I felt warned off completely from rescan.  This at
least allows us to fix the problem that the OpenCV community found.

Thanks,
Bill

>
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
>> I've not included a test case because the problem tends to get lost in 
>> reduction,
>> and may shift over time anyway.  Is this okay for trunk, and eventual 
>> backport
>> to 8 and 7?
>>
>> Thanks!
>>
>> Bill
>>
>>
>> 2019-03-07  Bill Schmidt  
>>
>> * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild
>> ud- and du-chains between phases.
>>
>>
>> Index: gcc/config/rs6000/rs6000-p8swap.c
>> ===
>> --- gcc/config/rs6000/rs6000-p8swap.c   (revision 269471)
>> +++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
>> @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun)
>>
>>/* Pre-pass to recombine lvx and stvx patterns so we don't lose info.  */
>>recombine_lvx_stvx_patterns (fun);
>> +
>> +  /* Rebuild ud- and du-chains.  */
>> +  df_remove_problem (df_chain);
>>df_process_deferred_rescans ();
>> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>> +  df_analyze ();
>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>>
>>/* Allocate structure to represent webs of insns.  */
>>insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ());
>>

Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).

2019-03-08 Thread Jan Hubicka

> Hi.
> 
> Thanks to Intel guys, we've done some re-measurement in PR86952
> about usage of jump tables when retpolines are used.
> Numbers prove that disabling of JT should be the best for now.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

OK, thanks!
I wonder if there is some threshold for extremely large jumptables where
branchy sequence will loose, but I think it is better to disable them
than what we have right now.

Honza

Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.

2019-03-08 Thread Nathan Sidwell


On 3/7/19 9:54 PM, Jason Merrill wrote:

For named function calls in a template, the result of unqualified lookup is
safed in CALL_EXPR_FN.  But for operator expressions, no unqualified lookup
is performed until we know whether the operands have class type.  So when we
see in a lambda a use of an operator that might be overloaded, we need to do
that lookup then and save it away somewhere.  One possibility would be in
the expression, but we can't really add extra conditional operands to
standard tree codes.  I mostly implemented another approach using a new
WITH_LOOKUP_EXPR code, but teaching everywhere how to handle a new tree code
is always complicated.  Then it occurred to me that we could associate the
lookups with the function, which is both simpler and smaller.  So this patch
stores any operator bindings needed by a lambda function in an internal
attribute on the lambda call operator.

Tested x86_64-pc-linux-gnu, applying to trunk.  Nathan, does slipping these
bindings into the sk_function_parms binding level this way make sense to you?


oo sneaky, I like it.  Perhaps a comment explaining the surprising scope 
these are being pushed into?


nathan

--
Nathan Sidwell

Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).

2019-03-08 Thread Martin Liška

On 3/8/19 1:44 PM, Jan Hubicka wrote:
>> Hi.
>>
>> Thanks to Intel guys, we've done some re-measurement in PR86952
>> about usage of jump tables when retpolines are used.
>> Numbers prove that disabling of JT should be the best for now.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> OK, thanks!
> I wonder if there is some threshold for extremely large jumptables where
> branchy sequence will loose, but I think it is better to disable them
> than what we have right now.

I tested switch statements up to 4096 and it was still slower ;)

Martin

> 
> Honza
>

Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)

2019-03-08 Thread Richard Biener

On Fri, Mar 8, 2019 at 12:45 PM Jakub Jelinek  wrote:
>
> On Fri, Mar 08, 2019 at 12:25:31PM +0100, Jakub Jelinek wrote:
> > Maybe.  The question is what exactly should we count.  We could count only
> > in the cxx_eval_statement_list loop individual non-debug statements,
> > but that would miss statements that aren't in STATEMENT_LISTs.
> > Or we could count number of cxx_eval_constant_expression calls
> > (perhaps after the initial if (jump_target && *jump_target) skipping)
> > within a single cxx_eval_outermost_constant_expr, but we couldn't call that
> > number of statements even when it would be much more accurate on how long
> > actually something will take to evaluate, because one can have a single
> > statement with hundreds of thousands of operations in it.
> > So -fconstexpr-ops-limit= ?  Probably should use Host_Wide_Int in that case
> > rather than UInteger though, so users can allow more than 4G ops.
>
> Or, if we wanted to count statements, perhaps count just specific subset
> of tree codes in cxx_eval_statement_list, like *_STMT, RETURN_EXPR,
> SWITCH_EXPR, anything else that must be a statement rather than some
> subexpression and is usually not wrapped with EXPR_STMT?
>
> Or count both statements and ops and have two limits.
>
> With the ops limit I'm afraid even if we have quite high default, it might
> trigger if people use huge multi-megabyte constexpr arrays or similar.

Well, I think having only a single limit is desirable which means we have to
count something resembling the overall work done.  We don't have to document
that it matches "statements" (whatever that exactly would be).

I don't see multi-megabyte constexpr arrays as an issue - do you think we'd
reject them already when parsing them?  We could always decide to elide
counting things we didn't do any real work on (like a literal '1' initializer).

Richard.

> Jakub

Re: [PATCH] rs6000: Fix lost ud chains in swap optimization

2019-03-08 Thread Richard Biener

On Fri, Mar 8, 2019 at 1:35 PM Bill Schmidt  wrote:
>
> On 3/8/19 4:40 AM, Richard Biener wrote:
>
> On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt  wrote:
>
> Hi,
>
> We recently discovered a problem in swap optimization where the du- and 
> ud-chains
> were getting corrupted after a preliminary modification phase and prior to the
> main body of the pass.  The fix for this is to rebuild the chains between 
> phases.
>
> It looks expensive - is it possible to keep them up-to-date instead?
>
> That's what I've been doing up till now.  There appears to be a problem with
> rescanning and the DF_DU_CHAIN and DF_UD_CHAIN dataflow problems.  Whether I
> use df_process_deferred_rescans or df_insn_rescan_all, I see a problem where
> now and then a use-def chain gets lost.
>
> As I looked into it further, I found this comment on df_insn_rescan_all:
>
> /* Rescan all of the insns in the function.  Note that the artificial
>uses and defs are not touched.  This function will destroy def-use
>or use-def chains.  */
>
> So if that's accurate, it appears that the rescan machinery is not compatible
> with du-/ud-chains, which is consistent with my experience.  This is the
> second time we've encountered this issue.  It doesn't come up often, but
> when it does, it's a real head-scratcher to debug.
>
> If an expert in the df code thinks the above comment is erroneous, then there
> is at least a subtle bug somewhere in the df code that would be good to fix.
> But seeing the comment, I felt warned off completely from rescan.  This at
> least allows us to fix the problem that the OpenCV community found.

OK, I see - I'm not very familiar with DF.

> Thanks,
> Bill
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> I've not included a test case because the problem tends to get lost in 
> reduction,
> and may shift over time anyway.  Is this okay for trunk, and eventual backport
> to 8 and 7?
>
> Thanks!
>
> Bill
>
>
> 2019-03-07  Bill Schmidt  
>
> * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild
> ud- and du-chains between phases.
>
>
> Index: gcc/config/rs6000/rs6000-p8swap.c
> ===
> --- gcc/config/rs6000/rs6000-p8swap.c   (revision 269471)
> +++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
> @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun)
>
>/* Pre-pass to recombine lvx and stvx patterns so we don't lose info.  */
>recombine_lvx_stvx_patterns (fun);
> +
> +  /* Rebuild ud- and du-chains.  */
> +  df_remove_problem (df_chain);
>df_process_deferred_rescans ();
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>
>/* Allocate structure to represent webs of insns.  */
>insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ());
>
>

[PATCH] Add fixed underlying type to enum path::format

2019-03-08 Thread Jonathan Wakely


* include/bits/fs_path.h (path::format): Add fixed underlying type.

Tested powerpc64le-linux, committed to trunk.

commit 9bb37d0b7196d4a97d65ff597af884e2fa4ebb52
Author: Jonathan Wakely 
Date:   Thu Mar 7 15:00:56 2019 +

Add fixed underlying type to enum path::format

* include/bits/fs_path.h (path::format): Add fixed underlying type.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 077045e6c78..96033f68c36 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -166,7 +166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #endif
 typedef std::basic_string  string_type;
 
-enum format { native_format, generic_format, auto_format };
+enum format : unsigned char { native_format, generic_format, auto_format };
 
 // constructors and destructor

[PATCH] Fix text of hyperlink in manual

2019-03-08 Thread Jonathan Wakely


* doc/xml/manual/using.xml: Use link element instead of xref.
* doc/html/*: Regenerate.

Committed to trunk.

commit 756ccc4dcdcc318d811b67e027202251d7dac57f
Author: Jonathan Wakely 
Date:   Fri Mar 8 13:38:31 2019 +

Fix text of hyperlink in manual

* doc/xml/manual/using.xml: Use link element instead of xref.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 2d44a739406..7647e9b8dad 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1192,8 +1192,8 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
 enabled for
 std::vector>
 and only when std::allocator is derived from
-new_allocator
-or malloc_allocator. The annotations
+new_allocator
+or malloc_allocator. The annotations
 must be present on all vector operations or none, so this macro must
 be defined to the same value for all translation units that create,
 destroy or modify vectors.

Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).

2019-03-08 Thread Jan Hubicka

> On 3/8/19 1:44 PM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> Thanks to Intel guys, we've done some re-measurement in PR86952
> >> about usage of jump tables when retpolines are used.
> >> Numbers prove that disabling of JT should be the best for now.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> > 
> > OK, thanks!
> > I wonder if there is some threshold for extremely large jumptables where
> > branchy sequence will loose, but I think it is better to disable them
> > than what we have right now.
> 
> I tested switch statements up to 4096 and it was still slower ;)

Well, we have switch statements with this many cases in
insn-attrtab/latencytab tables.  I suppose what kind of code you compile
with retpolines, but I would expect even bigger switch statements to
appear in real code (large DFA automatons and such)

Honza
> 
> Martin
> 
> > 
> > Honza
> > 
>

[PATCH] Add tests for resolved PR (PR c/85870).

2019-03-08 Thread Martin Liška

Hi.

I'm going to install Jakub's test-case for a PR that's fixed
on trunk.

Martin

gcc/testsuite/ChangeLog:

2019-03-08  Jakub Jelinek  

PR c/85870
* gcc.dg/lto/pr85870_0.c: New test.
* gcc.dg/lto/pr85870_1.c: New test.
---
 gcc/testsuite/gcc.dg/lto/pr85870_0.c | 34 
 gcc/testsuite/gcc.dg/lto/pr85870_1.c | 27 ++
 2 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr85870_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr85870_1.c


diff --git a/gcc/testsuite/gcc.dg/lto/pr85870_0.c b/gcc/testsuite/gcc.dg/lto/pr85870_0.c
new file mode 100644
index 000..4b59f607ebd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr85870_0.c
@@ -0,0 +1,34 @@
+/* PR c/85870 */
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -flto -O2 } } } */
+
+typedef struct abc_s {
+  char a1;
+  short a2;
+  unsigned int a3;
+  unsigned int a4;
+} abc;
+
+typedef struct xyz_s {
+ unsigned x1;
+ unsigned x2;
+ abc *x3;
+} xyz;
+
+extern xyz XYZ[3];
+static const abc Arr1[]={
+   {0,0,0xdeadbeaf,0xbeefdead} ,
+#line 1040
+{0,0,0xdeadbeaf,0xbeefdead} };
+
+void init_xyz_0() {
+  XYZ[0].x1=975753;
+  XYZ[0].x2=1024;
+  XYZ[0].x3=(abc *)Arr1;
+
+}
+
+int
+main ()
+{
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr85870_1.c b/gcc/testsuite/gcc.dg/lto/pr85870_1.c
new file mode 100644
index 000..cd1cd310164
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr85870_1.c
@@ -0,0 +1,27 @@
+typedef struct abc_s {
+  char a1;
+  short a2;
+  unsigned int a3;
+  unsigned int a4;
+} abc;
+
+
+typedef struct xyz_s {
+ unsigned int x1;
+ unsigned int x2;
+ abc *x3;
+} xyz;
+
+
+extern xyz XYZ[3];
+static const abc Arr2[]={
+   {0,0,0xbeafdead,0xdeadbeef} ,
+#line 1048594
+   {0,0,0xbeafdead,0xdeadbeef} };
+
+void init_xyz_1() {
+  XYZ[1].x1=425753;
+  XYZ[1].x2=1048576;
+  XYZ[1].x3=(abc *)Arr2;
+
+}

Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).

2019-03-08 Thread Martin Liška

On 3/8/19 2:58 PM, Jan Hubicka wrote:
>> On 3/8/19 1:44 PM, Jan Hubicka wrote:
 Hi.

 Thanks to Intel guys, we've done some re-measurement in PR86952
 about usage of jump tables when retpolines are used.
 Numbers prove that disabling of JT should be the best for now.

 Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

 Ready to be installed?
>>>
>>> OK, thanks!
>>> I wonder if there is some threshold for extremely large jumptables where
>>> branchy sequence will loose, but I think it is better to disable them
>>> than what we have right now.
>>
>> I tested switch statements up to 4096 and it was still slower ;)
> 
> Well, we have switch statements with this many cases in
> insn-attrtab/latencytab tables.  I suppose what kind of code you compile
> with retpolines, but I would expect even bigger switch statements to
> appear in real code (large DFA automatons and such)

Question is whether you'll meet such huge switch statements in Linux kernel?

Martin

> 
> Honza
>>
>> Martin
>>
>>>
>>> Honza
>>>
>>

Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).

2019-03-08 Thread Jakub Jelinek

On Fri, Mar 08, 2019 at 02:50:26PM +0100, Martin Liška wrote:
> On 3/8/19 1:44 PM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> Thanks to Intel guys, we've done some re-measurement in PR86952
> >> about usage of jump tables when retpolines are used.
> >> Numbers prove that disabling of JT should be the best for now.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> > 
> > OK, thanks!
> > I wonder if there is some threshold for extremely large jumptables where
> > branchy sequence will loose, but I think it is better to disable them
> > than what we have right now.
> 
> I tested switch statements up to 4096 and it was still slower ;)

Try one with 1000 of entries ;) and also compare code size and data
segment size.

Jakub

[PATCH] Cleanup TODO handling for CFG clenaup & SSA update

2019-03-08 Thread Richard Biener



There's an old comment

  /* When cleanup_tree_cfg merges consecutive blocks, it may
 perform some simplistic propagation when removing single
 valued PHI nodes.  This propagation may, in turn, cause the
 SSA form to become out-of-date (see PR 22037).  So, even
 if the parent pass had not scheduled an SSA update, we may
 still need to do one.  */
  if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun))
flags |= TODO_update_ssa;

which is from times we've had multiple virtual operands.  After
those went away we could still run into this for example when
propagating a non-const function address into an indirect call
through a const function type.  This has been fixed as well
(we retain the const-ness of the call).  Thus the above is
no longer necessary and we can simplify the code.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'm not really nervous about this change but if you think it
should wait for GCC 10 speak up.

Richard.

2019-03-08  Richard Biener  

* passes.c (execute_function_todo): Remove dead code.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 269302)
+++ gcc/passes.c(working copy)
@@ -1924,26 +1924,12 @@ execute_function_todo (function *fn, voi
 
   push_cfun (fn);
 
-  /* Always cleanup the CFG before trying to update SSA.  */
+  /* If we need to cleanup the CFG let it perform a needed SSA update.  */
   if (flags & TODO_cleanup_cfg)
-{
-  cleanup_tree_cfg (flags & TODO_update_ssa_any);
-
-  /* When cleanup_tree_cfg merges consecutive blocks, it may
-perform some simplistic propagation when removing single
-valued PHI nodes.  This propagation may, in turn, cause the
-SSA form to become out-of-date (see PR 22037).  So, even
-if the parent pass had not scheduled an SSA update, we may
-still need to do one.  */
-  if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun))
-   flags |= TODO_update_ssa;
-}
-
-  if (flags & TODO_update_ssa_any)
-{
-  unsigned update_flags = flags & TODO_update_ssa_any;
-  update_ssa (update_flags);
-}
+cleanup_tree_cfg (flags & TODO_update_ssa_any);
+  else if (flags & TODO_update_ssa_any)
+update_ssa (flags & TODO_update_ssa_any);
+  gcc_assert (!need_ssa_update_p (fn));
 
   if (flag_tree_pta && (flags & TODO_rebuild_alias))
 compute_may_aliases ();

Re: [PATCH] Cleanup TODO handling for CFG clenaup & SSA update

2019-03-08 Thread Jeff Law

On 3/8/19 7:23 AM, Richard Biener wrote:
> 
> There's an old comment
> 
>   /* When cleanup_tree_cfg merges consecutive blocks, it may
>  perform some simplistic propagation when removing single
>  valued PHI nodes.  This propagation may, in turn, cause the
>  SSA form to become out-of-date (see PR 22037).  So, even
>  if the parent pass had not scheduled an SSA update, we may
>  still need to do one.  */
>   if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun))
> flags |= TODO_update_ssa;
> 
> which is from times we've had multiple virtual operands.  After
> those went away we could still run into this for example when
> propagating a non-const function address into an indirect call
> through a const function type.  This has been fixed as well
> (we retain the const-ness of the call).  Thus the above is
> no longer necessary and we can simplify the code.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> I'm not really nervous about this change but if you think it
> should wait for GCC 10 speak up.
> 
> Richard.
> 
> 2019-03-08  Richard Biener  
> 
>   * passes.c (execute_function_todo): Remove dead code.
What's driving the desire to change this for gcc-9?  I think it's a fine
cleanup for gcc-10, but it's not clear to me we want to push it into gcc-9.

jeff

Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.

2019-03-08 Thread Jason Merrill

On Fri, Mar 8, 2019 at 8:04 AM Nathan Sidwell  wrote:
>
> On 3/7/19 9:54 PM, Jason Merrill wrote:
> > For named function calls in a template, the result of unqualified lookup is
> > safed in CALL_EXPR_FN.  But for operator expressions, no unqualified lookup
> > is performed until we know whether the operands have class type.  So when we
> > see in a lambda a use of an operator that might be overloaded, we need to do
> > that lookup then and save it away somewhere.  One possibility would be in
> > the expression, but we can't really add extra conditional operands to
> > standard tree codes.  I mostly implemented another approach using a new
> > WITH_LOOKUP_EXPR code, but teaching everywhere how to handle a new tree code
> > is always complicated.  Then it occurred to me that we could associate the
> > lookups with the function, which is both simpler and smaller.  So this patch
> > stores any operator bindings needed by a lambda function in an internal
> > attribute on the lambda call operator.
> >
> > Tested x86_64-pc-linux-gnu, applying to trunk.  Nathan, does slipping these
> > bindings into the sk_function_parms binding level this way make sense to 
> > you?
>
> oo sneaky, I like it.  Perhaps a comment explaining the surprising scope
> these are being pushed into?

The comment for push_operator_bindings already says "push the bindings
we saved away in maybe_save_op_lookup into the function parameter
binding level", are you thinking of something else/more detailed?

Jason

Re: [PATCH] Cleanup TODO handling for CFG clenaup & SSA update

2019-03-08 Thread Richard Biener

On March 8, 2019 3:52:36 PM GMT+01:00, Jeff Law  wrote:
>On 3/8/19 7:23 AM, Richard Biener wrote:
>> 
>> There's an old comment
>> 
>>   /* When cleanup_tree_cfg merges consecutive blocks, it may
>>  perform some simplistic propagation when removing single
>>  valued PHI nodes.  This propagation may, in turn, cause the
>>  SSA form to become out-of-date (see PR 22037).  So, even
>>  if the parent pass had not scheduled an SSA update, we may
>>  still need to do one.  */
>>   if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun))
>> flags |= TODO_update_ssa;
>> 
>> which is from times we've had multiple virtual operands.  After
>> those went away we could still run into this for example when
>> propagating a non-const function address into an indirect call
>> through a const function type.  This has been fixed as well
>> (we retain the const-ness of the call).  Thus the above is
>> no longer necessary and we can simplify the code.
>> 
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> 
>> I'm not really nervous about this change but if you think it
>> should wait for GCC 10 speak up.
>> 
>> Richard.
>> 
>> 2019-03-08  Richard Biener  
>> 
>>  * passes.c (execute_function_todo): Remove dead code.
>What's driving the desire to change this for gcc-9?  I think it's a
>fine
>cleanup for gcc-10, but it's not clear to me we want to push it into
>gcc-9.

Just that I came along this with the previous related CFG cleanup fix and got 
the time to test it. 

Queued for GCC 10 instead. 

Richard. 

>jeff

Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.

2019-03-08 Thread Nathan Sidwell


On 3/8/19 10:30 AM, Jason Merrill wrote:


The comment for push_operator_bindings already says "push the bindings
we saved away in maybe_save_op_lookup into the function parameter
binding level", are you thinking of something else/more detailed?


good enough, thanks


--
Nathan Sidwell

Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref

2019-03-08 Thread Segher Boessenkool

Hi Kewen,

Just one quick note:

On Fri, Mar 08, 2019 at 09:40:30AM +0800, Kewen.Lin wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> +  v2df temp;
> +  temp = arg1[0] * arg2[0];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[1] * arg2[1];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[2] * arg2[2];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[3] * arg2[3];
> +  accumulator += temp[0] + temp[1];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */

Please write a word or two about what that test is testing?  This helps
a lot when a test starts failing.


Segher

Re: [PATCH] rs6000: Fix lost ud chains in swap optimization

2019-03-08 Thread Segher Boessenkool

Hi Bill,

On Thu, Mar 07, 2019 at 06:33:48PM -0600, Bill Schmidt wrote:
> We recently discovered a problem in swap optimization where the du- and 
> ud-chains
> were getting corrupted after a preliminary modification phase and prior to the
> main body of the pass.  The fix for this is to rebuild the chains between 
> phases.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> I've not included a test case because the problem tends to get lost in 
> reduction,
> and may shift over time anyway.  Is this okay for trunk, and eventual backport
> to 8 and 7?

Like Richard I think this is expensive.  But I don't think you can do better
currently, so, okay for trunk and backports.  Thanks!


Segher

[PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline

2019-03-08 Thread Kelvin Nilsen

This problem report, though initially motivated by differences in behavior 
between constant and non-constant selector arguments, uncovered a number of 
inconsistencies in the implementation of vec_extract.

This patch provides several fixes to make handling of constant selector 
expressions the same as the handling of non-constant selector expressions.  In 
the process of testing, it was observed that certain existing regression tests 
were looking for the wrong instructions to be emitted and those tests have been 
updated.

This has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, 
with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2019-03-08  Kelvin Nilsen  

PR target/87532
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
When handling vec-extract, use modular arithmetic to allow
constant selectors greater than vector length.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow
V1TImode vectors to have constant selector values greater than 0.
Use modular arithmetic to compute vector index.
(rs6000_split_vec_extract_var): Use modular arithmetic to compute
index for in-memory vectors.  Correct code generation for
in-register vectors.
(altivec_expand_vec_ext_builtin): Use modular arithmetic to
compute index.

gcc/testsuite/ChangeLog:

2019-03-08  Kelvin Nilsen  

PR target/87532
* gcc.target/powerpc/vsx-builtin-10a.c: New test.
* gcc.target/powerpc/vsx-builtin-20a.c: New test.
* gcc.target/powerpc/vsx-builtin-11b.c: New test.
* gcc.target/powerpc/vsx-builtin-9b.c: New test.
* gcc.target/powerpc/vsx-builtin-12a.c: New test.
* gcc.target/powerpc/vsx-builtin-13b.c: New test.
* gcc.target/powerpc/vsx-builtin-14a.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test.
* gcc.target/powerpc/vsx-builtin-15b.c: New test.
* gcc.target/powerpc/vsx-builtin-16a.c: New test.
* gcc.target/powerpc/vsx-builtin-17b.c: New test.
* gcc.target/powerpc/vsx-builtin-18a.c: New test.
* gcc.target/powerpc/pr87532-mc.c: New test.
* gcc.target/powerpc/vsx-builtin-19b.c: New test.
* gcc.target/powerpc/vsx-builtin-10b.c: New test.
* gcc.target/powerpc/vsx-builtin-11a.c: New test.
* gcc.target/powerpc/vsx-builtin-9a.c: New test.
* gcc.target/powerpc/vsx-builtin-20b.c: New test.
* gcc.target/powerpc/vsx-builtin-12b.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c: New test.
* gcc.target/powerpc/vsx-builtin-14b.c: New test.
* gcc.target/powerpc/vsx-builtin-15a.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test.
* gcc.target/powerpc/pr87532.c: New test.
* gcc.target/powerpc/vsx-builtin-16b.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test.
* gcc.target/powerpc/vsx-builtin-17a.c: New test.
* gcc.target/powerpc/vsx-builtin-18b.c: New test.
* gcc.target/powerpc/vsx-builtin-19a.c: New test.
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected
instruction selection.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c  (revision 0)
@@ -0,0 +1,157 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target dfp_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */
+/* { dg-options "-maltivec" } */
+
+/* This test should run the same on any target that supports altivec/dfp
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#include 
+
+extern void abort (void);
+
+#define CONST0 (0)
+#define CONST1 (1)
+#define CONST2 (2)
+#define CONST3 (3)
+#define CONST4 (4)
+#define CONST5 (5)
+#define CONST6 (6)
+#define CONST7 (7)
+
+
+/* Test that indices > length of vector are applied modulo the vector
+   length.  */
+
+/* Test for vector residing in register.  */
+short s3 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 3);
+}
+
+short s7 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 7);
+}
+
+short s21 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 21);
+}
+
+short s30 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 30);
+}
+
+/* Test for vector residing in memory.  */
+short ms3 (vector short *vp)
+{
+  return __builtin_v

Re: [PATCH 6/7 v2] i386.c: make "sorry" message more amenable to translation (PR target/79926)

2019-03-08 Thread Jakub Jelinek

On Fri, Mar 10, 2017 at 09:21:53PM -0500, David Malcolm wrote:
> Martin pointed out that the wording of these messages could be improved
> by adding an article, and by adding quotes to "no_caller_saved_registers"
> 
> Here's a revised version that makes those changes (in addition to
> the ones you suggested).
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   PR target/79926
>   * config/i386/i386.c (ix86_set_current_function): Make "sorry"
>   messages more amenable to translation, and improve wording.
> 
> gcc/testsuite/ChangeLog:
>   PR target/79926
>   * gcc.target/i386/interrupt-387-err-1.c: Update expected message.
>   * gcc.target/i386/interrupt-387-err-2.c: Likewise.
>   * gcc.target/i386/interrupt-bnd-err-1.c: Likewise.
>   * gcc.target/i386/interrupt-bnd-err-2.c: Likewise.
>   * gcc.target/i386/interrupt-mmx-err-1.c: Likewise.
>   * gcc.target/i386/interrupt-mmx-err-2.c: Likewise.

Sorry for the delay, I can't find this mail in my mailbox (grabbed it from
the archives).  This is ok for trunk.

> ---
>  gcc/config/i386/i386.c  | 13 -
>  gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c |  2 +-
>  gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/interrupt-bnd-err-2.c |  2 +-
>  gcc/testsuite/gcc.target/i386/interrupt-mmx-err-1.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/interrupt-mmx-err-2.c |  2 +-
>  7 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e705a3e..9fbf8d0 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -7271,12 +7271,15 @@ ix86_set_current_function (tree fndecl)
>if (isa != NULL)
>   {
> if (cfun->machine->func_type != TYPE_NORMAL)
> - sorry ("%s instructions aren't allowed in %s service routine",
> -isa, (cfun->machine->func_type == TYPE_EXCEPTION
> -  ? "exception" : "interrupt"));
> + sorry (cfun->machine->func_type == TYPE_EXCEPTION
> +? G_("%s instructions aren%'t allowed in an"
> + " exception service routine")
> +: G_("%s instructions aren%'t allowed in an"
> + " interrupt service routine"),
> +isa);
> else
> - sorry ("%s instructions aren't allowed in function with "
> -"no_caller_saved_registers attribute", isa);
> + sorry ("%s instructions aren%'t allowed in a function with "
> +"the % attribute", isa);
> /* Don't issue the same error twice.  */
> cfun->machine->func_type = TYPE_NORMAL;
> cfun->machine->no_caller_saved_registers = false;
> diff --git a/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c 
> b/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c
> index 3fbdc88..8561a3c 100644
> --- a/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c
> +++ b/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c
> @@ -6,11 +6,11 @@ typedef unsigned int uword_t __attribute__ ((mode 
> (__word__)));
>  void
>  __attribute__((interrupt))
>  fn1 (void *frame, uword_t error)
> -{ /* { dg-message "80387 instructions aren't allowed in exception service 
> routine" } */
> +{ /* { dg-message "80387 instructions aren't allowed in an exception service 
> routine" } */
>  }
>  
>  void
>  __attribute__((interrupt))
>  fn2 (void *frame)
> -{ /* { dg-message "80387 instructions aren't allowed in interrupt service 
> routine" } */
> +{ /* { dg-message "80387 instructions aren't allowed in an interrupt service 
> routine" } */
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c 
> b/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c
> index 3203d64..9810f18 100644
> --- a/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c
> +++ b/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c
> @@ -4,5 +4,5 @@
>  void
>  __attribute__((no_caller_saved_registers))
>  fn1 (void)
> -{ /* { dg-message "80387 instructions aren't allowed in function with 
> no_caller_saved_registers attribute" } */
> +{ /* { dg-message "80387 instructions aren't allowed in a function with the 
> 'no_caller_saved_registers' attribute" } */
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c 
> b/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c
> index e07688e..1126fca 100644
> --- a/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c
> +++ b/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c
> @@ -6,11 +6,11 @@ typedef unsigned int uword_t __attribute__ ((mode 
> (__word__)));
>  void
>  __attribute__((interrupt))
>  fn1 (void *frame)
> -{ /* { dg-message "MPX instructions aren't allowed in interrupt service 
> routine" } */
> +{ /* { dg-message "MPX instructions aren't allowed in an interrupt service 
> ro

[GCC, Arm, committed] Fix availability of FP16-FP64 conversion instructions

2019-03-08 Thread Andre Vieira (lists)


Hi,

vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for FPUs 
that do not support double precision.  This patch fixes that.


Regression tested for arm-none-eabi.

Committed in r269499.

Cheers,
Andre

gcc/ChangeLog:
2019-03-08  Andre Vieira  

* config/arm/arm.h (TARGET_FP16_TO_DOUBLE): Add TARGET_VFP_DOUBLE
requirement.

gcc/testsuite/ChangeLog:

2019-03-08  Andre Vieira  

* gcc.target/arm/f16_f64_conv_no_dp.c: New test.
>From 870d88c1d2cf9f1e11ab85b4048739c0c98e9a06 Mon Sep 17 00:00:00 2001
From: Andre Vieira 
Date: Fri, 8 Mar 2019 16:11:10 +
Subject: [PATCH] [GCC, Arm] Fix availability of FP16-FP64 conversion
 instructions

vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for FPUs that do
not support double precision.  This patch fixes that.
---
 gcc/config/arm/arm.h  |  2 +-
 gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 103d390dd17273e947bbd4d605be2c1ef70fb137..7adafead0f20832467b873f53d298a2e1a25ab0a 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -195,7 +195,7 @@ extern tree arm_fp16_type_node;
 /* FPU supports converting between HFmode and DFmode in a single hardware
step.  */
 #define TARGET_FP16_TO_DOUBLE		\
-  (TARGET_HARD_FLOAT && (TARGET_FP16 && TARGET_VFP5))
+  (TARGET_HARD_FLOAT && TARGET_FP16 && TARGET_VFP5 && TARGET_VFP_DOUBLE)
 
 /* FPU supports fused-multiply-add operations.  */
 #define TARGET_FMA (bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv4))
diff --git a/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c b/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c
new file mode 100644
index ..99b62a8ffd5de5c61c08980a067d8fc08cb24b11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-skip-if "do not override fpu" { *-*-* } { "-mfpu=*" } { "-mfpu=fpv5-sp-d16" } } */
+/* { dg-skip-if "do not disable fpu" { *-*-* } { "-mfloat-abi=soft" } { * } } */
+/* { dg-skip-if "do not override fp16-format" { *-*-* } { "-mfp16-format=*" } { "-mfp16-format=ieee" } } */
+/* { dg-options "-O1 -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mfp16-format=ieee" } */
+
+__fp16 foo (double a)
+{
+  return a;
+}
+
+double bar (__fp16 a)
+{
+  return a;
+}
-- 
2.17.1

Re: [patch, fortran] Fix PR 66089, ICE (plus wrong code) in dependency handling

2019-03-08 Thread Thomas Koenig


Am 08.03.19 um 08:04 schrieb Bernhard Reutner-Fischer:

Please change call abort to stop N in the test?


Done.

Anything else?  OK for trunk?

Regards

Thomas

[committed] Fix POLY_INT_CST/CONST_POLY_INT typo (PR 89631)

2019-03-08 Thread Richard Sandiford

Tested on aarch64-linux-gnu (with SVE) and x86_64-linux-gnu.
Applied as obvious.

Richard


2019-03-08  Richard Sandiford  

gcc/
PR debug/89631
* dwarf2cfi.c (dwarf2out_frame_debug_expr): Use CONST_POLY_INT
instead of POLY_INT_CST.
--

Index: gcc/dwarf2cfi.c
===
--- gcc/dwarf2cfi.c 2019-03-08 18:15:36.700740351 +
+++ gcc/dwarf2cfi.c 2019-03-08 18:17:35.892287413 +
@@ -1778,7 +1778,7 @@ dwarf2out_frame_debug_expr (rtx expr)
 
  /* Rule 6 */
case CONST_INT:
-   case POLY_INT_CST:
+   case CONST_POLY_INT:
  cur_trace->cfa_temp.reg = dwf_regno (dest);
  cur_trace->cfa_temp.offset = rtx_to_poly_int64 (src);
  break;

[committed] Add testcase for already fixed PR c++/82075

2019-03-08 Thread Jakub Jelinek

Hi!

This got fixed with r257057 aka PR84031 fix, but the committed testcase
doesn't resemble anything close to this one.

Tested on x86_64-linux, committed to trunk as obvious.

2019-03-08  Jakub Jelinek  

PR c++/82075
* g++.dg/cpp1z/decomp49.C: New test.

--- gcc/testsuite/g++.dg/cpp1z/decomp49.C.jj2019-03-08 19:25:32.680069446 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp49.C   2019-03-08 19:26:06.192521962 
+0100
@@ -0,0 +1,14 @@
+// PR c++/82075
+// { dg-do run { target c++11 } }
+// { dg-options "" }
+
+struct B { };
+struct D : B { int i; };
+
+int
+main ()
+{
+  auto [i] = D{};  // { dg-warning "only available with" "" { target 
c++14_down } }
+  if (i != 0)
+__builtin_abort ();
+}

Jakub

Re: [PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline

2019-03-08 Thread Segher Boessenkool

Hi Kelvin,

On Fri, Mar 08, 2019 at 10:59:35AM -0600, Kelvin Nilsen wrote:
> This problem report, though initially motivated by differences in behavior 
> between constant and non-constant selector arguments, uncovered a number of 
> inconsistencies in the implementation of vec_extract.
> 
> This patch provides several fixes to make handling of constant selector 
> expressions the same as the handling of non-constant selector expressions.  
> In the process of testing, it was observed that certain existing regression 
> tests were looking for the wrong instructions to be emitted and those tests 
> have been updated.
> 
> This has bootstrapped and tested without regressions on 
> powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 
> big-endian, with both -m32 and -m64 target options).

Thanks for the patch.  I have lots of comments/questions, mostly about the
testcases.  Sorry :-/  The actual compiler code part looks good though.


> --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c(revision 0)
> @@ -0,0 +1,157 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */

Is there any reason to think this testcase will fail on Darwin?  I mean, it
requires VSX, and that should skip the testcase already on Darwin?

> +/* { dg-require-effective-target dfp_hw } */

Why is that?  I don't see any dfp used here?

> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } 
> */

You do not use -mcpu=, so why this dg-skip-if?  And please set
-mdejagnu-cpu= instead.

> --- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c   
> (revision 269370)
> +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c   
> (working copy)
> @@ -18,7 +18,7 @@
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {\msradi\M} 3 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */

Maybe allow both?  So {\msra?di\M}?  Or does sradi make no sense here?

> --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c(revision 0)
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target dfp_hw } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } 
> */
> +/* { dg-options "-maltivec" } */
> +
> +/* This test should run the same on any target that supports altivec/dfp
> +   instructions.  Intentionally not specifying cpu in order to test
> +   all code generation paths.  */

I don't see where dfp comes in?  For server CPUs it is p6 and up, the same
as AltiVec, but that is coincidental.

> --- gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0)
> @@ -0,0 +1,260 @@
> +/* { dg-do run { target { powerpc*-*-* && lp64 } } } */

Check for int128, instead?  Or is there another reason to want lp64?

> +  __asm__ (" # %x0" : "+wa" (a));

"wa" requires VSX.

> --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c(revision 0)
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */

Btw, you can just say { dg-do run }: everything in gcc.target/powerpc is
only run for powerpc*-*-* targets.

> --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do run { target { powerpc*-*-linux* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } 
> */
> +/* { dg-options "-O2 -mvsx -save-temps -dp -g" } */

I think that is debugging code left over?  -dp -g should be harmless, but
-save-temps is littering ;-)

> --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h  (revision 0)

> +static vector TYPE
> +deoptimize (vector TYPE a)
> +{
> +  __asm__ (" # %x0" : "+wa" (a));
> +  return a;
> +}

Is this only ever compiled with VSX?  If not, use "v" instead?

> --- gcc/config/rs6000/rs6000-c.c  (revision 269370)
> +++ gcc/config/rs6000/rs6000-c.c  (working copy)
> @@ -6568,9 +6568,15 @@
> /* If the second argument is an integer constant, if the value is in
>the expected range, generate the built-in code if we can.  We need
>64-bit and direct move to extract the

Re: [PR 85762, 87008, 85459] Relax MEM_REF check in contains_vce_or_bfcref_p

2019-03-08 Thread Richard Biener

On March 8, 2019 5:57:35 PM GMT+01:00, Martin Jambor  wrote:
>Hi,
>
>BTW, we have dropped the gcc-patches from the CC quite some emails ago
>:-/
>
>On Fri, Mar 08 2019, Richard Biener wrote:
>> yOn Thu, 7 Mar 2019, Martin Jambor wrote:
>>
>>> Hello again,
>>> 
>>> On Thu, Mar 07 2019, Martin Jambor wrote:
>>> > Hi,
>>> >
>>> > On Thu, Mar 07 2019, Richard Biener wrote:
>>> >>
>>> >> So possibly we should restrict total scalarization to the case
>>> >> where all accesses are layout compatible (that is,
>>> >> types_compatible_p)?
>>> >>
>>> >
>>> > That would be the patch below (currently undergoing testing).  It
>is
>>> > slightly more strict than mine but on the three testcases in
>question,
>>> > it gives the same results too.  People memcpying their aggregates
>>> > around, even when not doing anything silly like storing stuff in
>>> > padding, will find their code slower but arguably closer to what
>they
>>> > wanted.
>>> 
>>> So, testing revealed that (unlike my previous version) this breaks
>two
>>> guality tests where we depend on total scalarization to convey to
>the
>>> debugger what is in bits of an aggregate.
>>> 
>>> In gcc.dg/guality/pr54970.c, the array a is no longer totally
>scalarized
>>> because it is memcpied into and gdb no longer knows what constant
>lurks
>>> in a[0], which is otherwise unused.  Probably not a big deal.
>>> 
>>> In gcc.dg/guality/pr59776.c, where the ain type and function are:
>>> 
>>>  struct S { float f, g; };
>>> 
>>>  __attribute__((noinline, noclone)) void
>>>  foo (struct S *p)
>>>  {
>>>struct S s1, s2;  /* { dg-final {
>gdb-test pr59776.c:17 "s1.f" "5.0" } } */
>>>s1 = *p;  /* { dg-final {
>gdb-test pr59776.c:17 "s1.g" "6.0" } } */
>>>s2 = s1;  /* { dg-final {
>gdb-test pr59776.c:17 "s2.f" "0.0" } } */
>>>*(int *) &s2.f = 0;   /* { dg-final {
>gdb-test pr59776.c:17 "s2.g" "6.0" } } */
>>>asm volatile (NOP : : : "memory");/* { dg-final {
>gdb-test pr59776.c:20 "s1.f" "5.0" } } */
>>>asm volatile (NOP : : : "memory");/* { dg-final {
>gdb-test pr59776.c:20 "s1.g" "6.0" } } */
>>>s2 = s1;  /* { dg-final {
>gdb-test pr59776.c:20 "s2.f" "5.0" } } */
>>>asm volatile (NOP : : : "memory");/* { dg-final {
>gdb-test pr59776.c:20 "s2.g" "6.0" } } */
>>>asm volatile (NOP : : : "memory");
>>>  }
>>>   
>>> the zeroing through integer type caused that s2 variable is no
>longer
>>> totally scalarized and thus somehow opaque for the debugger.  I
>don't
>>> think this particular test is a big concern either, but such scalar
>type
>>> casted accesses were exactly the reason why I initially decided
>against
>>> completely disabling total scalarization when I see a
>types-incompatible
>>> access.
>>> 
>>> In any event, although I would still slightly prefer my patch from
>>> yesterday (or at least tolerating scalar type-incompatible
>accesses), I
>>> am also happy with the one from today, provided I have permission to
>>> XFAIL the guality tests.
>>
>> I guess I'd play safe at this point and do less adjustments.  The
>> second patch looks nicer, with
>>
>>  static bool
>> -contains_vce_or_bfcref_p (const_tree ref)
>> +contains_vce_or_bfcref_p (const_tree ref, bool *type_changing_p =
>NULL)
>
>OK
>
>>
>> (a default for type_chagning_p) there would be even less hunks.  Is
>> it functionally equivalent to the first variant when you change
>> the types_compatible_p back to the TYPE_MAIN_VARIANT compare?
>> That is, is this the only real difference?
>
>I probably don't understand the question because I'd say that the
>second
>guality failure shows there is a difference between being careful we
>just don't loose any data in padding and disabling total scalarization
>altogether.
>
>To give a non-guality example, the following will regress with the
>yesterday's variant - compared to current trunk:
>
>  struct S {unsigned a,b;};
>
>  void foo (struct S *in, struct S *out)
>  {
>struct S s;
>s = *in;
>*(int *)&s.b = -1;
>*out = s;
>  }
>
>Currently s is totally scalarized away, but if we "restrict total
>scalarization to the case where all accesses are layout compatible," it
>of course isn't and s survives all the way to optimized dump, whereas
>currently we eliminate it even in early SRA.
>
>Even the assembly output changes from:
>
>   movl(%rdi), %eax
>   movl%eax, (%rsi)
>   movl$-1, 4(%rsi)
>   ret
>
>to a bit unexpected:
>
>   movabsq $-4294967296, %rax
>   orq (%rdi), %rax
>   movq%rax, (%rsi)
>   ret
>
>So I guess now I am even more convinced about what I wrote yesterday,
>i.e. that if we should at least allow the scalar incompatible types.
>The reason why this is safe is that when a scalar access is not a leaf
>of an access tree, scalarization of that portion of the aggregate fails
>(and thus

[PATCH] Fix various description text issues in *.opt files (PR target/79645)

2019-03-08 Thread Jakub Jelinek

Hi!

The PR complained about lack of terminating dot at the end of one option
description, but I've found many others.
The patch ensures that normal option descriptions start with a capital
letter (unless it starts with a keyword, number, or - with tab in the middle of
description, etc.) and ends with a dot, for the Enum descriptions that it
starts with a capital letter and ends with a colon and I've tried to verify
the Warn etc. messages don't start with a capital letter (unless it would be
written that way in the middle of sentence) and don't end with a dot or
period.

Tested on x86_64-linux, ok for trunk?

2019-03-08  Jakub Jelinek  

PR target/79645
* common.opt (fdiagnostics-show-labels,
fdiagnostics-show-line-numbers, fdiagnostics-format=,
fdiagnostics-minimum-margin-width=, fgnat-encodings=, gas-loc-support,
gas-locview-support, ginline-points, ginternal-reset-location-views):
Terminate description text with a dot.
* config/microblaze/microblaze.opt (mxl-prefetch): Likewise.
* config/mcore/mcore.opt (m210, m340): Likewise.
* config/epiphany/epiphany.opt (mprefer-short-insn-regs, mcmove,
mnops=): Start description text with a capital letter.
* config/arc/arc.opt (msize-level=): Likewise.
* config/sh/sh.opt (minline-ic_invalidate): Likewise.
* config/rs6000/sysv4.opt (mno-toc, mtoc, mno-traceback, mshlib,
mnewlib): Likewise.
* config/ft32/ft32.opt (msim): Likewise.
(mft32b, mcompress): Likewise.  Terminate description text with a dot.
(mnodiv, mnopm): Terminate description text with a dot.
* config/c6x/c6x.opt (c6x_sdata): Terminate Enum description with
a colon.
* config/i386/i386.opt (prefer_vector_width, instrument_return):
Likewise.
* config/rx/rx.opt (nofpu): Remove trailing spaces from description
text.
lto/
* lang.opt: Terminate description text with a dot.

--- gcc/common.opt.jj   2019-02-25 23:57:00.971009713 +0100
+++ gcc/common.opt  2019-03-08 20:47:17.258098651 +0100
@@ -1251,11 +1251,11 @@ Show the source line with a caret indica
 
 fdiagnostics-show-labels
 Common Var(flag_diagnostics_show_labels) Init(1)
-Show labels annotating ranges of source code when showing source
+Show labels annotating ranges of source code when showing source.
 
 fdiagnostics-show-line-numbers
 Common Var(flag_diagnostics_show_line_numbers) Init(1)
-Show line numbers in the left margin when showing source
+Show line numbers in the left margin when showing source.
 
 fdiagnostics-color
 Common Alias(fdiagnostics-color=,always,never)
@@ -1283,7 +1283,7 @@ Enum(diagnostic_color_rule) String(auto)
 
 fdiagnostics-format=
 Common Joined RejectNegative Enum(diagnostics_output_format)
--fdiagnostics-format=[text|json] Select output format
+-fdiagnostics-format=[text|json] Select output format.
 
 ; Required for these enum values.
 SourceInclude
@@ -1312,7 +1312,7 @@ Amend appropriate diagnostic messages wi
 
 fdiagnostics-minimum-margin-width=
 Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
-Set minimum width of left margin of source code when showing source
+Set minimum width of left margin of source code when showing source.
 
 fdisable-
 Common Joined RejectNegative Var(common_deferred_options) Defer
@@ -1521,7 +1521,7 @@ Enum(dwarf_gnat_encodings) String(minima
 
 fgnat-encodings=
 Common Enum(dwarf_gnat_encodings) Joined RejectNegative Report Undocumented 
Var(gnat_encodings)
--fgnat-encodings=[all|gdb|minimal] Select the balance between GNAT 
encodings and standard DWARF emitted in the debug information
+-fgnat-encodings=[all|gdb|minimal] Select the balance between GNAT 
encodings and standard DWARF emitted in the debug information.
 
 ; This option is not documented yet as its semantics will change.
 fgraphite
@@ -2969,11 +2969,11 @@ Generate debug information in default fo
 
 gas-loc-support
 Common Driver Var(dwarf2out_as_loc_support) Init(2)
-Assume assembler support for (DWARF2+) .loc directives
+Assume assembler support for (DWARF2+) .loc directives.
 
 gas-locview-support
 Common Driver Var(dwarf2out_as_locview_support) Init(2)
-Assume assembler support for view in (DWARF2+) .loc directives
+Assume assembler support for view in (DWARF2+) .loc directives.
 
 gcoff
 Common Driver Deprecated
@@ -3009,11 +3009,11 @@ Generate debug information in default ex
 
 ginline-points
 Common Driver Var(debug_inline_points) Init(2)
-Generate extended entry point information for inlined functions
+Generate extended entry point information for inlined functions.
 
 ginternal-reset-location-views
 Common Driver Var(debug_internal_reset_location_views) Init(2)
-Compute locview reset points based on insn length estimates
+Compute locview reset points based on insn length estimates.
 
 gno-
 RejectNegative Joined Undocumented
--- gcc/config/epiphany/epiphany.opt.jj 2019-01-01 12:37:23.560869895 +0100
+++

[Patch, aarch64] PR 89628 - fix register allocation in SIMD functions

2019-03-08 Thread Steve Ellcey

This is a patch to fix the register allocation in SIMD functions.  In
normal functions registers V16 to V31 are all caller saved.  In SIMD
functions V16 to V23 are callee saved and V24 to V31 are caller saved.
This means that SIMD functions should use V24 to V31 before V16 to V23
in order to avoid some saves and restores.

My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
defined on Aarch64, so I just defined it as all the registers in order
except for putting V24 to V31 before V16 to V23.  This fixes the
register allocation in SIMD functions.  It also changes the register
allocation order in regular functions but since all the registers (V16
to V31) are caller saved in that case, it doesn't matter.  We could use
ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
any reason to do that.

Tested on ThunderX2 Aarch64 with no regressions.

OK to checkin?


2018-03-08  Steve Ellcey  

PR target/89628
* config/aarch64/aarch64.h (V16-V31): Update comment.
(REG_ALLOC_ORDER): New macro.


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 7bd3bf5..d3723ff 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
ZR		zero register, encoded as X/R31 elsewhere
 
32 x 128-bit floating-point/vector registers
-   V16-V31	Caller-saved (temporary) registers
+   V24-V31	Caller-saved (temporary) registers
+   V16-V23	Caller-saved (temporary) registers in most functions;
+		Callee-saved in SIMD functions.
V8-V15	Callee-saved registers
V0-V7	Parameter/result registers
 
@@ -431,6 +433,26 @@ extern unsigned aarch64_architecture_version;
 V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
   }
 
+/* This is the default order for everything except V16-V23.  These
+   are caller saved in most functions but callee saved in SIMD
+   functions so we want to use V24-V31 which are always
+   caller saved before using these registers. */
+
+#define REG_ALLOC_ORDER		\
+  {\
+ 0,  1,  2,  3,4,  5,  6,  7,	/* R0 - R7 */		\
+ 8,  9, 10, 11,   12, 13, 14, 15,	/* R8 - R15 */		\
+16, 17, 18, 19,   20, 21, 22, 23,	/* R16 - R23 */		\
+24, 25, 26, 27,   28, 29, 30, 31,	/* R24 - R30, SP */	\
+32, 33, 34, 35,   36, 37, 38, 39,	/* V0 - V7 */		\
+40, 41, 42, 43,   44, 45, 46, 47,	/* V8 - V15 */		\
+56, 57, 58, 59,   60, 61, 62, 63,	/* V24 - V31 */		\
+48, 49, 50, 51,   52, 53, 54, 55,	/* V16 - V23 */		\
+64, 65, 66, 57,			/* SFP, AP, CC, VG */	\
+68, 69, 70, 71,   72, 73, 74, 75,	/* P0 - P7 */		\
+76, 77, 78, 79,   80, 81, 82, 83,	/* P8 - P15 */		\
+  }
+
 #define EPILOGUE_USES(REGNO) (aarch64_epilogue_uses (REGNO))
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,

Re: [PATCH] Add tests for resolved PR (PR c/85870).

2019-03-08 Thread Jakub Jelinek

On Fri, Mar 08, 2019 at 03:03:07PM +0100, Martin Liška wrote:
> I'm going to install Jakub's test-case for a PR that's fixed
> on trunk.

This fails on i686-linux regtest when linker plugin is disabled (I use that
in my setup, because my ld is 64-bit and 32-bit plugins don't really work
very well with 64-bit linker).

The problem is that XYZ is extern in both sources, no idea why it actually
works with linker plugin.  That said, dropping extern from one of the two
doesn't really work, the ICE when the fix is reverted goes away.

But with the following change the ICE is still in there when
r268789 and r268981 is reverted (both in this i686-linux setup and normal
x86_64-linux with linker plugin), and with vanilla cc1/lto1 works.

So, I've committed this to trunk as obvious.

2019-03-08  Jakub Jelinek  

PR c/85870
* gcc.dg/lto/pr85870_0.c: Add dg-extra-ld-options with
-r -nostdlib -flinker-output=nolto-rel.

--- gcc/testsuite/gcc.dg/lto/pr85870_0.c.jj 2019-03-08 21:43:52.032655832 
+0100
+++ gcc/testsuite/gcc.dg/lto/pr85870_0.c2019-03-08 23:25:15.150784557 
+0100
@@ -1,6 +1,7 @@
 /* PR c/85870 */
 /* { dg-lto-do link } */
 /* { dg-lto-options { { -flto -O2 } } } */
+/* { dg-extra-ld-options { -r -nostdlib -flinker-output=nolto-rel } } */

 typedef struct abc_s {
   char a1;

Jakub

libgo patch committed: Use more largefile functions

2019-03-08 Thread Ian Lance Taylor

This libgo patch, based on one by Rainer Orth, uses more largefile
functions.  With this patch we consistently call __go_openat for
openat and we use fstatat64, creat64,  sendfile64, and getdents64
where needed.  This is for PR 89447.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 269443)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-959260238817af3205fb9907dd92319291e6a893
+3106ec19626d75d8275be16c86421132548fa13e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/internal/syscall/unix/at.go
===
--- libgo/go/internal/syscall/unix/at.go(revision 269196)
+++ libgo/go/internal/syscall/unix/at.go(working copy)
@@ -13,12 +13,9 @@ import (
 //extern unlinkat
 func unlinkat(int32, *byte, int32) int32
 
-//extern openat
+//extern __go_openat
 func openat(int32, *byte, int32, syscall.Mode_t) int32
 
-//extern fstatat
-func fstatat(int32, *byte, *syscall.Stat_t, int32) int32
-
 func Unlinkat(dirfd int, path string, flags int) error {
var p *byte
p, err := syscall.BytePtrFromString(path)
Index: libgo/go/internal/syscall/unix/at_largefile.go
===
--- libgo/go/internal/syscall/unix/at_largefile.go  (nonexistent)
+++ libgo/go/internal/syscall/unix/at_largefile.go  (working copy)
@@ -0,0 +1,14 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix hurd linux solaris,386 solaris,sparc
+
+package unix
+
+import (
+   "syscall"
+)
+
+//extern fstatat64
+func fstatat(int32, *byte, *syscall.Stat_t, int32) int32
Index: libgo/go/internal/syscall/unix/at_regfile.go
===
--- libgo/go/internal/syscall/unix/at_regfile.go(nonexistent)
+++ libgo/go/internal/syscall/unix/at_regfile.go(working copy)
@@ -0,0 +1,18 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !aix
+// +build !hurd
+// +build !linux
+// +build !solaris !386
+// +build !solaris !sparc
+
+package unix
+
+import (
+   "syscall"
+)
+
+//extern fstatat
+func fstatat(int32, *byte, *syscall.Stat_t, int32) int32
Index: libgo/go/syscall/libcall_bsd.go
===
--- libgo/go/syscall/libcall_bsd.go (revision 269196)
+++ libgo/go/syscall/libcall_bsd.go (working copy)
@@ -13,8 +13,6 @@ import (
"unsafe"
 )
 
-//sys  sendfile(outfd int, infd int, offset *Offset_t, count int) (written 
int, err error)
-//sendfile(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t
 func Sendfile(outfd int, infd int, offset *int64, count int) (written int, err 
error) {
if race.Enabled {
race.ReleaseMerge(unsafe.Pointer(&ioSync))
Index: libgo/go/syscall/libcall_bsd_largefile.go
===
--- libgo/go/syscall/libcall_bsd_largefile.go   (nonexistent)
+++ libgo/go/syscall/libcall_bsd_largefile.go   (working copy)
@@ -0,0 +1,10 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build solaris,386 solaris,sparc
+
+package syscall
+
+//sys  sendfile(outfd int, infd int, offset *Offset_t, count int) (written 
int, err error)
+//sendfile64(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t
Index: libgo/go/syscall/libcall_bsd_regfile.go
===
--- libgo/go/syscall/libcall_bsd_regfile.go (nonexistent)
+++ libgo/go/syscall/libcall_bsd_regfile.go (working copy)
@@ -0,0 +1,10 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin dragonfly freebsd netbsd openbsd solaris,amd64 solaris,sparc64
+
+package syscall
+
+//sys  sendfile(outfd int, infd int, offset *Offset_t, count int) (written 
int, err error)
+//sendfile(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t
Index: libgo/go/syscall/libcall_posix.go
===
--- libgo/go/syscall/libcall_posix.go   (revision 269196)
+++ libgo/go/syscall/libcall_posix.go   (working copy)
@@ -184,9 +184,6 @@ func FDZero(set *FdSet) {
 //sys  Close(fd int) (err error)
 //close(fd _C_int) _C_int
 
-//sys  Creat(path string, mode ui

Re: [PATCH] Fix -gdwarf-5 -gsplit-dwarf ICEs (PR debug/89498)

2019-03-08 Thread Alexandre Oliva

On Mar  6, 2019, Jakub Jelinek  wrote:

> On Tue, Mar 05, 2019 at 04:06:59PM -0500, Jason Merrill wrote:
>> > Assuming output_view_list_offset is correct, the following patch adjusts
>> > size_of_die/value_format accordingly.
>> 
>> I would guess that omitting the handling from output_view_list_offset was an
>> oversight in the view work.  Alex, which is right? :)

I guess both are :-)

I don't recall having dealt with this particular combination of flags.
Indeed, I'm pretty sure I wrote the bulk of the locview output support
before DWARFv5 support was there, and AFAICT I failed to pick up the
more compact DWARFv5 representation as I updated the patchset, proposed
the locview extension for DWARFv6 in line with DWARFv5 locviews, and
implemented it as the incompatv5 extension for v5 loclists with locviews
that Jakub noticed.

> In order to use DW_FORM_loclistx .uleb128 for the DW_AT_GNU_locviews,
> we'd need to emit .long   .LVUS0-.Ldebug_loc0 etc. next to the
> .LLST0-.Ldebug_loc0 in the section offsets table at .Ldebug_loc0,
> but I'd expect debug info consumers to assume that anything listed
> in the offset table is actually a valid DW_LLE_* sequence, which is not the
> case for the .LVUS* labels.

*nod*

> Note the
> .long   .LVUS0-.Ldebug_loc0 # DW_AT_GNU_locviews
> looks weird, at least with the patch it is given DW_FORM_sec_offset, but
> because it is a difference between .LVUS0-.Ldebug_loc0, it is actually not a
> sec offset.  Shall it use a different form in that case?

Ugh.  That's more fallout from the concurrent implementations of v5 and
LVU, ISTM.  My bad for not catching, no doubt, since LVU landed later; I
don't mean to excuse it, just to explain how it came about.

I'm afraid I hadn't thought of it to the point of saying what would be
correct, let alone more desirable.  We're in non-strict dwarf territory,
trying to extend something that was not designed to be extended.
Unfortunately, the extensions designed for v4 do not work for v5, and
the extensions proposed for v6 don't either, so we're in a bit of a
limbo, so I guess anything that makes sense and doesn't break
unsuspecting consumers will do.

I'm thinking the most reasonable way forward in this non-standard
minefield is to document that the section base address used for
DW_AT_GNU_locviews is the same base address used for the corresponding
loclistx table.  That would make it trivially correct for v4-, and
not-so-trivially correct, but mostly sensible IMHO, for v5.

I'd argue there are even backward-compatibility reasons to document it
that way, since it is not hard to figure out that this was (by inaction)
the intent in the, erhm, reference implementation, even though, as you
noticed, it didn't work for split v5.  Of course, since it was broken,
it wouldn't be unreasonable to disregard backward compatibility
altogether and go for something entirely different; I don't know of any
other implementations that might have been based on this accidental
implementation detail, and I'd assume that if anyone even noticed the
problem while collecting data from the reference implementation, we'd
have got a bug report about it shortly thereafter.

Another possibility of encoding that comes to mind would be to output an
offset between the LLST and the LVUS labels as DW_AT_GNU_locviews, but
since it's not a compile-time constant, we won't want to make it
uleb128.  It would still be resolved to a constant and avoid
relocations, as preferred by split-dwarf, but it wouldn't save space or
relocations WRT what we'd get after your corrective patch.

Yet another, more far-out possibilities, are to introduce a list of
locview offsets right after the end of the loclist offsets, or even a
separate section with a separate section header for them, so that we
could use uleb128 compiler-computed indices to reference locview
extensions to loclists.  That would be a significant amount of work for
little to no benefit AFAICT, and only on the short term.  In the long
term, I expect/hope some way to represent locviews to be adopted in
standard DWARF.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe

Re: [PATCH] Fix -gdwarf-5 -gsplit-dwarf ICEs (PR debug/89498)

2019-03-08 Thread Jakub Jelinek

On Sat, Mar 09, 2019 at 03:12:28AM -0300, Alexandre Oliva wrote:
> Ugh.  That's more fallout from the concurrent implementations of v5 and
> LVU, ISTM.  My bad for not catching, no doubt, since LVU landed later; I
> don't mean to excuse it, just to explain how it came about.
> 
> 
> I'm afraid I hadn't thought of it to the point of saying what would be
> correct, let alone more desirable.  We're in non-strict dwarf territory,
> trying to extend something that was not designed to be extended.
> Unfortunately, the extensions designed for v4 do not work for v5, and
> the extensions proposed for v6 don't either, so we're in a bit of a
> limbo, so I guess anything that makes sense and doesn't break
> unsuspecting consumers will do.
> 
> 
> I'm thinking the most reasonable way forward in this non-standard
> minefield is to document that the section base address used for
> DW_AT_GNU_locviews is the same base address used for the corresponding
> loclistx table.  That would make it trivially correct for v4-, and
> not-so-trivially correct, but mostly sensible IMHO, for v5.
> 
> I'd argue there are even backward-compatibility reasons to document it
> that way, since it is not hard to figure out that this was (by inaction)
> the intent in the, erhm, reference implementation, even though, as you
> noticed, it didn't work for split v5.  Of course, since it was broken,
> it wouldn't be unreasonable to disregard backward compatibility
> altogether and go for something entirely different; I don't know of any
> other implementations that might have been based on this accidental
> implementation detail, and I'd assume that if anyone even noticed the
> problem while collecting data from the reference implementation, we'd
> have got a bug report about it shortly thereafter.
> 
> 
> Another possibility of encoding that comes to mind would be to output an
> offset between the LLST and the LVUS labels as DW_AT_GNU_locviews, but
> since it's not a compile-time constant, we won't want to make it
> uleb128.  It would still be resolved to a constant and avoid
> relocations, as preferred by split-dwarf, but it wouldn't save space or
> relocations WRT what we'd get after your corrective patch.
> 
> Yet another, more far-out possibilities, are to introduce a list of
> locview offsets right after the end of the loclist offsets, or even a
> separate section with a separate section header for them, so that we
> could use uleb128 compiler-computed indices to reference locview
> extensions to loclists.  That would be a significant amount of work for
> little to no benefit AFAICT, and only on the short term.  In the long
> term, I expect/hope some way to represent locviews to be adopted in
> standard DWARF.

Whatever we choose, IMHO: 1) we can't introduce new DW_FORM_*
2) we must avoid changing anything on how it was represented for the
non-split case, we've already shipped GCC 8.[123] with it
3) the consumer must be able to figure out easily according to clear rules
how to interpret the attribute and find the views.  So, either it needs
to use different form from what is used normally if it needs to be
interpreted differently, or it needs to use a different attribute,
and from that and possibly version number in .debug_info header it needs
to figure out if it should use absolute or relative offset to .debug_loc,
or absolute or relative offset to .debug_loclist, in split or non-split
case etc.  We need to look at what we emit with
-gdwarf-{2,3,4,5} {,-gsplit-dwarf}.

If we added offsets to LVU entries to the table at the start of
.debug_loclists, they shouldn't be counted towards the limit and guess some
locviews unaware consumers could be upset, because the uleb128s would be
above that limit.  If we include them in the limit, then even more consumers
would be upset because they don't find valid DW_LLE_* at those spots when
they just try to dump the .debug_loclists section without accessing
.debug_info.

Jakub

[PATCH] Fix RTL thread_jump for doloop branches (PR rtl-optimization/89634)

2019-03-08 Thread Jakub Jelinek

Hi!

The following testcase used to be miscompiled by RTL jump threading
on s390x-linux with -march=zEC12 at -O2 before r269302 which made it latent.
The problem is we had:
(jump_insn 26 25 87 4 (parallel [
(set (pc)
(if_then_else (eq (reg/v:DI 5 %r5 [orig:74 e ] [74])
(const_int 1 [0x1]))
(label_ref 43)
(pc)))
(clobber (reg:CC 33 %cc))
]) "rh1686696.c":16:7 1263 {*cmp_and_br_signed_di}
 (int_list:REG_BR_PROB 118111604 (nil))
 -> 43)
as the sole non-note non-code_label insn in basic block 4, and
(jump_insn 68 67 134 12 (parallel [
(set (pc)
(if_then_else (ne (reg/v:DI 5 %r5 [orig:74 e ] [74])
(const_int 1 [0x1]))
(label_ref:DI 23)
(pc)))
(set (reg/v:DI 5 %r5 [orig:74 e ] [74])
(plus:DI (reg/v:DI 5 %r5 [orig:74 e ] [74])
(const_int -1 [0x])))
(clobber (scratch:DI))
(clobber (reg:CC 33 %cc))
]) "rh1686696.c":13:3 1922 {doloop_di}
 (int_list:REG_BR_PROB 904381916 (nil))
 -> 23)
as the sole non-note non-code_label insn in basic block 12, the doloop
conditional jump branching to the bb with the above jump_insn 26.
thread_jump sees one condition being %r5 == 1 and the other %r5 != 1,
performs the cselib assisted checks if the b sequence is really a noop
(but that starts with the original complete bb as a base and only does
nonequal processing in the e->src bb) and decided to change insn 68
to jump after the jump_insn 26 because the comparison of %r5 against 1
has been done already.

It didn't take into account that %r5 has been modified in the doloop_di
insn though, so the conditions are comparing different values.

I have thought about different approaches, below is the simplest one,
just punt if modified_in_p.  Another possibility would be to handle the
BB_END (e->src) insn in the second rather than first loop and do the
nonequal handling in there, but I couldn't figure out what kind of insns
it would handle better than this more simple approach.  It wouldn't handle
hypothetical condjump instruction which compares one register and modifies
another one, as it would add that register into nonequal and unless that
register has been unconditionally set to something else, would keep it
in nonequal and bail out.

Bootstrapped/regtested on {x86_64,i686}-linux (where it didn't check much
because those targets don't have doloops) and on r269253 trunk (i.e. before
the r269302 change) on {powerpc64le,s390x}-linux (which both have doloop
insns).  Ok for trunk?

2019-03-09  Jakub Jelinek  

PR rtl-optimization/89634
* cfgcleanup.c (thread_jump): Punt if registers mentioned in cond1
are modified in BB_END (e->src) instruction.

* gcc.c-torture/execute/pr89634.c: New test.

--- gcc/cfgcleanup.c.jj 2019-01-22 10:11:59.652429658 +0100
+++ gcc/cfgcleanup.c2019-03-08 17:47:11.997954352 +0100
@@ -308,6 +308,11 @@ thread_jump (edge e, basic_block b)
   || !rtx_equal_p (XEXP (cond1, 1), XEXP (cond2, 1)))
 return NULL;
 
+  /* Punt if BB_END (e->src) is doloop-like conditional jump that modifies
+ the registers used in cond1.  */
+  if (modified_in_p (cond1, BB_END (e->src)))
+return NULL;
+
   /* Short circuit cases where block B contains some side effects, as we can't
  safely bypass it.  */
   for (insn = NEXT_INSN (BB_HEAD (b)); insn != NEXT_INSN (BB_END (b));
--- gcc/testsuite/gcc.c-torture/execute/pr89634.c.jj2019-03-08 
17:48:08.161045920 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr89634.c   2019-03-08 
17:47:52.151304876 +0100
@@ -0,0 +1,40 @@
+/* PR rtl-optimization/89634 */
+
+static unsigned long *
+foo (unsigned long *x)
+{
+  return x + (1 + *x);
+}
+
+__attribute__((noipa)) unsigned long
+bar (unsigned long *x)
+{
+  unsigned long c, d = 1, e, *f, g, h = 0, i;
+  for (e = *x - 1; e > 0; e--)
+{
+  f = foo (x + 1);
+  for (i = 1; i < e; i++)
+   f = foo (f);
+  c = *f;
+  if (c == 2)
+   d *= 2;
+  else
+   {
+ i = (c - 1) / 2 - 1;
+ g = (2 * i + 1) * (d + 1) + (2 * d + 1);
+ if (g > h)
+   h = g;
+ d *= c;
+   }
+}
+  return h;
+}
+
+int
+main ()
+{
+  unsigned long a[18] = { 4, 2, -200, 200, 2, -400, 400, 3, -600, 0, 600, 5, 
-100, -66, 0, 66, 100, __LONG_MAX__ / 8 + 1 };
+  if (bar (a) != 17)
+__builtin_abort ();
+  return 0;
+}

Jakub

53 matches

Mail list logo