Re: [PATCH] Fix and improve tree-object-size.c pass_through_call

2017-11-24 Thread Richard Biener
On Thu, 23 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned on IRC, we can actually now use ERF_RETURN*_ARG* to find out
> what functions are pass through (except for __builtin_assume_aligned, which
> is pass through but we never want to propagate the arg to the result early, as
> the result contains additional information, so it isn't RET1).
> 
> While cleaning this up, I run into a bug in the function, BUILT_IN_STPNCPY_CHK
> obviously isn't a pass-through function, so I've also added a testcase that
> FAILs without this patch and succeeds with it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> For release branches I'd just remove the BUILT_IN_STPNCPY_CHK case and
> add testcase.
> 
> 2017-11-23  Jakub Jelinek  
> 
>   * tree-object-size.c (pass_through_call): Use gimple_call_return_flags
>   ERF_RETURN*ARG* for builtins other than BUILT_IN_ASSUME_ALIGNED,
>   check for the latter with gimple_call_builtin_p.  Do not handle
>   BUILT_IN_STPNCPY_CHK which is not a pass through call.
> 
>   * gcc.dg/builtin-object-size-18.c: New test.
> 
> --- gcc/tree-object-size.c.jj 2017-10-20 09:16:10.0 +0200
> +++ gcc/tree-object-size.c2017-11-23 16:23:24.544741945 +0100
> @@ -464,34 +464,17 @@ alloc_object_size (const gcall *call, in
>  static tree
>  pass_through_call (const gcall *call)
>  {
> -  tree callee = gimple_call_fndecl (call);
> +  unsigned rf = gimple_call_return_flags (call);
> +  if (rf & ERF_RETURNS_ARG)
> +{
> +  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
> +  if (argnum < gimple_call_num_args (call))
> + return gimple_call_arg (call, argnum);
> +}
>  
> -  if (callee
> -  && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
> -switch (DECL_FUNCTION_CODE (callee))
> -  {
> -  case BUILT_IN_MEMCPY:
> -  case BUILT_IN_MEMMOVE:
> -  case BUILT_IN_MEMSET:
> -  case BUILT_IN_STRCPY:
> -  case BUILT_IN_STRNCPY:
> -  case BUILT_IN_STRCAT:
> -  case BUILT_IN_STRNCAT:
> -  case BUILT_IN_MEMCPY_CHK:
> -  case BUILT_IN_MEMMOVE_CHK:
> -  case BUILT_IN_MEMSET_CHK:
> -  case BUILT_IN_STRCPY_CHK:
> -  case BUILT_IN_STRNCPY_CHK:
> -  case BUILT_IN_STPNCPY_CHK:
> -  case BUILT_IN_STRCAT_CHK:
> -  case BUILT_IN_STRNCAT_CHK:
> -  case BUILT_IN_ASSUME_ALIGNED:
> - if (gimple_call_num_args (call) >= 1)
> -   return gimple_call_arg (call, 0);
> - break;
> -  default:
> - break;
> -  }
> +  /* __builtin_assume_aligned is intentionally not marked RET1.  */
> +  if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED))
> +return gimple_call_arg (call, 0);
>  
>return NULL_TREE;
>  }
> --- gcc/testsuite/gcc.dg/builtin-object-size-18.c.jj  2017-11-23 
> 16:33:25.723389685 +0100
> +++ gcc/testsuite/gcc.dg/builtin-object-size-18.c 2017-11-23 
> 16:34:02.930934647 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* __stpncpy_chk could return buf up to buf + 64, so
> +   the minimum object size might be far smaller than 64.  */
> +/* { dg-final { scan-tree-dump-not "return 64;" "optimized" } } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +
> +size_t
> +foo (const char *p, size_t s, size_t t)
> +{
> +  char buf[64];
> +  char *q = __builtin___stpncpy_chk (buf, p, s, t);
> +  return __builtin_object_size (q, 2);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix ubsan on mingw hosts (PR sanitizer/83014)

2017-11-24 Thread Richard Biener
On Thu, 23 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> From my reading of the PR and looking around, I believe the reason
> ubsan ICEs on mingw hosts is that it is the only host that redefines
> HOST_LONG_LONG_FORMAT to "I64" instead of the usual "ll", but "%I64"
> is nothing pp_format can handle, and indeed this spot in ubsan.c is
> the only place that calls pp_printf with HOST_WIDE_INT_PRINT_DEC
> or similar format specifier.  Instead, other spots use
> pp_unsigned_wide_integer or pp_scalar on top of which it is defined
> that does:
>   sprintf (pp_buffer (PP)->digit_buffer, FORMAT, SCALAR); \
>   pp_string (PP, pp_buffer (PP)->digit_buffer);   \
> and thus uses the host *printf which handles "I64" and appends that
> as a string.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, no way to test
> on mingw.  Ok for trunk? 

Ok.

Richard.

> 2017-11-23  Jakub Jelinek  
> 
>   PR sanitizer/83014
>   * ubsan.c (ubsan_type_descriptor): Use pp_unsigned_wide_integer
>   instead of pp_printf with HOST_WIDE_INT_PRINT_DEC.  Avoid calling
>   tree_to_uhwi twice.
> 
>   * gcc.dg/ubsan/pr83014.c: New test.
> 
> --- gcc/ubsan.c.jj2017-11-13 09:31:33.0 +0100
> +++ gcc/ubsan.c   2017-11-23 11:35:20.452162632 +0100
> @@ -436,10 +436,10 @@ ubsan_type_descriptor (tree type, enum u
> && TYPE_MAX_VALUE (dom) != NULL_TREE
> && TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST)
>   {
> +   unsigned HOST_WIDE_INT m;
> if (tree_fits_uhwi_p (TYPE_MAX_VALUE (dom))
> -   && tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1 != 0)
> - pp_printf (&pretty_name, HOST_WIDE_INT_PRINT_DEC,
> - tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1);
> +   && (m = tree_to_uhwi (TYPE_MAX_VALUE (dom))) + 1 != 0)
> + pp_unsigned_wide_integer (&pretty_name, m + 1);
> else
>   pp_wide_int (&pretty_name,
>wi::add (wi::to_widest (TYPE_MAX_VALUE (dom)), 1),
> --- gcc/testsuite/gcc.dg/ubsan/pr83014.c.jj   2017-11-23 11:52:59.613932074 
> +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr83014.c  2017-11-23 11:53:30.867542456 
> +0100
> @@ -0,0 +1,12 @@
> +/* PR sanitizer/83014 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +int
> +foo (void)
> +{
> +  int data[5];
> +  data[0] = 0;
> +  data[5] = 0;
> +  return data[0];
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][i386,AVX] Enable VBMI2 support [2/7]

2017-11-24 Thread Eric Botcazou
> This seems to break the build for me:
> 
> In file included from
> /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:21:0:
> /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:30165:16: error:
> ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
>   BDESC_VERIFYS (IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST,

Confirmed:

/home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c: In function ‘rtx_def* 
ix86_expand_builtin(tree, rtx, rtx, machine_mode, int)’:
/home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37439:16: error: 
‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
   if (fcode >= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST
^
/home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37440:19: error: 
‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST’ was not declared in this scope
   && fcode <= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST)
   ^
/home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37443:48: error: 
‘bdesc_special_args2’ was not declared in this scope
   return ix86_expand_special_args_builtin (bdesc_special_args2 + i, exp,

-- 
Eric Botcazou


Re: [PATCH][i386,AVX] Enable VBMI2 support [2/7]

2017-11-24 Thread Jakub Jelinek
On Fri, Nov 24, 2017 at 09:34:07AM +0100, Eric Botcazou wrote:
> > This seems to break the build for me:
> > 
> > In file included from
> > /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:21:0:
> > /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:30165:16: error:
> > ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
> >   BDESC_VERIFYS (IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST,
> 
> Confirmed:
> 
> /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c: In function ‘rtx_def* 
> ix86_expand_builtin(tree, rtx, rtx, machine_mode, int)’:
> /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37439:16: error: 
> ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
>if (fcode >= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST
> ^
> /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37440:19: error: 
> ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST’ was not declared in this scope
>&& fcode <= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST)
>^
> /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37443:48: error: 
> ‘bdesc_special_args2’ was not declared in this scope
>return ix86_expand_special_args_builtin (bdesc_special_args2 + i, exp,

It works now, Kirill has committed r255131 and r255132.

Jakub


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-24 Thread Janne Blomqvist
On Thu, Nov 23, 2017 at 7:58 PM, Thomas Koenig  wrote:
> Hi Janne,
>
>> However, to continue my nitpicking (sorry!), it seems that in many
>> cases compare_fcn still takes an integer length argument. Could you
>> make that gfc_charlen_type as well? Or maybe size_t, since the
>> argument is passed straight to memcmp{_char4} anyway? Please consider
>> such a patch pre-approved. Thanks!
>
>
> Committed as r255109.
>
> I had missed out on the non-inlined maxval and maxloc versions...

You still missed two places. Committed r255135 with the fixes.

> The fun with max* and min* intrinsics is not yet over. Maxval and
> Minval have yet to be implemented for character arguments, and then
> there is the BACK argument to MAXLOC.
>
> Maybe (while we are breaking compatibility) we should just add BACK
> to the front end, reject it whith a "not yet implemented" message,
> add the argument to the library and worry about implementation
> later.
>
> What do you think?

Yeah, makes sense I suppose.


-- 
Janne Blomqvist


Re: Simplify ptr - 0

2017-11-24 Thread Marc Glisse

On Thu, 23 Nov 2017, Richard Biener wrote:


What about 0 - ptr?  (ok, that's even more weird)


* match.pd (0-ptr): New transformation.

Regtested on gcc112.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 255118)
+++ gcc/match.pd	(working copy)
@@ -2449,23 +2449,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (minus @0 { tem; })
 
 /* Convert x+x into x*2.  */
 (simplify
  (plus @0 @0)
  (if (SCALAR_FLOAT_TYPE_P (type))
   (mult @0 { build_real (type, dconst2); })
   (if (INTEGRAL_TYPE_P (type))
(mult @0 { build_int_cst (type, 2); }
 
+/* 0 - X  ->  -X.  */
 (simplify
  (minus integer_zerop @1)
  (negate @1))
+(simplify
+ (pointer_diff integer_zerop @1)
+ (negate (convert @1)))
 
 /* (ARG0 - ARG1) is the same as (-ARG1 + ARG0).  So check whether
ARG0 is zero and X + ARG0 reduces to X, since that would mean
(-ARG1 + ARG0) reduces to -ARG1.  */
 (simplify
  (minus real_zerop@0 @1)
  (if (fold_real_zero_addition_p (type, @0, 0))
   (negate @1)))
 
 /* Transform x * -1 into -x.  */


Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-24 Thread Kyrill Tkachov

Hi Christophe,

On 23/11/17 23:26, Christophe Lyon wrote:

On 22 November 2017 at 12:26, Kyrill  Tkachov
 wrote:

Hi Tamar,

On 06/11/17 16:53, Tamar Christina wrote:

Hi All,

This patch adds the NEON intrinsics for Dot product.

Dot product is available from ARMv8.2-a and onwards.

Regtested on arm-none-eabi, armeb-none-eabi,
aarch64-none-elf and aarch64_be-none-elf with no issues found.

Ok for trunk?

gcc/
2017-11-06  Tamar Christina  

 * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)


This should be config/arm/arm_neon.h


 (vdot_s32, vdotq_s32): New.
 (vdot_lane_u32, vdotq_lane_u32): New.
 (vdot_lane_s32, vdotq_lane_s32): New.


gcc/testsuite/
2017-11-06  Tamar Christina  

 * gcc.target/arm/simd/vdot-compile.c: New.
 * gcc.target/arm/simd/vect-dot-qi.h: New.
 * gcc.target/arm/simd/vect-dot-s8.c: New.
 * gcc.target/arm/simd/vect-dot-u8.c: New

--


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index
0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a
100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
   #endif
  +/* Adv.SIMD Dot Product intrinsics.  */

Please no full stop: "AdvSIMD".

+
+#pragma GCC push_options
+#if __ARM_ARCH >= 8
+#pragma GCC target ("arch=armv8.2-a+dotprod")




Not sure if Kyrill actually meant to comment about the three lines
above, but they have a bug:
#if should be before #pragma GCC push_options.


You're right, sorry for missing this :(



Indeed, after this patch was committed (r255064), I've noticed many
regressions, for instance
p64_p128 is now unsupported. This is because the arm_crypto_ok
effective target now fails
with this message:
XXX/arm_neon.h:16911:1: error: inlining failed in call to
always_inline 'vaeseq_u8': target specific option mismatch

Not sure why this wasn't noticed in validations earlier?

Fixed as obvious (r255126).


Thank you for fixing this up.

Kyrill

Christophe


diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
new file mode 100644
index
..90b00aff95cfef96d1963be17673dc191cc71169
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
@@ -0,0 +1,15 @@
+TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+
+__attribute__ ((noinline)) int
+foo1(int len) {
+  int i;
+  TYPE int result = 0;
+  TYPE short prod;
+
+  for (i=0; i



[PATCH] Fix PR83128

2017-11-24 Thread Richard Biener

We somehow forgot to handle STRING_CSTs when looking through memcpy
like functions.

Fixed as follows, bootstrapped and tested on x86_64-unknown-linux-gnu, 
applied.

Richard.

2017-11-24  Richard Biener  

PR tree-optimization/83128
* tree-ssa-sccvn.c (fully_constant_vn_reference_p): Handle STRING_CSTs.
(vn_reference_lookup_3): Likewise.

* gcc.dg/tree-ssa/ssa-fre-62.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 255093)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1388,10 +1388,14 @@ fully_constant_vn_reference_p (vn_refere
   else if (base->opcode == MEM_REF
   && base[1].opcode == ADDR_EXPR
   && (TREE_CODE (TREE_OPERAND (base[1].op0, 0)) == VAR_DECL
-  || TREE_CODE (TREE_OPERAND (base[1].op0, 0)) == CONST_DECL))
+  || TREE_CODE (TREE_OPERAND (base[1].op0, 0)) == CONST_DECL
+  || TREE_CODE (TREE_OPERAND (base[1].op0, 0)) == STRING_CST))
{
  decl = TREE_OPERAND (base[1].op0, 0);
- ctor = ctor_for_folding (decl);
+ if (TREE_CODE (decl) == STRING_CST)
+   ctor = decl;
+ else
+   ctor = ctor_for_folding (decl);
}
   if (ctor == NULL_TREE)
return build_zero_cst (ref->type);
@@ -2331,7 +2335,8 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  rhs = TREE_OPERAND (tem, 0);
  rhs_offset += tree_to_uhwi (TREE_OPERAND (tem, 1));
}
- else if (DECL_P (tem))
+ else if (DECL_P (tem)
+  || TREE_CODE (tem) == STRING_CST)
rhs = build_fold_addr_expr (tem);
  else
return (void *)-1;
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-62.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-62.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-62.c  (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+char f()
+{
+  char *i = (char*)__builtin_malloc(100);
+  __builtin_memcpy(i, "a", 1);
+  return i[0];
+}
+
+/* { dg-final { scan-tree-dump "return 97;" "fre1" } } */


Re: [PATCH][i386,AVX] Enable VBMI2 support [2/7]

2017-11-24 Thread Kirill Yukhin
On 24 Nov 09:35, Jakub Jelinek wrote:
> On Fri, Nov 24, 2017 at 09:34:07AM +0100, Eric Botcazou wrote:
> > > This seems to break the build for me:
> > > 
> > > In file included from
> > > /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:21:0:
> > > /home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:30165:16: error:
> > > ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
> > >   BDESC_VERIFYS (IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST,
> > 
> > Confirmed:
> > 
> > /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c: In function ‘rtx_def* 
> > ix86_expand_builtin(tree, rtx, rtx, machine_mode, int)’:
> > /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37439:16: error: 
> > ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope
> >if (fcode >= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST
> > ^
> > /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37440:19: error: 
> > ‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST’ was not declared in this scope
> >&& fcode <= IX86_BUILTIN__BDESC_SPECIAL_ARGS2_LAST)
> >^
> > /home/eric/gnat/gnat-head/src/gcc/config/i386/i386.c:37443:48: error: 
> > ‘bdesc_special_args2’ was not declared in this scope
> >return ix86_expand_special_args_builtin (bdesc_special_args2 + i, 
> > exp,
> 
> It works now, Kirill has committed r255131 and r255132.
Sorry about that, folks. I've missed a conflict during patch application.

--
Thanks, K
> 
>   Jakub


RE: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-24 Thread Tamar Christina
> >
> Not sure if Kyrill actually meant to comment about the three lines above, but
> they have a bug:
> #if should be before #pragma GCC push_options.
> 
> Indeed, after this patch was committed (r255064), I've noticed many
> regressions, for instance
> p64_p128 is now unsupported. This is because the arm_crypto_ok effective
> target now fails with this message:
> XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline
> 'vaeseq_u8': target specific option mismatch
> 
> Not sure why this wasn't noticed in validations earlier?

I still have the log files for these runs:

It seems that I was comparing the log files instead of the sum files, which do 
not show this difference.

/d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" 
../../build-arm-none-eabi/results.clean/vanilla/gcc.log 
../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log  | grep p64_p128
/d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" 
../../build-arm-none-eabi/results.clean/vanilla/gcc.sum 
../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum  | grep p64_p128
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0 
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  execution test
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  (test for 
excess errors)
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1 
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  execution test
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  (test for 
excess errors)
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O2

Sorry for missing this, I don't even know why these scripts accept the log 
files if they're always going to do the wrong thing.

Anyway thanks for fixing this and I'll make sure I'm using the sum files in the 
future.

Tamar

> 
> Fixed as obvious (r255126).
> 
> Christophe
> 
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> > new file mode 100644
> > index
> >
> ..90b00aff95cfef96d1963be176
> 73
> > dc191cc71169
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> > @@ -0,0 +1,15 @@
> > +TYPE char X[N] __attribute__
> ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > +TYPE char Y[N] __attribute__
> ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > +
> > +__attribute__ ((noinline)) int
> > +foo1(int len) {
> > +  int i;
> > +  TYPE int result = 0;
> > +  TYPE short prod;
> > +
> > +  for (i=0; i > +prod = X[i] * Y[i];
> > +result += prod;
> > +  }
> > +  return result;
> > +}
> > \ No newline at end of file
> >
> > Please add new lines at the end of the new test files.
> > This applies to a few more new files in this patch.
> >
> > Ok with these nits fixed.
> >
> > Thanks,
> > Kyrill
> >


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor  wrote:
> Hi,
>
> On Mon, Nov 13 2017, Richard Biener wrote:
>> The main concern here is that GIMPLE is not very well defined for
>> aggregate copies and that gimple-fold.c happily optimizes
>> memcpy (&a, &b, sizeof (a)) into a = b;
>>
>> struct A { short s; long i; long j; };
>> struct A a, b;
>> void foo ()
>> {
>>   __builtin_memcpy (&a, &b, sizeof (struct A));
>> }
>>
>> gets folded to
>>
>>   MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&b];
>>   return;
>>
>> you see we're careful about TBAA but (don't see that above but
>> can be verified by for example debugging expand_assignment)
>> TREE_TYPE (MEM[...]) is actually 'struct A'.
>>
>> And yes, I've been worried about SRA as well here...  it _does_
>> have some early outs when seeing VIEW_CONVERT_EXPR but
>> appearantly not for the above.  Testcase that aborts with SRA but
>> not without:
>>
>> struct A { short s; long i; long j; };
>> struct A a, b;
>> void foo ()
>> {
>>   struct A c;
>>   __builtin_memcpy (&c, &b, sizeof (struct A));
>>   __builtin_memcpy (&a, &c, sizeof (struct A));
>> }
>> int main()
>> {
>>   __builtin_memset (&b, 0, sizeof (struct A));
>>   b.s = 1;
>>   __builtin_memcpy ((char *)&b+2, &b, 2);
>>   foo ();
>>   __builtin_memcpy (&a, (char *)&a+2, 2);
>>   if (a.s != 1)
>> __builtin_abort ();
>>   return 0;
>> }
>
> Thanks for the testcase, I agree that is a fairly big problem.  Do you
> think that the following (untested) patch is an appropriate way of
> fixing it and generally of extending gimple to capture that a statement
> is a bit-copy?

I think the place to fix is the memcpy folding.  That is, we'd say that
aggregate assignments are not bit-copies but do element-wise assignments.
For memcpy folding we'd then need to use a type that doesn't contain
padding.  Which effectively means char[].

Of course we need to stop SRA from decomposing that copy to
individual characters then ;)

So iff we decide that all aggregate copies are element copies,
maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
(currently we allow TYPE_CANONICAL compatibility and thus there
might be some mismatches), then we have to fix nothign but
the memcpy folding.

> If so, I'll add the testcase, bootstrap it and formally propose it.
> Subsequently I will of course make sure that any element-wise copying
> patch would test the predicate.

I don't think the alias-set should determine whether a copy is
bit-wise or not.

Richard.

> Thanks,
>
> Martin
>
>
> 2017-11-23  Martin Jambor  
>
> * gimple.c (gimple_bit_copy_p): New function.
> * gimple.h (gimple_bit_copy_p): Declare it.
> * tree-sra.c (sra_modify_assign): Use it.
> ---
>  gcc/gimple.c   | 20 
>  gcc/gimple.h   |  1 +
>  gcc/tree-sra.c |  1 +
>  3 files changed, 22 insertions(+)
>
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index c986a732004..e1b428d91bb 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -3087,6 +3087,26 @@ gimple_inexpensive_call_p (gcall *stmt)
>return false;
>  }
>
> +/* Return true if STMT is an assignment performing bit copy and so is also
> +   expected to copy any padding.  */
> +
> +bool
> +gimple_bit_copy_p (gassign *stmt)
> +{
> +  if (!gimple_assign_single_p (stmt))
> +return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (TREE_CODE (lhs) == MEM_REF
> +  && TYPE_REF_CAN_ALIAS_ALL (reference_alias_ptr_type (lhs)))
> +return true;
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) == MEM_REF
> +  && TYPE_REF_CAN_ALIAS_ALL (reference_alias_ptr_type (rhs)))
> +return true;
> +  return false;
> +}
> +
>  #if CHECKING_P
>
>  namespace selftest {
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 334def89398..60929473361 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -1531,6 +1531,7 @@ extern void gimple_seq_discard (gimple_seq);
>  extern void maybe_remove_unused_call_args (struct function *, gimple *);
>  extern bool gimple_inexpensive_call_p (gcall *);
>  extern bool stmt_can_terminate_bb_p (gimple *);
> +extern bool gimple_bit_copy_p (gassign *);
>
>  /* Formal (expression) temporary table handling: multiple occurrences of
> the same scalar expression are evaluated into the same temporary.  */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index db490b20c3e..fc0a8fe60bf 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -3591,6 +3591,7 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>|| gimple_has_volatile_ops (stmt)
>|| contains_vce_or_bfcref_p (rhs)
>|| contains_vce_or_bfcref_p (lhs)
> +  || gimple_bit_copy_p (as_a  (stmt))
>|| stmt_ends_bb_p (stmt))
>  {
>/* No need to copy into a constant-pool, it comes pre-initialized.  */
> --
> 2.15.0
>


Re: [PATCH 1/4] [SPARC] Errata workaround for GRLIB-TN-0012

2017-11-24 Thread Eric Botcazou
> 2017-11-17  Daniel Cederman  
> 
>   * config/sparc/sparc.c (fpop_insn_p): New function.
>   (sparc_do_work_around_errata): Insert NOP instructions to
>   prevent sequences that could trigger the TN-0012 errata for
>   GR712RC.
>   (pass_work_around_errata::gate): Also test sparc_fix_gr712rc.
>   * config/sparc/sparc.md (fix_gr712rc): New attribute.
>   (in_branch_annul_delay): Prevent floating-point instructions
>   in delay slot of annulled integer branch.

OK for mainline and 7 branch modulo the following nits:

> +/* True if floating-point instruction.  */
> +
> +static int
> +fpop_insn_p (rtx_insn *insn)

'bool' instead of 'int'.  "True if INSN is a floating-point instruction."

> +{
> +  if ( GET_CODE (PATTERN (insn)) != SET)
> +return false;

No space before GET_CODE.

>  /* We use a machine specific pass to enable workarounds for errata.
> 
> We need to have the (essentially) final form of the insn stream in order
> @@ -970,11 +995,31 @@ sparc_do_work_around_errata (void)
>  {
>bool insert_nop = false;
>rtx set;
> +  rtx_insn *jump = 0;
> 
>/* Look into the instruction in a delay slot.  */
>if (NONJUMP_INSN_P (insn))
>   if (rtx_sequence *seq = dyn_cast  (PATTERN (insn)))
> -   insn = seq->insn (1);
> +   {
> + jump = seq->insn (0);
> + insn = seq->insn (1);
> +   }

This should be changed into:

  rtx_insn jump;

  /* Look into the instruction in a delay slot.  */
  if (NONJUMP_INSN_P (insn)
  && (rtx_sequence *seq = dyn_cast  (PATTERN (insn)))
{
  jump = seq->insn (0);
  insn = seq->insn (1);
}
  else if (JUMP_P (insn))
jump = insn
  else
jump = NULL_RTX;

and the block below simplified accordingly.

> +  /* Place a NOP at the branch target of an integer branch if it is
> +  a floating-point operation or a floating-point branch.  */
> +  if (sparc_fix_gr712rc
> +   && (JUMP_P (insn) || jump)
> +   && get_attr_branch_type (jump ? jump : insn) == BRANCH_TYPE_ICC)
> + {
> +   rtx_insn *target;
> +
> +   target = next_active_insn (JUMP_LABEL_AS_INSN (jump ? jump : insn));

  rtx_insn *target = next_active_insn (JUMP_LABEL_AS_INSN (jump));

>/* Look for either of these two sequences:
> 
> @@ -1303,7 +1348,8 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return sparc_fix_at697f || sparc_fix_ut699 || sparc_fix_b2bst;
> +  return sparc_fix_at697f || sparc_fix_ut699 || sparc_fix_b2bst
> + || sparc_fix_gr712rc;
>  }

"|| sparc_fix_gr712rc" lined up under sparc_fix_at697f.

> @@ -602,6 +615,10 @@
>  (define_delay (eq_attr "type" "branch")
>[(eq_attr "in_branch_delay" "true") (nil) (eq_attr "in_branch_delay"
> "true")])
> 
> +(define_delay (and (eq_attr "type" "branch") (eq_attr "branch_type" "icc"))
> +  [(eq_attr "in_branch_delay" "true") (nil)
> +  (eq_attr "in_branch_annul_delay" "true")])
> +

I think that we'd better keep the various define_delay's mutually exclusive.

-- 
Eric Botcazou


Re: [PATCH 2/4] [SPARC] Errata workaround for GRLIB-TN-0011

2017-11-24 Thread Eric Botcazou
> 2017-11-17  Daniel Cederman  
> 
>   * config/sparc/sync.md (swapsi): 16-byte align if sparc_fix_gr712rc.
>   (atomic_compare_and_swap_leon3_1): Likewise.
>   (ldstub): Likewise.

You need to adjust the "length" attribute accordingly (and conservatively), 
see atomic_compare_and_swapdi_v8plus for reference.

-- 
Eric Botcazou


Re: [PATCH 3/4] [SPARC] Errata workaround for GRLIB-TN-0010

2017-11-24 Thread Eric Botcazou
> 2017-11-17  Daniel Cederman  
> 
>   * config/sparc/sparc.c (atomic_insn_p): New function.
>   (sparc_do_work_around_errata): Insert NOP instructions to
>   prevent sequences that could trigger the TN-0010 errata for
>   UT700.
>   * config/sparc/sync.md (atomic_compare_and_swap_leon3_1): Make
>   instruction referable in atomic_insns_p.

OK for mainline and 7 branch modulo the following nits:

> +/* True if complex atomic instruction.  */
> +
> +static int
> +atomic_insn_p (rtx_insn *insn)

'bool' instead of 'int'.  "True if INSN is an atomic instruction."

I'd rename the function into "atomic_insn_for_leon3_p" since you're handling 
only the LEON3 instructions.

> @@ -1021,6 +1037,33 @@ sparc_do_work_around_errata (void
>   emit_insn_before (gen_nop (), target);
>   }
> 
> +  /* Insert a NOP between load instruction and atomic
> +  instruction. Insert a NOP at branch target if load
> +  in delay slot and atomic instruction at branch target.  */

Double space after colon.

> +  if (sparc_fix_ut700
> +   && NONJUMP_INSN_P (insn)
> +   && (set = single_set (insn)) != NULL_RTX
> +   && MEM_P (SET_SRC (set))
> +   && REG_P (SET_DEST (set)))
> + {
> +   if (jump)
> + {
> +   rtx_insn *target;
> +
> +   target = next_active_insn (JUMP_LABEL_AS_INSN (jump));

  rtx_insn *target
= next_active_insn (JUMP_LABEL_AS_INSN (jump));


> +   if (target
> +   && atomic_insn_p (target))

On a single line.

-- 
Eric Botcazou


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
 wrote:
> On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor  wrote:
>> Hi,
>>
>> On Mon, Nov 13 2017, Richard Biener wrote:
>>> The main concern here is that GIMPLE is not very well defined for
>>> aggregate copies and that gimple-fold.c happily optimizes
>>> memcpy (&a, &b, sizeof (a)) into a = b;
>>>
>>> struct A { short s; long i; long j; };
>>> struct A a, b;
>>> void foo ()
>>> {
>>>   __builtin_memcpy (&a, &b, sizeof (struct A));
>>> }
>>>
>>> gets folded to
>>>
>>>   MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&b];
>>>   return;
>>>
>>> you see we're careful about TBAA but (don't see that above but
>>> can be verified by for example debugging expand_assignment)
>>> TREE_TYPE (MEM[...]) is actually 'struct A'.
>>>
>>> And yes, I've been worried about SRA as well here...  it _does_
>>> have some early outs when seeing VIEW_CONVERT_EXPR but
>>> appearantly not for the above.  Testcase that aborts with SRA but
>>> not without:
>>>
>>> struct A { short s; long i; long j; };
>>> struct A a, b;
>>> void foo ()
>>> {
>>>   struct A c;
>>>   __builtin_memcpy (&c, &b, sizeof (struct A));
>>>   __builtin_memcpy (&a, &c, sizeof (struct A));
>>> }
>>> int main()
>>> {
>>>   __builtin_memset (&b, 0, sizeof (struct A));
>>>   b.s = 1;
>>>   __builtin_memcpy ((char *)&b+2, &b, 2);
>>>   foo ();
>>>   __builtin_memcpy (&a, (char *)&a+2, 2);
>>>   if (a.s != 1)
>>> __builtin_abort ();
>>>   return 0;
>>> }
>>
>> Thanks for the testcase, I agree that is a fairly big problem.  Do you
>> think that the following (untested) patch is an appropriate way of
>> fixing it and generally of extending gimple to capture that a statement
>> is a bit-copy?
>
> I think the place to fix is the memcpy folding.  That is, we'd say that
> aggregate assignments are not bit-copies but do element-wise assignments.
> For memcpy folding we'd then need to use a type that doesn't contain
> padding.  Which effectively means char[].
>
> Of course we need to stop SRA from decomposing that copy to
> individual characters then ;)
>
> So iff we decide that all aggregate copies are element copies,
> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
> (currently we allow TYPE_CANONICAL compatibility and thus there
> might be some mismatches), then we have to fix nothign but
> the memcpy folding.
>
>> If so, I'll add the testcase, bootstrap it and formally propose it.
>> Subsequently I will of course make sure that any element-wise copying
>> patch would test the predicate.
>
> I don't think the alias-set should determine whether a copy is
> bit-wise or not.

Like the attached.  At least FAILs

FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
"memcpy[^\n]*123456" 2 (found 0 times)

not sure why we have this test.

Richard.


p3-2
Description: Binary data


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
 wrote:
> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>  wrote:
>> On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor  wrote:
>>> Hi,
>>>
>>> On Mon, Nov 13 2017, Richard Biener wrote:
 The main concern here is that GIMPLE is not very well defined for
 aggregate copies and that gimple-fold.c happily optimizes
 memcpy (&a, &b, sizeof (a)) into a = b;

 struct A { short s; long i; long j; };
 struct A a, b;
 void foo ()
 {
   __builtin_memcpy (&a, &b, sizeof (struct A));
 }

 gets folded to

   MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&b];
   return;

 you see we're careful about TBAA but (don't see that above but
 can be verified by for example debugging expand_assignment)
 TREE_TYPE (MEM[...]) is actually 'struct A'.

 And yes, I've been worried about SRA as well here...  it _does_
 have some early outs when seeing VIEW_CONVERT_EXPR but
 appearantly not for the above.  Testcase that aborts with SRA but
 not without:

 struct A { short s; long i; long j; };
 struct A a, b;
 void foo ()
 {
   struct A c;
   __builtin_memcpy (&c, &b, sizeof (struct A));
   __builtin_memcpy (&a, &c, sizeof (struct A));
 }
 int main()
 {
   __builtin_memset (&b, 0, sizeof (struct A));
   b.s = 1;
   __builtin_memcpy ((char *)&b+2, &b, 2);
   foo ();
   __builtin_memcpy (&a, (char *)&a+2, 2);
   if (a.s != 1)
 __builtin_abort ();
   return 0;
 }
>>>
>>> Thanks for the testcase, I agree that is a fairly big problem.  Do you
>>> think that the following (untested) patch is an appropriate way of
>>> fixing it and generally of extending gimple to capture that a statement
>>> is a bit-copy?
>>
>> I think the place to fix is the memcpy folding.  That is, we'd say that
>> aggregate assignments are not bit-copies but do element-wise assignments.
>> For memcpy folding we'd then need to use a type that doesn't contain
>> padding.  Which effectively means char[].
>>
>> Of course we need to stop SRA from decomposing that copy to
>> individual characters then ;)
>>
>> So iff we decide that all aggregate copies are element copies,
>> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
>> (currently we allow TYPE_CANONICAL compatibility and thus there
>> might be some mismatches), then we have to fix nothign but
>> the memcpy folding.
>>
>>> If so, I'll add the testcase, bootstrap it and formally propose it.
>>> Subsequently I will of course make sure that any element-wise copying
>>> patch would test the predicate.
>>
>> I don't think the alias-set should determine whether a copy is
>> bit-wise or not.
>
> Like the attached.  At least FAILs
>
> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
> "memcpy[^\n]*123456" 2 (found 0 times)
>
> not sure why we have this test.

Hum.  And SRA still decomposes the copy to struct elements w/o padding
even though the access is done using char[].  So somehow it ignores
VIEW_CONVERT_EXPRs
(well, those implicitely present on MEM_REFs).

Looks like this is because total scalarization is done on the decls
and not at all
honoring how the variable is accessed?  The following seems to fix
that, otherwise untested.

Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  (revision 255137)
+++ gcc/tree-sra.c  (working copy)
@@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
 {
   racc->grp_assignment_read = 1;
   if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
- && !is_gimple_reg_type (racc->type))
+ && !is_gimple_reg_type (racc->type)
+ && (TYPE_MAIN_VARIANT (racc->type)
+ == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base
bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
   if (storage_order_barrier_p (lhs))
racc->grp_unscalarizable_region = 1;


I'm giving this full testing with the folding fix.

Richard.

> Richard.


Re: [PATCH 4/4] [SPARC] Errata workaround for GRLIB-TN-0013

2017-11-24 Thread Eric Botcazou
> 2017-11-17  Daniel Cederman  
> 
>   * config/sparc/sparc.c (fpop_reg_depend_p): New function.
>   (div_sqrt_insn_p): New function.
>   (sparc_do_work_around_errata): Insert NOP instructions to
>   prevent sequences that could trigger the TN-0013 errata for
>   certain LEON3 processors.
>   (pass_work_around_errata::gate): Also test sparc_fix_tn0013.
>   (sparc_option_override): Set sparc_fix_tn0013 appropriately.
>   * config/sparc/sparc.md (fix_tn0013): New attribute.
>   (in_branch_delay): Prevent div and sqrt in delay slot if fix_tn0013.
>   * config/sparc/sparc.opt (sparc_fix_tn0013: New variable.

OK for mainline and 7 branch modulo the following nits:

> +/* True if any of the floating-point instruction's source
> +   registers is the same as the provided register.  */
> +
> +static int
> fpop_reg_depend_p (rtx_insn *insn, unsigned int reg)

'bool' instead of 'int'.  "True if any of INSN's source register(s) is REG".

The function's name is awkward.  insn_uses_reg_p?  insn_reads_reg_p?

> +/* True if the instruction is floating-point division or
> +   floating-point square-root.  */
> +
> +static int
> +div_sqrt_insn_p (rtx_insn *insn)

'bool' instead of 'int'. "True if INSN is a floating-point ..."

> +  if ( GET_CODE (PATTERN (insn)) != SET)
> +return false;

No space before GET_CODE.

> @@ -1064,6 +1097,80 @@ sparc_do_work_around_errata (void)
>   insert_nop = true;
>   }
> 
> +  /* Look for sequences that could trigger the TN-0013 errata.  */
> +  if (sparc_fix_tn0013
> +   && NONJUMP_INSN_P (insn)
> +   && div_sqrt_insn_p (insn))
> + {
> +   int i;
> +   int fp_found = 0;
> +   unsigned int dest_reg;
> +   rtx_insn *after;
> +
> +   dest_reg = REGNO (SET_DEST (single_set (insn)));

  const unsigned int dest_reg = REGNO (SET_DEST (single_set (insn)));

> +   next = next_active_insn (insn);
> +   if (!next)
> + break;
> +
> +   for (after = next, i = 0; i < 4; i++)
> + {
> +   /* Count floating-point operations.  */
> +   if (i != 3
> +   && fpop_insn_p (after))

On a single line.

> +   /* If the insn uses the destination register of
> +  the div/sqrt, then it cannot be problematic.  */
> +   if (fpop_reg_depend_p (after, dest_reg))
> + break;
> +   fp_found++;
> + }
> +
> +   /* Count floating-point loads.  */
> +   if (i != 3
> +   && (set = single_set (after)) != NULL_RTX
> +   && REG_P (SET_DEST (set))
> +   && REGNO (SET_DEST (set)) > 31)
> + {
> +   /* If the insn uses the destination register of
> +  the div/sqrt, then it cannot be problematic.  */
> +   if (REGNO (SET_DEST (set)) == dest_reg)
> + break;
> +   fp_found++;
> + }
> +
> +   /* Check if this is a problematic sequence.  */
> +   if (i > 1
> +   && fp_found >= 2
> +   && div_sqrt_insn_p (after))
> + {
> +   /* Add extra NOP to prevent second version of
> +  problematic sequence.  */
> +   if (i == 2)
> + emit_insn_before (gen_nop (), next);
> +   insert_nop = true;
> +   break;
> + }

So you emit 2 NOPs in a row if i == 2?

> diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
> index 22267f5..43cd964 100644
> --- a/gcc/config/sparc/sparc.opt
> +++ b/gcc/config/sparc/sparc.opt
> @@ -253,6 +253,10 @@ Enable workarounds for the errata of the GR712RC
> processor. TargetVariable
>  unsigned int sparc_fix_b2bst
> 
> +;; Enable workaround for TN-0013 errata
> +TargetVariable
> +unsigned int sparc_fix_tn0013

Can we find a more evocative name for the variable?

-- 
Eric Botcazou


Fix bootstrap miscompare when stage1 is built with different chekcing setting

2017-11-24 Thread Jan Hubicka
Hi,
I have disabled inliner cache sanity check for profile feedback because profile
count scaling now can cause small roundoff deviations (a cost we had to pay for
dropping separate frequencies I guess). While doing so I missed that the code
block must set current_badness even when not doing any checks.

This fixes the IA64 bootstrap issue reported by Andreas (which is triggered
by his setup using release checking for stage1 cc1).

Bootstrapped/regtested x86_64-linux, comitted.

Honza

PR bootstrap/83015
* ipa-inline.c (inline_small_functions): Set current badnes correctly
when skipping checking.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 255103)
+++ ipa-inline.c(working copy)
@@ -1865,6 +1865,8 @@ inline_small_functions (void)
  gcc_assert (cached_badness == current_badness);
  gcc_assert (current_badness >= badness);
}
+  else
+current_badness = edge_badness (edge, false);
 #else
   current_badness = edge_badness (edge, false);
 #endif


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Martin Jambor
Hi Richi,

On Fri, Nov 24 2017, Richard Biener wrote:
> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>  wrote:
>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener

..

> And yes, I've been worried about SRA as well here...  it _does_
> have some early outs when seeing VIEW_CONVERT_EXPR but
> appearantly not for the above.  Testcase that aborts with SRA but
> not without:
>
> struct A { short s; long i; long j; };
> struct A a, b;
> void foo ()
> {
>   struct A c;
>   __builtin_memcpy (&c, &b, sizeof (struct A));
>   __builtin_memcpy (&a, &c, sizeof (struct A));
> }
> int main()
> {
>   __builtin_memset (&b, 0, sizeof (struct A));
>   b.s = 1;
>   __builtin_memcpy ((char *)&b+2, &b, 2);
>   foo ();
>   __builtin_memcpy (&a, (char *)&a+2, 2);
>   if (a.s != 1)
> __builtin_abort ();
>   return 0;
> }

 Thanks for the testcase, I agree that is a fairly big problem.  Do you
 think that the following (untested) patch is an appropriate way of
 fixing it and generally of extending gimple to capture that a statement
 is a bit-copy?
>>>
>>> I think the place to fix is the memcpy folding.  That is, we'd say that
>>> aggregate assignments are not bit-copies but do element-wise assignments.
>>> For memcpy folding we'd then need to use a type that doesn't contain
>>> padding.  Which effectively means char[].
>>>
>>> Of course we need to stop SRA from decomposing that copy to
>>> individual characters then ;)
>>>
>>> So iff we decide that all aggregate copies are element copies,
>>> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
>>> (currently we allow TYPE_CANONICAL compatibility and thus there
>>> might be some mismatches), then we have to fix nothign but
>>> the memcpy folding.
>>>
 If so, I'll add the testcase, bootstrap it and formally propose it.
 Subsequently I will of course make sure that any element-wise copying
 patch would test the predicate.
>>>
>>> I don't think the alias-set should determine whether a copy is
>>> bit-wise or not.
>>
>> Like the attached.  At least FAILs
>>
>> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
>> "memcpy[^\n]*123456" 2 (found 0 times)
>>
>> not sure why we have this test.
>
> Hum.  And SRA still decomposes the copy to struct elements w/o padding
> even though the access is done using char[].  So somehow it ignores
> VIEW_CONVERT_EXPRs
> (well, those implicitely present on MEM_REFs).

Yes.  SRA is not even too afraid of top-level V_C_Es.  It really bails
out only if they are buried under a handled_component.  And it does not
remove aggregate assignments containing them.

>
> Looks like this is because total scalarization is done on the decls
> and not at all
> honoring how the variable is accessed?  The following seems to fix
> that, otherwise untested.


>
> Index: gcc/tree-sra.c
> ===
> --- gcc/tree-sra.c  (revision 255137)
> +++ gcc/tree-sra.c  (working copy)
> @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
>  {
>racc->grp_assignment_read = 1;
>if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
> - && !is_gimple_reg_type (racc->type))
> + && !is_gimple_reg_type (racc->type)
> + && (TYPE_MAIN_VARIANT (racc->type)
> + == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base
> bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>if (storage_order_barrier_p (lhs))
> racc->grp_unscalarizable_region = 1;

I believe that the added condition is not what you want, this seems
to trigger also for ordinary:

  s1 = s2.field

Where racc->type is type of the field but racc->base is s2 and its type
is type of the structure.   

I also think you want to be setting a bit in
cannot_scalarize_away_bitmap in order to guarantee that total
scalarization will not happen for the given candidate.  Otherwise some
other regular assignment might trigger it ...except if we then also
checked the statement for bit-copying types in sra_modify_assign (in the
condition after the big comment), which I suppose is actually the
correct thing to do.

Thanks a lot for the folding patch, I can take over the SRA bits if you
want to.

Martin



Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor  wrote:
> Hi Richi,
>
> On Fri, Nov 24 2017, Richard Biener wrote:
>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>>  wrote:
>>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>
> ..
>
>> And yes, I've been worried about SRA as well here...  it _does_
>> have some early outs when seeing VIEW_CONVERT_EXPR but
>> appearantly not for the above.  Testcase that aborts with SRA but
>> not without:
>>
>> struct A { short s; long i; long j; };
>> struct A a, b;
>> void foo ()
>> {
>>   struct A c;
>>   __builtin_memcpy (&c, &b, sizeof (struct A));
>>   __builtin_memcpy (&a, &c, sizeof (struct A));
>> }
>> int main()
>> {
>>   __builtin_memset (&b, 0, sizeof (struct A));
>>   b.s = 1;
>>   __builtin_memcpy ((char *)&b+2, &b, 2);
>>   foo ();
>>   __builtin_memcpy (&a, (char *)&a+2, 2);
>>   if (a.s != 1)
>> __builtin_abort ();
>>   return 0;
>> }
>
> Thanks for the testcase, I agree that is a fairly big problem.  Do you
> think that the following (untested) patch is an appropriate way of
> fixing it and generally of extending gimple to capture that a statement
> is a bit-copy?

 I think the place to fix is the memcpy folding.  That is, we'd say that
 aggregate assignments are not bit-copies but do element-wise assignments.
 For memcpy folding we'd then need to use a type that doesn't contain
 padding.  Which effectively means char[].

 Of course we need to stop SRA from decomposing that copy to
 individual characters then ;)

 So iff we decide that all aggregate copies are element copies,
 maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
 (currently we allow TYPE_CANONICAL compatibility and thus there
 might be some mismatches), then we have to fix nothign but
 the memcpy folding.

> If so, I'll add the testcase, bootstrap it and formally propose it.
> Subsequently I will of course make sure that any element-wise copying
> patch would test the predicate.

 I don't think the alias-set should determine whether a copy is
 bit-wise or not.
>>>
>>> Like the attached.  At least FAILs
>>>
>>> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
>>> "memcpy[^\n]*123456" 2 (found 0 times)
>>>
>>> not sure why we have this test.
>>
>> Hum.  And SRA still decomposes the copy to struct elements w/o padding
>> even though the access is done using char[].  So somehow it ignores
>> VIEW_CONVERT_EXPRs
>> (well, those implicitely present on MEM_REFs).
>
> Yes.  SRA is not even too afraid of top-level V_C_Es.  It really bails
> out only if they are buried under a handled_component.  And it does not
> remove aggregate assignments containing them.
>
>>
>> Looks like this is because total scalarization is done on the decls
>> and not at all
>> honoring how the variable is accessed?  The following seems to fix
>> that, otherwise untested.
>
>
>>
>> Index: gcc/tree-sra.c
>> ===
>> --- gcc/tree-sra.c  (revision 255137)
>> +++ gcc/tree-sra.c  (working copy)
>> @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
>>  {
>>racc->grp_assignment_read = 1;
>>if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
>> - && !is_gimple_reg_type (racc->type))
>> + && !is_gimple_reg_type (racc->type)
>> + && (TYPE_MAIN_VARIANT (racc->type)
>> + == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base
>> bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>>if (storage_order_barrier_p (lhs))
>> racc->grp_unscalarizable_region = 1;
>
> I believe that the added condition is not what you want, this seems
> to trigger also for ordinary:
>
>   s1 = s2.field
>
> Where racc->type is type of the field but racc->base is s2 and its type
> is type of the structure.

Yes.  But do we want to totally scalarize s2 in this case?  We only
access parts of it.  We don't seem to have a testcase that fails
(well, full testing still in progress).

> I also think you want to be setting a bit in
> cannot_scalarize_away_bitmap in order to guarantee that total
> scalarization will not happen for the given candidate.  Otherwise some
> other regular assignment might trigger it ...

Yeah, figured that out myself.

> except if we then also
> checked the statement for bit-copying types in sra_modify_assign (in the
> condition after the big comment), which I suppose is actually the
> correct thing to do.

But modification is too late, no?

> Thanks a lot for the folding patch, I can take over the SRA bits if you
> want to.

For reference below is the full patch.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for the SRA parts?

Thanks,
Richard.

2017-11-24  Richard Biener  

PR tree-optimization/83141

Re: Simplify ptr - 0

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 10:18 AM, Marc Glisse  wrote:
> On Thu, 23 Nov 2017, Richard Biener wrote:
>
>> What about 0 - ptr?  (ok, that's even more weird)
>
>
> * match.pd (0-ptr): New transformation.
>
> Regtested on gcc112.

Ok.

Thanks,
Richard.

> --
> Marc Glisse
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 255118)
> +++ gcc/match.pd(working copy)
> @@ -2449,23 +2449,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (minus @0 { tem; })
>
>  /* Convert x+x into x*2.  */
>  (simplify
>   (plus @0 @0)
>   (if (SCALAR_FLOAT_TYPE_P (type))
>(mult @0 { build_real (type, dconst2); })
>(if (INTEGRAL_TYPE_P (type))
> (mult @0 { build_int_cst (type, 2); }
>
> +/* 0 - X  ->  -X.  */
>  (simplify
>   (minus integer_zerop @1)
>   (negate @1))
> +(simplify
> + (pointer_diff integer_zerop @1)
> + (negate (convert @1)))
>
>  /* (ARG0 - ARG1) is the same as (-ARG1 + ARG0).  So check whether
> ARG0 is zero and X + ARG0 reduces to X, since that would mean
> (-ARG1 + ARG0) reduces to -ARG1.  */
>  (simplify
>   (minus real_zerop@0 @1)
>   (if (fold_real_zero_addition_p (type, @0, 0))
>(negate @1)))
>
>  /* Transform x * -1 into -x.  */
>


[PATCH] [pr#83069] Keep profile_count for bb under real_bb_freq_max

2017-11-24 Thread Siddhesh Poyarekar
freq_max < 1, i.e. highest frequency among bbs in the function being
higher than real_bb_freq_max means that the bb ends up with a profile
count larger than real_bb_freq_max and then can go all the way up to
and beyond profile_count::max_count.

Bootstrapped on aarch64, testsuite in progress.

* gcc/predict.c (estimate_bb_frequencies): Don't reset freq_max.
* gcc/testsuite/gcc.dg/pr83069.c: New test case.

---
 gcc/predict.c  |  2 --
 gcc/testsuite/gcc.dg/pr83069.c | 15 +++
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr83069.c

diff --git a/gcc/predict.c b/gcc/predict.c
index 0f34956..ff9b5a9 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -3613,8 +3613,6 @@ estimate_bb_frequencies (bool force)
  freq_max = BLOCK_INFO (bb)->frequency;
 
   freq_max = real_bb_freq_max / freq_max;
-  if (freq_max < 16)
-   freq_max = 16;
   profile_count ipa_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa ();
   cfun->cfg->count_max = profile_count::uninitialized ();
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
diff --git a/gcc/testsuite/gcc.dg/pr83069.c b/gcc/testsuite/gcc.dg/pr83069.c
new file mode 100644
index 000..d43d78d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83069.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+void
+foo (unsigned long *res, unsigned long in)
+{
+  for (unsigned long a = 0; a < 98; a++)
+for (unsigned long b = 0; b < 98; b++)
+  for (unsigned long c = 0; c < 98; c++)
+   for (unsigned long d = 0; d < 98; d++)
+ for (unsigned long e = 0; e < 98; e++)
+   for (unsigned long f = 0; f < 98; f++)
+ for (unsigned long g = 0; g < 98; g++)
+   *res += a * in;
+}
-- 
2.7.4



[PATCH] Fix PR82402

2017-11-24 Thread Richard Biener

The following fixes a missing SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-11-24  Richard Biener  

PR tree-optimization/82402
* tree-vect-loop-manip.c (create_lcssa_for_virtual_phi): Properly
set SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

* gcc.dg/torture/pr82402.c: New testcase.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  (revision 255137)
+++ gcc/tree-vect-loop-manip.c  (working copy)
@@ -639,6 +639,8 @@ create_lcssa_for_virtual_phi (struct loo
gimple *stmt;
use_operand_p use_p;
 
+   SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_vop)
+ = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (vop);
add_phi_arg (new_phi, vop, exit_e, UNKNOWN_LOCATION);
gimple_phi_set_result (new_phi, new_vop);
FOR_EACH_IMM_USE_STMT (stmt, imm_iter, vop)
Index: gcc/testsuite/gcc.dg/torture/pr82402.c
===
--- gcc/testsuite/gcc.dg/torture/pr82402.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr82402.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+typedef int jmp_buf[1];
+
+extern void exit(int) __attribute__((__noreturn__));
+extern int setjmpx(jmp_buf) __attribute__((__returns_twice__));
+
+jmp_buf jbAnagram;
+int a[6];
+int d;
+int b () { exit (1); }
+int c () { b (); }
+int e ()
+{
+  int f = 0;
+  for (; f < 6; f++)
+a[f] = d;
+  c ();
+  setjmpx (jbAnagram);
+}


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Martin Jambor
On Fri, Nov 24 2017, Richard Biener wrote:
> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor  wrote:
>> Hi Richi,
>>
>> On Fri, Nov 24 2017, Richard Biener wrote:
>>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>>>  wrote:
 On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>>
>> ..
>>
>>> And yes, I've been worried about SRA as well here...  it _does_
>>> have some early outs when seeing VIEW_CONVERT_EXPR but
>>> appearantly not for the above.  Testcase that aborts with SRA but
>>> not without:
>>>
>>> struct A { short s; long i; long j; };
>>> struct A a, b;
>>> void foo ()
>>> {
>>>   struct A c;
>>>   __builtin_memcpy (&c, &b, sizeof (struct A));
>>>   __builtin_memcpy (&a, &c, sizeof (struct A));
>>> }
>>> int main()
>>> {
>>>   __builtin_memset (&b, 0, sizeof (struct A));
>>>   b.s = 1;
>>>   __builtin_memcpy ((char *)&b+2, &b, 2);
>>>   foo ();
>>>   __builtin_memcpy (&a, (char *)&a+2, 2);
>>>   if (a.s != 1)
>>> __builtin_abort ();
>>>   return 0;
>>> }
>>
>> Thanks for the testcase, I agree that is a fairly big problem.  Do you
>> think that the following (untested) patch is an appropriate way of
>> fixing it and generally of extending gimple to capture that a statement
>> is a bit-copy?
>
> I think the place to fix is the memcpy folding.  That is, we'd say that
> aggregate assignments are not bit-copies but do element-wise assignments.
> For memcpy folding we'd then need to use a type that doesn't contain
> padding.  Which effectively means char[].
>
> Of course we need to stop SRA from decomposing that copy to
> individual characters then ;)
>
> So iff we decide that all aggregate copies are element copies,
> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
> (currently we allow TYPE_CANONICAL compatibility and thus there
> might be some mismatches), then we have to fix nothign but
> the memcpy folding.
>
>> If so, I'll add the testcase, bootstrap it and formally propose it.
>> Subsequently I will of course make sure that any element-wise copying
>> patch would test the predicate.
>
> I don't think the alias-set should determine whether a copy is
> bit-wise or not.

 Like the attached.  At least FAILs

 FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
 "memcpy[^\n]*123456" 2 (found 0 times)

 not sure why we have this test.
>>>
>>> Hum.  And SRA still decomposes the copy to struct elements w/o padding
>>> even though the access is done using char[].  So somehow it ignores
>>> VIEW_CONVERT_EXPRs
>>> (well, those implicitely present on MEM_REFs).
>>
>> Yes.  SRA is not even too afraid of top-level V_C_Es.  It really bails
>> out only if they are buried under a handled_component.  And it does not
>> remove aggregate assignments containing them.
>>
>>>
>>> Looks like this is because total scalarization is done on the decls
>>> and not at all
>>> honoring how the variable is accessed?  The following seems to fix
>>> that, otherwise untested.
>>
>>
>>>
>>> Index: gcc/tree-sra.c
>>> ===
>>> --- gcc/tree-sra.c  (revision 255137)
>>> +++ gcc/tree-sra.c  (working copy)
>>> @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
>>>  {
>>>racc->grp_assignment_read = 1;
>>>if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
>>> - && !is_gimple_reg_type (racc->type))
>>> + && !is_gimple_reg_type (racc->type)
>>> + && (TYPE_MAIN_VARIANT (racc->type)
>>> + == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base
>>> bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID 
>>> (racc->base));
>>>if (storage_order_barrier_p (lhs))
>>> racc->grp_unscalarizable_region = 1;
>>
>> I believe that the added condition is not what you want, this seems
>> to trigger also for ordinary:
>>
>>   s1 = s2.field
>>
>> Where racc->type is type of the field but racc->base is s2 and its type
>> is type of the structure.
>
> Yes.  But do we want to totally scalarize s2 in this case?  We only
> access parts of it.  We don't seem to have a testcase that fails
> (well, full testing still in progress).

If we start with

small_s1 = src;
dst = small_s1->even_smaller_struct_field;

and small_s1 is otherwise unused, I think that we want to facilitate
copy propagation with total scalarization too.

>
>> I also think you want to be setting a bit in
>> cannot_scalarize_away_bitmap in order to guarantee that total
>> scalarization will not happen for the given candidate.  Otherwise some
>> other regular assignment might trigger it ...
>
> Yeah, figured that out myself.
>
>> except if we then also
>> checked the statement for bit-copying types in sra_modify_assign (in the
>> condition after the big comment), which I suppos

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 2:00 PM, Martin Jambor  wrote:
> On Fri, Nov 24 2017, Richard Biener wrote:
>> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor  wrote:
>>> Hi Richi,
>>>
>>> On Fri, Nov 24 2017, Richard Biener wrote:
 On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
  wrote:
> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>>>
>>> ..
>>>
 And yes, I've been worried about SRA as well here...  it _does_
 have some early outs when seeing VIEW_CONVERT_EXPR but
 appearantly not for the above.  Testcase that aborts with SRA but
 not without:

 struct A { short s; long i; long j; };
 struct A a, b;
 void foo ()
 {
   struct A c;
   __builtin_memcpy (&c, &b, sizeof (struct A));
   __builtin_memcpy (&a, &c, sizeof (struct A));
 }
 int main()
 {
   __builtin_memset (&b, 0, sizeof (struct A));
   b.s = 1;
   __builtin_memcpy ((char *)&b+2, &b, 2);
   foo ();
   __builtin_memcpy (&a, (char *)&a+2, 2);
   if (a.s != 1)
 __builtin_abort ();
   return 0;
 }
>>>
>>> Thanks for the testcase, I agree that is a fairly big problem.  Do you
>>> think that the following (untested) patch is an appropriate way of
>>> fixing it and generally of extending gimple to capture that a statement
>>> is a bit-copy?
>>
>> I think the place to fix is the memcpy folding.  That is, we'd say that
>> aggregate assignments are not bit-copies but do element-wise assignments.
>> For memcpy folding we'd then need to use a type that doesn't contain
>> padding.  Which effectively means char[].
>>
>> Of course we need to stop SRA from decomposing that copy to
>> individual characters then ;)
>>
>> So iff we decide that all aggregate copies are element copies,
>> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
>> (currently we allow TYPE_CANONICAL compatibility and thus there
>> might be some mismatches), then we have to fix nothign but
>> the memcpy folding.
>>
>>> If so, I'll add the testcase, bootstrap it and formally propose it.
>>> Subsequently I will of course make sure that any element-wise copying
>>> patch would test the predicate.
>>
>> I don't think the alias-set should determine whether a copy is
>> bit-wise or not.
>
> Like the attached.  At least FAILs
>
> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
> "memcpy[^\n]*123456" 2 (found 0 times)
>
> not sure why we have this test.

 Hum.  And SRA still decomposes the copy to struct elements w/o padding
 even though the access is done using char[].  So somehow it ignores
 VIEW_CONVERT_EXPRs
 (well, those implicitely present on MEM_REFs).
>>>
>>> Yes.  SRA is not even too afraid of top-level V_C_Es.  It really bails
>>> out only if they are buried under a handled_component.  And it does not
>>> remove aggregate assignments containing them.
>>>

 Looks like this is because total scalarization is done on the decls
 and not at all
 honoring how the variable is accessed?  The following seems to fix
 that, otherwise untested.
>>>
>>>

 Index: gcc/tree-sra.c
 ===
 --- gcc/tree-sra.c  (revision 255137)
 +++ gcc/tree-sra.c  (working copy)
 @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
  {
racc->grp_assignment_read = 1;
if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
 - && !is_gimple_reg_type (racc->type))
 + && !is_gimple_reg_type (racc->type)
 + && (TYPE_MAIN_VARIANT (racc->type)
 + == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base
 bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID 
 (racc->base));
if (storage_order_barrier_p (lhs))
 racc->grp_unscalarizable_region = 1;
>>>
>>> I believe that the added condition is not what you want, this seems
>>> to trigger also for ordinary:
>>>
>>>   s1 = s2.field
>>>
>>> Where racc->type is type of the field but racc->base is s2 and its type
>>> is type of the structure.
>>
>> Yes.  But do we want to totally scalarize s2 in this case?  We only
>> access parts of it.  We don't seem to have a testcase that fails
>> (well, full testing still in progress).
>
> If we start with
>
> small_s1 = src;
> dst = small_s1->even_smaller_struct_field;
>
> and small_s1 is otherwise unused, I think that we want to facilitate
> copy propagation with total scalarization too.
>
>>
>>> I also think you want to be setting a bit in
>>> cannot_scalarize_away_bitmap in order to guarantee that total
>>> scalarization will not happen for the given candidate.  Otherwise some
>>> other regular assignment might trigger it ...
>

Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2017-11-24 Thread Marc Glisse

Hello,

@@ -4247,9 +4248,20 @@ build_operator_new_call (tree fnname, vec 
**args,
  we disregard block-scope declarations of "operator new".  */
   fns = lookup_function_nonclass (fnname, *args, /*block_p=*/false);

+  if (align_arg)
+{
+  vec* align_args
+   = vec_copy_and_insert (*args, align_arg, 1);
+  cand = perform_overload_resolution (fns, align_args, &candidates,
+ &any_viable_p, tf_none);
+  /* If no aligned allocation function matches, try again without the
+alignment.  */
+}
+

Code further in the function expects to be able to adjust args, which is 
defeated by copying them in align_args, see PR 82760.


--
Marc Glisse


Re: [C++ PATCH] Fix -fsanitize={null,alignment} of references (PR c++/79572)

2017-11-24 Thread Maxim Kuvyrkov
On Thu, Mar 23, 2017 at 11:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> Since late C++ folding has been committed, we don't sanitize some reference
> bindings to NULL.  Earlier we had always NOP_EXPR to REFERENCE_TYPE say from
> INTEGER_CST or whatever else, but cp_fold can now turn that right into
> INTEGER_CST with REFERENCE_TYPE.  The following patch sanitizes even those.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-03-23  Jakub Jelinek  
>
> PR c++/79572
> * c-ubsan.h (ubsan_maybe_instrument_reference): Change argument to
> tree *.
> * c-ubsan.c (ubsan_maybe_instrument_reference): Likewise.  Handle
> not just NOP_EXPR to REFERENCE_TYPE, but also INTEGER_CST with
> REFERENCE_TYPE.
>
> * cp-gimplify.c (cp_genericize_r): Sanitize INTEGER_CSTs with
> REFERENCE_TYPE.  Adjust ubsan_maybe_instrument_reference caller
> for NOP_EXPR to REFERENCE_TYPE.
>
> * g++.dg/ubsan/null-8.C: New test.
>
...
> --- gcc/testsuite/g++.dg/ubsan/null-8.C.jj  2017-03-23 09:42:31.664696676 
> +0100
> +++ gcc/testsuite/g++.dg/ubsan/null-8.C 2017-03-23 09:43:31.501908802 +0100
> @@ -0,0 +1,19 @@
> +// PR c++/79572
> +// { dg-do run }
> +// { dg-options "-fsanitize=null -std=c++14" }
> +// { dg-output "reference binding to null pointer of type 'const int'" }
> +
> +void
> +foo (const int &iref)
> +{
> +  if (&iref)
> +__builtin_printf ("iref %d\n", iref);
> +  else
> +__builtin_printf ("iref is NULL\n");

Hi Jakub,

Using __builtin_printf causes this test to fail sporadically when
cross-testing.  Stdout and stderr output can get mixed in
cross-testing, so dejagnu might see
==
g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
pointer of type iref is NULL
'const int'
==
instead of
==
g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
pointer of type 'const int'
iref is NULL
==

Is it essential for this testcase to use __builtin_printf or simple
"fprintf (stderr, ...)" would do just fine?

> +}
> +
> +int
> +main ()
> +{
> +  foo (*((int*) __null));
> +}

Regards,

-- 
Maxim Kuvyrkov


Re: [C++ PATCH] Fix -fsanitize={null,alignment} of references (PR c++/79572)

2017-11-24 Thread Jakub Jelinek
On Fri, Nov 24, 2017 at 05:16:27PM +0300, Maxim Kuvyrkov wrote:
> Using __builtin_printf causes this test to fail sporadically when
> cross-testing.  Stdout and stderr output can get mixed in
> cross-testing, so dejagnu might see
> ==
> g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
> pointer of type iref is NULL
> 'const int'
> ==
> instead of
> ==
> g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
> pointer of type 'const int'
> iref is NULL
> ==
> 
> Is it essential for this testcase to use __builtin_printf or simple
> "fprintf (stderr, ...)" would do just fine?

That would mean bringing in stdio.h, which is very much undesirable.

If you want, just revert the patch, verify the testcase FAILs,
and then tweak it to say:
__attribute__((noinline, noclone))
void
bar (const *x, int y)
{
  asm volatile ("" : : "g" (x), "g" (y) : "memory");
}

and change __builtin_printf ("iref %d\n", iref);
to bar ("iref %d\n", iref);
and __builtin_printf ("iref is NULL\n");
to bar ("iref is NULL\n", 0);
If the test still FAILs and is fixed after you reapply the patch,
the change is preapproved.

Jakub


[PATCH, i386] Fix registers type for MODE_TI

2017-11-24 Thread Shalnov, Sergey
Hi,
I found wrong ymm registers are generated in case of 
"-march=skylake-avx512 -mprefer-vector-width=128" options set

The code looks like:
movq%r11, 64(%rbx) 
vpxord  %ymm0, %ymm0, %ymm0 
vmovdqa64   %xmm0, 32(%rbx)
movq%r11, 15584(%rbx)

where MODE_TI used in standard_sse_constant_opcode();

This patch fix that to "xmm" registers generation because we don't have to touch
upper part of registers in case of 128bit type.

2017-11-24  Sergey Shalnov  
gcc/
* config/i386/i386.c (standard_sse_constant_opcode): Fix
registers type for 128bit mode.



0007-Fix-registers-type-for-MODE_TI.patch
Description: 0007-Fix-registers-type-for-MODE_TI.patch


[PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-24 Thread Sudakshina Das

Hi

For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
  return (a < b) ? b : c;
}

when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard trunk generates wrong code:


test_select:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16   s0, s0
vcvtb.f32.f16   s15, s1
vcmpe.f32   s0, s15
vmrsAPSR_nzcv, FPSCR
// <-- No conditional branch!
vmovs1, s2  @ __fp16
.L2:
vmovs0, s1  @ __fp16
bx  lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional
wherever needed.

Testing done: Add a new test case and checked for regressions 
arm-none-linux-gnueabihf.


Is this ok for trunk?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-24  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-24  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 075a938..61b6477 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -410,7 +410,10 @@
   gcc_unreachable ();
 }
  }
-  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+  [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\
+		  unconditional, unconditional, unconditional,\
+		  unconditional")
+   (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
(set_attr "predicable_short_it" "no, no, no, yes,\
 no, no, no, no,\
 no, no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
new file mode 100644
index 000..fcb857f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -marm" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_select (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+/* { dg-final { scan-assembler "bmi" } } */


[RFC] Generate sub-tests from libgomp.c/for-{3,5,6}.c

2017-11-24 Thread Tom de Vries

[ add missing cc gcc-patches ]

On 11/24/2017 04:31 PM, Tom de Vries wrote:

Hi,

The libgomp offloading tests libgomp.c/for-{3,5,6}.c are monolithic 
tests which test lots of subtests, and if any of them fail, it would be 
good to be able to quickly known which subtest(s) fail. [ And if you fix 
or break one of the subtests during accel target bringup, it would be 
nice if that shows up in the PASS/FAIL status. ]


I've written a patch to generate standalone subtests from the original 
ones, and add them to the test list in c.exp. This adds a total of 84 
tests.


I've scaled the timeout for each subtest with factor 0.1. If all new 
subtests time out, and we have a standard timeout of 300s, this adds 7 
hrs testing time). But AFAIK, it's an open issue that dg-timeout and 
dg-timeout-factor don't work on remote setups, so that problem may still 
present itself.


I'm not sure if it makes sense to enable this by default on a normal 
host-only setup, but perhaps this approach is acceptable if we require 
effective target offload_device for the sub-test testcases?


If not, I think it would already be good to have this in the sources 
disabled by default, and allow it to be enabled by editing a variable in 
c.exp or some such.


WDYT?

Tested on x86_64 without offloading device (compile+run times listed in 
patch).


Tested on x86_64 with gcn device.

TODO:
- generate only one copy of the subtests, to be used by each parallel
   invocation.
- improve tcl, f.i. extracting the DO_TEST arguments could be done
   better.

Thanks,
- Tom

0001-Generate-sub-tests-from-libgomp.c-for-3-5-6-.c.patch


Generate sub-tests from libgomp.c/for-{3,5,6}.c

With compiler build at -O0 --enable-checking=yes,rtl:
for-3.c: 33s
for-3-*.c: 2m31s
for-5.c: 59s
for-5-*.c: 5m5s
for-6.c: 43s
for-6-*.c: 3m39s

With compiler build at -O2 --enable-checking=release:
for-3.c: 10s
for-3-*.c: 22s
for-5.c: 8s
for-5-*.c: 42s
for-6.c: 6s
for-6-*.c: 29s

2017-11-24  Tom de Vries  

* testsuite/libgomp.c/c.exp: Generate tests from .list files, and add
them to test to be run.
(generate_tests): New proc.
* testsuite/libgomp.c/for-3.list: New test list, factored out of ...
* testsuite/libgomp.c/for-3.c (main): ... here.  Use for-3.list to
iterate over tests, or just run ONE_TEST.
(TEST_ALL): Define.
* testsuite/libgomp.c/for-5.list: New test list, factored out of ...
* testsuite/libgomp.c/for-5.c (main): ... here.  Use for-5.list to
iterate over tests, or just run ONE_TEST.
(TEST_ALL): Define.
* testsuite/libgomp.c/for-6.list: New test list, factored out of ...
* testsuite/libgomp.c/for-6.c (main): ... here.  Use for-6.list to
iterate over tests, or just run ONE_TEST.
(TEST_ALL): Define.

---
  libgomp/testsuite/libgomp.c/c.exp  | 74 ++-
  libgomp/testsuite/libgomp.c/for-3.c| 65 +---
  libgomp/testsuite/libgomp.c/for-3.list | 24 +++
  libgomp/testsuite/libgomp.c/for-5.c| 79 ++
  libgomp/testsuite/libgomp.c/for-5.list | 35 +++
  libgomp/testsuite/libgomp.c/for-6.c| 62 +++---
  libgomp/testsuite/libgomp.c/for-6.list | 24 +++
  7 files changed, 275 insertions(+), 88 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/c.exp 
b/libgomp/testsuite/libgomp.c/c.exp
index 31bdd57..791788b 100644
--- a/libgomp/testsuite/libgomp.c/c.exp
+++ b/libgomp/testsuite/libgomp.c/c.exp
@@ -23,10 +23,82 @@ dg-init
  # Turn on OpenMP.
  lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
  
+# Generate new tests for each DO_TEST entry in TEST_LIST.

+proc generate_tests { test_list } {
+global srcdir
+global subdir
+
+# Get corresponding source file.
+set base_file [regsub "\.list" $test_list ""]
+set base_file [regsub "$srcdir/$subdir/" $base_file ""]
+set c_file $base_file.c
+
+# Get dg directives from c file.
+set dg_directives ""
+set fp [open "$srcdir/$subdir/$c_file" r]
+while {[gets $fp line] >= 0} {
+   if {[regexp -line -- "^/\\* \{ dg-" $line]} {
+   if { "$dg_directives" == "" } {
+   set sep ""
+   } else {
+   set sep "\n"
+   }
+   set dg_directives "$dg_directives$sep$line"
+   }
+}
+close $fp
+
+# Get list of tests.
+set fp [open "$test_list" r]
+set file_data [read $fp]
+close $fp
+set file_data [regsub -all "DO_TEST" $file_data ""]
+set file_data [regsub -all "\\(" $file_data ""]
+set file_data [regsub -all "\\)" $file_data ""]
+set file_data [regsub -all \[\n\] $file_data ""]
+set file_data [string trimleft $file_data " "]
+set tests [split $file_data]
+
+# Create directory to generate files.
+set test_dir [pwd]
+set generated_dir $test_dir/generated/libgomp.c
+file mkdir $generated_dir
+
+# Generate tests.
+set new_fil

[PATCH, i386] Fix wrong instruction vpcmpeqd generation

2017-11-24 Thread Shalnov, Sergey
Hi,
I found wrong vpcmpeqd instruction form generated in case of
"-march=skylake-avx512 -mprefer-vector-width=128" options set

The compiler emits following error at compile stage:
Error: invalid register operand for `vpcmpeqd'

Because following was generated:
vpcmpeqd %xmm16, %xmm16, %xmm16

In this form the instruction can use zmm0-xmm15 registers only.


2017-11-24  Sergey Shalnov  
gcc/
* config/i386/i386.c (standard_sse_constant_opcode):
Fix wrong form for vpcmpeqd instruction



0008-Wrong-instruction-vpcmpeqd-generation.patch
Description: 0008-Wrong-instruction-vpcmpeqd-generation.patch


Re: [0/4] [AArch64] Add SVE support

2017-11-24 Thread Richard Sandiford
Richard Sandiford  writes:
> This series adds support for ARM's Scalable Vector Extension.
> More details on SVE can be found here:
>
>   
> https://developer.arm.com/products/architecture/a-profile/docs/arm-architecture-reference-manual-supplement-armv8-a
>
> There are four parts for ease of review, but it probably makes
> sense to commit them as one patch.
>
> The series plugs SVE into the current vectorisation framework without
> adding any new features to the framework itself.  This means for example
> that vector loops still handle full vectors, with a scalar epilogue loop
> being needed for the rest.  Later patches add support for other features
> like fully-predicated loops.
>
> The patches build on top of the various series that I've already posted.
> Sorry that there were so many, and thanks again for all the reviews.
>
> Tested on aarch64-linux-gnu without SVE and aarch64-linux-gnu with SVE
> (in the default vector-length agnostic mode).  Also tested with
> -msve-vector-bits=256 and -msve-vector-bits=512 to select 256-bit
> and 512-bit SVE registers.

Here's an update based on an off-list discussion with the maintainers.
Changes since v1:

- Changed the names of the modes from 256-bit vectors to "VNx"
  + a 128-bit mode name, e.g. V32QI -> VNx16QI.

- Added an "sve" attribute and used it in the "enabled" attribute.
  This allows generic aarch64.md patterns to disable things related
  to SVE on non-SVE targets; previously this was implicit through the
  constraints.

- Improved the consistency of the constraint names, specifically:

  Ua?: addition contraints (already used for Uaa)
  Us?: general scalar constraints (already used for various other scalars)
  Ut?: memory constraints (unchanged from v1)
  vs?: vector SVE constraints (mostly unchanged, but now includes FP
   as well as integer constraints)

  There's still the general "Dm" (minus one) constraint, for consistency
  with "Dz" (zero).

- Added missing register descriptions above FIXED_REGISTERS.

- "should"/"is expected to" -> "must".

- Added more commentary to things like regmode_natural_size.

I also did a before and after comparison of the testsuite output
for base AArch64 (but using the new FIRST_PSEUDO_REGISTER definition
to avoid changes to hash values).  There were no differences.

Thanks,
Richard


2017-11-24  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes.  Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate re

Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-24 Thread Christophe Lyon
On 24 November 2017 at 11:31, Tamar Christina  wrote:
>> >
>> Not sure if Kyrill actually meant to comment about the three lines above, but
>> they have a bug:
>> #if should be before #pragma GCC push_options.
>>
>> Indeed, after this patch was committed (r255064), I've noticed many
>> regressions, for instance
>> p64_p128 is now unsupported. This is because the arm_crypto_ok effective
>> target now fails with this message:
>> XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline
>> 'vaeseq_u8': target specific option mismatch
>>
>> Not sure why this wasn't noticed in validations earlier?
>
> I still have the log files for these runs:
>
> It seems that I was comparing the log files instead of the sum files, which 
> do not show this difference.
>
> /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" 
> ../../build-arm-none-eabi/results.clean/vanilla/gcc.log 
> ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log  | grep p64_p128
> /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" 
> ../../build-arm-none-eabi/results.clean/vanilla/gcc.sum 
> ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum  | grep p64_p128
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  execution 
> test
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  (test for 
> excess errors)
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  execution 
> test
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  (test for 
> excess errors)
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O2
>
> Sorry for missing this, I don't even know why these scripts accept the log 
> files if they're always going to do the wrong thing.
>
> Anyway thanks for fixing this and I'll make sure I'm using the sum files in 
> the future.
>

Thanks for checking why you missed it.

That being said, I think there are a few more problems with your
patch, but there is a lot of "noise" in the reports.

After your commit, I have these reports:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html

After my commit, I have these reports:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255126/report-build-info.html

I haven't fully checked that my patch fixes all the regressions
reported at r255064, but I don't see
why my patch would introduce regressions So I think your patch is
causing problems:
* on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]

* on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-d16-fp16
(the 2 "BIG-REGR" entries)

where a few tests fail:
(arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
gcc.dg/vect/pr65947-14.c execution test

(armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
  Executed from: gcc.dg/vect/vect.exp
gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
gcc.dg/vect/pr51074.c execution test
gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
gcc.dg/vect/pr65947-14.c execution test
gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-nb-iter-ub-2.c execution test
gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-nb-iter-ub-3.c execution test
gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-strided-shift-1.c execution test
gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-strided-u16-i3.c execution test
  Executed from: gcc.target/arm/arm.exp
gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect
"vectorized 1 loops" 1 (found 0 times)

I haven't checked whether this tests were already failing before your
patch, and are just reported as new failures because they failed to
compile in the mean time.

Not sure I am clear :-)

Sorry for the delay and potentially hard to parse reports, I'm
struggling with infrastructure problems.

Thanks,

Christophe

> Tamar
>
>>
>> Fixed as obviou

[PR 81248] Fix ipa-sra size check

2017-11-24 Thread Martin Jambor
Hi,

PR 81248 is a missed-optimization bug when SRA refuses to replace a
pointer to a one-field structure to just passing the field by value.
The problem is a bogus check which compares the size of the replacement
with the size of the aggregate, even when it is passed by reference,
which was not the intention.

The check was also performed in two places.  This patch moves it only to
one and changes it as it was intended.  In the process I changed the
meaning of PARAM_IPA_SRA_PTR_GROWTH_FACTOR a bit and now it also limits
the number of new parameters as well as their total size, so that (by
default) we never create more than two replacements for an aggregate
passed by reference (because without it I have seen replacements by four
32-bit floats, for example).

I had to disable IPA-SRA for two testcases.  The SSA-PRE is testing
hoisting of loads that are is now done by IPA-SRA.  ipcp-cstagg-2.c now
unfortunately exhibits PR 80735.  But the situation is worse than
without the patch only for structures containing nothing but a function
pointer which I hope is not an interesting case.  I am still in the
process of rewriting IPA-SRA to a real IPA pass, now hope to have it
ready in early stage 1, and that should fix this issue, among others.

The patch has passed bootstrap and testing on x86_64-linux, OK for
trunk?

Thanks,

Martin


2017-11-23  Martin Jambor  

PR tree-optimization/81248
* tree-sra.c (splice_param_accesses): Remove size check.
(decide_one_param_reduction): Fix size check.
* gimple-pretty-print.c (dump_profile): Silence warning.
* params.def (PARAM_IPA_SRA_PTR_GROWTH_FACTOR): Adjust description.

testsuite/
* g++.dg/ipa/pr81248.C: New test.
* gcc.dg/tree-ssa/ssa-pre-31.c: Disable IPA-SRA.
* gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c: Likewise.
---
 gcc/gimple-pretty-print.c  |  2 +-
 gcc/params.def |  4 +--
 gcc/testsuite/g++.dg/ipa/pr81248.C | 40 ++
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c |  2 +-
 gcc/tree-sra.c | 55 +-
 6 files changed, 69 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr81248.C

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 55c623e37bb..8bcc4e31bfb 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -84,7 +84,7 @@ debug_gimple_stmt (gimple *gs)
 static const char *
 dump_profile (profile_count &count)
 {
-  char *buf;
+  char *buf = NULL;
   if (!count.initialized_p ())
 return "";
   if (count.ipa_p ())
diff --git a/gcc/params.def b/gcc/params.def
index 89915d4fc7f..93bd2cf75fe 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -971,8 +971,8 @@ DEFPARAM (PARAM_MIN_NONDEBUG_INSN_UID,
 
 DEFPARAM (PARAM_IPA_SRA_PTR_GROWTH_FACTOR,
  "ipa-sra-ptr-growth-factor",
- "Maximum allowed growth of size of new parameters ipa-sra replaces "
- "a pointer to an aggregate with.",
+ "Maximum allowed growth of number and total size of new parameters "
+ "that ipa-sra replaces a pointer to an aggregate with.",
  2, 0, 0)
 
 DEFPARAM (PARAM_TM_MAX_AGGREGATE_SIZE,
diff --git a/gcc/testsuite/g++.dg/ipa/pr81248.C 
b/gcc/testsuite/g++.dg/ipa/pr81248.C
new file mode 100644
index 000..d55d2e751e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr81248.C
@@ -0,0 +1,40 @@
+//  { dg-do compile }
+// { dg-options "-O2 -std=c++17 -fdump-tree-eipa_sra" }
+
+
+#include 
+
+typedef unsigned char __uint8_t;
+typedef __uint8_t uint8_t;
+
+
+struct A {
+A() = default;
+A(const A& o) = default;
+A(const volatile A& o) : m1(o.m1) {}
+uint8_t m1{0};
+};
+
+volatile uint8_t v;
+
+template
+void f(const T& x) __attribute__((noinline));
+template
+void f(const T& x) {
+if constexpr(std::is_same, A>::value) {
+v = x.m1;
+}
+else {
+v = x;
+}
+}
+
+uint8_t n1;
+A n2;
+
+int main() {
+f(n1);
+f(n2);
+}
+
+// { dg-final { scan-tree-dump-times "Adjusting call" 2 "eipa_sra" } }
diff --git a/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c 
b/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
index f82014024d4..c1b6f0f73a3 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-ipa-cp-details" } */
+/* { dg-options "-O3 -fdump-ipa-cp-details -fno-ipa-sra" } */
 
 typedef struct S
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
index 0dface557be..6a33b942ad5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre -fno-ipa-sra" } */
 
 typedef struct {
 unsigned int 

[PATCH] Fix typo in unordered_map deduction guide, missing "typename ="

2017-11-24 Thread Stephan Bergmann

Otherwise, at least recent Clang trunk with -std=gnu++17 complains


In file included from 
lib/gcc/x86_64-pc-linux-gnu/8.0.0/../../../../include/c++/8.0.0/functional:60:
In file included from 
lib/gcc/x86_64-pc-linux-gnu/8.0.0/../../../../include/c++/8.0.0/unordered_map:47:
lib/gcc/x86_64-pc-linux-gnu/8.0.0/../../../../include/c++/8.0.0/bits/unordered_map.h:1197:5:
 error: deduction guide template contains a template parameter that cannot be 
deduced
unordered_map(initializer_list>, _Allocator)
^
lib/gcc/x86_64-pc-linux-gnu/8.0.0/../../../../include/c++/8.0.0/bits/unordered_map.h:1196:34:
 note: non-deducible template parameter (anonymous)
   _RequireAllocator<_Allocator>>
^
1 error generated.


>From 0ecb03c59b9318b3f47e49daa2f45dd513035d55 Mon Sep 17 00:00:00 2001
From: Stephan Bergmann 
Date: Fri, 24 Nov 2017 17:55:38 +0100
Subject: [PATCH] Fix typo in unordered_map deduction guide, missing "typename
 ="

...as correctly appears in all the other deduction guides in these files.
---
 libstdc++-v3/include/bits/unordered_map.h | 2 +-
 libstdc++-v3/include/debug/unordered_map  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/unordered_map.h b/libstdc++-v3/include/bits/unordered_map.h
index cb5bcb89a16..27c89f172e1 100644
--- a/libstdc++-v3/include/bits/unordered_map.h
+++ b/libstdc++-v3/include/bits/unordered_map.h
@@ -1193,7 +1193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 -> unordered_map<_Key, _Tp, hash<_Key>, equal_to<_Key>, _Allocator>;
 
   template>
+	   typename = _RequireAllocator<_Allocator>>
 unordered_map(initializer_list>, _Allocator)
 -> unordered_map<_Key, _Tp, hash<_Key>, equal_to<_Key>, _Allocator>;
 
diff --git a/libstdc++-v3/include/debug/unordered_map b/libstdc++-v3/include/debug/unordered_map
index c5734304846..c68ccaa3372 100644
--- a/libstdc++-v3/include/debug/unordered_map
+++ b/libstdc++-v3/include/debug/unordered_map
@@ -679,7 +679,7 @@ namespace __debug
 -> unordered_map<_Key, _Tp, hash<_Key>, equal_to<_Key>, _Allocator>;
 
   template>
+	   typename = _RequireAllocator<_Allocator>>
 unordered_map(initializer_list>, _Allocator)
 -> unordered_map<_Key, _Tp, hash<_Key>, equal_to<_Key>, _Allocator>;
 
-- 
2.13.6



[PATCH] Missing noexcept in string_view::find_first_of declaration

2017-11-24 Thread Stephan Bergmann
 >From 0a75fbb0c16e216892d16f1ba3448471e33f61bb Mon Sep 17 00:00:00 2001
From: Stephan Bergmann 
Date: Fri, 24 Nov 2017 18:04:30 +0100
Subject: [PATCH] Missing noexcept in string_view::find_first_of declaration

cf. definition in libstdc++-v3/include/bits/string_view.tcc that does have the
noexcept (like all the other overloads defined inline in
libstdc++-v3/include/std/string_view do).
---
 libstdc++-v3/include/std/string_view | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/string_view b/libstdc++-v3/include/std/string_view
index fa834002726..68b4b08f8f4 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -323,7 +323,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return this->find(__c, __pos); }
 
   constexpr size_type
-  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const;
+  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const noexcept;
 
   constexpr size_type
   find_first_of(const _CharT* __str, size_type __pos = 0) const noexcept
-- 
2.13.6



Re: [PR 81248] Fix ipa-sra size check

2017-11-24 Thread Richard Biener
On November 24, 2017 5:34:17 PM GMT+01:00, Martin Jambor  
wrote:
>Hi,
>
>PR 81248 is a missed-optimization bug when SRA refuses to replace a
>pointer to a one-field structure to just passing the field by value.
>The problem is a bogus check which compares the size of the replacement
>with the size of the aggregate, even when it is passed by reference,
>which was not the intention.
>
>The check was also performed in two places.  This patch moves it only
>to
>one and changes it as it was intended.  In the process I changed the
>meaning of PARAM_IPA_SRA_PTR_GROWTH_FACTOR a bit and now it also limits
>the number of new parameters as well as their total size, so that (by
>default) we never create more than two replacements for an aggregate
>passed by reference (because without it I have seen replacements by
>four
>32-bit floats, for example).
>
>I had to disable IPA-SRA for two testcases.  The SSA-PRE is testing
>hoisting of loads that are is now done by IPA-SRA.  ipcp-cstagg-2.c now
>unfortunately exhibits PR 80735.  But the situation is worse than
>without the patch only for structures containing nothing but a function
>pointer which I hope is not an interesting case.  I am still in the
>process of rewriting IPA-SRA to a real IPA pass, now hope to have it
>ready in early stage 1, and that should fix this issue, among others.
>
>The patch has passed bootstrap and testing on x86_64-linux, OK for
>trunk?

OK. 

Richard. 

>Thanks,
>
>Martin
>
>
>2017-11-23  Martin Jambor  
>
>   PR tree-optimization/81248
>   * tree-sra.c (splice_param_accesses): Remove size check.
>   (decide_one_param_reduction): Fix size check.
>   * gimple-pretty-print.c (dump_profile): Silence warning.
>   * params.def (PARAM_IPA_SRA_PTR_GROWTH_FACTOR): Adjust description.
>
>   testsuite/
>   * g++.dg/ipa/pr81248.C: New test.
>   * gcc.dg/tree-ssa/ssa-pre-31.c: Disable IPA-SRA.
>   * gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c: Likewise.
>---
> gcc/gimple-pretty-print.c  |  2 +-
> gcc/params.def |  4 +--
> gcc/testsuite/g++.dg/ipa/pr81248.C | 40 ++
> gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c   |  2 +-
> gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c |  2 +-
>gcc/tree-sra.c | 55
>+-
> 6 files changed, 69 insertions(+), 36 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/ipa/pr81248.C
>
>diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
>index 55c623e37bb..8bcc4e31bfb 100644
>--- a/gcc/gimple-pretty-print.c
>+++ b/gcc/gimple-pretty-print.c
>@@ -84,7 +84,7 @@ debug_gimple_stmt (gimple *gs)
> static const char *
> dump_profile (profile_count &count)
> {
>-  char *buf;
>+  char *buf = NULL;
>   if (!count.initialized_p ())
> return "";
>   if (count.ipa_p ())
>diff --git a/gcc/params.def b/gcc/params.def
>index 89915d4fc7f..93bd2cf75fe 100644
>--- a/gcc/params.def
>+++ b/gcc/params.def
>@@ -971,8 +971,8 @@ DEFPARAM (PARAM_MIN_NONDEBUG_INSN_UID,
> 
> DEFPARAM (PARAM_IPA_SRA_PTR_GROWTH_FACTOR,
> "ipa-sra-ptr-growth-factor",
>-"Maximum allowed growth of size of new parameters ipa-sra replaces
>"
>-"a pointer to an aggregate with.",
>+"Maximum allowed growth of number and total size of new parameters
>"
>+"that ipa-sra replaces a pointer to an aggregate with.",
> 2, 0, 0)
> 
> DEFPARAM (PARAM_TM_MAX_AGGREGATE_SIZE,
>diff --git a/gcc/testsuite/g++.dg/ipa/pr81248.C
>b/gcc/testsuite/g++.dg/ipa/pr81248.C
>new file mode 100644
>index 000..d55d2e751e8
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/ipa/pr81248.C
>@@ -0,0 +1,40 @@
>+//  { dg-do compile }
>+// { dg-options "-O2 -std=c++17 -fdump-tree-eipa_sra" }
>+
>+
>+#include 
>+
>+typedef unsigned char __uint8_t;
>+typedef __uint8_t uint8_t;
>+
>+
>+struct A {
>+A() = default;
>+A(const A& o) = default;
>+A(const volatile A& o) : m1(o.m1) {}
>+uint8_t m1{0};
>+};
>+
>+volatile uint8_t v;
>+
>+template
>+void f(const T& x) __attribute__((noinline));
>+template
>+void f(const T& x) {
>+if constexpr(std::is_same, A>::value) {
>+v = x.m1;
>+}
>+else {
>+v = x;
>+}
>+}
>+
>+uint8_t n1;
>+A n2;
>+
>+int main() {
>+f(n1);
>+f(n2);
>+}
>+
>+// { dg-final { scan-tree-dump-times "Adjusting call" 2 "eipa_sra" } }
>diff --git a/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
>b/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
>index f82014024d4..c1b6f0f73a3 100644
>--- a/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
>+++ b/gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
>@@ -1,5 +1,5 @@
> /* { dg-do compile } */
>-/* { dg-options "-O3 -fdump-ipa-cp-details" } */
>+/* { dg-options "-O3 -fdump-ipa-cp-details -fno-ipa-sra" } */
> 
> typedef struct S
> {
>diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
>b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
>index 0dface557be..6a33b942ad5 100644
>--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
>+++ b/gcc/tests

Re: [PATCH] Missing noexcept in string_view::find_first_of declaration

2017-11-24 Thread Jonathan Wakely

On 24/11/17 18:09 +0100, Stephan Bergmann wrote:

cf. definition in libstdc++-v3/include/bits/string_view.tcc that does have the
noexcept (like all the other overloads defined inline in
libstdc++-v3/include/std/string_view do).


I thought Jason already fixed this in
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01106.html but it
doesn't seem to have been committed yet.

I'll take care of it. Thanks for the patch.



---
libstdc++-v3/include/std/string_view | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index fa834002726..68b4b08f8f4 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -323,7 +323,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  { return this->find(__c, __pos); }

  constexpr size_type
-  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const;
+  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const 
noexcept;

  constexpr size_type
  find_first_of(const _CharT* __str, size_type __pos = 0) const noexcept
--
2.13.6





RE: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-24 Thread Tamar Christina
Hi Christophe,

> 
> After your commit, I have these reports:
> http://people.linaro.org/~christophe.lyon/cross-
> validation/gcc/trunk/255064/report-build-info.html
> 
> After my commit, I have these reports:
> http://people.linaro.org/~christophe.lyon/cross-
> validation/gcc/trunk/255126/report-build-info.html
> 
> I haven't fully checked that my patch fixes all the regressions reported at
> r255064, but I don't see why my patch would introduce regressions So I
> think your patch is causing problems:
> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
> 
> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
> d16-fp16 (the 2 "BIG-REGR" entries)

This patch only introduced a few neon instrinsics in arm_neon.h, and most of 
these files don't use the header.

gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.

I will run some regressions over the weekend on an updated tree, but I can't 
understand how a not included header it can cause execution failures 😊
However most of those are vectorizer tests. It seems much more likely to me 
that vectorization is broken rather.

Thanks,
Tamar

> where a few tests fail:
> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/pr65947-14.c execution test
> 
> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>   Executed from: gcc.dg/vect/vect.exp
> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/pr51074.c execution test
> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/pr65947-14.c execution test
> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-nb-iter-ub-2.c execution test
> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-nb-iter-ub-3.c execution test
> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-strided-shift-1.c execution test
> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-strided-u16-i3.c execution test
>   Executed from: gcc.target/arm/arm.exp
> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
> loops" 1 (found 0 times)
> 
> I haven't checked whether this tests were already failing before your patch,
> and are just reported as new failures because they failed to compile in the
> mean time.
> 
> Not sure I am clear :-)
> 
> Sorry for the delay and potentially hard to parse reports, I'm struggling with
> infrastructure problems.
> 
> Thanks,
> 
> Christophe
> 
> > Tamar
> >
> >>
> >> Fixed as obvious (r255126).
> >>
> >> Christophe
> >>
> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> >> > new file mode 100644
> >> > index
> >> >
> >>
> ..90b00aff95cfef96d1963be176
> >> 73
> >> > dc191cc71169
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> >> > @@ -0,0 +1,15 @@
> >> > +TYPE char X[N] __attribute__
> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
> >> > +TYPE char Y[N] __attribute__
> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
> >> > +
> >> > +__attribute__ ((noinline)) int
> >> > +foo1(int len) {
> >> > +  int i;
> >> > +  TYPE int result = 0;
> >> > +  TYPE short prod;
> >> > +
> >> > +  for (i=0; i >> > +prod = X[i] * Y[i];
> >> > +result += prod;
> >> > +  }
> >> > +  return result;
> >> > +}
> >> > \ No newline at end of file
> >> >
> >> > Please add new lines at the end of the new test files.
> >> > This applies to a few more new files in this patch.
> >> >
> >> > Ok with these nits fixed.
> >> >
> >> > Thanks,
> >> > Kyrill
> >> >


[PATCH, Makefile.in] refine selftest recipes to restore mingw bootstrap

2017-11-24 Thread Olivier Hainque
Hello,

Bootstrap for mingw target configurations is failing today, with complaints
about an inexistant /dev/null.

The failing commands are issued by the selftest recipes in gcc/Makefile, using:

SELFTEST_FLAGS = -nostdinc /dev/null -S -o /dev/null \
-fself-test=$(srcdir)/testsuite/selftests

The attached patch is a proposal to fix this by resorting to "nul" instead of
"/dev/null" when the build machine name indicates a mingw setup.

It fixes the gcc-7 bootstrap in our Windows setup (i686-pc-mingw32), and passes
bootstrap and regression tests on x86_64-linux.

OK to commit ?

Thanks much in advance,

With Kind Regards,

Olivier

2017-11-24  Olivier Hainque  

* Makefile.in (SELFTEST_FLAGS): Use nul instead of /dev/null
on mingw build hosts.



dev_null.diff
Description: Binary data


Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-24 Thread Christophe Lyon
On 24 November 2017 at 19:05, Tamar Christina  wrote:
> Hi Christophe,
>
>>
>> After your commit, I have these reports:
>> http://people.linaro.org/~christophe.lyon/cross-
>> validation/gcc/trunk/255064/report-build-info.html
>>
>> After my commit, I have these reports:
>> http://people.linaro.org/~christophe.lyon/cross-
>> validation/gcc/trunk/255126/report-build-info.html
>>
>> I haven't fully checked that my patch fixes all the regressions reported at
>> r255064, but I don't see why my patch would introduce regressions So I
>> think your patch is causing problems:
>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>
>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
>> d16-fp16 (the 2 "BIG-REGR" entries)
>
> This patch only introduced a few neon instrinsics in arm_neon.h, and most of 
> these files don't use the header.
>
> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new 
> test.
>
> I will run some regressions over the weekend on an updated tree, but I can't 
> understand how a not included header it can cause execution failures 😊
> However most of those are vectorizer tests. It seems much more likely to me 
> that vectorization is broken rather.

Agreed. But note that many regressions are reported for the
configurations --with-fpu vfpv3-d16-fp16
at: 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html
Maybe that's just a matter of arm_neon.h being included by some
effective-target tests?


>
> Thanks,
> Tamar
>
>> where a few tests fail:
>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/pr65947-14.c execution test
>>
>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>>   Executed from: gcc.dg/vect/vect.exp
>> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/pr51074.c execution test
>> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/pr65947-14.c execution test
>> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-nb-iter-ub-2.c execution test
>> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-nb-iter-ub-3.c execution test
>> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-strided-shift-1.c execution test
>> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-strided-u16-i3.c execution test
>>   Executed from: gcc.target/arm/arm.exp
>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
>> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
>> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
>> loops" 1 (found 0 times)
>>
>> I haven't checked whether this tests were already failing before your patch,
>> and are just reported as new failures because they failed to compile in the
>> mean time.
>>
>> Not sure I am clear :-)
>>
>> Sorry for the delay and potentially hard to parse reports, I'm struggling 
>> with
>> infrastructure problems.
>>
>> Thanks,
>>
>> Christophe
>>
>> > Tamar
>> >
>> >>
>> >> Fixed as obvious (r255126).
>> >>
>> >> Christophe
>> >>
>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > new file mode 100644
>> >> > index
>> >> >
>> >>
>> ..90b00aff95cfef96d1963be176
>> >> 73
>> >> > dc191cc71169
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > @@ -0,0 +1,15 @@
>> >> > +TYPE char X[N] __attribute__
>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> >> > +TYPE char Y[N] __attribute__
>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> >> > +
>> >> > +__attribute__ ((noinline)) int
>> >> > +foo1(int len) {
>> >> > +  int i;
>> >> > +  TYPE int result = 0;
>> >> > +  TYPE short prod;
>> >> > +
>> >> > +  for (i=0; i> >> > +prod = X[i] * Y[i];
>> >> > +result += prod;
>> >> > +  }
>> >> > +  return result;
>> >> > +}
>> >> > \ No newline at end of file
>> >> >
>> >> > Please add new lines at the end of the new test files.
>> >> > This applies to a few more new files in this patch.
>> >> >
>> >> > Ok with these nits fix

[C++ PATCH] Optimize void COND_EXPRs during cp_fold (PR c++/81675)

2017-11-24 Thread Jakub Jelinek
Hi!

The comment says that trying to fold VOID_TYPE_P COND_EXPRs is not worth
bothering, but as the following testcase shows, that is not the case.
fold_ternary can optimize COND_EXPRs where the condition is constant and
the unused branch doesn't have any labels, and not folding it early means
bogus warnings afterwards.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-11-24  Jakub Jelinek  

PR c++/81675
* cp-gimplify.c (cp_fold) : Don't return immediately
for VOID_TYPE_P COND_EXPRs, instead fold the operands and if op0 is
INTEGER_CST, ensure that both op1 and op2 are non-NULL and fall
through into normal folding, otherwise just rebuild x if any op
changed.

* g++.dg/warn/pr81675.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2017-11-21 20:23:01.0 +0100
+++ gcc/cp/cp-gimplify.c2017-11-24 11:09:51.879647584 +0100
@@ -2299,13 +2299,6 @@ cp_fold (tree x)
 
 case VEC_COND_EXPR:
 case COND_EXPR:
-
-  /* Don't bother folding a void condition, since it can't produce a
-constant value.  Also, some statement-level uses of COND_EXPR leave
-one of the branches NULL, so folding would crash.  */
-  if (VOID_TYPE_P (TREE_TYPE (x)))
-   return x;
-
   loc = EXPR_LOCATION (x);
   op0 = cp_fold_rvalue (TREE_OPERAND (x, 0));
   op1 = cp_fold (TREE_OPERAND (x, 1));
@@ -2319,6 +2312,29 @@ cp_fold (tree x)
  if (!VOID_TYPE_P (TREE_TYPE (op2)))
op2 = cp_truthvalue_conversion (op2);
}
+  else if (VOID_TYPE_P (TREE_TYPE (x)))
+   {
+ if (TREE_CODE (op0) == INTEGER_CST)
+   {
+ /* If the condition is constant, fold can fold away
+the COND_EXPR.  If some statement-level uses of COND_EXPR
+have one of the branches NULL, avoid folding crash.  */
+ if (!op1)
+   op1 = build_empty_stmt (loc);
+ if (!op2)
+   op2 = build_empty_stmt (loc);
+   }
+ else
+   {
+ /* Otherwise, don't bother folding a void condition, since
+it can't produce a constant value.  */
+ if (op0 != TREE_OPERAND (x, 0)
+ || op1 != TREE_OPERAND (x, 1)
+ || op2 != TREE_OPERAND (x, 2))
+   x = build3_loc (loc, code, TREE_TYPE (x), op0, op1, op2);
+ break;
+   }
+   }
 
   if (op0 != TREE_OPERAND (x, 0)
  || op1 != TREE_OPERAND (x, 1)
--- gcc/testsuite/g++.dg/warn/pr81675.C.jj  2017-11-24 11:12:13.962912829 
+0100
+++ gcc/testsuite/g++.dg/warn/pr81675.C 2017-11-24 11:11:56.0 +0100
@@ -0,0 +1,15 @@
+// PR c++/81675
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+struct S
+{
+  ~S () __attribute__((noreturn));
+  int a;
+};
+
+int
+foo ()
+{
+  false ? 5 : S ().a;
+}

Jakub


[PATCH] Fix combine's simplify_if_then_else (PR rtl-optimization/81553)

2017-11-24 Thread Jakub Jelinek
Hi!

The following testcase ICEs in wide-int*, but the reason is a mode mismatch
(we build a SImode MULT with one QImode argument and one VOIDmode argument,
then it is folded into SImode NEG with QImode argument, ...).
The bug is in assuming that the mode of c1 must be m, that is usually the
case, but shifts are special, the second argument can have a different mode.

The following patch makes sure we perform the computation of the new shift
count in the right mode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-24  Jakub Jelinek  

PR rtl-optimization/81553
* combine.c (simplify_if_then_else): In (if_then_else COND (OP Z C1) Z)
to (OP Z (mult COND (C1 * STORE_FLAG_VALUE))) optimization, if OP
is a shift where C1 has different mode than the whole shift, use C1's
mode for MULT rather than the shift's mode.

* gcc.c-torture/compile/pr81553.c: New test.

--- gcc/combine.c.jj2017-11-19 18:08:08.0 +0100
+++ gcc/combine.c   2017-11-24 12:07:25.701480794 +0100
@@ -6639,11 +6639,15 @@ simplify_if_then_else (rtx x)
 
   if (z)
{
- temp = subst (simplify_gen_relational (true_code, m, VOIDmode,
+ machine_mode cm = m;
+ if ((op == ASHIFT || op == LSHIFTRT || op == ASHIFTRT)
+ && GET_MODE (c1) != VOIDmode)
+   cm = GET_MODE (c1);
+ temp = subst (simplify_gen_relational (true_code, cm, VOIDmode,
 cond_op0, cond_op1),
pc_rtx, pc_rtx, 0, 0, 0);
- temp = simplify_gen_binary (MULT, m, temp,
- simplify_gen_binary (MULT, m, c1,
+ temp = simplify_gen_binary (MULT, cm, temp,
+ simplify_gen_binary (MULT, cm, c1,
   const_true_rtx));
  temp = subst (temp, pc_rtx, pc_rtx, 0, 0, 0);
  temp = simplify_gen_binary (op, m, gen_lowpart (m, z), temp);
--- gcc/testsuite/gcc.c-torture/compile/pr81553.c.jj2017-11-24 
12:11:25.681551110 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr81553.c   2017-11-24 
12:10:55.0 +0100
@@ -0,0 +1,10 @@
+/* PR rtl-optimization/81553 */
+
+int a, b, c, d;
+
+void
+foo (void)
+{
+  d = 1 >> c >> 1;
+  b = ~(209883449764912897ULL & d) << (0 >= a) | ~d;
+}

Jakub


[committed] Avoid -Wsurprising warning on OpenMP min/max array reductions (PR fortran/81304)

2017-11-24 Thread Jakub Jelinek
Hi!

gfc_trans_omp_array_reduction_or_udr creates artificial intrinsics symbol
just for the purpose of expansion of array min/max reductions.
That is something we certainly don't want a -Wsurprising warning on.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2017-11-24  Jakub Jelinek  

PR fortran/81304
* trans-openmp.c (gfc_trans_omp_array_reduction_or_udr): Set
attr.implicit_type in intrinsic_sym to avoid undesirable warning.

* testsuite/libgomp.fortran/pr81304.f90: New test.

--- gcc/fortran/trans-openmp.c.jj   2017-11-08 16:19:36.0 +0100
+++ gcc/fortran/trans-openmp.c  2017-11-24 13:41:41.995610897 +0100
@@ -1623,6 +1623,7 @@ gfc_trans_omp_array_reduction_or_udr (tr
   intrinsic_sym.attr.referenced = 1;
   intrinsic_sym.attr.intrinsic = 1;
   intrinsic_sym.attr.function = 1;
+  intrinsic_sym.attr.implicit_type = 1;
   intrinsic_sym.result = &intrinsic_sym;
   intrinsic_sym.declared_at = where;
 
--- libgomp/testsuite/libgomp.fortran/pr81304.f90.jj2017-11-24 
13:50:39.668043753 +0100
+++ libgomp/testsuite/libgomp.fortran/pr81304.f90   2017-11-24 
13:50:11.0 +0100
@@ -0,0 +1,17 @@
+! PR fortran/81304
+! { dg-do run }
+! { dg-options "-Wsurprising" }
+
+program pr81304
+   integer :: i
+   real, dimension(1:3) :: a, b, c
+   a = 128
+   b = 0
+!$omp parallel do reduction(min: a) reduction(max: b) private (c)  ! { 
dg-bogus "Type specified for intrinsic function" }
+   do i = 1, 16
+ c = (/ i, i - 5, i + 5 /)
+ a = min (a, c)
+ b = max (b, c)
+   end do
+   if (any (a /= (/ 1, -4, 6 /)) .or. any (b /= (/ 16, 11, 21 /))) call abort
+end

Jakub


[PATCH] Fix hot/cold partitioning with -gstabs{,+} (PR debug/81307)

2017-11-24 Thread Jakub Jelinek
Hi!

On most targets, N_SLINE has addresses relative to the start of
the function, which means -gstabs{,+} is completely broken with
hot/cold partitioning (fails to assemble almost anything that
has both partitions).  This used to be bearable when it wasn't
the default, but now that we hot/cold partition by default it means
STABS is completely unusable.

Because STABS should die soon, I'm not trying to propose any extension,
just emit something that assemble and be tolerable for debugging purposes
(after all, debugging optimized code with stabs isn't really a good idea
because it doesn't handle block fragments anyway).

What the patch does is that it treats hot/cold partitioned functions
basically as two functions, say main and main.cold.1, in the hot partition
everything should be normal, except that lexical scopes that only appear in
cold partition are not emitted in the [lr]brac tree for the hot partition.
Then the cold partition is yet another N_FUN, set of N_SLINE relative to
the start of the cold partition, and finally another [lr]brac block tree.
This one doesn't include any lexical scopes that are solely in the hot
partition, but we can have scopes that are in both, those are duplicated,
sometimes with merged vars from multiple scopes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Could anybody from the debugger folks test it a little bit (say gdb
testsuite if it has any -O2 -gstabs/-O2 -gstabs+ tests)?

2017-11-24  Jakub Jelinek  

PR debug/81307
* dbxout.c (lastlineno): New variable.
(dbx_debug_hooks): Use dbxout_switch_text_section as
switch_text_section debug hook.
(dbxout_function_end): Switch to current_function_section
rather than function_section.  If crtl->has_bb_partition,
output just one N_FUN, depending on in_cold_section_p.
(dbxout_source_line): Remember last lineno in lastlineno.
(dbxout_switch_text_section): New function.
(dbxout_function_decl): Adjust dbxout_block caller.
(dbx_block_with_cold_children): New function.
(dbxout_block): Return true if any LBRAC/RBRAC have been
emitted.  Use dbx_block_with_cold_children at depth == 0
in second partition.  Add PARENT_BLOCKNUM argument, pass
it optionally adjusted to children.  Output LBRAC/RBRAC
around recursive call only if the block is in the current
partition, if not and anything was output, emit empty
range LBRAC/RBRAC.
* final.c (final_scan_insn): Compute cold_function_name
before calling switch_text_section debug hook.  Call
that hook even if dwarf2out_do_frame if not emitting
dwarf debug info.

--- gcc/dbxout.c.jj 2017-11-01 22:49:18.0 +0100
+++ gcc/dbxout.c2017-11-24 18:04:24.053081853 +0100
@@ -244,6 +244,10 @@ static GTY(()) int source_label_number =
 
 static GTY(()) const char *lastfile;
 
+/* Last line number mentioned in a NOTE insn.  */
+
+static GTY(()) unsigned int lastlineno;
+
 /* Used by PCH machinery to detect if 'lastfile' should be reset to
base_input_file.  */
 static GTY(()) int lastfile_is_base;
@@ -334,6 +338,7 @@ static void debug_free_queue (void);
 
 static void dbxout_source_line (unsigned int, unsigned int, const char *,
int, bool);
+static void dbxout_switch_text_section (void);
 static void dbxout_begin_prologue (unsigned int, unsigned int, const char *);
 static void dbxout_source_file (const char *);
 static void dbxout_function_end (tree);
@@ -380,7 +385,7 @@ const struct gcc_debug_hooks dbx_debug_h
   dbxout_handle_pch,/* handle_pch */
   debug_nothing_rtx_insn,   /* var_location */
   debug_nothing_tree,   /* size_function */
-  debug_nothing_void,/* switch_text_section */
+  dbxout_switch_text_section,/* switch_text_section */
   debug_nothing_tree_tree,  /* set_name */
   0, /* start_end_main_source_file */
   TYPE_SYMTAB_IS_ADDRESS /* tree_type_symtab_field */
@@ -902,7 +907,7 @@ dbxout_function_end (tree decl ATTRIBUTE
 
   /* The Lscope label must be emitted even if we aren't doing anything
  else; dbxout_block needs it.  */
-  switch_to_section (function_section (current_function_decl));
+  switch_to_section (current_function_section ());
 
   /* Convert Lscope into the appropriate format for local labels in case
  the system doesn't insert underscores in front of user generated
@@ -923,11 +928,12 @@ dbxout_function_end (tree decl ATTRIBUTE
   if (crtl->has_bb_partition)
 {
   dbxout_begin_empty_stabs (N_FUN);
-  dbxout_stab_value_label_diff (crtl->subsections.hot_section_end_label,
-   crtl->subsections.hot_section_label);
-  dbxout_begin_empty_stabs (N_FUN);
-  dbxout_stab_value_label_diff (crtl->subsections.cold_section_end_label,
-   

[C++ PATCH] Avoid -Wreturn-type warnings if a switch has default label, no breaks inside of it, but is followed by a break (PR sanitizer/81275)

2017-11-24 Thread Jakub Jelinek
Hi!

The testcase below has a useless break; that causes a bogus -Wreturn-type
warning.  The C++ FE already has code to avoid adding a BREAK_STMT
after a return or similar sequence that is known not to return.
The following patch extends block_may_fallthrough to also return false
for SWITCH_STMTs that can't fall through.

Those are ones with non-empty body where the whole body can't fallthrough,
additionally they need to have a default: case label (or cover the whole
range of values, but that is not what this patch can compute, that would
be too big duplication of the gimplification processing) and no BREAK_STMT.

For the default: case label we need to look in all SWITCH_BODY children
except for nested SWITCH_STMTs, for BREAK_STMTs also not in
{FOR,DO,WHILE}_BODY.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-24  Jakub Jelinek  

PR sanitizer/81275
* cp-objcp-common.c (struct find_default_and_break_s): New type.
(find_default_and_break): New function.
(cxx_block_may_fallthru): Return false for SWITCH_STMT which
contains no BREAK_STMTs and contains a default: CASE_LABEL_EXPR.

* g++.dg/warn/pr81275.C: New test.

--- gcc/cp/cp-objcp-common.c.jj 2017-11-15 09:38:25.0 +0100
+++ gcc/cp/cp-objcp-common.c2017-11-24 20:08:35.047132170 +0100
@@ -335,6 +335,96 @@ init_shadowed_var_for_decl (void)
 = hash_table::create_ggc (512);
 }
 
+/* Data structure for find_default_and_break.  */
+
+struct find_default_and_break_s
+{
+  hash_set *pset;
+  bool found_default;
+  bool look_for_break;
+};
+
+/* Helper function for cxx_block_may_fallthru SWITCH_STMT processing.
+   Determine if SWITCH_STMT body contains a BREAK_STMT and also record
+   if it contains default: CASE_LABEL_EXPR.  */
+
+static tree
+find_default_and_break (tree *stmt_p, int *walk_subtrees, void *d)
+{
+  find_default_and_break_s *data = (find_default_and_break_s *) d;
+  tree stmt = *stmt_p;
+  switch (TREE_CODE (stmt))
+{
+case FOR_STMT:
+  if (cp_walk_tree (&FOR_INIT_STMT (stmt), find_default_and_break,
+   d, data->pset)
+ || cp_walk_tree (&FOR_COND (stmt), find_default_and_break,
+  d, data->pset)
+ || cp_walk_tree (&FOR_EXPR (stmt), find_default_and_break,
+  d, data->pset))
+   return stmt;
+  stmt = FOR_BODY (stmt);
+  goto loop_body;
+case WHILE_STMT:
+  if (cp_walk_tree (&WHILE_COND (stmt), find_default_and_break,
+   d, data->pset))
+   return stmt;
+  stmt = WHILE_BODY (stmt);
+  goto loop_body;
+case DO_STMT:
+  if (cp_walk_tree (&DO_COND (stmt), find_default_and_break,
+   d, data->pset))
+   return stmt;
+  stmt = DO_BODY (stmt);
+loop_body:
+  *walk_subtrees = 0;
+  /* Inside of {FOR,WHILE,DO}_BODY we can ignore BREAK_STMTs that
+are related to the corresponding loop, not the outer SWITCH_STMT.
+If we haven't still found a default:, we need to look for it
+though.  */
+  if (!data->found_default)
+   {
+ if (data->look_for_break)
+   {
+ if (cp_walk_tree (&stmt, find_default_and_break, d, data->pset))
+   return *stmt_p;
+   }
+ else
+   {
+ data->look_for_break = true;
+ if (cp_walk_tree (&stmt, find_default_and_break, d, data->pset))
+   {
+ data->look_for_break = false;
+ return *stmt_p;
+   }
+ data->look_for_break = false;
+   }
+   }
+  return NULL_TREE;
+case CASE_LABEL_EXPR:
+  if (CASE_LOW (stmt) == NULL_TREE && CASE_HIGH (stmt) == NULL_TREE)
+   {
+ /* Found default:.  */
+ data->found_default = true;
+ if (!data->look_for_break)
+   return stmt;
+   }
+  return NULL_TREE;
+case BREAK_STMT:
+  /* Found a break that can fall through out of the SWITCH_STMT.  */
+  return data->look_for_break ? stmt : NULL_TREE;
+case SWITCH_STMT:
+  /* Nested switch can't contain a break; nor default: for the outer
+switch.  */
+  *walk_subtrees = 0;
+  return NULL_TREE;
+default:
+  if (TYPE_P (stmt))
+   *walk_subtrees = 0;
+  return NULL_TREE;
+}
+}
+
 /* Return true if stmt can fall through.  Used by block_may_fallthru
default case.  */
 
@@ -349,6 +439,25 @@ cxx_block_may_fallthru (const_tree stmt)
 case THROW_EXPR:
   return false;
 
+case SWITCH_STMT:
+  if (SWITCH_STMT_BODY (stmt) == NULL_TREE
+ || block_may_fallthru (SWITCH_STMT_BODY (stmt)))
+   return true;
+  else
+   {
+ /* If the switch stmt body can't fall through, it can still
+fall through if there is no default: label (checking whether
+without default: case labels cover the whole range might be
+too exp

[PATCH] Fix bss_initializer_p (PR target/83100)

2017-11-24 Thread Jakub Jelinek
Hi!

As the testcases show, it is essential that bss_initializer_p returns
true for DECL_COMMON vars without initializer, even when they are
TREE_READONLY, otherwise they aren't really common, e.g.
  if (DECL_COMMON (decl))
{
  /* If the decl has been given an explicit section name, or it resides
 in a non-generic address space, then it isn't common, and shouldn't
 be handled as such.  */
  gcc_assert (DECL_SECTION_NAME (decl) == NULL
  && ADDR_SPACE_GENERIC_P (as));
  if (DECL_THREAD_LOCAL_P (decl))
return tls_comm_section;
  else if (TREE_PUBLIC (decl) && bss_initializer_p (decl))
return comm_section;
}
falls through into non-common code, and on section anchors target sometimes
ICE.

Fixed thusly, bootstrapped/regtested on powerpc64le-linux, bootstrapped
also on {powerpc64,x86_64,i686}-linux, ok for trunk if the pending
regtests on the above 3 targets succeed?

2017-11-24  Jakub Jelinek  

PR target/83100
* varasm.c (bss_initializer_p): Return true for DECL_COMMON
TREE_READONLY decls.

* gcc.dg/pr83100-1.c: New test.
* gcc.dg/pr83100-2.c: New test.
* gcc.dg/pr83100-3.c: New test.
* gcc.dg/pr83100-4.c: New test.

--- gcc/varasm.c.jj 2017-11-21 20:23:02.0 +0100
+++ gcc/varasm.c2017-11-24 21:43:55.616951823 +0100
@@ -986,9 +986,9 @@ decode_reg_name (const char *name)
 bool
 bss_initializer_p (const_tree decl)
 {
-  /* Do not put constants into the .bss section, they belong in a readonly
- section.  */
-  return (!TREE_READONLY (decl)
+  /* Do not put non-common constants into the .bss section, they belong in
+ a readonly section.  */
+  return ((!TREE_READONLY (decl) || DECL_COMMON (decl))
  && (DECL_INITIAL (decl) == NULL
  /* In LTO we have no errors in program; error_mark_node is used
 to mark offlined constructors.  */
--- gcc/testsuite/gcc.dg/pr83100-1.c.jj 2017-11-24 21:56:52.957497438 +0100
+++ gcc/testsuite/gcc.dg/pr83100-1.c2017-11-24 22:01:03.433437167 +0100
@@ -0,0 +1,7 @@
+/* PR target/83100 */
+/* { dg-do compile { target *-*-linux* *-*-gnu* } } */
+/* { dg-options "-O2 -fcommon -fdata-sections" } */
+
+const int a;
+
+/* { dg-final { scan-assembler "comm" } } */
--- gcc/testsuite/gcc.dg/pr83100-2.c.jj 2017-11-24 21:58:54.475012758 +0100
+++ gcc/testsuite/gcc.dg/pr83100-2.c2017-11-24 22:01:09.108367832 +0100
@@ -0,0 +1,15 @@
+/* PR target/83100 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fcommon -fdata-sections" } */
+/* { dg-additional-sources pr83100-3.c } */
+/* { dg-skip-if "-fdata-sections not supported" { hppa*-*-hpux* nvptx-*-* } } 
*/
+
+const int a;
+
+int
+main ()
+{
+  if (a != 7)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pr83100-3.c.jj 2017-11-24 21:59:57.072247956 +0100
+++ gcc/testsuite/gcc.dg/pr83100-3.c2017-11-24 22:01:14.344303860 +0100
@@ -0,0 +1,6 @@
+/* PR target/83100 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcommon -fdata-sections" } */
+/* { dg-skip-if "-fdata-sections not supported" { hppa*-*-hpux* nvptx-*-* } } 
*/
+
+const int a = 7;
--- gcc/testsuite/gcc.dg/pr83100-4.c.jj 2017-11-24 22:00:13.405048405 +0100
+++ gcc/testsuite/gcc.dg/pr83100-4.c2017-11-24 22:01:20.281231323 +0100
@@ -0,0 +1,7 @@
+/* PR target/83100 */
+/* { dg-do compile { target *-*-linux* *-*-gnu* } } */
+/* { dg-options "-O2 -fno-common -fdata-sections" } */
+
+const int a;
+
+/* { dg-final { scan-assembler "rodata.a" } } */

Jakub


Re: [PATCH] Fix powerpc* ICE with vec builtins with -mno-altivec (PR target/82848)

2017-11-24 Thread Segher Boessenkool
Hi Jakub,

On Thu, Nov 23, 2017 at 10:02:24PM +0100, Jakub Jelinek wrote:
> With -mno-altivec -mno-vsx -mno-fold-gimple we error on __builtin_vec*
> builtins used when corresponding ISA is not enabled.
> When folding gimple, in some cases we get away with it (e.g. when folding
> the builtin to PLUS_EXPR on the generic vectors), because
> tree-vect-generic.c lowers those into scalar piecewise operations,
> but e.g. in case of FMA_EXPR tree-vect-generic.c isn't doing anything,
> expects that FMA_EXPR is actually used only if supported, I think similarly
> for various vector widening operations etc.
> 
> One option is not to fold into gimple only those builtins where we know it
> will not work on generic vectors; my preference is to not fold any builtin
> which would be rejected at expansion time, so user get at least consistent
> diagnostics.  Usually people are using the altivec.h etc. headers that error
> if the ISA is not enabled, so this will only affect people that use the
> builtins directly.

Yeah I agree.

Okay for trunk, thanks!


Segher


> 2017-11-23  Jakub Jelinek  
> 
>   PR target/82848
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Don't fold
>   builtins not enabled in the currently selected ISA.

(You forgot changelog for the added testcase).


Re: [PATCH] Fix combine's simplify_if_then_else (PR rtl-optimization/81553)

2017-11-24 Thread Segher Boessenkool
Hi,

On Fri, Nov 24, 2017 at 10:38:13PM +0100, Jakub Jelinek wrote:
> The following testcase ICEs in wide-int*, but the reason is a mode mismatch
> (we build a SImode MULT with one QImode argument and one VOIDmode argument,
> then it is folded into SImode NEG with QImode argument, ...).
> The bug is in assuming that the mode of c1 must be m, that is usually the
> case, but shifts are special, the second argument can have a different mode.
> 
> The following patch makes sure we perform the computation of the new shift
> count in the right mode.

Okay, thanks!  Is this better than bailing out though, do you have
an example?


Segher


> 2017-11-24  Jakub Jelinek  
> 
>   PR rtl-optimization/81553
>   * combine.c (simplify_if_then_else): In (if_then_else COND (OP Z C1) Z)
>   to (OP Z (mult COND (C1 * STORE_FLAG_VALUE))) optimization, if OP
>   is a shift where C1 has different mode than the whole shift, use C1's
>   mode for MULT rather than the shift's mode.
> 
>   * gcc.c-torture/compile/pr81553.c: New test.