Re: Add more subreg offset helpers

2016-11-22 Thread Richard Sandiford
Eric Botcazou  writes:
>> subreg_offset_from_lsb was supposed to be the inverse operation of
>> subreg_lsb, which also returns a bit number.
>
> It would have helped the reviewer to state it in the function comment. ;-)
>
>> Should I change that to return a byte number as well?
>
> Both functions are fine as-is, but mention that the new one is the inverse of 
> the old one in the comment.

OK, how about the following version?  Tested as before.

Thanks,
Richard


gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtl.h (subreg_size_offset_from_lsb): Declare.
(subreg_offset_from_lsb): New function.
(subreg_size_lowpart_offset): Declare.
(subreg_lowpart_offset): Turn into an inline function.
(subreg_size_highpart_offset): Declare.
(subreg_highpart_offset): Turn into an inline function.
* emit-rtl.c (subreg_size_lowpart_offset): New function.
(subreg_size_highpart_offset): Likewise
* rtlanal.c (subreg_size_offset_from_lsb): Likewise.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 9ea0c8f..04ce2d1 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1478,44 +1478,41 @@ gen_highpart_mode (machine_mode outermode, machine_mode 
innermode, rtx exp)
  subreg_highpart_offset (outermode, innermode));
 }
 
-/* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
+/* Return the SUBREG_BYTE for a lowpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
 
 unsigned int
-subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_lowpart_offset (unsigned int outer_bytes, unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
-
-  if (difference > 0)
-{
-  if (WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
+  if (outer_bytes > inner_bytes)
+/* Paradoxical subregs always have a SUBREG_BYTE of 0.  */
+return 0;
 
-  return offset;
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return 0;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes, 0);
 }
 
-/* Return offset in bytes to get OUTERMODE high part
-   of the value in mode INNERMODE stored in memory in target format.  */
+/* Return the SUBREG_BYTE for a highpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
+
 unsigned int
-subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_highpart_offset (unsigned int outer_bytes,
+unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
+  gcc_assert (inner_bytes >= outer_bytes);
 
-  gcc_assert (GET_MODE_SIZE (innermode) >= GET_MODE_SIZE (outermode));
-
-  if (difference > 0)
-{
-  if (! WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (! BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
-
-  return offset;
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return 0;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes,
+   (inner_bytes - outer_bytes)
+   * BITS_PER_UNIT);
 }
 
 /* Return 1 iff X, assumed to be a SUBREG,
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 21f4860..660d381 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2178,6 +2178,24 @@ extern void get_full_rtx_cost (rtx, machine_mode, enum 
rtx_code, int,
 extern unsigned int subreg_lsb (const_rtx);
 extern unsigned int subreg_lsb_1 (machine_mode, machine_mode,
  unsigned int);
+extern unsigned int subreg_size_offset_from_lsb (unsigned int, unsigned int,
+unsigned int);
+
+/* Return the subreg byte offset for a subreg whose outer mode is
+   OUTER_MODE, whose inner mode is INNER_MODE, and where there are
+   LSB_SHIFT *bits* between the lsb of the outer value and the lsb of
+   the inner value.  This is the inverse of subreg_lsb_1 (which converts
+   byte offsets to bit shifts).  */
+
+inline unsigned int
+subreg_offset_from_lsb (machine_mode outer_mode,
+   machine_mode inner_mode,
+   unsigned int lsb_shift)
+{
+  return subreg_size_offset_from_lsb (GET_MODE_SIZE (outer_mode),
+ GET_MODE_SIZE (inner_mode), lsb_shift);
+}
+
 extern unsigned int subreg_regno_offset(unsigned int, machine_mode,

Re: [PATCH] Fix divmod expansion (PR middle-end/78416, take 2)

2016-11-22 Thread Richard Biener
On Mon, Nov 21, 2016 at 8:09 PM, Jakub Jelinek  wrote:
> Hi!
>
> On Fri, Nov 18, 2016 at 11:10:58PM +0100, Richard Biener wrote:
>> I wonder if transforming the const-int to wide int makes this all easier to 
>> read?
>
> Here is updated patch that does that.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2016-11-21  Jakub Jelinek  
>
> PR middle-end/78416
> * expmed.c (expand_divmod): Use wide_int for computation of
> op1_is_pow2.  Don't set it if op1 is 0.  Formatting fixes.
> Use size <= HOST_BITS_PER_WIDE_INT instead of
> HOST_BITS_PER_WIDE_INT >= size.
>
> * gcc.dg/torture/pr78416.c: New test.
>
> --- gcc/expmed.c.jj 2016-11-19 18:02:45.431380371 +0100
> +++ gcc/expmed.c2016-11-21 16:13:32.980271174 +0100
> @@ -3994,11 +3994,10 @@ expand_divmod (int rem_flag, enum tree_c
>op1_is_constant = CONST_INT_P (op1);
>if (op1_is_constant)
>  {
> -  unsigned HOST_WIDE_INT ext_op1 = UINTVAL (op1);
> -  if (unsignedp)
> -   ext_op1 &= GET_MODE_MASK (mode);
> -  op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1)
> -|| (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P 
> (-ext_op1;
> +  wide_int ext_op1 = rtx_mode_t (op1, mode);
> +  op1_is_pow2 = (wi::popcount (ext_op1) == 1
> +|| (! unsignedp
> +&& wi::popcount (wi::neg (ext_op1)) == 1));
>  }
>
>/*
> @@ -4079,11 +4078,10 @@ expand_divmod (int rem_flag, enum tree_c
>   not straightforward to generalize this.  Maybe we should make an array
>   of possible modes in init_expmed?  Save this for GCC 2.7.  */
>
> -  optab1 = ((op1_is_pow2 && op1 != const0_rtx)
> +  optab1 = (op1_is_pow2
> ? (unsignedp ? lshr_optab : ashr_optab)
> : (unsignedp ? udiv_optab : sdiv_optab));
> -  optab2 = ((op1_is_pow2 && op1 != const0_rtx)
> -   ? optab1
> +  optab2 = (op1_is_pow2 ? optab1
> : (unsignedp ? udivmod_optab : sdivmod_optab));
>
>for (compute_mode = mode; compute_mode != VOIDmode;
> @@ -4139,10 +4137,15 @@ expand_divmod (int rem_flag, enum tree_c
>/* convert_modes may have placed op1 into a register, so we
>  must recompute the following.  */
>op1_is_constant = CONST_INT_P (op1);
> -  op1_is_pow2 = (op1_is_constant
> -&& ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
> - || (! unsignedp
> - && EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL 
> (op1));
> +  if (op1_is_constant)
> +   {
> + wide_int ext_op1 = rtx_mode_t (op1, compute_mode);
> + op1_is_pow2 = (wi::popcount (ext_op1) == 1
> +|| (! unsignedp
> +&& wi::popcount (wi::neg (ext_op1)) == 1));
> +   }
> +  else
> +   op1_is_pow2 = 0;
>  }
>
>/* If one of the operands is a volatile MEM, copy it into a register.  */
> @@ -4182,10 +4185,10 @@ expand_divmod (int rem_flag, enum tree_c
> unsigned HOST_WIDE_INT mh, ml;
> int pre_shift, post_shift;
> int dummy;
> -   unsigned HOST_WIDE_INT d = (INTVAL (op1)
> -   & GET_MODE_MASK (compute_mode));
> +   wide_int wd = rtx_mode_t (op1, compute_mode);
> +   unsigned HOST_WIDE_INT d = wd.to_uhwi ();
>
> -   if (EXACT_POWER_OF_2_OR_ZERO_P (d))
> +   if (wi::popcount (wd) == 1)
>   {
> pre_shift = floor_log2 (d);
> if (rem_flag)
> @@ -4325,7 +4328,7 @@ expand_divmod (int rem_flag, enum tree_c
> else if (d == -1)
>   quotient = expand_unop (compute_mode, neg_optab, op0,
>   tquotient, 0);
> -   else if (HOST_BITS_PER_WIDE_INT >= size
> +   else if (size <= HOST_BITS_PER_WIDE_INT
>  && abs_d == HOST_WIDE_INT_1U << (size - 1))
>   {
> /* This case is not handled correctly below.  */
> @@ -4335,6 +4338,7 @@ expand_divmod (int rem_flag, enum tree_c
>   goto fail1;
>   }
> else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
> +&& (size <= HOST_BITS_PER_WIDE_INT || d >= 0)
>  && (rem_flag
>  ? smod_pow2_cheap (speed, compute_mode)
>  : sdiv_pow2_cheap (speed, compute_mode))
> @@ -4348,7 +4352,9 @@ expand_divmod (int rem_flag, enum tree_c
> compute_mode)
>  != CODE_FOR_nothing)))
>   ;
> -   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
> +   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)
> +&& (size <= HOST_BITS_PER_WID

Re: [PATCH] Fix ICE with masked stores (PR tree-optimization/78445)

2016-11-22 Thread Richard Biener
On Mon, Nov 21, 2016 at 8:25 PM, Jakub Jelinek  wrote:
> On Wed, Nov 16, 2016 at 09:14:57PM -0600, Bill Schmidt wrote:
>> 2016-11-16  Bill Schmidt  
>> Richard Biener  
>>
>>   PR tree-optimization/77848
>>   * tree-if-conv.c (tree_if_conversion): Always version loops unless
>>   the user specified -ftree-loop-if-convert.
>
> This broke the attached testcase.
>
>> --- gcc/tree-if-conv.c(revision 242521)
>> +++ gcc/tree-if-conv.c(working copy)
>> @@ -2803,10 +2803,12 @@ tree_if_conversion (struct loop *loop)
>> || loop->dont_vectorize))
>>  goto cleanup;
>>
>> -  /* Either version this loop, or if the pattern is right for outer-loop
>> - vectorization, version the outer loop.  In the latter case we will
>> - still if-convert the original inner loop.  */
>> -  if ((any_pred_load_store || any_complicated_phi)
>> +  /* Since we have no cost model, always version loops unless the user
>> + specified -ftree-loop-if-convert.  Either version this loop, or if
>> + the pattern is right for outer-loop vectorization, version the
>> + outer loop.  In the latter case we will still if-convert the
>> + original inner loop.  */
>> +  if (flag_tree_loop_if_convert != 1
>>&& !version_loop_for_if_conversion
>>(versionable_outer_loop_p (loop_outer (loop))
>> ? loop_outer (loop) : loop))
>
> If there are masked loads/stores (and I assume also the complicated phi
> stuff, but haven't verified), then it isn't just some kind of optimization
> to version the loop based on LOOP_VECTORIZED ifn, it is a requirement
> - MASK_LOAD/MASK_STORE aren't supported for scalar code, so they can only
> appear in the vectorized version.  Fixed by reverting that - if
> -ftree-loop-if-convert we'll do what we used to do before, without it
> we do what you've added, i.e. version always.
>
> The rest is just formatting fix, the too large argument that forces
> call's ( on the next line and even misindented is IMHO much cleaner
> if a temporary is used.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2016-11-21  Jakub Jelinek  
>
> PR tree-optimization/78445
> * tree-if-conv.c (tree_if_conversion): If any_pred_load_store or
> any_complicated_phi, version loop even if flag_tree_loop_if_convert is
> 1.  Formatting fix.
>
> * gcc.dg/pr78445.c: New test.
>
> --- gcc/tree-if-conv.c.jj   2016-11-17 18:08:12.0 +0100
> +++ gcc/tree-if-conv.c  2016-11-21 17:28:30.807242395 +0100
> @@ -2804,15 +2804,20 @@ tree_if_conversion (struct loop *loop)
>  goto cleanup;
>
>/* Since we have no cost model, always version loops unless the user
> - specified -ftree-loop-if-convert.  Either version this loop, or if
> - the pattern is right for outer-loop vectorization, version the
> - outer loop.  In the latter case we will still if-convert the
> - original inner loop.  */
> -  if (flag_tree_loop_if_convert != 1
> -  && !version_loop_for_if_conversion
> -  (versionable_outer_loop_p (loop_outer (loop))
> -   ? loop_outer (loop) : loop))
> -goto cleanup;
> + specified -ftree-loop-if-convert or unless versioning is required.
> + Either version this loop, or if the pattern is right for outer-loop
> + vectorization, version the outer loop.  In the latter case we will
> + still if-convert the original inner loop.  */
> +  if (any_pred_load_store
> +  || any_complicated_phi
> +  || flag_tree_loop_if_convert != 1)
> +{
> +  struct loop *vloop
> +   = (versionable_outer_loop_p (loop_outer (loop))
> +  ? loop_outer (loop) : loop);
> +  if (!version_loop_for_if_conversion (vloop))
> +   goto cleanup;
> +}
>
>/* Now all statements are if-convertible.  Combine all the basic
>   blocks into one huge basic block doing the if-conversion
> --- gcc/testsuite/gcc.dg/pr78445.c.jj   2016-11-21 17:30:58.534400256 +0100
> +++ gcc/testsuite/gcc.dg/pr78445.c  2016-11-21 17:30:41.0 +0100
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/78445 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-loop-if-convert -ftree-vectorize" } */
> +/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
> +
> +int a;
> +
> +void
> +foo (int x, int *y)
> +{
> +  while (a != 0)
> +if (x != 0)
> +  {
> +   *y = a;
> +   x = *y;
> +  }
> +else
> +  x = a;
> +}
>
>
> Jakub


Re: Rework subreg_get_info

2016-11-22 Thread Richard Sandiford
Eric Botcazou  writes:
>> Well, I think it's probably grammatical, but how about:
>> 
>>   If the register representation of a non-scalar mode has holes in it,
>>   we expect the scalar units to be concatenated together, with the holes
>>   distributed evenly among the scalar units.  Each scalar unit must occupy
>>   at least one register.
>
> Fine with me, thanks.
>
>> This actually was one of the more important changes :-)  In combination
>> with later patches, the idea is to move away from UNITS_PER_WORD tests
>> when endianness is regular (all big or all little) and only do them
>> when the distinction between bytes and words makes a real difference.
>> 
>> The specific motivating examples were SVE predicate registers, which
>> occupy VL*2 bytes for some runtime VL.  They are smaller than a word
>> when VL<4, word-sized when VL==4, and bigger than a word when VL>4.
>> We therefore can't calculate:
>> 
>>   GET_MODE_SIZE (ymode) > UNITS_PER_WORD
>> 
>> at compile time.  This is one of the patches that avoids forcing the
>> issue unless the answer really matters.
>
> So you plan to modify it again to remove the ysize > UNITS_PER_WORD
> test?  Or you don't really care about the REG_WORDS_BIG_ENDIAN !=
> WORDS_BIG_ENDIAN case?

Yeah, a later patch changed the form of the comparison so that it
forced the ordering wrt UNITS_PER_WORD to be known at compile time
(if the comparison was reached).  The idea is to do that kind of thing
only as a last resort, when there's a known reason for it to be safe.

> I understand that this formulation is intended to hide various combinations 
> of 
> WORDS_BIG_ENDIAN and BYTES_BIG_ENDIAN, but it's a net loss for most targets 
> where the old code boils down to an unconditional assignment to info->offset.
> Would it be doable to use the same handling as in subreg_size_lowpart_offset?

Actually, thinking more about it: the assumption we're making in the
WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN condition discussed below is
really:

  /* We assume that the ordering of registers within a multi-register
 value has a consistent endianness: if bytes and register words
 have different endianness, the hard registers that make up a
 multi-register value must be at least word-sized.  */

(quoted from the revised patch below).  And with that assumption this
check is simply REG_WORDS_BIG_ENDIAN vs. !REG_WORDS_BIG_ENDIAN.

(I've checked ports for mixed byte/word endianness (think that's
just pdp11) and word/reg-word endianness (think that's just c6x),
and the assumption does seem to hold.  I'd be very surprised if we
coped correctly with more exotic combinations, including for the
reasons quoted below.)

>> For WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN targets?  In practice
>> the old code didn't handle the case in which a single word spans more
>> than one register: if xmode was bigger than a word, ymode was smaller
>> than a word, and the number of registers in a ymode was smaller than
>> the number of registers in a word, we would need to take "normal"
>> endianness into account to resolve the subword register offset while
>> using REG_WORDS_BIG_ENDIAN for the word component.  Instead the old
>> code reversed the endianness relative the size of ymode, regardless of
>> whether ymode was bigger than a word or smaller than a word.  In other
>> words, the assumption seems to have been that REG_WORDS_BIG_ENDIAN is
>> effectively "endianness across multiple registers" and there is no need
>> to subdivide register offsets into words and subwords.
>> 
>> In practice that was OK, since AFAICT no target with WORDS_BIG_ENDIAN !=
>> REG_WORDS_BIG_ENDIAN had subword-sized registers.  This in turn means
>> that "block endianness" is always word endianness for these targets.
>
> Since you tested on c6x-elf, that's OK, but I think that a comment
> before the if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) test would be
> in order, stating the implicit assumption made at this point.

OK, how does this look?

Thanks,
Richard


gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (subreg_get_info): Use more local variables.
Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
occupies at least one register.  Assume that full hard registers
have consistent endianness.  Share previously-duplicated if block.
Rework the main handling so that it operates on independently-
addressable YMODE-sized blocks.  Use subreg_size_lowpart_offset
to check lowpart offsets, without trying to find an equivalent
integer mode first.  Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
as a final register-endianness correction.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index d29a3fe..17dbb1e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3588,31 +3588,29 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
 unsigned int offset, machine_mode ymode,
   

Tighten check for whether a sibcall references local variables

2016-11-22 Thread Richard Sandiford
This loop:

  /* Make sure the tail invocation of this function does not refer
 to local variables.  */
  FOR_EACH_LOCAL_DECL (cfun, idx, var)
{
  if (TREE_CODE (var) != PARM_DECL
  && auto_var_in_fn_p (var, cfun->decl)
  && (ref_maybe_used_by_stmt_p (call, var)
  || call_may_clobber_ref_p (call, var)))
return;
}

triggered even for local variables that are passed by value.
This meant that we didn't allow local aggregates to be passed
to a sibling call but did (for example) allow global aggregates
to be passed.

I think the loop is really checking for indirect references,
so should be able to skip any variables that never have their
address taken.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-tailcall.c (find_tail_calls): Allow calls to reference
local variables if all references are known to be direct.

gcc/testsuite/
* gcc.dg/tree-ssa/tailcall-8.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
new file mode 100644
index 000..ffeabe5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
@@ -0,0 +1,80 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+struct s { int x; };
+void f_direct (struct s);
+void f_indirect (struct s *);
+void f_void (void);
+
+/* Tail call.  */
+void
+g1 (struct s param)
+{
+  f_direct (param);
+}
+
+/* Tail call.  */
+void
+g2 (struct s *param_ptr)
+{
+  f_direct (*param_ptr);
+}
+
+/* Tail call.  */
+void
+g3 (struct s *param_ptr)
+{
+  f_indirect (param_ptr);
+}
+
+/* Tail call.  */
+void
+g4 (struct s *param_ptr)
+{
+  f_indirect (param_ptr);
+  f_void ();
+}
+
+/* Tail call.  */
+void
+g5 (struct s param)
+{
+  struct s local = param;
+  f_direct (local);
+}
+
+/* Tail call.  */
+void
+g6 (struct s param)
+{
+  struct s local = param;
+  f_direct (local);
+  f_void ();
+}
+
+/* Not a tail call.  */
+void
+g7 (struct s param)
+{
+  struct s local = param;
+  f_indirect (&local);
+}
+
+/* Not a tail call.  */
+void
+g8 (struct s *param_ptr)
+{
+  struct s local = *param_ptr;
+  f_indirect (&local);
+}
+
+/* Not a tail call.  */
+void
+g9 (struct s *param_ptr)
+{
+  struct s local = *param_ptr;
+  f_indirect (&local);
+  f_void ();
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 6 "tailc" } } */
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index f97541d..66a0a4c 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -504,12 +504,14 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
tail_recursion = true;
 }
 
-  /* Make sure the tail invocation of this function does not refer
- to local variables.  */
+  /* Make sure the tail invocation of this function does not indirectly
+ refer to local variables.  (Passing variables directly by value
+ is OK.)  */
   FOR_EACH_LOCAL_DECL (cfun, idx, var)
 {
   if (TREE_CODE (var) != PARM_DECL
  && auto_var_in_fn_p (var, cfun->decl)
+ && may_be_aliased (var)
  && (ref_maybe_used_by_stmt_p (call, var)
  || call_may_clobber_ref_p (call, var)))
return;



Re: Add more subreg offset helpers

2016-11-22 Thread Eric Botcazou
> 2016-11-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtl.h (subreg_size_offset_from_lsb): Declare.
>   (subreg_offset_from_lsb): New function.
>   (subreg_size_lowpart_offset): Declare.
>   (subreg_lowpart_offset): Turn into an inline function.
>   (subreg_size_highpart_offset): Declare.
>   (subreg_highpart_offset): Turn into an inline function.
>   * emit-rtl.c (subreg_size_lowpart_offset): New function.
>   (subreg_size_highpart_offset): Likewise
>   * rtlanal.c (subreg_size_offset_from_lsb): Likewise.

OK, thanks.

-- 
Eric Botcazou


Re: Rework subreg_get_info

2016-11-22 Thread Eric Botcazou
> Actually, thinking more about it: the assumption we're making in the
> WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN condition discussed below is
> really:
> 
>   /* We assume that the ordering of registers within a multi-register
>  value has a consistent endianness: if bytes and register words
>  have different endianness, the hard registers that make up a
>  multi-register value must be at least word-sized.  */
> 
> (quoted from the revised patch below).  And with that assumption this
> check is simply REG_WORDS_BIG_ENDIAN vs. !REG_WORDS_BIG_ENDIAN.

In other words, this would break for an architecture with subword-sized 
registers and different byte endianness and register word endianness.

> (I've checked ports for mixed byte/word endianness (think that's
> just pdp11) and word/reg-word endianness (think that's just c6x),
> and the assumption does seem to hold.  I'd be very surprised if we
> coped correctly with more exotic combinations, including for the
> reasons quoted below.)

That seems sensible to me.

> 2016-11-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtlanal.c (subreg_get_info): Use more local variables.
>   Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
>   occupies at least one register.  Assume that full hard registers
>   have consistent endianness.  Share previously-duplicated if block.
>   Rework the main handling so that it operates on independently-
>   addressable YMODE-sized blocks.  Use subreg_size_lowpart_offset
>   to check lowpart offsets, without trying to find an equivalent
>   integer mode first.  Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
>   as a final register-endianness correction.

OK, thanks.

-- 
Eric Botcazou


[PATCH][ARM] PR target/78439: Update movdi constraints for Cortex-A8 tuning to handle LDRD/STRD

2016-11-22 Thread Kyrill Tkachov

Hi all,

This PR is an ICE while bootstrapping GCC with Cortex-A8 tuning, which we also 
get from the default ARMv7-A tuning.
The ldrd/strd peepholes were recently made more aggressive and in this case 
they transform:
(insn 13 33 40 2 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp)
(const_int -28 [0xffe4])) [3 d.num_comps+0 S4 A64])
(reg:SI 12 ip [orig:117 _20 ] [117])) "cp-demangle.c":32 632 
{*arm_movsi_vfp}
 (expr_list:REG_DEAD (reg:SI 12 ip [orig:117 _20 ] [117])
(nil)))
(insn 40 13 39 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 11 fp)
(const_int -24 [0xffe8])) [2 d.subs+0 S4 A32])
(reg/f:SI 13 sp)) "cp-demangle.c":51 632 {*arm_movsi_vfp}
 (nil))

into:
(insn 68 33 39 2 (set (mem/c:DI (plus:SI (reg/f:SI 11 fp)
(const_int -28 [0xffe4])) [3 d.num_comps+0 S8 A64])
(reg:DI 12 ip)) "cp-demangle.c":51 -1
 (nil))

This is okay, but the *movdi_vfp_cortexa8 pattern doesn't deal with the IP 
being the source
of the store. The reason is that when the LDRD/STRD peepholes and machinery was 
introduced back in r197530
it created the 'q' constraint which should be used for the register operands of 
the DImode stores and loads
('q' means CORE_REGS when LDRD/STRD is enabled in ARM mode and GENERAL_REGS 
otherwise). That revision
updated the movdi_vfp pattern to use it in alternatives 4,5,6 but neglected to 
udpate the Cortex-A8-specific
pattern. This is a sign that we should perhaps get rid of this special-cased 
pattern at some point, but for now
this simple patch updates the appropriate alternatives to use the 'q' 
constraint so that output_move_double
can output the correct LDRD/STRD instruction.

Bootstrapped on arm-none-linux-gnueabihf with --with-arch=armv7-a that 
exercises this code (bootstrap currently fails
without this patch) and tested with /-mtune=cortex-a8.

Ok for trunk?

Thanks,
Kyrill

2016-11-22  Kyrylo Tkachov  

PR target/78439
* config/arm/vfp.md (*movdi_vfp_cortexa8): Use 'q' constraints for the
register operand in alternatives 4,5,6.

2016-11-22  Kyrylo Tkachov  

PR target/78439
* gcc.c-torture/compile/pr78439.c: New test.
commit 600526ea992fa58f87e6b0b4f821f4a2dfd0fa7a
Author: Kyrylo Tkachov 
Date:   Mon Nov 21 12:00:20 2016 +

[ARM] PR target/78439: Update movdi constraints for Cortex-A8 tuning to handled LDRD/STRD

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 2051f10..ce56e16 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -355,8 +355,8 @@ (define_insn "*movdi_vfp"
 )
 
 (define_insn "*movdi_vfp_cortexa8"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-   (match_operand:DI 1 "di_operand"  "r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,!r,w,w, Uv")
+	(match_operand:DI 1 "di_operand"		"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
   "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
 && (   register_operand (operands[0], DImode)
 || register_operand (operands[1], DImode))
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr78439.c b/gcc/testsuite/gcc.c-torture/compile/pr78439.c
new file mode 100644
index 000..a8af86b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr78439.c
@@ -0,0 +1,56 @@
+/* PR target/78439.  */
+
+enum demangle_component_type
+{
+  DEMANGLE_COMPONENT_THROW_SPEC
+};
+struct demangle_component
+{
+  enum demangle_component_type type;
+  struct
+  {
+struct
+{
+  struct demangle_component *left;
+  struct demangle_component *right;
+};
+  };
+};
+
+int a, b;
+
+struct d_info
+{
+  struct demangle_component *comps;
+  int next_comp;
+  int num_comps;
+  struct demangle_component *subs;
+  int num_subs;
+  int is_conversion;
+};
+
+void
+fn1 (int p1, struct d_info *p2)
+{
+  p2->num_comps = 2 * p1;
+  p2->next_comp = p2->num_subs = p1;
+  p2->is_conversion = 0;
+}
+
+int fn3 (int *);
+void fn4 (struct d_info *, int);
+
+void
+fn2 ()
+{
+  int c;
+  struct d_info d;
+  b = 0;
+  c = fn3 (&a);
+  fn1 (c, &d);
+  struct demangle_component e[d.num_comps];
+  struct demangle_component *f[d.num_subs];
+  d.comps = e;
+  d.subs = (struct demangle_component *) f;
+  fn4 (&d, 1);
+}


Re: [PATCH v2][PR libgfortran/78314] Fix ieee_support_halting

2016-11-22 Thread Szabolcs Nagy
On 21/11/16 14:16, FX wrote:
> Can you XFAIL the test on your platform, open a PR and assign it to me?

OK. Committed.

ARM and AArch64 may not support trapping so runtime and
compile time check can differ.

gcc/testsuite/
2016-11-22  Szabolcs Nagy  

PR libgfortran/78449
* gfortran.dg/ieee/ieee_8.f90 (aarch64*gnu, arm*gnu*): Mark xfail.

diff --git a/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 b/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90
index 9806bcf..7d0cdfd 100644
--- a/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90
@@ -1,4 +1,5 @@
-! { dg-do run }
+! { dg-do run { xfail aarch64*-*-gnu arm*-*-gnueabi arm*-*-gnueabihf } }
+! XFAIL because of PR libfortran/78449.
 
 module foo
   use :: ieee_exceptions


Re: [PATCH 01/11] use rtx_insn * more places where it is obvious

2016-11-22 Thread Andreas Schwab
../../gcc/config/ia64/ia64.c:7141:13: error: 'void ia64_emit_insn_before(rtx, 
rtx)' declared 'static' but never defined [-Werror=unused-function]
 static void ia64_emit_insn_before (rtx, rtx);
 ^

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Fix PR78154

2016-11-22 Thread Prathamesh Kulkarni
On 21 November 2016 at 15:34, Richard Biener  wrote:
> On Fri, 18 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 17 November 2016 at 15:24, Richard Biener  wrote:
>> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 17 November 2016 at 14:21, Richard Biener  wrote:
>> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> Hi Richard,
>> >> >> Following your suggestion in PR78154, the patch checks if stmt
>> >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
>> >> >> and returns true in that case.
>> >> >>
>> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress.
>> >> >> Would it be OK to commit this patch in stage-3 ?
>> >> >
>> >> > As people noted we have returns_nonnull for this and that is already
>> >> > checked.  So please make sure the builtins get this attribute instead.
>> >> OK thanks, I will add the returns_nonnull attribute to the required
>> >> string builtins.
>> >> I noticed some of the string builtins don't have RET1 in builtins.def:
>> >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF.
>> >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar
>> >> to entries for memmove, strcpy ?
>> >
>> > Yes, I think so.
>> Hi,
>> In the attached patch I added returns_nonnull attribute to
>> ATTR_RET1_NOTHROW_NONNULL_LEAF,
>> and changed few builtins like strcat, strncpy, strncat and
>> corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF.
>> Does the patch look correct ?
>
> Hmm, given you only change ATTR_RET1_NOTHROW_NONNULL_LEAF means that
> the gimple_stmt_nonzero_warnv_p code is incomplete -- it should
> infer returns_nonnull itself from RET1 (which is fnspec("1") basically)
> and the nonnull attribute on the argument.  So
>
>   unsigned rf = gimple_call_return_flags (stmt);
>   if (rf & ERF_RETURNS_ARG)
>{
>  tree arg = gimple_call_arg (stmt, rf & ERF_RETURN_ARG_MASK);
>  if (range of arg is ! VARYING)
>use range of arg;
>  else if (infer_nonnull_range_by_attribute (stmt, arg))
> ... nonnull ...
>
Hi,
Thanks for the suggestions, modified gimple_stmt_nonzero_warnv_p
accordingly in this version.
For functions like stpcpy that return nonnull but not one of it's
arguments, I added new enum ATTR_RETNONNULL_NOTHROW_LEAF.
Is that OK ?
Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm*-*-*, aarch64*-*-* in progress.

Thanks,
Prathamesh
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
2016-11-22  Richard Biener  
Prathamesh Kulkarni  

* tree-vrp.c (gimple_stmt_nonzero_warnv_p): Return true if function
returns it's argument and the argument is nonnull.
* builtin-attrs.def: Define ATTR_RETURNS_NONNULL,
ATT_RETNONNULL_NOTHROW_LEAF.
* builtins.def (BUILT_IN_MEMPCPY): Change attribute to
ATTR_RETNONNULL_NOTHROW_LEAF.
(BUILT_IN_STPCPY): Likewise.
(BUILT_IN_STPNCPY): Likewise.
(BUILT_IN_MEMPCPY_CHK): Likewise.
(BUILT_IN_STPCPY_CHK): Likewise.
(BUILT_IN_STPNCPY_CHK): Likewise.
(BUILT_IN_STRCAT): Change attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF.
(BUILT_IN_STRNCAT): Likewise.
(BUILT_IN_STRNCPY): Likewise.
(BUILT_IN_MEMSET_CHK): Likewise.
(BUILT_IN_STRCAT_CHK): Likewise.
(BUILT_IN_STRCPY_CHK): Likewise.
(BUILT_IN_STRNCAT_CHK): Likewise.
(BUILT_IN_STRNCPY_CHK): Likewise.

testsuite/
* gcc.dg/tree-ssa/pr78154.c: New test.
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 8dc59c9..94d0c62 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -108,6 +108,7 @@ DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic")
 DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm")
 DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure")
 DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice")
+DEF_ATTR_IDENT (ATTR_RETURNS_NONNULL, "returns_nonnull")
 
 DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL)
 
@@ -197,6 +198,9 @@ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, 
ATTR_NULL, \
and which return their first argument.  */
 DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, 
ATTR_LIST_STR1, \
ATTR_NOTHROW_NONNULL_LEAF)
+/* Nothrow leaf functions whose return value is nonnull.  */
+DEF_ATTR_TREE_LIST (ATTR_RETNONNULL_NOTHROW_LEAF, ATTR_RETURNS_NONNULL, 
ATTR_NULL, \
+   ATTR_NOTHROW_LEAF_LIST)
 /* Nothrow const leaf functions whose pointer parameter(s) are all nonnull.  */
 DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL_LEAF, ATTR_CONST, ATTR_NULL, \
ATTR_NOTHROW_NONNULL_LEAF)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 219feeb..82c987d 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins

Re: [www-patch] Document new -Wshadow= variants in gcc-7/changes.html

2016-11-22 Thread Mark Wielaard
On Mon, 2016-11-21 at 15:10 +0100, Gerald Pfeifer wrote:
> On Mon, 21 Nov 2016, Mark Wielaard wrote:
> Index: htdocs/gcc-7/changes.html
> ===
> +The -Wshadow warning has been split into 3
> 
> I believe for small numbers one usually spells them out ("three"
> instead of "3").

Fixed.

> +type is compatible (in C++ compatible means that the type of the
> +shadowing variable can be converted to that of the shadowed variable).
> +
> +The following example shows the different kinds of shadow
> +warnings:
> 
> Take care, what looks like a paragraph between the two above will
> just show up as a blank when rendered as HTML.  If you want to
> retain the paragraph, use 

Added ... around the paragraphs.

Pushed with those changes.

Thanks,

Mark


Re: [PATCH, GCC/ARM] Fix PR77904: callee-saved register trashed when clobbering sp

2016-11-22 Thread Thomas Preudhomme

On 17/11/16 09:11, Kyrill Tkachov wrote:


On 17/11/16 08:56, Thomas Preudhomme wrote:

On 16/11/16 10:30, Kyrill Tkachov wrote:

Hi Thomas,

On 03/11/16 16:52, Thomas Preudhomme wrote:

Hi,

When using a callee-saved register to save the frame pointer the Thumb-1
prologue fails to save the callee-saved register before that. For ARM and
Thumb-2 targets the frame pointer is handled as a special case but nothing is
done for Thumb-1 targets. This patch adds the same logic for Thumb-1 targets.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-02  Thomas Preud'homme 

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): mark frame pointer
in save register mask if it is needed.



s/mark/Mark/



*** gcc/testsuite/ChangeLog ***

2016-11-02  Thomas Preud'homme 

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Testing: Testsuite shows no regression when run with arm-none-eabi GCC
cross-compiler for Cortex-M0 target.

Is this ok for trunk?



I'd ask for a bootstrap, but this code is Thumb-1 only so it wouldn't affect
anything.


I can bootstrap for armv4t with --with-mode=thumb which would at least
exercise the path. I'll try such a bootstrap on qemu.



If you can get it to work, then yes please.


Bootstrap came back clean so I've committed the patch (r242693). Thanks!

Best regards,

Thomas


Re: Tighten check for whether a sibcall references local variables

2016-11-22 Thread Richard Biener
On Tue, Nov 22, 2016 at 10:00 AM, Richard Sandiford
 wrote:
> This loop:
>
>   /* Make sure the tail invocation of this function does not refer
>  to local variables.  */
>   FOR_EACH_LOCAL_DECL (cfun, idx, var)
> {
>   if (TREE_CODE (var) != PARM_DECL
>   && auto_var_in_fn_p (var, cfun->decl)
>   && (ref_maybe_used_by_stmt_p (call, var)
>   || call_may_clobber_ref_p (call, var)))
> return;
> }
>
> triggered even for local variables that are passed by value.
> This meant that we didn't allow local aggregates to be passed
> to a sibling call but did (for example) allow global aggregates
> to be passed.
>
> I think the loop is really checking for indirect references,
> so should be able to skip any variables that never have their
> address taken.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.  We've had various correctness issues in this part in the past so
I'd prefer if you can rewrite your dg-do compile tests to dg-do run ones
that verify the code works correctly.  I suggest to use a dg-additional-sources
with a separate TU for the execution driver.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-tailcall.c (find_tail_calls): Allow calls to reference
> local variables if all references are known to be direct.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/tailcall-8.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
> new file mode 100644
> index 000..ffeabe5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
> @@ -0,0 +1,80 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-tailc-details" } */
> +
> +struct s { int x; };
> +void f_direct (struct s);
> +void f_indirect (struct s *);
> +void f_void (void);
> +
> +/* Tail call.  */
> +void
> +g1 (struct s param)
> +{
> +  f_direct (param);
> +}
> +
> +/* Tail call.  */
> +void
> +g2 (struct s *param_ptr)
> +{
> +  f_direct (*param_ptr);
> +}
> +
> +/* Tail call.  */
> +void
> +g3 (struct s *param_ptr)
> +{
> +  f_indirect (param_ptr);
> +}
> +
> +/* Tail call.  */
> +void
> +g4 (struct s *param_ptr)
> +{
> +  f_indirect (param_ptr);
> +  f_void ();
> +}
> +
> +/* Tail call.  */
> +void
> +g5 (struct s param)
> +{
> +  struct s local = param;
> +  f_direct (local);
> +}
> +
> +/* Tail call.  */
> +void
> +g6 (struct s param)
> +{
> +  struct s local = param;
> +  f_direct (local);
> +  f_void ();
> +}
> +
> +/* Not a tail call.  */
> +void
> +g7 (struct s param)
> +{
> +  struct s local = param;
> +  f_indirect (&local);
> +}
> +
> +/* Not a tail call.  */
> +void
> +g8 (struct s *param_ptr)
> +{
> +  struct s local = *param_ptr;
> +  f_indirect (&local);
> +}
> +
> +/* Not a tail call.  */
> +void
> +g9 (struct s *param_ptr)
> +{
> +  struct s local = *param_ptr;
> +  f_indirect (&local);
> +  f_void ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Found tail call" 6 "tailc" } } */
> diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
> index f97541d..66a0a4c 100644
> --- a/gcc/tree-tailcall.c
> +++ b/gcc/tree-tailcall.c
> @@ -504,12 +504,14 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
> tail_recursion = true;
>  }
>
> -  /* Make sure the tail invocation of this function does not refer
> - to local variables.  */
> +  /* Make sure the tail invocation of this function does not indirectly
> + refer to local variables.  (Passing variables directly by value
> + is OK.)  */
>FOR_EACH_LOCAL_DECL (cfun, idx, var)
>  {
>if (TREE_CODE (var) != PARM_DECL
>   && auto_var_in_fn_p (var, cfun->decl)
> + && may_be_aliased (var)
>   && (ref_maybe_used_by_stmt_p (call, var)
>   || call_may_clobber_ref_p (call, var)))
> return;
>


[arm-embedded] [PATCH, GCC/ARM, ping] Fix PR77904: callee-saved register trashed when clobbering sp

2016-11-22 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to fix callee-saved register corruption 
when clobbering sp to our embedded-6-branch.


*** gcc/ChangeLog.arm ***

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): Mark frame pointer
in save register mask if it is needed.


*** gcc/testsuite/ChangeLog.arm ***

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Best regards,

Thomas
--- Begin Message ---

Ping?

Best regards,

Thomas

On 03/11/16 16:52, Thomas Preudhomme wrote:

Hi,

When using a callee-saved register to save the frame pointer the Thumb-1
prologue fails to save the callee-saved register before that. For ARM and
Thumb-2 targets the frame pointer is handled as a special case but nothing is
done for Thumb-1 targets. This patch adds the same logic for Thumb-1 targets.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): mark frame pointer
in save register mask if it is needed.


*** gcc/testsuite/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Testing: Testsuite shows no regression when run with arm-none-eabi GCC
cross-compiler for Cortex-M0 target.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..c7bf3320a3db5dfc4f33ae145ff2e5f239d6c0f9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19495,6 +19495,10 @@ thumb1_compute_save_reg_mask (void)
 if (df_regs_ever_live_p (reg) && callee_saved_reg_p (reg))
   mask |= 1 << reg;
 
+  /* Handle the frame pointer as a special case.  */
+  if (frame_pointer_needed)
+mask |= 1 << HARD_FRAME_POINTER_REGNUM;
+
   if (flag_pic
   && !TARGET_SINGLE_PIC_BASE
   && arm_pic_register != INVALID_REGNUM
diff --git a/gcc/testsuite/gcc.target/arm/pr77904.c b/gcc/testsuite/gcc.target/arm/pr77904.c
new file mode 100644
index ..76728c07e73350ce44160cabff3dd2fa7a6ef021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77904.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_sp (void)
+{
+  __asm volatile ("" : : : "sp");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+  clobber_sp ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}
--- End Message ---


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-22 Thread Martin Liška
On 11/16/2016 05:28 PM, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 05:01:31PM +0100, Martin Liška wrote:
>> +  use_operand_p use_p;
>> +  imm_use_iterator imm_iter;
>> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
>> +{
>> +  gimple *use = USE_STMT (use_p);
>> +  if (is_gimple_debug (use))
>> +continue;
>> +
>> +  built_in_function b = (recover_p
>> + ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
>> + : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
>> +  tree fun = builtin_decl_implicit (b);
>> +  pretty_printer pp;
>> +  pp_tree_identifier (&pp, DECL_NAME (var_decl));
>> +
>> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (&pp),
>> +   DECL_SIZE_UNIT (var_decl));
>> +  gimple_set_location (call, gimple_location (use));
>> +
>> +  /* The USE can be a gimple PHI node.  If so, insert the call on
>> + all edges leading to the PHI node.  */
>> +  if (is_a  (use))
>> +{
>> +  gphi * phi = dyn_cast (use);
> 
> No space after *.

Done.

> 
>> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>> +if (gimple_phi_arg_def (phi, i) == poisoned_var)
>> +  {
>> +edge e = gimple_phi_arg_edge (phi, i);
>> +gsi_insert_seq_on_edge (e, call);
>> +*need_commit_edge_insert = true;
> 
> You clearly don't have a sufficient testsuite coverage for this,
> because this won't really work if you have more than one phi
> argument equal to poisoned_var.  Inserting the same gimple stmt
> into multiple places can't really work.  I bet you want to set
> call to NULL after the gsi_insert_seq_on_edge and before that
> call if (call == NULL) { call = gimple_build_call (...); gimple_set_location 
> (...); }
> Or maybe gimple_copy for the 2nd etc. would work too, dunno.

I see, fixed by using gimple_copy functionality.

> 
>> +  }
>> +}
>> +  else
>> +{
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (use);
>> +  gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>> +}
>> +}
>> +
>> +  gimple *nop = gimple_build_nop ();
>> +  SSA_NAME_IS_DEFAULT_DEF (poisoned_var) = true;
>> +  SSA_NAME_DEF_STMT (poisoned_var) = nop;
>> +  gsi_replace (iter, nop, GSI_NEW_STMT);
> 
> The last argument of gsi_replace is a bool, not GSI_*.
> But not sure how this will work anyway, I think SSA_NAME_IS_DEFAULT_DEF
> are supposed to have SSA_NAME_DEF_STMT a GIMPLE_NOP that doesn't
> have bb set, while you are putting it into the stmt sequence.
> Shouldn't you just gsi_remove iter instead?

gsi_remove does not work as a SSA name would lost a defining statement. However
setting SSA_NAME_DEF_STMT (poisoned_var) = gimple_build_nop () and removing the 
stmt
works fine. I haven't known that it can't belong to a BB.

Maybe we can add a verifier for that?

diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 2d9c62d..8fd4e91 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -767,6 +767,15 @@ verify_ssa_name (tree ssa_name, bool is_virtual)
   return true;
 }
 
+  if (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
+  && gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
+  && gimple_bb (SSA_NAME_DEF_STMT (ssa_name)) != NULL)
+{
+  error ("defining statement of a default name can't belong to a basic "
+"block");
+  return true;
+}
+
   return false;
 }

That can be eventually done independently.

> 
> Otherwise LGTM, but please post the asan patch to llvm-commits
> or through their web review interface.

Good, I'm going to insert the patch to the tool.

Thanks,
Martin

> 
>   Jakub
> 

>From da190584d091eaaa509067918de4f1f77e887484 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Nov 2016 16:49:05 +0100
Subject: [PATCH] use-after-scope: introduce ASAN_POISON internal fn

gcc/ChangeLog:

2016-11-16  Martin Liska  

	* asan.c (asan_expand_poison_ifn): New function.
	* asan.h (asan_expand_poison_ifn): Declare the function.
	* internal-fn.c (expand_ASAN_POISON): New function.
	* internal-fn.def (ASAN_POISON): New internal fn.
	* sanitizer.def (BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT):
	New built-in.
	(BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE): Likewise.
	* sanopt.c (pass_sanopt::execute): Expand IFN_ASAN_POISON.
	* tree-ssa.c (is_asan_mark_p): New function.
	(execute_update_addresses_taken): Make local variables as not
	addressable if address of these varibles is just taken by
	ASAN_MARK.

gcc/testsuite/ChangeLog:

2016-11-16  Martin Liska  

	* gcc.dg/asan/use-after-scope-3.c: Run just with -O0.
	* gcc.dg/asan/use-after-scope-9.c: Run just with -O2 and
	change expected output.
---
 gcc/asan.c| 77 ++-
 gcc/asan.h|  1 +
 gcc/internal-fn.c |  7 +++
 gcc/internal-fn.def   |  1 +
 gcc/sanitizer.def |  8 +++
 gcc/sanopt.c  

Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f

2016-11-22 Thread Michael Matz
Hi,

On Mon, 21 Nov 2016, Richard Biener wrote:

> >> Btw, this probably means that on haswell (or other archs with vf==4) mgrid
> >> is slower than necessary.  On mgrid you really really want predictive
> >> commoning to happen.  Vectorization isn't that interesting there.
> > Interesting, I will check if there is difference between 2/4 vf.  we
> > do have cases that smaller vf is better and should be chosen, though
> > for different reasons.
> 
> At some time in the past we had predictive commoning done before 
> vectorization (GCC 4.3 at least).

Not relevant here.  The testcase and patch associated with it came for 4.5 
where predcom was after vectorization already and the latter disabled the 
former (and mgrid had a 40% performance hit).  PR41783 and 
https://gcc.gnu.org/ml/gcc-patches/2010-01/msg00855.html FWIW.


Ciao,
Michael.


Re: [PATCH][ARM] PR target/78439: Update movdi constraints for Cortex-A8 tuning to handle LDRD/STRD

2016-11-22 Thread Ramana Radhakrishnan
On Tue, Nov 22, 2016 at 9:57 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This PR is an ICE while bootstrapping GCC with Cortex-A8 tuning, which we
> also get from the default ARMv7-A tuning.
> The ldrd/strd peepholes were recently made more aggressive and in this case
> they transform:
> (insn 13 33 40 2 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S4
> A64])
> (reg:SI 12 ip [orig:117 _20 ] [117])) "cp-demangle.c":32 632
> {*arm_movsi_vfp}
>  (expr_list:REG_DEAD (reg:SI 12 ip [orig:117 _20 ] [117])
> (nil)))
> (insn 40 13 39 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -24 [0xffe8])) [2 d.subs+0 S4 A32])
> (reg/f:SI 13 sp)) "cp-demangle.c":51 632 {*arm_movsi_vfp}
>  (nil))
>
> into:
> (insn 68 33 39 2 (set (mem/c:DI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S8
> A64])
> (reg:DI 12 ip)) "cp-demangle.c":51 -1
>  (nil))
>
> This is okay, but the *movdi_vfp_cortexa8 pattern doesn't deal with the IP
> being the source
> of the store. The reason is that when the LDRD/STRD peepholes and machinery
> was introduced back in r197530
> it created the 'q' constraint which should be used for the register operands
> of the DImode stores and loads
> ('q' means CORE_REGS when LDRD/STRD is enabled in ARM mode and GENERAL_REGS
> otherwise). That revision
> updated the movdi_vfp pattern to use it in alternatives 4,5,6 but neglected
> to udpate the Cortex-A8-specific
> pattern. This is a sign that we should perhaps get rid of this special-cased
> pattern at some point, but for now
> this simple patch updates the appropriate alternatives to use the 'q'
> constraint so that output_move_double
> can output the correct LDRD/STRD instruction.
>
> Bootstrapped on arm-none-linux-gnueabihf with --with-arch=armv7-a that
> exercises this code (bootstrap currently fails
> without this patch) and tested with /-mtune=cortex-a8.
>
> Ok for trunk?

Ok.

Ramana
>
> Thanks,
> Kyrill
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * config/arm/vfp.md (*movdi_vfp_cortexa8): Use 'q' constraints for the
> register operand in alternatives 4,5,6.
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * gcc.c-torture/compile/pr78439.c: New test.


[Patch, Fortran, OOP] PR 78443: Incorrect behavior with non_overridable keyword

2016-11-22 Thread Janus Weil
Hi all,

here is a patch for a wrong-code problem with non_overridable
type-bound procedures. For details see the PR. Regtests cleanly. Ok
for trunk?

Since the patch is very simple and it fixes wrong code which can
silently give bad runtime results, I think backporting to the release
branches might be a good idea as well. Ok?

Cheers,
Janus


2016-11-22  Janus Weil  

PR fortran/78443
* class.c (add_proc_comp): Add a vtype component for non-overridable
procedures that are overriding.

2016-11-22  Janus Weil  

PR fortran/78443
* gfortran.dg/typebound_proc_35.f90: New test case.
Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c (Revision 242657)
+++ gcc/fortran/class.c (Arbeitskopie)
@@ -751,7 +751,7 @@ add_proc_comp (gfc_symbol *vtype, const char *name
 {
   gfc_component *c;
 
-  if (tb->non_overridable)
+  if (tb->non_overridable && !tb->overridden)
 return;
 
   c = gfc_find_component (vtype, name, true, true, NULL);
! { dg-do run }
!
! PR 78443: [OOP] Incorrect behavior with non_overridable keyword
!
! Contributed by federico 

module types
implicit none


! Abstract parent class and its child type
type, abstract :: P1
contains
procedure :: test => test1
procedure (square_interface), deferred :: square
endtype

! Deferred procedure interface
abstract interface
function square_interface( this, x ) result( y )
   import P1
   class(P1) :: this
   real :: x, y
end function square_interface
end interface

type, extends(P1) :: C1
contains
   procedure, non_overridable :: square => C1_square
endtype

! Non-abstract parent class and its child type
type :: P2
contains
procedure :: test => test2
procedure :: square => P2_square
endtype

type, extends(P2) :: C2
contains
   procedure, non_overridable :: square => C2_square
endtype

contains

real function test1( this, x )
class(P1) :: this
real :: x
test1 = this % square( x )
end function

real function test2( this, x )
class(P2) :: this
real :: x
test2 = this % square( x )
end function

function P2_square( this, x ) result( y )
   class(P2) :: this
   real :: x, y
   y = -100.  ! dummy
end function

function C1_square( this, x ) result( y )
   class(C1) :: this
   real :: x, y
   y = x**2
end function

function C2_square( this, x ) result( y )
   class(C2) :: this
   real :: x, y
   y = x**2
end function

end module

program main
use types
implicit none
type(P2) :: t1
type(C2) :: t2
type(C1) :: t3

if ( t1 % test( 2.0 ) /= -100) call abort()
if ( t2 % test( 2.0 ) /= 4) call abort()
if ( t3 % test( 2.0 ) /= 4) call abort()
end program


Re: [PATCH] Fix ICE with -Wuninitialized (PR tree-optimization/78455)

2016-11-22 Thread Aldy Hernandez

On 11/22/2016 02:49 AM, Jakub Jelinek wrote:

On Mon, Nov 21, 2016 at 04:02:40PM -0800, Marek Polacek wrote:

What seems like a typo caused an ICE here.  We've got a vector of vectors here
and we're trying to walk all the elements, so the second loop oughta use 'j'.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-11-21  Marek Polacek  

PR tree-optimization/78455
* tree-ssa-uninit.c (can_chain_union_be_invalidated_p): Fix typo.

* gcc.dg/uninit-23.c: New.


I'd say this is even obvious.  Ok.

Jakub



Thank you Marek.

Aldy


[PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Uros Bizjak
Hello!

New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.

gcc/ChangeLog

2016-11-22  Uros Bizjak  

* Makefile.in ($(lang_checks_parallelized)): Fix detection
of -j argument.

gcc/ada/ChangeLog

2016-11-22  Uros Bizjak  

* gcc-interface/Make-lang.in (check-acats): Fix detection
of -j argument.

libstdc++-v3/ChangeLog

2016-11-22  Uros Bizjak  

* testsuite/Makefile.am
(check-DEJAGNU $(check_DEJAGNU_normal_targets)):Fix detection
of -j argument.
* testsuite/Makefile.in: Regenereate.

Patch was bootstrapped and regression tested on x86_64-linux-gnu with
"GNU Make 4.2.1" and "GNU Make 3.81". Ada was not checked, but the
change is consistent with other changes.

OK for mainline SVN and release branches?

Uros.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7ecd1e4..d1acede 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3914,7 +3914,7 @@ check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 
1, \
 # testsuites like objc or go.
 $(lang_checks_parallelized): check-% : site.exp
-rm -rf $(TESTSUITEDIR)/$*-parallel
-   @if [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
+   @if [ -n "$(filter -j%, $(MFLAGS))" ]; then \
  test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) || true; \
  test -d $(TESTSUITEDIR)/$*-parallel || mkdir 
$(TESTSUITEDIR)/$*-parallel || true; \
  
GCC_RUNTEST_PARALLELIZE_DIR=`${PWD_COMMAND}`/$(TESTSUITEDIR)/$(check_p_tool)-parallel
 ; \
diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index b5d1f0e..eb0489b 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -890,7 +890,7 @@ check-acats:
@test -d $(ACATSDIR) || mkdir -p $(ACATSDIR); \
rootme=`${PWD_COMMAND}`; export rootme; \
EXPECT=$(EXPECT); export EXPECT; \
-   if [ -z "$(CHAPTERS)" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; \
+   if [ -z "$(CHAPTERS)" ] && [ -n "$(filter -j%, $(MFLAGS))" ]; \
then \
  rm -rf $(ACATSDIR)-parallel; \
  mkdir $(ACATSDIR)-parallel; \
diff --git a/libstdc++-v3/testsuite/Makefile.am 
b/libstdc++-v3/testsuite/Makefile.am
index af57d0b..65848f0 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -117,7 +117,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
normal%/site.exp
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
$(if $*,@)AR="$(AR)"; export AR; \
RANLIB="$(RANLIB)"; export RANLIB; \
-   if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
+   if [ -z "$*" ] && [ -n "$(filter -j%, $(MFLAGS))" ]; then \
  rm -rf normal-parallel || true; \
  mkdir normal-parallel; \
  $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \
diff --git a/libstdc++-v3/testsuite/Makefile.in 
b/libstdc++-v3/testsuite/Makefile.in
index b37758b..1cdf4b8 100644
--- a/libstdc++-v3/testsuite/Makefile.in
+++ b/libstdc++-v3/testsuite/Makefile.in
@@ -598,7 +598,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
normal%/site.exp
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
$(if $*,@)AR="$(AR)"; export AR; \
RANLIB="$(RANLIB)"; export RANLIB; \
-   if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
+   if [ -z "$*" ] && [ -n "$(filter -j%, $(MFLAGS))" ]; then \
  rm -rf normal-parallel || true; \
  mkdir normal-parallel; \
  $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \


[patch,avr] Fix PR60300: Minor prologue improvement.

2016-11-22 Thread Georg-Johann Lay
This patch is a minor improvement of prologue length.  It now allows 
frame sizes of up to 11 to be allocated by RCALL + PUSH 0 sequences but 
limits the number of RCALLs to 3.


The PR has some discussion on size vs. speed consideration w.r. to using 
RCALL in prologues, and following that I picked the rather arbitrary 
upper bound of 3 RCALLs.  The prior maximal frame size opt to such 
sequences was 6 which also never produced more than 3 RCALLs.


Ok for trunk?


Johann

gcc/
PR target/60300
* config/avr/constraints.md (Csp): Widen range to [-11..6].
* config/avr/avr.c (avr_prologue_setup_frame): Limit number
of RCALLs in prologue to 3.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 242672)
+++ config/avr/avr.c	(working copy)
@@ -1687,7 +1687,11 @@ avr_prologue_setup_frame (HOST_WIDE_INT
   /* Stack adjustment by means of RCALL . and/or PUSH __TMP_REG__
  can only handle specific offsets.  */
 
-  if (avr_sp_immediate_operand (gen_int_mode (-size, HImode), HImode))
+  int n_rcall = size / (AVR_3_BYTE_PC ? 3 : 2);
+
+  if (avr_sp_immediate_operand (gen_int_mode (-size, HImode), HImode)
+  // Don't use more than 3 RCALLs.
+  && n_rcall <= 3)
 {
   rtx_insn *sp_plus_insns;
 
Index: config/avr/constraints.md
===
--- config/avr/constraints.md	(revision 242671)
+++ config/avr/constraints.md	(working copy)
@@ -189,9 +189,9 @@ (define_constraint "Cx4"
(match_test "avr_popcount_each_byte (op, 4, (1<<0) | (1<<8))")))
 
 (define_constraint "Csp"
-  "Integer constant in the range -6 @dots{} 6."
+  "Integer constant in the range -11 @dots{} 6."
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, -6, 6)")))
+   (match_test "IN_RANGE (ival, -11, 6)")))
 
 (define_constraint "Cxf"
   "32-bit integer constant where at least one nibble is 0xf."


Re: PR78153

2016-11-22 Thread Prathamesh Kulkarni
On 21 November 2016 at 15:10, Richard Biener  wrote:
> On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>
>> Hi,
>> As suggested by Martin in PR78153 strlen's return value cannot exceed
>> PTRDIFF_MAX.
>> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
>> in the attached patch.
>>
>> However it regressed strlenopt-3.c:
>>
>> Consider fn1() from strlenopt-3.c:
>>
>> __attribute__((noinline, noclone)) size_t
>> fn1 (char *p, char *q)
>> {
>>   size_t s = strlen (q);
>>   strcpy (p, q);
>>   return s - strlen (p);
>> }
>>
>> The optimized dump shows the following:
>>
>> __attribute__((noclone, noinline))
>> fn1 (char * p, char * q)
>> {
>>   size_t s;
>>   size_t _7;
>>   long unsigned int _9;
>>
>>   :
>>   s_4 = strlen (q_3(D));
>>   _9 = s_4 + 1;
>>   __builtin_memcpy (p_5(D), q_3(D), _9);
>>   _7 = 0;
>>   return _7;
>>
>> }
>>
>> which introduces the regression, because the test expects "return 0;" in 
>> fn1().
>>
>> The issue seems to be in vrp2:
>>
>> Before the patch:
>> Visiting statement:
>> s_4 = strlen (q_3(D));
>> Found new range for s_4: VARYING
>>
>> Visiting statement:
>> _1 = s_4;
>> Found new range for _1: [s_4, s_4]
>> marking stmt to be not simulated again
>>
>> Visiting statement:
>> _7 = s_4 - _1;
>> Applying pattern match.pd:111, gimple-match.c:27997
>> Match-and-simplified s_4 - _1 to 0
>> Intersecting
>>   [0, 0]
>> and
>>   [0, +INF]
>> to
>>   [0, 0]
>> Found new range for _7: [0, 0]
>>
>> __attribute__((noclone, noinline))
>> fn1 (char * p, char * q)
>> {
>>   size_t s;
>>   long unsigned int _1;
>>   long unsigned int _9;
>>
>>   :
>>   s_4 = strlen (q_3(D));
>>   _9 = s_4 + 1;
>>   __builtin_memcpy (p_5(D), q_3(D), _9);
>>   _1 = s_4;
>>   return 0;
>>
>> }
>>
>>
>> After the patch:
>> Visiting statement:
>> s_4 = strlen (q_3(D));
>> Intersecting
>>   [0, 9223372036854775806]
>> and
>>   [0, 9223372036854775806]
>> to
>>   [0, 9223372036854775806]
>> Found new range for s_4: [0, 9223372036854775806]
>> marking stmt to be not simulated again
>>
>> Visiting statement:
>> _1 = s_4;
>> Intersecting
>>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> and
>>   [0, 9223372036854775806]
>> to
>>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> Found new range for _1: [0, 9223372036854775806]
>> marking stmt to be not simulated again
>>
>> Visiting statement:
>> _7 = s_4 - _1;
>> Intersecting
>>   ~[9223372036854775807, 9223372036854775809]
>> and
>>   ~[9223372036854775807, 9223372036854775809]
>> to
>>   ~[9223372036854775807, 9223372036854775809]
>> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
>> marking stmt to be not simulated again
>>
>> __attribute__((noclone, noinline))
>> fn1 (char * p, char * q)
>> {
>>   size_t s;
>>   long unsigned int _1;
>>   size_t _7;
>>   long unsigned int _9;
>>
>>   :
>>   s_4 = strlen (q_3(D));
>>   _9 = s_4 + 1;
>>   __builtin_memcpy (p_5(D), q_3(D), _9);
>>   _1 = s_4;
>>   _7 = s_4 - _1;
>>   return _7;
>>
>> }
>>
>> Then forwprop4 turns
>> _1 = s_4
>> _7 = s_4 - _1
>> into
>> _7 = 0
>>
>> and we end up with:
>> _7 = 0
>> return _7
>> in optimized dump.
>>
>> Running ccp again after forwprop4 trivially solves the issue, however
>> I am not sure if we want to run ccp again ?
>>
>> The issue is probably with extract_range_from_ssa_name():
>> For _1 = s_4
>>
>> Before patch:
>> VR for s_4 is set to varying.
>> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
>> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to s_4,
>> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
>> match.pd pattern x - x -> 0).
>>
>> After patch:
>> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
>> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
>> so IIUC, we then lose the information that _1 is equal to s_4,
>
> We don't lose it, it's in its set of equivalencies.
Ah, I missed that, thanks. For some reason I had mis-conception that
equivalences stores
variables which have same value-ranges but are not necessarily equal.
>
>> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0.
>> forwprop4 does that because it sees that s_4 and _1 are equivalent.
>> Does this sound correct ?
>
> Yes.  So the issue is really that vrp_visit_assignment_or_call calls
> gimple_fold_stmt_to_constant_1 with vrp_valueize[_1] which when
> we do not have a singleton VR_RANGE does not fall back to looking
> at equivalences (there's not a good cheap way to do that currently because
> VRP doesn't keep a proper copy lattice but simply IORs equivalences
> from all equivalences).  In theory simply using the first set bit
> might work.  Thus sth like
>
> @@ -7057,6 +7030,12 @@ vrp_valueize (tree name)
>   || is_gimple_min_invariant (vr->min))
>   && vrp_operand_equal_p (vr->min, vr->max))
> return vr->min;
> +  else if (vr->equiv && ! bitmap_empty_p (vr->equiv))
> +   {
> + unsigned num = bitmap_first_set_bit (vr->equiv);
> + if

Re: [PATCH, GCC/ARM 1/2] Add multilib support for embedded bare-metal targets

2016-11-22 Thread Ramana Radhakrishnan
On Thu, Oct 13, 2016 at 4:35 PM, Thomas Preudhomme
 wrote:
> Hi ARM maintainers,
>
> This patchset aims at adding multilib support for R and M profile ARM
> architectures and allowing it to be built alongside multilib for A profile
> ARM architectures. This specific patch adds the t-rmprofile multilib
> Makefile fragment for the former objective. Multilib are built for all M
> profile architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as
> ARMv7. ARMv7 multilib is used for R profile architectures but also A profile
> architectures.
>
> ChangeLog entry is as follows:
>
>
> *** gcc/ChangeLog ***
>
> 2016-10-03  Thomas Preud'homme  
>
> * config.gcc: Allow new rmprofile value for configure option
> --with-multilib-list.
> * config/arm/t-rmprofile: New file.
> * doc/install.texi (--with-multilib-list): Document new rmprofile
> value
> for ARM.
>
>
> Testing:
>
> == aprofile ==
> * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after
> the patchset for both aprofile and rmprofile
> * default spec (gcc -dumpspecs) is the same before and after the patchset
> for aprofile
> * No difference in --print-multi-directory between before and after the
> patchset for aprofile for all combination of ISA (ARM/Thumb), architecture,
> CPU, FPU and float ABI
>
> == rmprofile ==
> * aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
> ABI) and directory naming
> * Difference in --print-multi-directory between before [1] and after the
> patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
> CPU, FPU and float ABI modulo the name and directory structure changes
>
> [1] as per patch applied in ARM embedded branches
> https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup
>
> == aprofile + rmprofile ==
> * aprofile,rmprofile and rmprofile,aprofile builds give an error saying it
> is not supported
>
>
> Is this ok for master branch?

This is OK , thanks.

Ramana

>
> Best regards,
>
> Thomas


Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Dominik Vogt
On Wed, Oct 12, 2016 at 04:37:26PM -0400, David Malcolm wrote:
> On Wed, 2016-10-12 at 19:31 +0200, Bernd Schmidt wrote:
> > On 10/12/2016 07:48 PM, David Malcolm wrote:
> > > This patch implements a "compact" mode for print_rtx_function,
> > > implementing most of the ideas above.
> > > 
> > > Example of output can be seen here:
> > >   https://dmalcolm.fedorapeople.org/gcc/2016-10-12/test-switch-comp
> > > act.rtl
> > > which can be contrasted with the non-compact output here:
> > >   https://dmalcolm.fedorapeople.org/gcc/2016-10-12/test-switch-nonc
> > > ompact.rtl
> > > 
> > > It adds the "c" prefix to the insn names, so we get "cinsn", etc. 
> > >  However,
> > > it does lead to things like this:
> > > 
> > >(ccode_label 56 8 "")
> > > 
> > > which gives me pause: would the "ccode" in "ccode_label" be
> > > confusing? (compared
> > > to "ccmode").  An alternative might be to have a "compact-insn
> > > -chain" vs
> > > "insn-chain" wrapper element, expressing that this is a compact
> > > dump.

> --- a/gcc/print-rtl.c
> +++ b/gcc/print-rtl.c
...
> @@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx, int idx)
>if (INSN_HAS_LOCATION (in_insn))
>   {
> expanded_location xloc = insn_location (in_insn);
> -   fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> +   fprintf (outfile, " \"%s\":%i", xloc.file, xloc.line);

Was this change intentional?  We've got to update a scan-assembler
statement in an s390 test to reflect the additional double quotes
in the output string.  Not a big deal, just wanted to make sure
this is not an accident.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH 1/4] Remove build dependence on HSA run-time

2016-11-22 Thread Martin Jambor
Hi,

On Fri, Nov 18, 2016 at 11:23:10AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 08:02:41PM +0100, Martin Jambor wrote:
> > @@ -143,6 +240,12 @@ init_enviroment_variables (void)
> >  suppress_host_fallback = true;
> >else
> >  suppress_host_fallback = false;
> > +
> > +  hsa_runtime_lib = getenv ("HSA_RUNTIME_LIB");
> > +  if (hsa_runtime_lib == NULL)
> > +hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
> 
> libgomp is very much env var driven, but the above one is IMHO just
> too dangerous in suid/sgid apps, allowing one to select a library
> of their own choice to dlopen is an instant exploit possibility,
> so such env var should be only considered in non-priviledged processes.
> It is possible to try dlopen (hsa_runtime_lib) and if that fails, try
> dlopen ("libhsa-runtime64.so"), where it would search the library only
> in the system paths (note, the dynamic linker handles LD_LIBRARY_PATH,
> LD_PRELOAD etc. safely in priviledges processes).
> 
> So I'd recommend to use secure_getenv instead.  E.g. see how libgfortran
> checks for it in configure and even provides a fallback version for it.
> In the HSA plugin case, I think the fallback should be static function
> in the plugin.
> Otherwise it looks reasonable, thanks for working on that.
> 

I have basically copied what libgfortran did, with additional checking
for HAVE_UNISTD_H when attempting to implement secure_getenv in its
absence (which is maybe unnecessary but should not do any harm) and I
also needed to add -D_GNU_SOURCE to plugin compilation flags.
Finally, I have changed all getenv users in the plugin to use
secure_getenv.

So far I have only bootstrapped (and lto-bootstrapped) and tested this
on x86_64-linux without any issues.  I'm about to play with it a bit
on gcc111, i.e. ppc64le-aix, but the machine is very slow and I mainly
want to make sure I do not break it for people not interested in hsa.

So, is this version OK for trunk?

Thanks a lot,

Martin


2016-11-21  Martin Liska  
Martin Jambor  

gcc/
* doc/install.texi: Remove entry about --with-hsa-kmt-lib.

libgomp/
* plugin/hsa.h: New file.
* plugin/hsa_ext_finalize.h: New file.
* plugin/configfrag.ac: Remove hsa-kmt-lib test.  Added checks for
header file unistd.h, and functions secure_getenv, __secure_getenv,
getuid, geteuid, getgid and getegid.
* plugin/Makefrag.am (libgomp_plugin_hsa_la_CPPFLAGS): Added
-D_GNU_SOURCE.
* plugin/plugin-hsa.c: Include config.h, inttypes.h and stdbool.h.
Handle various cases of secure_getenv presence, add an implementation
when we can test effective UID and GID.
(struct hsa_runtime_fn_info): New structure.
(hsa_runtime_fn_info hsa_fns): New variable.
(hsa_runtime_lib): Likewise.
(support_cpu_devices): Likewise.
(init_enviroment_variables): Load newly introduced ENV
variables.
(hsa_warn): Call hsa run-time functions via hsa_fns structure.
(hsa_fatal): Likewise.
(DLSYM_FN): New macro.
(init_hsa_runtime_functions): New function.
(suitable_hsa_agent_p): Call hsa run-time functions via hsa_fns
structure.  Depending on environment, also allow CPU devices.
(init_hsa_context): Call hsa run-time functions via hsa_fns structure.
(get_kernarg_memory_region): Likewise.
(GOMP_OFFLOAD_init_device): Likewise.
(destroy_hsa_program): Likewise.
(init_basic_kernel_info): New function.
(GOMP_OFFLOAD_load_image): Use it.
(create_and_finalize_hsa_program): Call hsa run-time functions via
hsa_fns structure.
(create_single_kernel_dispatch): Likewise.
(release_kernel_dispatch): Likewise.
(init_single_kernel): Likewise.
(parse_target_attributes): Allow up multiple HSA grid dimensions.
(get_group_size): New function.
(run_kernel): Likewise.
(GOMP_OFFLOAD_run): Outline most functionality to run_kernel.
(GOMP_OFFLOAD_fini_device): Call hsa run-time functions via hsa_fns
structure.
* testsuite/lib/libgomp.exp: Remove hsa_kmt_lib support.
* testsuite/libgomp-test-support.exp.in: Likewise.
* Makefile.in: Regenerated.
* aclocal.m4: Likewise.
* config.h.in: Likewise.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
---
 gcc/doc/install.texi  |   6 -
 libgomp/Makefile.in   | 138 ++
 libgomp/aclocal.m4|  74 ++-
 libgomp/config.h.in   |  21 +
 libgomp/configure | 129 --
 libgomp/plugin/Makefrag.am|   3 +-
 libgomp/plugin/configfrag.ac  |  35 +-
 libgomp/plugin/hsa.h  | 630 ++
 libgomp/plugin/hsa_ext_finalize.h | 265 +++
 lib

Re: [PATCH 2/4] HSA specific built-ins

2016-11-22 Thread Martin Jambor
On Fri, Nov 18, 2016 at 11:27:24AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 08:39:35PM +0100, Martin Jambor wrote:
> > Hello,
> > 
> > this patch adds a small file hsa-builtins.def which defines a few
> > builtins that I then use in OpenMP lowering and expansion.
> > 
> > After we split gridification stuff in omp-low.c to a separate file, we
> > should be able to only conditionally include the file and remove the
> > weird conditional ifdef.
> > 
> > OK for trunk?
> 
> Does this work well even with lto and jit FEs?  Ok for trunk if it does.

I have enabled jit, ran its testsuite and compared the results to ones
from unpatched trunk and found no new failures.  I have also
lto-bootstrapped the patch with both hsa enabled and disabled so that
should be fine too.  Thus, I consider the patch approved.

Thank you very much,

Martin


Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Jonathan Wakely

On 22/11/16 13:25 +0100, Uros Bizjak wrote:

OK for mainline SVN and release branches?


Yes, the libstdc++ parts are ok, thanks.



Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Bernd Schmidt

On 11/22/2016 02:18 PM, Dominik Vogt wrote:


@@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx, int idx)
   if (INSN_HAS_LOCATION (in_insn))
{
  expanded_location xloc = insn_location (in_insn);
- fprintf (outfile, " %s:%i", xloc.file, xloc.line);
+ fprintf (outfile, " \"%s\":%i", xloc.file, xloc.line);


Was this change intentional?  We've got to update a scan-assembler
statement in an s390 test to reflect the additional double quotes
in the output string.  Not a big deal, just wanted to make sure
this is not an accident.


The idea was to make the output less ambiguous for file names with spaces.


Bernd



Re: [PATCH] Introduce emit_status::ensure_regno_capacity (v5)

2016-11-22 Thread Bernd Schmidt

On 11/18/2016 10:19 PM, David Malcolm wrote:


-  regno_reg_rtx = ggc_vec_alloc (crtl->emit.regno_pointer_align_length);
+  regno_reg_rtx =
+ggc_cleared_vec_alloc (crtl->emit.regno_pointer_align_length);


When wrapping lines, put the operator first.

Otherwise ok.


Bernd



Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-22 Thread Jonathan Wakely

On 17/11/16 23:38 +0200, Ville Voutilainen wrote:

@@ -478,6 +478,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
swap(pair<_T1, _T2>& __x, pair<_T1, _T2>& __y)
noexcept(noexcept(__x.swap(__y)))
{ __x.swap(__y); }
+
+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  template
+inline
+typename enable_if<__not_<__and_<__is_swappable<_T1>,
+__is_swappable<_T2>>>::value>::type
+swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;


Is there any advantage to using __not_ here, rather than just:

   typename enable_if,
  __is_swappable<_T2>>::value>::type

?

__not_ is useful as a sub-expression of an __and_ / __or_ expression,
but at the top level doesn't seem to buy anything, and is more typing,
and requires indenting the code further.



Re: [Patch, fortran, RFC] Add warning for missing location information

2016-11-22 Thread Janus Weil
Hi Thomas,

>> the attached patch runs through gfortran's AST to check for missing
>> location information.

one small comment: Is it necessary to introduce the extra CHECK_LOCUS
macro? Couldn't you just use CHECKING_P alone? In your patch
CHECK_LOCUS is basically just replicating CHECKING_P.

And: Why not use DK_WARNING instead of DK_NOTE?

Apart from that I don't see anything wrong with your patch (but I'm
probably not the best person to review it) ...

Cheers,
Janus



> 2016-11-14  Thomas Koenig  
>
> PR fortran/78226
> * error.c (gfc_warning_internal):  New function.
> * frontend-passes.c (CHECK_LOCUS):  New macro.
> (gfc_run_passes):  Call check_locus if CHECK_LOCUS
> is defined.
> (check_locus_code):  New function.
> (check_locus_expr):  New function.
> (check_locus):  New function.
> * gfortran.h:  Add prototype for gfc_warning_internal.
>
>


Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 02:32:39PM +0100, Bernd Schmidt wrote:
> On 11/22/2016 02:18 PM, Dominik Vogt wrote:
> 
> >>@@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx, int idx)
> >>   if (INSN_HAS_LOCATION (in_insn))
> >>{
> >>  expanded_location xloc = insn_location (in_insn);
> >>- fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> >>+ fprintf (outfile, " \"%s\":%i", xloc.file, xloc.line);
> >
> >Was this change intentional?  We've got to update a scan-assembler
> >statement in an s390 test to reflect the additional double quotes
> >in the output string.  Not a big deal, just wanted to make sure
> >this is not an accident.
> 
> The idea was to make the output less ambiguous for file names with spaces.

Can't it be done only if xloc.file contains any fancy characters?
If it does (where fancy should be anything other than [a-zA-Z/_0-9.-] or
some other reasonable definition, certainly space, quotes, backslash, etc. 
would count),
shouldn't we adjust it (e.g. use \" instead of ", handle control characters
etc.)?

Jakub


Re: [PATCH] Propagate cv qualifications in variant_alternative

2016-11-22 Thread Jonathan Wakely

On 21/11/16 22:46 -0800, Tim Shen wrote:

PR libstdc++/78441
* include/std/variant: Propagate cv qualifications to types returned
by variant_alternative.
* testsuite/20_util/variant/compile.cc: Tests.


OK for trunk, thanks.



Re: [PATCH 3/4] OpenMP lowering changes from the hsa branch

2016-11-22 Thread Martin Jambor
Hi,

On Fri, Nov 18, 2016 at 11:38:56AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> > +  size_t collapse = gimple_omp_for_collapse (for_stmt);
> > +  struct omp_for_data_loop *loops
> > += (struct omp_for_data_loop *)
> > +alloca (gimple_omp_for_collapse (for_stmt)
> > +   * sizeof (struct omp_for_data_loop));
> 
> Use
>   struct omp_for_data_loop *loops
> = XALLOCAVEC (struct omp_for_data_loop,
> gimple_omp_for_collapse (for_stmt));
> instead?

I have changed it as you suggested.

> 
> > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
> >  {
> >GIMPLE_PASS, /* type */
> >"ompexp", /* name */
> > -  OPTGROUP_NONE, /* optinfo_flags */
> > +  OPTGROUP_OPENMP, /* optinfo_flags */
> >TV_NONE, /* tv_id */
> >PROP_gimple_any, /* properties_required */
> >PROP_gimple_eomp, /* properties_provided */
> 
> What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What 
> about
> openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
> ompdevlow pass soon.

I was not sure about those at first, but I suppose all of them should
also be in the same group (though I hope the name is still fine), so I
added them.  I will make sure that ompdevlow pass will be in it as
well, whether it gets in before or after this.

> 
> Otherwise LGTM.

Thanks,  the updated patch is below.  I have tested the whole patch
set by by bootstrapping, lto-bootstrapping and testing on x86_64-linux
and bootstrapping and testing on aarch64-linux.  I will commit it when
the first patch is approved.

Thank you very much for the review,

Martin


2016-11-21  Martin Jambor  

gcc/
* dumpfile.h (OPTGROUP_OPENMP): Define.
* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
* gimple.h (gf_mask): Added elements GF_OMP_FOR_GRID_INTRA_GROUP and
GF_OMP_FOR_GRID_GROUP_ITER.
(gimple_omp_for_grid_phony): Added checking assert.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_for_grid_intra_group): New function.
(gimple_omp_for_set_grid_intra_group): Likewise.
(gimple_omp_for_grid_group_iter): Likewise.
(gimple_omp_for_set_grid_group_iter): Likewise.
* omp-low.c (check_omp_nesting_restrictions): Allow GRID loop where
previosuly only distribute loop was permitted.
(lower_lastprivate_clauses): Allow non tcc_comparison predicates.
(grid_get_kernel_launch_attributes): Support multiple HSA grid
dimensions.
(grid_expand_omp_for_loop): Likewise and also support standalone
distribute constructs.  New parameter INTRA_GROUP, updated both users.
(grid_expand_target_grid_body): Support standalone distribute
constructs.
(pass_data_expand_omp): Changed optinfo_flags to OPTGROUP_OPENMP.
(pass_data_expand_omp_ssa): Likewise.
(pass_data_lower_omp): Likewise.
(pass_data_diagnose_omp_blocks): Likewise.
(pass_data_oacc_device_lower): Likewise.
(pass_data_omp_target_link): Likewise.
(grid_lastprivate_predicate): New function.
(lower_omp_for_lastprivate): Call grid_lastprivate_predicate for
gridified loops.
(lower_omp_for): Support standalone distribute constructs.
(grid_prop): New type.
(grid_safe_assignment_p): Check for assignments to group_sizes, new
parameter GRID.
(grid_seq_only_contains_local_assignments): New parameter GRID, pass
it to callee.
(grid_find_single_omp_among_assignments_1): Likewise, improve missed
optimization info messages.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Do not bail out for SIMDs.
(grid_parallel_clauses_gridifiable): New function.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_dist_follows_simple_pattern): Likewise.
(grid_gfor_follows_tiling_pattern): Likewise.
(grid_call_permissible_in_distribute_p): Likewise.
(grid_handle_call_in_distribute): Likewise.
(grid_dist_follows_tiling_pattern): Likewise.
(grid_target_follows_gridifiable_pattern): Support standalone distribute
constructs.
(grid_var_segment): New enum.
(grid_mark_variable_segment): New function.
(grid_copy_leading_local_assignments): Call grid_mark_variable_segment
if a new argument says so.
(grid_process_grid_body): New function.
(grid_eliminate_combined_simd_part): Likewise.
(grid_mark_tiling_loops): Likewise.
(grid_mark_tiling_parallels_and_loops): Likewise.
(grid_process_kernel_body_copy): Support standalone distribute
constructs.
(grid_attempt_target_gridification): New grid variable holding overall
gridification state.  Support standalone distribute constructs and
collapse clauses.
* doc/optinfo.texi (Opt

[testsuite,committed]: Fix a test that assumed int is 32 bits.

2016-11-22 Thread Georg-Johann Lay
Committed as obvious because the test case is clearly about a vector of 
4 * int.


Johann

gcc/testsuite/
* c-c++-common/builtin-shuffle-1.c (V): Use 4 * int in vector.


Index: c-c++-common/builtin-shuffle-1.c
===
--- c-c++-common/builtin-shuffle-1.c(revision 242541)
+++ c-c++-common/builtin-shuffle-1.c(working copy)
@@ -1,7 +1,7 @@
 /* PR c++/78089 */
 /* { dg-do run } */

-typedef int V __attribute__((vector_size (16)));
+typedef int V __attribute__((vector_size (4 * __SIZEOF_INT__)));
 V a, b, c;

 int


Re: [PATCH 1/4] Remove build dependence on HSA run-time

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 02:27:44PM +0100, Martin Jambor wrote:
> I have basically copied what libgfortran did, with additional checking
> for HAVE_UNISTD_H when attempting to implement secure_getenv in its
> absence (which is maybe unnecessary but should not do any harm) and I
> also needed to add -D_GNU_SOURCE to plugin compilation flags.
> Finally, I have changed all getenv users in the plugin to use
> secure_getenv.

I'm not sure about the all getenv users to secure_getenv, for the
specification of the library to dlopen it is essential, for the rest it
is debatable; but it is your choice.

> +hsa_status_t hsa_executable_validate(hsa_executable_t executable,
> + uint32_t *result);
> +uint64_t hsa_queue_add_write_index_acq_rel(const hsa_queue_t *queue,
> +   uint64_t value);
...
> +hsa_status_t hsa_executable_readonly_variable_define(
> +hsa_executable_t executable, hsa_agent_t agent, const char 
> *variable_name,
> +void *address);

If hsa.h is our header rather than one imported from somewhere else,
can you tweak the formatting (space before (, in the last above case
wrap after type to allow more arguments on a line?
If it is just imported from somewhere else, please disregard.

Otherwise LGTM.

Jakub


Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread David Malcolm
On Tue, 2016-11-22 at 14:37 +0100, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 02:32:39PM +0100, Bernd Schmidt wrote:
> > On 11/22/2016 02:18 PM, Dominik Vogt wrote:
> > 
> > > > @@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx,
> > > > int idx)
> > > >   if (INSN_HAS_LOCATION (in_insn))
> > > > {
> > > >   expanded_location xloc = insn_location (in_insn);
> > > > - fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> > > > + fprintf (outfile, " \"%s\":%i", xloc.file,
> > > > xloc.line);
> > > 
> > > Was this change intentional?  We've got to update a scan
> > > -assembler
> > > statement in an s390 test to reflect the additional double quotes
> > > in the output string.  Not a big deal, just wanted to make sure
> > > this is not an accident.

Sorry about the breakage.

How widespread is the problem?

> > The idea was to make the output less ambiguous for file names with
> > spaces.
> 
> Can't it be done only if xloc.file contains any fancy characters?
> If it does (where fancy should be anything other than [a-zA-Z/_0-9.-]
> or
> some other reasonable definition, certainly space, quotes, backslash,
> etc. would count),
> shouldn't we adjust it (e.g. use \" instead of ", handle control
> characters
> etc.)?

The idea was that quotes also make the output somewhat easier for the
RTL frontend to parse, though reading the latest version of the RTL
frontend patches, it looks like I don't make use of them yet.

Another approach would be to only use the quotes when the dump is in
"compact" mode, since compact mode is the format that the RTL frontend
parses: the RTL dumps emitted by DejaGnu don't use it, instead using
the older style.



Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Bernd Schmidt

On 11/22/2016 02:37 PM, Jakub Jelinek wrote:

Can't it be done only if xloc.file contains any fancy characters?


Sure, but why? Strings generally get emitted with quotes around them, I 
don't see a good reason for filenames to be different, especially if it 
makes the output easier to parse.



If it does (where fancy should be anything other than [a-zA-Z/_0-9.-] or
some other reasonable definition, certainly space, quotes, backslash, etc. 
would count),
shouldn't we adjust it (e.g. use \" instead of ", handle control characters
etc.)?


The way I see it, spaces in filenames are regrettably somewhat common. 
Backslashes and quotes rather less so, to the point I really don't see a 
need to worry about them at the moment, and the necessary quoting could 
be added later if really necessary.



Bernd



Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Dominik Vogt
On Tue, Nov 22, 2016 at 09:25:03AM -0500, David Malcolm wrote:
> On Tue, 2016-11-22 at 14:37 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 22, 2016 at 02:32:39PM +0100, Bernd Schmidt wrote:
> > > On 11/22/2016 02:18 PM, Dominik Vogt wrote:
> > > 
> > > > > @@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx,
> > > > > int idx)
> > > > >   if (INSN_HAS_LOCATION (in_insn))
> > > > >   {
> > > > > expanded_location xloc = insn_location (in_insn);
> > > > > -   fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> > > > > +   fprintf (outfile, " \"%s\":%i", xloc.file,
> > > > > xloc.line);
> > > > 
> > > > Was this change intentional?  We've got to update a scan
> > > > -assembler
> > > > statement in an s390 test to reflect the additional double quotes
> > > > in the output string.  Not a big deal, just wanted to make sure
> > > > this is not an accident.
> 
> Sorry about the breakage.
> 
> How widespread is the problem?

In the s390 tests, it is only a single scan-assembler.  Not sure
whether these are affected or not:

gcc.dg/debug/dwarf2/pr29609-1.c:/* { dg-final { scan-assembler "pr29609-1.c:18" 
} } */
gcc.dg/debug/dwarf2/pr29609-2.c:/* { dg-final { scan-assembler "pr29609-2.c:27" 
} } */
...
gcc.dg/debug/dwarf2/pr36690-1.c:/* { dg-final { scan-assembler "pr36690-1.c:11" 
} } */
gcc.dg/debug/dwarf2/pr36690-2.c:/* { dg-final { scan-assembler "pr36690-2.c:24" 
} } */
gcc.dg/debug/dwarf2/pr36690-3.c:/* { dg-final { scan-assembler "pr36690-3.c:19" 
} } */
...
gcc.dg/debug/dwarf2/pr37616.c:/* { dg-final { scan-assembler "pr37616.c:17" } } 
*/
...
gcc.dg/debug/dwarf2/short-circuit.c:/* { dg-final { scan-assembler 
"short-circuit.c:11" } } */
...

(List generated with

  $ cd testsuite
  $ grep -r "scan-assembler.*[.]c.\?.\?.\?:" .
)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-11-22 Thread Thomas Preudhomme

Hi,

We decided to also apply this patch to the ARM embedded 6 branch.

Best regards,

Thomas

On 17/12/15 09:32, Thomas Preud'homme wrote:

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Wednesday, December 16, 2015 7:59 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
Kyrylo Tkachov
Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
multilibs

Currently in config.gcc, only the first multilib in a multilib list is checked 
for
validity and the following elements are ignored due to the break which
only breaks out of loop in shell. A loop is also done over the multilib list
elements despite no combination being legal. This patch rework the code
to address both issues.

ChangeLog entry is as follows:


2015-11-24  Thomas Preud'homme  

* config.gcc: Error out when conflicting multilib is detected.  Do not
loop over multilibs since no combination is legal.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 59aee2c..be3c720 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3772,38 +3772,40 @@ case "${target}" in
# Add extra multilibs
if test "x$with_multilib_list" !=; then
arm_multilibs=cho $with_multilib_list | sed -e
's/,/ /g'`
-   for arm_multilib in ${arm_multilibs}; do
-   case ${arm_multilib} in
-   aprofile)
+   case ${arm_multilibs} in
+   aprofile)
# Note that arm/t-aprofile is a
# stand-alone make file fragment to be
# used only with itself.  We do not
# specifically use the
# TM_MULTILIB_OPTION framework
because
# this shorthand is more
-   # pragmatic. Additionally it is only
-   # designed to work without any
-   # with-cpu, with-arch with-mode
+   # pragmatic.
+   tmake_profile_file=rm/t-aprofile"
+   ;;
+   default)
+   ;;
+   *)
+   echo "Error: --with-multilib-
list=with_multilib_list} not supported." 1>&2
+   exit 1
+   ;;
+   esac
+
+   if test "x${tmake_profile_file}" != ; then
+   # arm/t-aprofile is only designed to work
+   # without any with-cpu, with-arch, with-
mode,
# with-fpu or with-float options.
-   if test "x$with_arch" != \
-   || test "x$with_cpu" != \
-   || test "x$with_float" != \
-   || test "x$with_fpu" != \
-   || test "x$with_mode" != ;
then
-   echo "Error: You cannot use
any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=rofile"
1>&2
-   exit 1
-   fi
-   tmake_file={tmake_file}
arm/t-aprofile"
-   break
-   ;;
-   default)
-   ;;
-   *)
-   echo "Error: --with-multilib-
list=with_multilib_list} not supported." 1>&2
-   exit 1
-   ;;
-   esac
-   done
+   if test "x$with_arch" != \
+   || test "x$with_cpu" != \
+   || test "x$with_float" != \
+   || test "x$with_fpu" != \
+   || test "x$with_mode" != ; then
+   echo "Error: You cannot use any of --
with-arch/cpu/fpu/float/mode with --with-multilib-list=arm_multilib}"
1>&2
+   exit 1
+   fi
+
+   tmake_file={tmake_file}
${tmake_profile_file}"
+   fi
fi
;;


Tested with the following multilib lists:
  + foo -> "Error: --with-multilib-list=o not supported" as expected
  + default,aprofile -

Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-22 Thread Kyrill Tkachov


On 22/11/16 14:42, Bernd Edlinger wrote:

Hi,

does this follow-up patch look reasonable?
See: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01945.html


Is it OK for trunk?



Ah yes, this one slipped my attention.
This is ok.
Thanks,
Kyrill


Thanks
Bernd.

On 11/21/16 21:46, Christophe Lyon wrote:

On 18 November 2016 at 16:50, Bernd Edlinger  wrote:

On 11/18/16 12:58, Christophe Lyon wrote:

On 17 November 2016 at 10:23, Kyrill Tkachov
 wrote:

On 09/11/16 12:58, Bernd Edlinger wrote:

Hi!


This patch enables the ldrd/strd peephole rules unconditionally.

It is meant to fix cases, where the patch to reduce the sha512
stack usage splits ldrd/strd instructions into separate ldr/str insns,
but is technically independent from the other patch:

See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html

It was necessary to change check_effective_target_arm_prefer_ldrd_strd
to retain the true prefer_ldrd_strd tuning flag.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


This is ok.
Thanks,
Kyrill


Hi Bernd,

Since you committed this patch (r242549), I'm seeing the new test
failing on some arm*-linux-gnueabihf configurations:

FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10
FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times strd 9

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html
for a map of failures.

Am I missing something?

Hi Christophe,

as always many thanks for your testing...

I have apparently only looked at the case -mfloat-abi=soft here, which
is what my other patch is going to address.  But all targets with
-mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd
and vstr.64 instead of strd, which should be accepted as well.

So the attached patch should fix at least most of the fallout.


I've tested it, and indeed it fixes the failures I've reported.

Thanks


Is it OK for trunk?


Thanks
Bernd.

  >> 2016-11-18  Bernd Edlinger  
  >>
  >>  * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu.
  >>
  >>Index: gcc/testsuite/gcc.target/arm/pr53447-5.c
  >>===
  >>--- gcc/testsuite/gcc.target/arm/pr53447-5.c  (revision 242588)
  >>+++ gcc/testsuite/gcc.target/arm/pr53447-5.c  (working copy)
  >>@@ -15,5 +15,8 @@ void foo(long long* p)
  >>   p[9] -= p[10];
  >> }
  >>
  >>-/* { dg-final { scan-assembler-times "ldrd" 10 } } */
  >>-/* { dg-final { scan-assembler-times "strd" 9 } } */
  >>+/* We accept neon instructions vldr.64 and vstr.64 as well.
  >>+   Note: DejaGnu counts patterns with alternatives twice,
  >>+   so actually there are only 10 loads and 9 stores.  */
  >>+/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */
  >>+/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */




Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> >Can't it be done only if xloc.file contains any fancy characters?
> 
> Sure, but why? Strings generally get emitted with quotes around them, I
> don't see a good reason for filenames to be different, especially if it
> makes the output easier to parse.

Because printing common filenames matches what we emit in diagnostics,
what e.g. sanitizers emit at runtime diagnostics, what we emit as locations
in gimple dumps etc.

Jakub


[arm-embedded] [PATCH, GCC/ARM 1/2] Add multilib support for embedded bare-metal targets

2016-11-22 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to add support for multilib for embedded 
bare-metal targets to our embedded-6-branch.


*** gcc/ChangeLog.arm ***

2016-11-22  Thomas Preud'homme  

Backport from mainline
2016-11-22 Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.


Best regards,

Thomas
--- Begin Message ---

Ping?

Best regards,

Thomas

On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile
ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile
architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile
value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU
and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup






== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, with-mode, with-fpu or with-float
+# options.
 if test "x$with_arch" != x \
 || test "x$with_cpu" 

Re: Fix PR78154

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 21 November 2016 at 15:34, Richard Biener  wrote:
> > On Fri, 18 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 17 November 2016 at 15:24, Richard Biener  wrote:
> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 17 November 2016 at 14:21, Richard Biener  wrote:
> >> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> Hi Richard,
> >> >> >> Following your suggestion in PR78154, the patch checks if stmt
> >> >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
> >> >> >> and returns true in that case.
> >> >> >>
> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress.
> >> >> >> Would it be OK to commit this patch in stage-3 ?
> >> >> >
> >> >> > As people noted we have returns_nonnull for this and that is already
> >> >> > checked.  So please make sure the builtins get this attribute instead.
> >> >> OK thanks, I will add the returns_nonnull attribute to the required
> >> >> string builtins.
> >> >> I noticed some of the string builtins don't have RET1 in builtins.def:
> >> >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF.
> >> >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar
> >> >> to entries for memmove, strcpy ?
> >> >
> >> > Yes, I think so.
> >> Hi,
> >> In the attached patch I added returns_nonnull attribute to
> >> ATTR_RET1_NOTHROW_NONNULL_LEAF,
> >> and changed few builtins like strcat, strncpy, strncat and
> >> corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF.
> >> Does the patch look correct ?
> >
> > Hmm, given you only change ATTR_RET1_NOTHROW_NONNULL_LEAF means that
> > the gimple_stmt_nonzero_warnv_p code is incomplete -- it should
> > infer returns_nonnull itself from RET1 (which is fnspec("1") basically)
> > and the nonnull attribute on the argument.  So
> >
> >   unsigned rf = gimple_call_return_flags (stmt);
> >   if (rf & ERF_RETURNS_ARG)
> >{
> >  tree arg = gimple_call_arg (stmt, rf & ERF_RETURN_ARG_MASK);
> >  if (range of arg is ! VARYING)
> >use range of arg;
> >  else if (infer_nonnull_range_by_attribute (stmt, arg))
> > ... nonnull ...
> >
> Hi,
> Thanks for the suggestions, modified gimple_stmt_nonzero_warnv_p
> accordingly in this version.
> For functions like stpcpy that return nonnull but not one of it's
> arguments, I added new enum ATTR_RETNONNULL_NOTHROW_LEAF.
> Is that OK ?
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross-testing on arm*-*-*, aarch64*-*-* in progress.

+   value_range *vr = get_value_range (arg);
+   if ((vr && vr->type != VR_VARYING)
+   || infer_nonnull_range_by_attribute (stmt, arg))
+ return true;
+ }

actually that's not quite correct (failed to notice the function
doesn't return a range but whether the range is nonnull).  For
nonnull it's just

  if (infer_nonnull_range_by_attribute (stmt, arg))
return true;

in the extract_range_basic call handling we could handle
ERF_RETURNS_ARG by returning the range of the argument (if not varying).

Thus the patch is ok with the above condition changed.  Please refer
to the recently opened PR from the ChangeLog.

Thanks,
Richard.


Re: PR78153

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 21 November 2016 at 15:10, Richard Biener  wrote:
> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
> >> PTRDIFF_MAX.
> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
> >> in the attached patch.
> >>
> >> However it regressed strlenopt-3.c:
> >>
> >> Consider fn1() from strlenopt-3.c:
> >>
> >> __attribute__((noinline, noclone)) size_t
> >> fn1 (char *p, char *q)
> >> {
> >>   size_t s = strlen (q);
> >>   strcpy (p, q);
> >>   return s - strlen (p);
> >> }
> >>
> >> The optimized dump shows the following:
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   size_t _7;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _7 = 0;
> >>   return _7;
> >>
> >> }
> >>
> >> which introduces the regression, because the test expects "return 0;" in 
> >> fn1().
> >>
> >> The issue seems to be in vrp2:
> >>
> >> Before the patch:
> >> Visiting statement:
> >> s_4 = strlen (q_3(D));
> >> Found new range for s_4: VARYING
> >>
> >> Visiting statement:
> >> _1 = s_4;
> >> Found new range for _1: [s_4, s_4]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _7 = s_4 - _1;
> >> Applying pattern match.pd:111, gimple-match.c:27997
> >> Match-and-simplified s_4 - _1 to 0
> >> Intersecting
> >>   [0, 0]
> >> and
> >>   [0, +INF]
> >> to
> >>   [0, 0]
> >> Found new range for _7: [0, 0]
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   long unsigned int _1;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _1 = s_4;
> >>   return 0;
> >>
> >> }
> >>
> >>
> >> After the patch:
> >> Visiting statement:
> >> s_4 = strlen (q_3(D));
> >> Intersecting
> >>   [0, 9223372036854775806]
> >> and
> >>   [0, 9223372036854775806]
> >> to
> >>   [0, 9223372036854775806]
> >> Found new range for s_4: [0, 9223372036854775806]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _1 = s_4;
> >> Intersecting
> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> and
> >>   [0, 9223372036854775806]
> >> to
> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> Found new range for _1: [0, 9223372036854775806]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _7 = s_4 - _1;
> >> Intersecting
> >>   ~[9223372036854775807, 9223372036854775809]
> >> and
> >>   ~[9223372036854775807, 9223372036854775809]
> >> to
> >>   ~[9223372036854775807, 9223372036854775809]
> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
> >> marking stmt to be not simulated again
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   long unsigned int _1;
> >>   size_t _7;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _1 = s_4;
> >>   _7 = s_4 - _1;
> >>   return _7;
> >>
> >> }
> >>
> >> Then forwprop4 turns
> >> _1 = s_4
> >> _7 = s_4 - _1
> >> into
> >> _7 = 0
> >>
> >> and we end up with:
> >> _7 = 0
> >> return _7
> >> in optimized dump.
> >>
> >> Running ccp again after forwprop4 trivially solves the issue, however
> >> I am not sure if we want to run ccp again ?
> >>
> >> The issue is probably with extract_range_from_ssa_name():
> >> For _1 = s_4
> >>
> >> Before patch:
> >> VR for s_4 is set to varying.
> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
> >> s_4,
> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
> >> match.pd pattern x - x -> 0).
> >>
> >> After patch:
> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
> >> so IIUC, we then lose the information that _1 is equal to s_4,
> >
> > We don't lose it, it's in its set of equivalencies.
> Ah, I missed that, thanks. For some reason I had mis-conception that
> equivalences stores
> variables which have same value-ranges but are not necessarily equal.
> >
> >> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0.
> >> forwprop4 does that because it sees that s_4 and _1 are equivalent.
> >> Does this sound correct ?
> >
> > Yes.  So the issue is really that vrp_visit_assignment_or_call calls
> > gimple_fold_stmt_to_constant_1 with vrp_valueize[_1] which when
> > we do not have a singleton VR_RANGE does not fall back to looking
> > at equivalences (there's not a good cheap way to do that currently because
> > VRP doesn't keep a proper copy lattice but simply IORs equivalences
> > from all equivale

Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-22 Thread Ville Voutilainen
On 22 November 2016 at 15:36, Jonathan Wakely  wrote:
>> +#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or
>> gnu++11
>> +  template
>> +inline
>> +typename enable_if<__not_<__and_<__is_swappable<_T1>,
>> +__is_swappable<_T2>>>::value>::type
>> +swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;
>
>
> Is there any advantage to using __not_ here, rather than just:
>
>typename enable_if,
>   __is_swappable<_T2>>::value>::type
>
> ?
>
> __not_ is useful as a sub-expression of an __and_ / __or_ expression,
> but at the top level doesn't seem to buy anything, and is more typing,
> and requires indenting the code further.


There's no particular advantage, it's just a habitual way to write a mixture of
__and_s and __not_s that I suffer from, whichever way the nesting is.
I'm also not consistent:

+inline enable_if_t && is_swappable_v<_Tp>)>
+swap(optional<_Tp>&, optional<_Tp>&) = delete;

so I can certainly change all these swaps to use operator! rather than
__not_. Is the
patch otherwise ok for trunk? What about the tuple part?


Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-22 Thread Bernd Edlinger
Hi,

does this follow-up patch look reasonable?
See: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01945.html


Is it OK for trunk?


Thanks
Bernd.

On 11/21/16 21:46, Christophe Lyon wrote:
> On 18 November 2016 at 16:50, Bernd Edlinger  
> wrote:
>> On 11/18/16 12:58, Christophe Lyon wrote:
>>> On 17 November 2016 at 10:23, Kyrill Tkachov
>>>  wrote:

 On 09/11/16 12:58, Bernd Edlinger wrote:
>
> Hi!
>
>
> This patch enables the ldrd/strd peephole rules unconditionally.
>
> It is meant to fix cases, where the patch to reduce the sha512
> stack usage splits ldrd/strd instructions into separate ldr/str insns,
> but is technically independent from the other patch:
>
> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html
>
> It was necessary to change check_effective_target_arm_prefer_ldrd_strd
> to retain the true prefer_ldrd_strd tuning flag.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?


 This is ok.
 Thanks,
 Kyrill

>>>
>>> Hi Bernd,
>>>
>>> Since you committed this patch (r242549), I'm seeing the new test
>>> failing on some arm*-linux-gnueabihf configurations:
>>>
>>> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10
>>> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times strd 9
>>>
>>> See 
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html
>>> for a map of failures.
>>>
>>> Am I missing something?
>>
>> Hi Christophe,
>>
>> as always many thanks for your testing...
>>
>> I have apparently only looked at the case -mfloat-abi=soft here, which
>> is what my other patch is going to address.  But all targets with
>> -mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd
>> and vstr.64 instead of strd, which should be accepted as well.
>>
>> So the attached patch should fix at least most of the fallout.
>>
>
> I've tested it, and indeed it fixes the failures I've reported.
>
> Thanks
>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
 >> 2016-11-18  Bernd Edlinger  
 >>
 >> * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu.
 >>
 >>Index: gcc/testsuite/gcc.target/arm/pr53447-5.c
 >>===
 >>--- gcc/testsuite/gcc.target/arm/pr53447-5.c (revision 242588)
 >>+++ gcc/testsuite/gcc.target/arm/pr53447-5.c (working copy)
 >>@@ -15,5 +15,8 @@ void foo(long long* p)
 >>   p[9] -= p[10];
 >> }
 >>
 >>-/* { dg-final { scan-assembler-times "ldrd" 10 } } */
 >>-/* { dg-final { scan-assembler-times "strd" 9 } } */
 >>+/* We accept neon instructions vldr.64 and vstr.64 as well.
 >>+   Note: DejaGnu counts patterns with alternatives twice,
 >>+   so actually there are only 10 loads and 9 stores.  */
 >>+/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */
 >>+/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */


Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-22 Thread Jonathan Wakely

On 22/11/16 16:59 +0200, Ville Voutilainen wrote:

On 22 November 2016 at 15:36, Jonathan Wakely  wrote:

+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or
gnu++11
+  template
+inline
+typename enable_if<__not_<__and_<__is_swappable<_T1>,
+__is_swappable<_T2>>>::value>::type
+swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;



Is there any advantage to using __not_ here, rather than just:

   typename enable_if,
  __is_swappable<_T2>>::value>::type

?

__not_ is useful as a sub-expression of an __and_ / __or_ expression,
but at the top level doesn't seem to buy anything, and is more typing,
and requires indenting the code further.



There's no particular advantage, it's just a habitual way to write a mixture of
__and_s and __not_s that I suffer from, whichever way the nesting is.
I'm also not consistent:

+inline enable_if_t && is_swappable_v<_Tp>)>
+swap(optional<_Tp>&, optional<_Tp>&) = delete;



Yes, I noticed that :-)


so I can certainly change all these swaps to use operator! rather than
__not_. Is the
patch otherwise ok for trunk? What about the tuple part?


Yes, OK changing the top-level __not_s to operator!

I haven't reviewed the tuple part fully yet.



Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Jeff Law

On 11/22/2016 05:25 AM, Uros Bizjak wrote:

Hello!

New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.

gcc/ChangeLog

2016-11-22  Uros Bizjak  

* Makefile.in ($(lang_checks_parallelized)): Fix detection
of -j argument.

gcc/ada/ChangeLog

2016-11-22  Uros Bizjak  

* gcc-interface/Make-lang.in (check-acats): Fix detection
of -j argument.

libstdc++-v3/ChangeLog

2016-11-22  Uros Bizjak  

* testsuite/Makefile.am
(check-DEJAGNU $(check_DEJAGNU_normal_targets)):Fix detection
of -j argument.
* testsuite/Makefile.in: Regenereate.

Patch was bootstrapped and regression tested on x86_64-linux-gnu with
"GNU Make 4.2.1" and "GNU Make 3.81". Ada was not checked, but the
change is consistent with other changes.

OK for mainline SVN and release branches?

OK on the rest of the bits, for the trunk and any release branches.

jeff


[testsuite,committed]: Restrict 2 test cases to big targets.

2016-11-22 Thread Georg-Johann Lay

This adds requirements for 2 test cases:

loop-split.c needs 32-bit int at least.  Use int32plus as I didn't 
intend to change the very test case.


gcc.dg/stack-layout-dynamic-1.c aligns the stack to 16 bits so ptr32plus 
seems reasonable.


Committed  to trunk.

Johann



gcc/testsuite/
* gcc.dg/loop-split.c: Require int32plus.
* gcc.dg/stack-layout-dynamic-1.c: Require ptr32plus.

Index: gcc.dg/loop-split.c
===
--- gcc.dg/loop-split.c (revision 242541)
+++ gcc.dg/loop-split.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+/* { dg-require-effective-target int32plus } */

 #ifdef __cplusplus
 extern "C" int printf (const char *, ...);
Index: gcc.dg/stack-layout-dynamic-1.c
===
--- gcc.dg/stack-layout-dynamic-1.c (revision 242541)
+++ gcc.dg/stack-layout-dynamic-1.c (working copy)
@@ -2,6 +2,7 @@
in one pass together with normal local variables.  */
 /* { dg-do compile } */
 /* { dg-options "-O0 -fomit-frame-pointer" } */
+/* { dg-require-effective-target ptr32plus } */

 extern void bar (void *, void *, void *);
 void foo (void)


Re: [Patch, Fortran, OOP] PR 78443: Incorrect behavior with non_overridable keyword

2016-11-22 Thread Steve Kargl
On Tue, Nov 22, 2016 at 01:14:46PM +0100, Janus Weil wrote:
> 
> here is a patch for a wrong-code problem with non_overridable
> type-bound procedures. For details see the PR. Regtests cleanly. Ok
> for trunk?

OK.

> Since the patch is very simple and it fixes wrong code which can
> silently give bad runtime results, I think backporting to the release
> branches might be a good idea as well. Ok?

OK.

-- 
Steve


Re: PR78153

2016-11-22 Thread Prathamesh Kulkarni
On 22 November 2016 at 20:18, Richard Biener  wrote:
> On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 21 November 2016 at 15:10, Richard Biener  wrote:
>> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> Hi,
>> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
>> >> PTRDIFF_MAX.
>> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
>> >> in the attached patch.
>> >>
>> >> However it regressed strlenopt-3.c:
>> >>
>> >> Consider fn1() from strlenopt-3.c:
>> >>
>> >> __attribute__((noinline, noclone)) size_t
>> >> fn1 (char *p, char *q)
>> >> {
>> >>   size_t s = strlen (q);
>> >>   strcpy (p, q);
>> >>   return s - strlen (p);
>> >> }
>> >>
>> >> The optimized dump shows the following:
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   size_t _7;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _7 = 0;
>> >>   return _7;
>> >>
>> >> }
>> >>
>> >> which introduces the regression, because the test expects "return 0;" in 
>> >> fn1().
>> >>
>> >> The issue seems to be in vrp2:
>> >>
>> >> Before the patch:
>> >> Visiting statement:
>> >> s_4 = strlen (q_3(D));
>> >> Found new range for s_4: VARYING
>> >>
>> >> Visiting statement:
>> >> _1 = s_4;
>> >> Found new range for _1: [s_4, s_4]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _7 = s_4 - _1;
>> >> Applying pattern match.pd:111, gimple-match.c:27997
>> >> Match-and-simplified s_4 - _1 to 0
>> >> Intersecting
>> >>   [0, 0]
>> >> and
>> >>   [0, +INF]
>> >> to
>> >>   [0, 0]
>> >> Found new range for _7: [0, 0]
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   long unsigned int _1;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _1 = s_4;
>> >>   return 0;
>> >>
>> >> }
>> >>
>> >>
>> >> After the patch:
>> >> Visiting statement:
>> >> s_4 = strlen (q_3(D));
>> >> Intersecting
>> >>   [0, 9223372036854775806]
>> >> and
>> >>   [0, 9223372036854775806]
>> >> to
>> >>   [0, 9223372036854775806]
>> >> Found new range for s_4: [0, 9223372036854775806]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _1 = s_4;
>> >> Intersecting
>> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> and
>> >>   [0, 9223372036854775806]
>> >> to
>> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> Found new range for _1: [0, 9223372036854775806]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _7 = s_4 - _1;
>> >> Intersecting
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> and
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> to
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
>> >> marking stmt to be not simulated again
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   long unsigned int _1;
>> >>   size_t _7;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _1 = s_4;
>> >>   _7 = s_4 - _1;
>> >>   return _7;
>> >>
>> >> }
>> >>
>> >> Then forwprop4 turns
>> >> _1 = s_4
>> >> _7 = s_4 - _1
>> >> into
>> >> _7 = 0
>> >>
>> >> and we end up with:
>> >> _7 = 0
>> >> return _7
>> >> in optimized dump.
>> >>
>> >> Running ccp again after forwprop4 trivially solves the issue, however
>> >> I am not sure if we want to run ccp again ?
>> >>
>> >> The issue is probably with extract_range_from_ssa_name():
>> >> For _1 = s_4
>> >>
>> >> Before patch:
>> >> VR for s_4 is set to varying.
>> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
>> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
>> >> s_4,
>> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
>> >> match.pd pattern x - x -> 0).
>> >>
>> >> After patch:
>> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
>> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
>> >> so IIUC, we then lose the information that _1 is equal to s_4,
>> >
>> > We don't lose it, it's in its set of equivalencies.
>> Ah, I missed that, thanks. For some reason I had mis-conception that
>> equivalences stores
>> variables which have same value-ranges but are not necessarily equal.
>> >
>> >> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0.
>> >> forwprop4 does that because it sees that s_4 and _1 are equivalent.
>> >> Does this sound correct ?
>> >
>> > Yes.  So the issue is really that vrp_visit_assignment_or_call calls
>> > gimple_fold_stmt_to_constant_1 with vrp_valueize[_1] which when
>> > we do

Re: PR78153

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 22 November 2016 at 20:18, Richard Biener  wrote:
> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 21 November 2016 at 15:10, Richard Biener  wrote:
> >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> Hi,
> >> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
> >> >> PTRDIFF_MAX.
> >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
> >> >> in the attached patch.
> >> >>
> >> >> However it regressed strlenopt-3.c:
> >> >>
> >> >> Consider fn1() from strlenopt-3.c:
> >> >>
> >> >> __attribute__((noinline, noclone)) size_t
> >> >> fn1 (char *p, char *q)
> >> >> {
> >> >>   size_t s = strlen (q);
> >> >>   strcpy (p, q);
> >> >>   return s - strlen (p);
> >> >> }
> >> >>
> >> >> The optimized dump shows the following:
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   size_t _7;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _7 = 0;
> >> >>   return _7;
> >> >>
> >> >> }
> >> >>
> >> >> which introduces the regression, because the test expects "return 0;" 
> >> >> in fn1().
> >> >>
> >> >> The issue seems to be in vrp2:
> >> >>
> >> >> Before the patch:
> >> >> Visiting statement:
> >> >> s_4 = strlen (q_3(D));
> >> >> Found new range for s_4: VARYING
> >> >>
> >> >> Visiting statement:
> >> >> _1 = s_4;
> >> >> Found new range for _1: [s_4, s_4]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _7 = s_4 - _1;
> >> >> Applying pattern match.pd:111, gimple-match.c:27997
> >> >> Match-and-simplified s_4 - _1 to 0
> >> >> Intersecting
> >> >>   [0, 0]
> >> >> and
> >> >>   [0, +INF]
> >> >> to
> >> >>   [0, 0]
> >> >> Found new range for _7: [0, 0]
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   long unsigned int _1;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _1 = s_4;
> >> >>   return 0;
> >> >>
> >> >> }
> >> >>
> >> >>
> >> >> After the patch:
> >> >> Visiting statement:
> >> >> s_4 = strlen (q_3(D));
> >> >> Intersecting
> >> >>   [0, 9223372036854775806]
> >> >> and
> >> >>   [0, 9223372036854775806]
> >> >> to
> >> >>   [0, 9223372036854775806]
> >> >> Found new range for s_4: [0, 9223372036854775806]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _1 = s_4;
> >> >> Intersecting
> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> >> and
> >> >>   [0, 9223372036854775806]
> >> >> to
> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> >> Found new range for _1: [0, 9223372036854775806]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _7 = s_4 - _1;
> >> >> Intersecting
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> and
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> to
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   long unsigned int _1;
> >> >>   size_t _7;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _1 = s_4;
> >> >>   _7 = s_4 - _1;
> >> >>   return _7;
> >> >>
> >> >> }
> >> >>
> >> >> Then forwprop4 turns
> >> >> _1 = s_4
> >> >> _7 = s_4 - _1
> >> >> into
> >> >> _7 = 0
> >> >>
> >> >> and we end up with:
> >> >> _7 = 0
> >> >> return _7
> >> >> in optimized dump.
> >> >>
> >> >> Running ccp again after forwprop4 trivially solves the issue, however
> >> >> I am not sure if we want to run ccp again ?
> >> >>
> >> >> The issue is probably with extract_range_from_ssa_name():
> >> >> For _1 = s_4
> >> >>
> >> >> Before patch:
> >> >> VR for s_4 is set to varying.
> >> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
> >> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
> >> >> s_4,
> >> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
> >> >> match.pd pattern x - x -> 0).
> >> >>
> >> >> After patch:
> >> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
> >> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
> >> >> so IIUC, we then lose the information that _1 is equal to s_4,
> >> >
> >> > We don't lose it, it's in its set of equivalencies.
> >> Ah, I missed that, thanks. For some reason I had mis-conception that
> >> equivalences stores
> >> var

[PATCH] Fix PR78472

2016-11-22 Thread Richard Biener

The following fixes a C/C++ interoperability issue with LTO when
zero-sized fields appear in one variant of a struct but not in another.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-11-22  Richard Biener  

PR lto/78472
* tree.c (gimple_canonical_types_compatible_p): Ignore zero-sized
fields.

lto/
* lto.c (hash_canonical_type): Ignore zero-sized fields.

* g++.dg/lto/pr78472_0.c: New testcase.
* g++.dg/lto/pr78472_1.C: Likewise.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 242657)
+++ gcc/tree.c  (working copy)
@@ -13506,10 +13506,12 @@ gimple_canonical_types_compatible_p (con
 f1 || f2;
 f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2))
  {
-   /* Skip non-fields.  */
-   while (f1 && TREE_CODE (f1) != FIELD_DECL)
+   /* Skip non-fields and zero-sized fields.  */
+   while (f1 && (TREE_CODE (f1) != FIELD_DECL
+ || integer_zerop (DECL_SIZE (f1
  f1 = TREE_CHAIN (f1);
-   while (f2 && TREE_CODE (f2) != FIELD_DECL)
+   while (f2 && (TREE_CODE (f2) != FIELD_DECL
+ || integer_zerop (DECL_SIZE (f2
  f2 = TREE_CHAIN (f2);
if (!f1 || !f2)
  break;
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 242657)
+++ gcc/lto/lto.c   (working copy)
@@ -372,7 +372,8 @@ hash_canonical_type (tree type)
   tree f;
 
   for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
-   if (TREE_CODE (f) == FIELD_DECL)
+   if (TREE_CODE (f) == FIELD_DECL
+   && ! integer_zerop (DECL_SIZE (f)))
  {
iterative_hash_canonical_type (TREE_TYPE (f), hstate);
nf++;
Index: gcc/testsuite/g++.dg/lto/pr78472_0.c
===
--- gcc/testsuite/g++.dg/lto/pr78472_0.c(revision 0)
+++ gcc/testsuite/g++.dg/lto/pr78472_0.c(working copy)
@@ -0,0 +1,12 @@
+// { dg-lto-do link }
+
+extern struct S
+{
+  unsigned i:4;
+  unsigned :0;
+} s;
+static void *f(void)
+{
+  return &s;
+}
+int main() {}
Index: gcc/testsuite/g++.dg/lto/pr78472_1.C
===
--- gcc/testsuite/g++.dg/lto/pr78472_1.C(revision 0)
+++ gcc/testsuite/g++.dg/lto/pr78472_1.C(working copy)
@@ -0,0 +1,9 @@
+struct S
+{
+  unsigned i:4;
+  unsigned :0;
+} s;
+static void *f(void)
+{
+  return &s;
+}


[testsuite,committed] Fix prototype of memset in a test case.

2016-11-22 Thread Georg-Johann Lay
One test case used unsigned long for the 3rd parameter of memset, which 
should be size_t.  This made the test crash for targets where correct 
parameter passing depends on correct prototypes.


Fixed and committed as obvious.

Johann


gcc/testsuite/
* gcc.c-torture/execute/pr30778.c (memset): Use size_t for 3rd
parameter in declaration.

Index: gcc.c-torture/execute/pr30778.c
===
--- gcc.c-torture/execute/pr30778.c (revision 242541)
+++ gcc.c-torture/execute/pr30778.c (working copy)
@@ -1,4 +1,4 @@
-extern void *memset (void *, int, unsigned long);
+extern void *memset (void *, int, __SIZE_TYPE__);
 extern void abort (void);

 struct reg_stat {


Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread David Malcolm
On Tue, 2016-11-22 at 15:45 +0100, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> > On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> > > Can't it be done only if xloc.file contains any fancy characters?
> > 
> > Sure, but why? Strings generally get emitted with quotes around
> > them, I
> > don't see a good reason for filenames to be different, especially
> > if it
> > makes the output easier to parse.
> 
> Because printing common filenames matches what we emit in
> diagnostics,
> what e.g. sanitizers emit at runtime diagnostics, what we emit as
> locations
> in gimple dumps etc.

It sounds like a distinction between human-readable vs machine
-readable.

How about something like the following, which only adds the quotes if
outputting the RTL FE's input format?

Does this fix the failing tests?
From 642d511fdba3a33fb18ce46c549f7c972ed6b14e Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Tue, 22 Nov 2016 11:06:41 -0500
Subject: [PATCH] print-rtl.c: conditionalize quotes for filenames

gcc/ChangeLog:
	* print-rtl.c (rtx_writer::print_rtx_operand_code_i): Only use
	quotes for filenames when in compact mode.
---
 gcc/print-rtl.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 77e6b05..5370602 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, int idx)
   if (INSN_HAS_LOCATION (in_insn))
 	{
 	  expanded_location xloc = insn_location (in_insn);
-	  fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+	  if (m_compact)
+	fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+	  else
+	fprintf (m_outfile, " %s:%i", xloc.file, xloc.line);
 	}
 #endif
 }
-- 
1.8.5.3



[PATCH] Fix up handle_pragma_target (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

#pragma GCC targets when used more than once without being
undone through #pragma GCC pop_options in between seems to act wierdly
and is the reason why sse-22a.c testcase now fails on x86_64/i686-linux.
The problem is that to some extent
#pragma GCC target ("f1", "f2,f3")
#pragma GCC target ("f4,f5", "f6")
acts as
#pragma GCC target ("f1", "f2,f3", "f4,f5", "f6")
(when computing the current set of global options e.g.), but
when a target node is being created for a function, we don't use the
current global options at the point of declaration, but instead use
current_target_pragma TREE_LIST with the current target pragma options;
that list is properly saved/restored on push_options/pop_options pragma,
but a new GCC target pragma overwrites the previous list rather than
appending to it, so to some other extent the above two pragmas act as
just #pragma GCC target ("f4,f5", "f6").
In particular, in sse-22a.c test we start with #pragma GCC target
containing huge list of ISAs, then #include  header, and
there most inlines are wrapped in #pragma GCC push_options/#pragma GCC
target (someisa) and #pragma GCC pop_options, but there are some inlines
that aren't wrapped at all.  The effect of that is that those wrapped
routines get their target attribute solely from the innermost target option,
while those not wrapped ones get one from their innermost GCC target,
which is the huge list of ISAs.  I think this is undesirable, the
pragmas should stack (append to each other).  If users want to override
completely to something different, they can push_options/pop_options around
the former, or #pragma GCC target ("no-isa1,no-isa2,isa3").
Note that sse-22a.c fails because of this with -save-temps even in GCC 6.

The patch just treats these consistently as appending to current set of
options.  So if one does:
#pragma GCC push_options
#pragma GCC target ("isa1")
#pragma GCC push_options
#pragma GCC target ("isa2")
void foo () { ... }
#pragma GCC pop_options
#pragma GCC pop_options
the foo function gets both isa1 and isa2 target attributes (in that order).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-22  Jakub Jelinek  

PR target/78451
* c-pragma.c (handle_pragma_target): Don't replace
current_target_pragma, but chainon the new args to the current one.

* gcc.target/i386/pr78451.c: New test.
* gcc.target/i386/pr69255-1.c: Use #pragma GCC push_options
and #pragma GCC pop_options around the first #pragma GCC target.
* gcc.target/i386/pr69255-2.c: Likewise.
* gcc.target/i386/pr69255-3.c: Likewise.

--- gcc/c-family/c-pragma.c.jj  2016-10-31 13:28:06.0 +0100
+++ gcc/c-family/c-pragma.c 2016-11-22 11:34:34.535159762 +0100
@@ -893,7 +893,7 @@ handle_pragma_target(cpp_reader *ARG_UNU
   args = nreverse (args);
 
   if (targetm.target_option.pragma_parse (args, NULL_TREE))
-   current_target_pragma = args;
+   current_target_pragma = chainon (current_target_pragma, args);
 }
 }
 
--- gcc/testsuite/gcc.target/i386/pr78451.c.jj  2016-11-22 11:57:24.743002256 
+0100
+++ gcc/testsuite/gcc.target/i386/pr78451.c 2016-11-22 11:56:51.0 
+0100
@@ -0,0 +1,35 @@
+/* PR target/78451 */
+/* { dg-options "-O2 -mno-avx512f" } */
+
+#pragma GCC push_options
+#pragma GCC target ("avx512bw")
+
+static inline int __attribute__ ((__always_inline__))
+bar (void)
+{
+  return 0;
+}
+
+#pragma GCC push_options
+#pragma GCC target ("avx512vl")
+
+int
+foo (void)
+{
+  return bar ();
+}
+
+#pragma GCC pop_options
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("avx512vl")
+#pragma GCC target ("avx512bw")
+
+int
+baz (void)
+{
+  return bar ();
+}
+
+#pragma GCC pop_options
--- gcc/testsuite/gcc.target/i386/pr69255-1.c.jj2016-09-06 
22:29:59.0 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-1.c   2016-11-22 16:20:32.790498858 
+0100
@@ -2,7 +2,9 @@
 /* { dg-do compile } */
 /* { dg-options "-msse4 -mno-avx" } */
 
+#pragma GCC push_options
 #pragma GCC target "avx512vl"
+#pragma GCC pop_options
 #pragma GCC target "no-avx512vl"
 __attribute__ ((__vector_size__ (32))) long long a;
 __attribute__ ((__vector_size__ (16))) int b;
@@ -13,5 +15,5 @@ foo (const long long *p)
   a = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error "needs 
isa option -m32 -mavx512vl" } */
 }
 
-/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
-/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 15 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 15 } */
--- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj2016-09-06 
22:29:59.0 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-2.c   2016-11-22 16:20:44.760346741 
+0100
@@ -2,7 +2,9 @@

Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Marc Glisse

On Tue, 22 Nov 2016, Uros Bizjak wrote:


New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.


Hello,

I didn't read the patch, but do you think this also fixes PR 53155 ?

--
Marc Glisse


Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-11-22 Thread Bin.Cheng
On Mon, Nov 21, 2016 at 9:34 PM, Doug Gilmore  wrote:
> I haven't seen any followups to this discussion of Bin's patch to
> PR68303 and PR69710, the patch submission:
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html
>
> Discussion:
> http://gcc.gnu.org/ml/gcc-patches/2016-07/msg00761.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01551.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg00372.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01550.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02162.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02155.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02154.html
>
>
> so I did some investigation to get a better understanding of the
> issues involved.
Hi Doug,
Thanks for looking into this problem.
>
> On 07/13/2016 01:59 PM, Jeff Law wrote:
>> On 05/25/2016 05:22 AM, Bin Cheng wrote:
>>> Hi, As analyzed in PR68303 and PR69710, vectorizer generates
>>> duplicated computations in loop's pre-header basic block when
>>> creating base address for vector reference to the same memory object.
>> Not a huge surprise.  Loop optimizations generally have a tendency
>> to create and/or expose CSE opportunities.  Unrolling is a common
>> culprit, there's certainly the possibility for header duplication,
>> code motions and IV rewriting to also expose/create redundant code.
>>
>> ...
>>
>>  But, 1) It
>>> doesn't fix all the problem on x86_64.  Root cause is computation for
>>> base address of the first reference is somehow moved outside of
>>> loop's pre-header, local CSE can't help in this case.
>> That's a bid odd -- have you investigated why this is outside the loop 
>> header?
>> ...
> I didn't look at this issue per se, but I did try running DOM between
> autovectorization and IVS.  Just running DOM had little effect, what
> was crucial was adding the change Bin mentioned in his original
> message:
>
> Besides CSE issue, this patch also re-associates address
> expressions in vect_create_addr_base_for_vector_ref, specifically,
> it splits constant offset and adds it back near the expression
> root in IR.  This is necessary because GCC only handles
> re-association for commutative operators in CSE.
>
> I attached a patch for these changes only.  These are the important
> modifications that address the some of the IVS related issues exposed
> by PR68303. I found that adding the CSE change (or calling DOM between
> autovectorization and IVOPTS) is not needed, and from what I have
I checked the code again.  As you said, re-association part is important
to enable CSE opportunities, no matter when and which pass handles it.
After re-association, the computation of base addresses are like:

//preheader
b_1 = g_Input + var_offset_1;
vectp_1 = b_1 + cst_offset_1;
b_2 = g_Input + var_offset_2;
vectp_2 = b_2 + cst_offset_2;
...
b_n = g_input + var_offset_n;
vectp_n = b_n + cst_offset_n;

//loop
MEM[vectp_1];
MEM[vectp_2];
...
MEM[vectp_n];

In fact, var_offset_1, var_offset_2, ..., var_offset_n are equal to others.  So
the addresses are in the form of "g_Input + var_offset + cst_offset_x" differing
to each other wrto constant offset.  The purpose of CSE is to propagate all
parts of this address to IVOPTs, otherwise IVOPTS only knows IVs as below:

iv_use_1: {b_1 + cst_offset_1, step}_loop
iv_use_1: {b_2 + cst_offset_2, step}_loop
...
iv_use_n: {b_n + cst_offset_n, step}_loop

> seen, actually makes the code worse.
>
> Applying only the modifications to
> vect_create_addr_base_for_vector_ref, additional simplifications will
> be done when induction variables are found (function
> find_induction_variables).  These simplications are indicated by the
> appearance of lines:
>
> Applying pattern match.pd:1056, generic-match.c:11865
This doesn't look related to this problem to me.  The simplification of this
problem is CSE, it's not what match.pd does.

>
> in the IVOPS dump file.  Now IVOPTs transforms the code so that
> constants now appear in the computation of the effective addresses for
> the memory OPs.  However the code generated by IVOPTS still uses a
> separate base register for each memory reference.  Later DOM3
> transforms the code to use just one base register, which is the form

Indeed CSE now looks like unnecessary fixing the problem, we can relying on
DOM pass to explore the equality among new bases (b_1, b_2, ..., b_n).  This
actually echoes my humble opinion: we shouldn't rely on IVOPTs to fix all bad
code issues.  On the other handle, for cases in which these bases
(b_1, b_2, ..., b_n)
are not equal to each other, there is not much to lose in this way either.

> the code needs to be in for the preliminary phase of IVOPTs where
> "IV uses" associated with memory OPs are placed into groups.  At the
> time of this grouping, checks are done to ensure that for each member
> of a group the constant offsets don't overflow the immediate fields in
> actual machi

Re: [Patch, Fortran, OOP] PR 78443: Incorrect behavior with non_overridable keyword

2016-11-22 Thread Janus Weil
2016-11-22 16:16 GMT+01:00 Steve Kargl :
>> here is a patch for a wrong-code problem with non_overridable
>> type-bound procedures. For details see the PR. Regtests cleanly. Ok
>> for trunk?
>
> OK.

Thanks, Steve. Committed as r242703.


>> Since the patch is very simple and it fixes wrong code which can
>> silently give bad runtime results, I think backporting to the release
>> branches might be a good idea as well. Ok?
>
> OK.

Will do soon (within a week or so).

Cheers,
Janus


[PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

_mm_setzero_di is problematic, because it is outside of AVX512* guarded
area, but it actually requires SSE2 which might not be enabled.
As discussed in the PR, I don't see neither _mm_setzero_[dh]i routines
in ICC headers nor in AVX/AVX512 manuals, and fail to see what the
difference is between those and the standard _mm_setzero_si128.
All these functions return __m128i containing all zeros, how exactly
it is constructed should be irrelevant after folding during gimplification
(all 3 routines gimplify to the same return stmt, __m128i is
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
and therefore all 3 are return (__m128i) { 0, 0 }, it doesn't matter
how those 0s were constructed).

This patch removes those two routines, uses _mm_setzero_si128 instead,
and I've also done some limited formatting fixes (mainly I tried to
fix up calls with no space before ( ).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
shall those be replaced with _mm512_setzero_si512 too?
Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.

2016-11-22  Jakub Jelinek  

PR target/78451
* config/i386/avx512vlintrin.h (_mm_setzero_di): Removed.
(_mm_maskz_mov_epi64): Use _mm_setzero_si128 instead of
_mm_setzero_di.
(_mm_maskz_load_epi64): Likewise.
(_mm_setzero_hi): Removed.
(_mm_maskz_loadu_epi64): Use _mm_setzero_si128 instead of
_mm_setzero_di.
(_mm_abs_epi64, _mm_maskz_abs_epi64, _mm_maskz_srl_epi64,
_mm_maskz_unpackhi_epi64, _mm_maskz_unpacklo_epi64,
_mm_maskz_compress_epi64, _mm_srav_epi64, _mm_maskz_srav_epi64,
_mm_maskz_sllv_epi64, _mm_maskz_srlv_epi64, _mm_rolv_epi64,
_mm_maskz_rolv_epi64, _mm_rorv_epi64, _mm_maskz_rorv_epi64,
_mm_min_epi64, _mm_max_epi64, _mm_max_epu64, _mm_min_epu64,
_mm_lzcnt_epi64, _mm_maskz_lzcnt_epi64, _mm_conflict_epi64,
_mm_maskz_conflict_epi64, _mm_sra_epi64, _mm_maskz_sra_epi64,
_mm_maskz_sll_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
_mm_ror_epi64, _mm_maskz_ror_epi64, _mm_alignr_epi64,
_mm_maskz_alignr_epi64, _mm_srai_epi64, _mm_maskz_slli_epi64):
Likewise.
(_mm_cvtepi32_epi8, _mm256_cvtepi32_epi8, _mm_cvtsepi32_epi8,
_mm256_cvtsepi32_epi8, _mm_cvtusepi32_epi8, _mm256_cvtusepi32_epi8,
_mm_cvtepi32_epi16, _mm256_cvtepi32_epi16, _mm_cvtsepi32_epi16,
_mm256_cvtsepi32_epi16, _mm_cvtusepi32_epi16, _mm256_cvtusepi32_epi16,
_mm_cvtepi64_epi8, _mm256_cvtepi64_epi8, _mm_cvtsepi64_epi8,
_mm256_cvtsepi64_epi8, _mm_cvtusepi64_epi8, _mm256_cvtusepi64_epi8,
_mm_cvtepi64_epi16, _mm256_cvtepi64_epi16, _mm_cvtsepi64_epi16,
_mm256_cvtsepi64_epi16, _mm_cvtusepi64_epi16, _mm256_cvtusepi64_epi16,
_mm_cvtepi64_epi32, _mm256_cvtepi64_epi32, _mm_cvtsepi64_epi32,
_mm256_cvtsepi64_epi32, _mm_cvtusepi64_epi32, _mm256_cvtusepi64_epi32,
_mm_maskz_set1_epi32, _mm_maskz_set1_epi64): Formatting fixes.
(_mm_maskz_cvtps_ph, _mm256_maskz_cvtps_ph): Use _mm_setzero_si128
instead of _mm_setzero_hi.
(_mm256_permutex_pd, _mm256_maskz_permutex_epi64, _mm256_insertf32x4,
_mm256_maskz_insertf32x4, _mm256_inserti32x4, _mm256_maskz_inserti32x4,
_mm256_extractf32x4_ps, _mm256_maskz_extractf32x4_ps,
_mm256_shuffle_i32x4, _mm256_maskz_shuffle_i32x4, _mm256_shuffle_f64x2,
_mm256_maskz_shuffle_f64x2, _mm256_shuffle_f32x4,
_mm256_maskz_shuffle_f32x4, _mm256_maskz_shuffle_pd,
_mm_maskz_shuffle_pd, _mm256_maskz_shuffle_ps, _mm_maskz_shuffle_ps,
_mm256_maskz_srli_epi32, _mm_maskz_srli_epi32, _mm_maskz_srli_epi64,
_mm256_mask_slli_epi32, _mm256_maskz_slli_epi32, _mm256_mask_slli_epi64,
_mm256_maskz_slli_epi64, _mm256_roundscale_ps,
_mm256_maskz_roundscale_ps, _mm256_roundscale_pd,
_mm256_maskz_roundscale_pd, _mm_roundscale_ps, _mm_maskz_roundscale_ps,
_mm_roundscale_pd, _mm_maskz_roundscale_pd, _mm256_getmant_ps,
_mm256_maskz_getmant_ps, _mm_getmant_ps, _mm_maskz_getmant_ps,
_mm256_getmant_pd, _mm256_maskz_getmant_pd, _mm_getmant_pd,
_mm_maskz_getmant_pd, _mm256_maskz_shuffle_epi32,
_mm_maskz_shuffle_epi32, _mm256_rol_epi32, _mm256_maskz_rol_epi32,
_mm_rol_epi32, _mm_maskz_rol_epi32, _mm256_ror_epi32,
_mm256_maskz_ror_epi32, _mm_ror_epi32, _mm_maskz_ror_epi32,
_mm_maskz_alignr_epi32, _mm_maskz_alignr_epi64,
_mm256_maskz_srai_epi32, _mm_maskz_srai_epi32, _mm_srai_epi64,
_mm_maskz_srai_epi64, _mm256_maskz_permutex_pd,
_mm256_maskz_permute_pd, _mm256_maskz_permute_ps, _mm_maskz_permute_pd,
_mm_maskz_permute_ps, _mm256_permutexvar_ps): Formatting fixes.
(_mm_maskz_slli_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
_mm_ror_epi64, _mm_maskz_ror_epi64): Use _mm_se

Re: PR78153

2016-11-22 Thread Prathamesh Kulkarni
On 22 November 2016 at 20:53, Richard Biener  wrote:
> On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 22 November 2016 at 20:18, Richard Biener  wrote:
>> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 21 November 2016 at 15:10, Richard Biener  wrote:
>> >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> Hi,
>> >> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
>> >> >> PTRDIFF_MAX.
>> >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
>> >> >> in the attached patch.
>> >> >>
>> >> >> However it regressed strlenopt-3.c:
>> >> >>
>> >> >> Consider fn1() from strlenopt-3.c:
>> >> >>
>> >> >> __attribute__((noinline, noclone)) size_t
>> >> >> fn1 (char *p, char *q)
>> >> >> {
>> >> >>   size_t s = strlen (q);
>> >> >>   strcpy (p, q);
>> >> >>   return s - strlen (p);
>> >> >> }
>> >> >>
>> >> >> The optimized dump shows the following:
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   size_t _7;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _7 = 0;
>> >> >>   return _7;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> which introduces the regression, because the test expects "return 0;" 
>> >> >> in fn1().
>> >> >>
>> >> >> The issue seems to be in vrp2:
>> >> >>
>> >> >> Before the patch:
>> >> >> Visiting statement:
>> >> >> s_4 = strlen (q_3(D));
>> >> >> Found new range for s_4: VARYING
>> >> >>
>> >> >> Visiting statement:
>> >> >> _1 = s_4;
>> >> >> Found new range for _1: [s_4, s_4]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _7 = s_4 - _1;
>> >> >> Applying pattern match.pd:111, gimple-match.c:27997
>> >> >> Match-and-simplified s_4 - _1 to 0
>> >> >> Intersecting
>> >> >>   [0, 0]
>> >> >> and
>> >> >>   [0, +INF]
>> >> >> to
>> >> >>   [0, 0]
>> >> >> Found new range for _7: [0, 0]
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   long unsigned int _1;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _1 = s_4;
>> >> >>   return 0;
>> >> >>
>> >> >> }
>> >> >>
>> >> >>
>> >> >> After the patch:
>> >> >> Visiting statement:
>> >> >> s_4 = strlen (q_3(D));
>> >> >> Intersecting
>> >> >>   [0, 9223372036854775806]
>> >> >> and
>> >> >>   [0, 9223372036854775806]
>> >> >> to
>> >> >>   [0, 9223372036854775806]
>> >> >> Found new range for s_4: [0, 9223372036854775806]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _1 = s_4;
>> >> >> Intersecting
>> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> and
>> >> >>   [0, 9223372036854775806]
>> >> >> to
>> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> Found new range for _1: [0, 9223372036854775806]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _7 = s_4 - _1;
>> >> >> Intersecting
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> and
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> to
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   long unsigned int _1;
>> >> >>   size_t _7;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _1 = s_4;
>> >> >>   _7 = s_4 - _1;
>> >> >>   return _7;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> Then forwprop4 turns
>> >> >> _1 = s_4
>> >> >> _7 = s_4 - _1
>> >> >> into
>> >> >> _7 = 0
>> >> >>
>> >> >> and we end up with:
>> >> >> _7 = 0
>> >> >> return _7
>> >> >> in optimized dump.
>> >> >>
>> >> >> Running ccp again after forwprop4 trivially solves the issue, however
>> >> >> I am not sure if we want to run ccp again ?
>> >> >>
>> >> >> The issue is probably with extract_range_from_ssa_name():
>> >> >> For _1 = s_4
>> >> >>
>> >> >> Before patch:
>> >> >> VR for s_4 is set to varying.
>> >> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
>> >> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal 
>> >> >> to s_4,
>> >> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
>> >> >> match.pd pattern x - x -> 0).
>> >> >>
>> >> >> After patch:
>> >> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
>> >> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
>> >> >> so IIUC, we t

[PATCH] Add avx5124fmaps,avx5124vnniw to sse-22.c target pragma (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

As mentioned in the PR, these 2 ISAs were added to just the first of the two
Intel specific target pragmas (the first one is used in the sse-22.c test
itself, the second one when it is included from sse-22a.c).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-22  Jakub Jelinek  

PR target/78451
* gcc.target/i386/sse-22.c: Add avx5124fmaps,avx5124vnniw to
GCC target pragma before including immintrin.h.

--- gcc/testsuite/gcc.target/i386/sse-22.c.jj   2016-11-18 20:04:24.0 
+0100
+++ gcc/testsuite/gcc.target/i386/sse-22.c  2016-11-22 12:31:43.721234017 
+0100
@@ -218,7 +218,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int,
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target 
("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi")
+#pragma GCC target 
("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx5124fmaps,avx5124vnniw")
 #endif
 #include 
 test_1 (_cvtss_sh, unsigned short, float, 1)


Jakub


Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Jonathan Wakely

On 22/11/16 16:54 +0100, Marc Glisse wrote:

On Tue, 22 Nov 2016, Uros Bizjak wrote:


New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.


Hello,

I didn't read the patch, but do you think this also fixes PR 53155 ?


No, probably not, as it only changes the "-j N" case, not the "-j"
case in your PR, which doesn't match because the -j gets combined with
other make flags.



[PATCH] PR78465 Remove runtime tests for macros

2016-11-22 Thread Jonathan Wakely

Andrew MacLeod did some digging and foudn that this test was changed
from using #if to using a runtime if and abort() because the LOCK_FREE
macros resolved to runtime calls at one point. However, they later got
changed to predefined macros, and so can be changed back to using #if.

This should fix the regression on Solaris, where the mismatched
abort() declaration causes it to FAIL.

PR libstdc++/78465
* testsuite/29_atomics/headers/atomic/macros.cc: Replace runtime tests
with preprocessor conditions.

Tested x86_64-linux, committed to trunk.

commit e6dcd511c9d9641e15f637cf1337149abf97c1e4
Author: Jonathan Wakely 
Date:   Tue Nov 22 16:17:52 2016 +

PR78465 Remove runtime tests for  macros

PR libstdc++/78465
* testsuite/29_atomics/headers/atomic/macros.cc: Replace runtime tests
with preprocessor conditions.

diff --git a/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc 
b/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
index 9ef8c78..4cb3e1a 100644
--- a/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
+++ b/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++11 } }
+// { dg-do preprocess { target c++11 } }
 
 // Copyright (C) 2008-2016 Free Software Foundation, Inc.
 //
@@ -21,42 +21,61 @@
 
 #ifndef ATOMIC_BOOL_LOCK_FREE 
 # error "ATOMIC_BOOL_LOCK_FREE must be a macro"
+#elif ATOMIC_BOOL_LOCK_FREE != 1 && ATOMIC_BOOL_LOCK_FREE != 2
+# error "ATOMIC_BOOL_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_CHAR_LOCK_FREE 
 # error "ATOMIC_CHAR_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR_LOCK_FREE != 1 && ATOMIC_CHAR_LOCK_FREE != 2
+# error "ATOMIC_CHAR_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_CHAR16_T_LOCK_FREE 
 # error "ATOMIC_CHAR16_T_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR16_T_LOCK_FREE != 1 && ATOMIC_CHAR16_T_LOCK_FREE != 2
 #endif
 
 #ifndef ATOMIC_CHAR32_T_LOCK_FREE 
 # error "ATOMIC_CHAR32_T_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR32_T_LOCK_FREE != 1 && ATOMIC_CHAR32_T_LOCK_FREE != 2
+# error "ATOMIC_CHAR32_T_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_WCHAR_T_LOCK_FREE 
 # error "ATOMIC_WCHAR_T_LOCK_FREE must be a macro"
+#elif ATOMIC_WCHAR_T_LOCK_FREE != 1 && ATOMIC_WCHAR_T_LOCK_FREE != 2
+# error "ATOMIC_WCHAR_T_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_SHORT_LOCK_FREE 
 # error "ATOMIC_SHORT_LOCK_FREE must be a macro"
+#elif ATOMIC_SHORT_LOCK_FREE != 1 && ATOMIC_SHORT_LOCK_FREE != 2
+# error "ATOMIC_SHORT_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_INT_LOCK_FREE 
 # error "ATOMIC_INT_LOCK_FREE must be a macro"
+#elif ATOMIC_INT_LOCK_FREE != 1 && ATOMIC_INT_LOCK_FREE != 2
+# error "ATOMIC_INT_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_LONG_LOCK_FREE 
 # error "ATOMIC_LONG_LOCK_FREE must be a macro"
+#elif ATOMIC_LONG_LOCK_FREE != 1 && ATOMIC_LONG_LOCK_FREE != 2
+# error "ATOMIC_LONG_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_LLONG_LOCK_FREE 
 # error "ATOMIC_LLONG_LOCK_FREE must be a macro"
+#elif ATOMIC_LLONG_LOCK_FREE != 1 && ATOMIC_LLONG_LOCK_FREE != 2
+# error "ATOMIC_LLONG_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_POINTER_LOCK_FREE 
 # error "ATOMIC_POINTER_LOCK_FREE must be a macro"
+#elif ATOMIC_POINTER_LOCK_FREE != 1 && ATOMIC_POINTER_LOCK_FREE != 2
+# error "ATOMIC_POINTER_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_FLAG_INIT
@@ -66,49 +85,3 @@
 #ifndef ATOMIC_VAR_INIT
 #error "ATOMIC_VAR_INIT_must_be_a_macro"
 #endif
-
-
-extern void abort(void);
-
-int main ()
-{
-#if (ATOMIC_BOOL_LOCK_FREE != 1 && ATOMIC_BOOL_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR_LOCK_FREE != 1 && ATOMIC_CHAR_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR16_T_LOCK_FREE != 1 && ATOMIC_CHAR16_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR32_T_LOCK_FREE != 1 && ATOMIC_CHAR32_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_WCHAR_T_LOCK_FREE != 1 && ATOMIC_WCHAR_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_SHORT_LOCK_FREE != 1 && ATOMIC_SHORT_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_INT_LOCK_FREE != 1 && ATOMIC_INT_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_LONG_LOCK_FREE != 1 && ATOMIC_LONG_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_LLONG_LOCK_FREE != 1 && ATOMIC_LLONG_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_POINTER_LOCK_FREE != 1 && ATOMIC_POINTER_LOCK_FREE != 2)
-   abort ();
-#endif
-}


Re: [PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 5:09 PM, Jakub Jelinek  wrote:
> Hi!
>
> _mm_setzero_di is problematic, because it is outside of AVX512* guarded
> area, but it actually requires SSE2 which might not be enabled.
> As discussed in the PR, I don't see neither _mm_setzero_[dh]i routines
> in ICC headers nor in AVX/AVX512 manuals, and fail to see what the
> difference is between those and the standard _mm_setzero_si128.
> All these functions return __m128i containing all zeros, how exactly
> it is constructed should be irrelevant after folding during gimplification
> (all 3 routines gimplify to the same return stmt, __m128i is
> typedef long long __m128i __attribute__ ((__vector_size__ (16), 
> __may_alias__));
> and therefore all 3 are return (__m128i) { 0, 0 }, it doesn't matter
> how those 0s were constructed).
>
> This patch removes those two routines, uses _mm_setzero_si128 instead,
> and I've also done some limited formatting fixes (mainly I tried to
> fix up calls with no space before ( ).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
> shall those be replaced with _mm512_setzero_si512 too?
> Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.

Yes, please also remove these two.

Patch to replace them with _mm512_setzero_si512 is pre-approved.

> 2016-11-22  Jakub Jelinek  
>
> PR target/78451
> * config/i386/avx512vlintrin.h (_mm_setzero_di): Removed.
> (_mm_maskz_mov_epi64): Use _mm_setzero_si128 instead of
> _mm_setzero_di.
> (_mm_maskz_load_epi64): Likewise.
> (_mm_setzero_hi): Removed.
> (_mm_maskz_loadu_epi64): Use _mm_setzero_si128 instead of
> _mm_setzero_di.
> (_mm_abs_epi64, _mm_maskz_abs_epi64, _mm_maskz_srl_epi64,
> _mm_maskz_unpackhi_epi64, _mm_maskz_unpacklo_epi64,
> _mm_maskz_compress_epi64, _mm_srav_epi64, _mm_maskz_srav_epi64,
> _mm_maskz_sllv_epi64, _mm_maskz_srlv_epi64, _mm_rolv_epi64,
> _mm_maskz_rolv_epi64, _mm_rorv_epi64, _mm_maskz_rorv_epi64,
> _mm_min_epi64, _mm_max_epi64, _mm_max_epu64, _mm_min_epu64,
> _mm_lzcnt_epi64, _mm_maskz_lzcnt_epi64, _mm_conflict_epi64,
> _mm_maskz_conflict_epi64, _mm_sra_epi64, _mm_maskz_sra_epi64,
> _mm_maskz_sll_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
> _mm_ror_epi64, _mm_maskz_ror_epi64, _mm_alignr_epi64,
> _mm_maskz_alignr_epi64, _mm_srai_epi64, _mm_maskz_slli_epi64):
> Likewise.
> (_mm_cvtepi32_epi8, _mm256_cvtepi32_epi8, _mm_cvtsepi32_epi8,
> _mm256_cvtsepi32_epi8, _mm_cvtusepi32_epi8, _mm256_cvtusepi32_epi8,
> _mm_cvtepi32_epi16, _mm256_cvtepi32_epi16, _mm_cvtsepi32_epi16,
> _mm256_cvtsepi32_epi16, _mm_cvtusepi32_epi16, _mm256_cvtusepi32_epi16,
> _mm_cvtepi64_epi8, _mm256_cvtepi64_epi8, _mm_cvtsepi64_epi8,
> _mm256_cvtsepi64_epi8, _mm_cvtusepi64_epi8, _mm256_cvtusepi64_epi8,
> _mm_cvtepi64_epi16, _mm256_cvtepi64_epi16, _mm_cvtsepi64_epi16,
> _mm256_cvtsepi64_epi16, _mm_cvtusepi64_epi16, _mm256_cvtusepi64_epi16,
> _mm_cvtepi64_epi32, _mm256_cvtepi64_epi32, _mm_cvtsepi64_epi32,
> _mm256_cvtsepi64_epi32, _mm_cvtusepi64_epi32, _mm256_cvtusepi64_epi32,
> _mm_maskz_set1_epi32, _mm_maskz_set1_epi64): Formatting fixes.
> (_mm_maskz_cvtps_ph, _mm256_maskz_cvtps_ph): Use _mm_setzero_si128
> instead of _mm_setzero_hi.
> (_mm256_permutex_pd, _mm256_maskz_permutex_epi64, _mm256_insertf32x4,
> _mm256_maskz_insertf32x4, _mm256_inserti32x4, 
> _mm256_maskz_inserti32x4,
> _mm256_extractf32x4_ps, _mm256_maskz_extractf32x4_ps,
> _mm256_shuffle_i32x4, _mm256_maskz_shuffle_i32x4, 
> _mm256_shuffle_f64x2,
> _mm256_maskz_shuffle_f64x2, _mm256_shuffle_f32x4,
> _mm256_maskz_shuffle_f32x4, _mm256_maskz_shuffle_pd,
> _mm_maskz_shuffle_pd, _mm256_maskz_shuffle_ps, _mm_maskz_shuffle_ps,
> _mm256_maskz_srli_epi32, _mm_maskz_srli_epi32, _mm_maskz_srli_epi64,
> _mm256_mask_slli_epi32, _mm256_maskz_slli_epi32, 
> _mm256_mask_slli_epi64,
> _mm256_maskz_slli_epi64, _mm256_roundscale_ps,
> _mm256_maskz_roundscale_ps, _mm256_roundscale_pd,
> _mm256_maskz_roundscale_pd, _mm_roundscale_ps, 
> _mm_maskz_roundscale_ps,
> _mm_roundscale_pd, _mm_maskz_roundscale_pd, _mm256_getmant_ps,
> _mm256_maskz_getmant_ps, _mm_getmant_ps, _mm_maskz_getmant_ps,
> _mm256_getmant_pd, _mm256_maskz_getmant_pd, _mm_getmant_pd,
> _mm_maskz_getmant_pd, _mm256_maskz_shuffle_epi32,
> _mm_maskz_shuffle_epi32, _mm256_rol_epi32, _mm256_maskz_rol_epi32,
> _mm_rol_epi32, _mm_maskz_rol_epi32, _mm256_ror_epi32,
> _mm256_maskz_ror_epi32, _mm_ror_epi32, _mm_maskz_ror_epi32,
> _mm_maskz_alignr_epi32, _mm_maskz_alignr_epi64,
> _mm256_maskz_srai_epi32, _mm_maskz_srai_epi32, _mm_srai_epi64,
>

Re: [PATCH] Add avx5124fmaps,avx5124vnniw to sse-22.c target pragma (PR target/78451)

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 5:12 PM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, these 2 ISAs were added to just the first of the two
> Intel specific target pragmas (the first one is used in the sse-22.c test
> itself, the second one when it is included from sse-22a.c).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-22  Jakub Jelinek  
>
> PR target/78451
> * gcc.target/i386/sse-22.c: Add avx5124fmaps,avx5124vnniw to
> GCC target pragma before including immintrin.h.

OK.

Thanks,
Uros.


Re: [PATCH][ARM] PR target/78439: Update movdi constraints for Cortex-A8 tuning to handle LDRD/STRD

2016-11-22 Thread Ramana Radhakrishnan
On Tue, Nov 22, 2016 at 9:57 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This PR is an ICE while bootstrapping GCC with Cortex-A8 tuning, which we
> also get from the default ARMv7-A tuning.
> The ldrd/strd peepholes were recently made more aggressive and in this case
> they transform:
> (insn 13 33 40 2 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S4
> A64])
> (reg:SI 12 ip [orig:117 _20 ] [117])) "cp-demangle.c":32 632
> {*arm_movsi_vfp}
>  (expr_list:REG_DEAD (reg:SI 12 ip [orig:117 _20 ] [117])
> (nil)))
> (insn 40 13 39 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -24 [0xffe8])) [2 d.subs+0 S4 A32])
> (reg/f:SI 13 sp)) "cp-demangle.c":51 632 {*arm_movsi_vfp}
>  (nil))
>
> into:
> (insn 68 33 39 2 (set (mem/c:DI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S8
> A64])
> (reg:DI 12 ip)) "cp-demangle.c":51 -1
>  (nil))
>
> This is okay, but the *movdi_vfp_cortexa8 pattern doesn't deal with the IP
> being the source
> of the store. The reason is that when the LDRD/STRD peepholes and machinery
> was introduced back in r197530
> it created the 'q' constraint which should be used for the register operands
> of the DImode stores and loads
> ('q' means CORE_REGS when LDRD/STRD is enabled in ARM mode and GENERAL_REGS
> otherwise). That revision
> updated the movdi_vfp pattern to use it in alternatives 4,5,6 but neglected
> to udpate the Cortex-A8-specific
> pattern. This is a sign that we should perhaps get rid of this special-cased
> pattern at some point, but for now

I would expect any patch that does this "i.e. remove the pattern" to
be tested to see the impact of the difference in constraints. AFAIR
the pattern was added to distinguish between  Neon for DImode
operations and non-Neon for DImode variations many moons ago. So
please do the archeology and measurements ( look at output from crafty
for a variety of options and a variety of cores before cleaning all
this up).

Ramana


> this simple patch updates the appropriate alternatives to use the 'q'
> constraint so that output_move_double
> can output the correct LDRD/STRD instruction.
>
> Bootstrapped on arm-none-linux-gnueabihf with --with-arch=armv7-a that
> exercises this code (bootstrap currently fails
> without this patch) and tested with /-mtune=cortex-a8.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * config/arm/vfp.md (*movdi_vfp_cortexa8): Use 'q' constraints for the
> register operand in alternatives 4,5,6.
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * gcc.c-torture/compile/pr78439.c: New test.


Re: [PR target/78213] Do not ICE on non-empty -fself-test

2016-11-22 Thread Bernd Schmidt

On 11/16/2016 11:45 AM, Aldy Hernandez wrote:


I would prefer Jakub's suggestion of running in finish_options().


I suspect we'll want both. Selftests should really run in an environment 
that's as close as possible to what would normally be going on in the 
compiler.



I assume there are other places throughout the self-tests that depend on
NOT continuing the compilation process, and I'd hate to plug each one.

Would the attached patch be acceptable to both of you?


Good enough for now.


Bernd


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-22 Thread Kyrill Tkachov


On 18/11/16 12:50, Segher Boessenkool wrote:

On Fri, Nov 18, 2016 at 09:29:13AM +, Kyrill Tkachov wrote:

So your COMPONENTS_FOR_BB returns both components in a pair whenever one
of those is needed?  That should work afaics.

I mean I still want to have one component per register and since
emit_{prologue,epilogue}_components knows how to form pairs from the
components passed down to it I just need to restrict the number of
components in any particular basic block to an even number.
So say a function can wrap 5 registers: x22,x23,x24,x25,x26.
I want get_separate_components to return 5 components since in that hook
we don't know how these registers are distributed across each basic block.
components_for_bb has that information.
In components_for_bb I want to restrict the components for a basic block to
an even number, so if normally all 5 registers would be valid for wrapping
in that bb I'd only choose 4 so I could form 2 pairs. But selecting only 4
of the 5 registers, say only x22,x23,x24,x25 leads to x26 not being saved
or restored at all, even during the normal prologue and epilogue because
x26 was marked as a component in components_for_bb and therefore omitted
from
the prologue and epilogue.
So I'm thinking x26 should be removed from the wrappable components of
a basic block by disqualify_components. I'm trying that approach now.

My suggestion was, in components_for_bb, whenever you mark x22 as needed
you also mark x23 as needed, and whenever you mark x23 as needed you also
mark x22.  I think this is a lot simpler?


But then we'd have cases where we're saving and restoring x23
even when it's not necessary.
In any case, I tried it out and it didn't fix the gobmk issue, though it did 
reduce the code
size increase somewhat.

With the patch already posted at [1] the net result is still positive on
both SPECINT and SPECFP.

I also ran the numbers on a Cortex-A57. The changes are less pronounced
with SPECINT being neutral (gobmk shows only a 0.8% regression) and SPECFP
having a small improvement, due to povray improving by 2.9%.

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01352.html



Segher




Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 4:54 PM, Marc Glisse  wrote:
> On Tue, 22 Nov 2016, Uros Bizjak wrote:
>
>> New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
>> different way. While older makes pass only "-j", newer makes pass e.g.
>> "-j4" when -j is specified on the command line. The detection of "-j"
>> make argument doesn't work in the later case.
>>
>> Attached patch reworks this functionality to detect -j correctly in all
>> cases.
>
>
> Hello,
>
> I didn't read the patch, but do you think this also fixes PR 53155 ?

 Looking at the PR, I don't think so - but I did test my patch with
CentOS 5.11 (with make 3.81) and detection worked there without
problems.

Maybe MAKEFLAGS should be used instead of MFLAGS, since docs mentions
that MFLAGS is intended for historical compatibility?

[1] 
https://www.gnu.org/software/make/manual/html_node/Options_002fRecursion.html

Uros.


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-22 Thread Jeff Law

On 11/08/2016 05:09 PM, Martin Sebor wrote:

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.
Sorry.  Just not getting to many of the pre-stage1 close patches as fast 
as I'd like.


My only real concern here is that if we call compute_builtin_object_size 
without having initialized the passes, then we initialize, compute, then 
finalize.  Subsequent calls will go through the same process -- the key 
being each one re-computes the internal state which might get expensive.


Wouldn't it just make more sense to pull up the init/fini calls, either 
explicitly (which likely means externalizing the init/fini routines) or 
by wrapping all this stuff in a class and instantiating a suitable object?


I think the answer to your memory management question is that internal 
state is likely not marked as a GC root and thus if you get a GC call 
pieces of internal state are not seen as reachable, but you still can 
reference them.  ie, you end up with dangling pointers.


Usually all you'd have to do is mark them so that gengtype will see 
them.  Bitmaps, trees, rtl, are all good examples.  So marking the 
bitmap would look like:


static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where 
someone has an object expected to be live across passes, but it isn't 
reachable because someone failed to register a GC root.


Jeff


Re: [PATCH] Fix PR78230

2016-11-22 Thread Jeff Law

On 11/08/2016 07:43 PM, Kito Cheng wrote:

gcc/testsuite/ChangeLog:

2016-11-09  Kito Cheng 

PR target/78230
* gcc.dg/torture/pr66178.c (test): Use uintptr_t instead of int.
(test2) Ditto.

OK.
jeff


Re: [PATCH] Delete GCJ

2016-11-22 Thread Sandra Loosemore

On 11/21/2016 04:23 PM, Matthias Klose wrote:

On 21.11.2016 18:16, Rainer Orth wrote:

Hi Matthias,


ahh, didn't see that :-/ Now fixed, is this clearer now?

The options @option{--with-target-bdw-gc-include} and
@option{--with-target-bdw-gc-lib} must always specified together for

^ be


thanks to all sorting out the documentation issues. Now attaching the updated
diff. Ok to commit?


The documentation part is OK now.

-Sandra



Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Alexander Monakov
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> with one nit below that needs immediate action and the rest can be resolved
> incrementally.  I'd like to check in afterwards the attached patch, at least
> for now, so that non-offloaded SIMD code is less affected.

Testing your patch revealed an issue in Fortran offloaded code; types of
boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower
runs) don't match.  I'm attaching a revised patch that addresses it by simply
using an integer type (there are also two other minor issues, below).

> Please change this into
> (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> for now, so that we don't waste compile time even when clearly it
> isn't needed, and incrementally change the inliner to propagate
> the property.

As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
needs to accept ACCEL_COMPILER.  Applied like this:

+  virtual bool gate (function *ARG_UNUSED (fun))
+{
+  /* FIXME: this should use PROP_gimple_lomp_dev.  */
+#ifdef ACCEL_COMPILER
+  return true;
+#else
+  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
+#endif
+}


In your GOMP_USE_SIMT() patch,

> @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
>if (max_vf == 0)
>  {
>max_vf = omp_max_vf ();
> +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> +OMP_CLAUSE__SIMT_))
> + {
> +   int max_simt = omp_max_simt_vf ();
> +   max_vf = MAX (max_vf, max_simt);
> + }

I don't believe here there's a need to take a maximum.  Cloning the loop upfront
means that SIMD+SIMT styles are not going to mix within a single loop.  I've
simplified it to an if-then-else in the revised patch.

> @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
>bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
>for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)
>  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> +  bool is_simt
> += (offloaded
> +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> +OMP_CLAUSE__SIMT_)
> +   && safelen_int > 1);

Here computation of 'offloaded' is no longer needed, because presence of
OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.

I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering.
Not sure if that's intentional -- as I understand it's possible to look at the
enclosing context's clauses because 'omp ordered' must be closely nested with
the corresponding loop.  I've added a FIXME in the patch.

Alexander	* internal-fn.c (expand_GOMP_USE_SIMT): New function.
	* tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands.
	(omp_clause_code_name): Add _simt_ name.
	(walk_tree_1): Handle OMP_CLAUSE__SIMT_.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_.
	(scan_omp_simd): New function.
	(scan_omp_1_stmt): Use it in target regions if needed.
	(omp_max_vf): Don't max with omp_max_simt_vf.
	(lower_rec_simd_input_clauses): Use omp_max_simt_vf if
	OMP_CLAUSE__SIMT_ is present.
	(lower_rec_input_clauses): Compute maybe_simt from presence of
	OMP_CLAUSE__SIMT_.
	(lower_lastprivate_clauses): Likewise.
	(expand_omp_simd): Likewise.
	(execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT.
	* internal-fn.def (GOMP_USE_SIMT): New internal function.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 6cd8522..b1dbc98 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -158,6 +158,14 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_USE_SIMT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
without SIMT execution this should be expanded in omp_device_lower pass.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index f055230..9a03e17 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6c52bff..eab0af5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -278,6 

Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-22 Thread Alexander Monakov
On Tue, 15 Nov 2016, Alexander Monakov wrote:
> On Tue, 15 Nov 2016, Alexander Monakov wrote:
> > Yep, I do see new test execution failures with both Intel MIC and PTX 
> > offloading
> > on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested 
> > patch
> > for the first two (on target2.f90 there's a different problem).
> 
> And here's a patch for target2.f90.  I don't have a perfect understanding of
> mapping clauses, but the test appears to need to explicitly map pointer
> variables, at a minimum.  Also, 'map (from: r)' is missing on the last target
> region.
> 
>   * testsuite/libgomp.fortran/target2.f90 (foo): Add mapping clauses to
>   target construct.

Ping.

> diff --git a/libgomp/testsuite/libgomp.fortran/target2.f90 
> b/libgomp/testsuite/libgomp.fortran/target2.f90
> index 42f704f..7119774 100644
> --- a/libgomp/testsuite/libgomp.fortran/target2.f90
> +++ b/libgomp/testsuite/libgomp.fortran/target2.f90
> @@ -63,7 +63,7 @@ contains
>r = r .or. (any (k(5:n-5) /= 17)) .or. (lbound (k, 1) /= 4) .or. 
> (ubound (k, 1) /= n)
>  !$omp end target
>  if (r) call abort
> -!$omp target map (to: d(2:n+1), n)
> +!$omp target map (to: d(2:n+1), f, j) map (from: r)
>r = a /= 7
>r = r .or. (any (b /= 8)) .or. (lbound (b, 1) /= 3) .or. (ubound (b, 
> 1) /= n)
>r = r .or. (any (c /= 9)) .or. (lbound (c, 1) /= 5) .or. (ubound (c, 
> 1) /= n + 4)
> 
> 


Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 08:25:45PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> > with one nit below that needs immediate action and the rest can be resolved
> > incrementally.  I'd like to check in afterwards the attached patch, at least
> > for now, so that non-offloaded SIMD code is less affected.
> 
> Testing your patch revealed an issue in Fortran offloaded code; types of
> boolean_type_node in f951 and boolean_false_node in lto1 (when 
> omp_device_lower
> runs) don't match.  I'm attaching a revised patch that addresses it by simply
> using an integer type (there are also two other minor issues, below).

Ok.

> > Please change this into
> > (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> > for now, so that we don't waste compile time even when clearly it
> > isn't needed, and incrementally change the inliner to propagate
> > the property.
> 
> As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
> needs to accept ACCEL_COMPILER.  Applied like this:
> 
> +  virtual bool gate (function *ARG_UNUSED (fun))
> +{
> +  /* FIXME: this should use PROP_gimple_lomp_dev.  */
> +#ifdef ACCEL_COMPILER
> +  return true;
> +#else
> +  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
> +#endif
> +}

Makes sense.

> > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
> >if (max_vf == 0)
> >  {
> >max_vf = omp_max_vf ();
> > +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> > +  OMP_CLAUSE__SIMT_))
> > +   {
> > + int max_simt = omp_max_simt_vf ();
> > + max_vf = MAX (max_vf, max_simt);
> > +   }
> 
> I don't believe here there's a need to take a maximum.  Cloning the loop 
> upfront
> means that SIMD+SIMT styles are not going to mix within a single loop.  I've
> simplified it to an if-then-else in the revised patch.

Ok.

> > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
> >bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
> >for (struct omp_region *rgn = region; !offloaded && rgn; rgn = 
> > rgn->outer)
> >  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> > -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> > +  bool is_simt
> > += (offloaded
> > +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> > +  OMP_CLAUSE__SIMT_)
> > +   && safelen_int > 1);
> 
> Here computation of 'offloaded' is no longer needed, because presence of
> OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.
> 
> I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" 
> lowering.
> Not sure if that's intentional -- as I understand it's possible to look at the
> enclosing context's clauses because 'omp ordered' must be closely nested with

Right now omp ordered simd for non-simt basically causes vf 1, because the
vectorizer isn't ready for having non-vectorized portions of code within
vectorized loop.

> the corresponding loop.  I've added a FIXME in the patch.

Ok for trunk, thanks.

Jakub


Go patch committed: Rewrite panic/defer code from C to Go

2016-11-22 Thread Ian Lance Taylor
This patch to the Go frontend and libgo rewrites the panic/defer code
from C to Go.  The actual stack unwind code is still in C, but the
rest of the code, notably all the memory allocation, is now in Go.
The names are changed to the names used in the Go 1.7 runtime, but the
code is necessarily somewhat different.

The __go_makefunc_can_recover function is dropped, as the uses of it
were removed by Richard Henderson's work in
https://golang.org/cl/198770044.

This moves more memory allocation from C to Go, which will simplify
the move to the new concurrent garbage collector.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2016-11-22  Ian Lance Taylor  

* go-gcc.cc (Gcc_backend::Gcc_backend): Add builtin function
__builtin_frame_address.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 242581)
+++ gcc/go/go-gcc.cc(working copy)
@@ -828,6 +828,15 @@ Gcc_backend::Gcc_backend()
   this->define_builtin(BUILT_IN_FRAME_ADDRESS, "__builtin_frame_address",
   NULL, t, false, false);
 
+  // The runtime calls __builtin_extract_return_addr when recording
+  // the address to which a function returns.
+  this->define_builtin(BUILT_IN_EXTRACT_RETURN_ADDR,
+  "__builtin_extract_return_addr", NULL,
+  build_function_type_list(ptr_type_node,
+   ptr_type_node,
+   NULL_TREE),
+  false, false);
+
   // The compiler uses __builtin_trap for some exception handling
   // cases.
   this->define_builtin(BUILT_IN_TRAP, "__builtin_trap", NULL,
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242600)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-bf4762823c4543229867436399be3ae30b4d13bb
+7593cc83a03999331c5e2dc65a9306c5fe57dfd0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h (revision 242581)
+++ gcc/go/gofrontend/backend.h (working copy)
@@ -707,7 +707,7 @@ class Backend
   // Create a statement that runs all deferred calls for FUNCTION.  This should
   // be a statement that looks like this in C++:
   //   finish:
-  // try { UNDEFER; } catch { CHECK_DEFER; goto finish; }
+  // try { DEFER_RETURN; } catch { CHECK_DEFER; goto finish; }
   virtual Bstatement*
   function_defer_statement(Bfunction* function, Bexpression* undefer,
Bexpression* check_defer, Location) = 0;
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 242581)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -280,7 +280,7 @@ Node::op_format() const
{
  switch (e->func_expression()->runtime_code())
{
-   case Runtime::PANIC:
+   case Runtime::GOPANIC:
  op << "panic";
  break;
 
@@ -300,11 +300,11 @@ Node::op_format() const
  op << "make";
  break;
 
-   case Runtime::DEFER:
+   case Runtime::DEFERPROC:
  op << "defer";
  break;
 
-   case Runtime::RECOVER:
+   case Runtime::GORECOVER:
  op << "recover";
  break;
 
@@ -1189,7 +1189,7 @@ Escape_analysis_assign::expression(Expre
  {
switch (fe->runtime_code())
  {
- case Runtime::PANIC:
+ case Runtime::GOPANIC:
{
  // Argument could leak through recover.
  Node* panic_arg = Node::make_node(call->args()->front());
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 242581)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -8951,7 +8951,7 @@ Builtin_call_expression::do_get_backend(
 arg = Expression::convert_for_assignment(gogo, empty, arg, location);
 
 Expression* panic =
-Runtime::make_call(Runtime::PANIC, location, 1, arg);
+Runtime::make_call(Runtime::GOPANIC, location, 1, arg);
 return panic->get_backend(context);
   }
 
@@ -8972,8 +8972,8 @@ Builtin_call_expression::do_get_backend(
// because it changes whether it can recover a panic or not.
// See test7 in test/recover1.go.
 Expression* recover = Runtime::make_call((this->is_deferred()
-  ? Runtime::DEFERRED_RECOVER
-  : Ru

[PATCH, Fortran, cosmetics] Use convenience functions and constants

2016-11-22 Thread Andre Vehreschild
Hi all,

during more hacking on the allocatable components in derived type coarrays, I
encountered the improvable code fragments in the patch attached.

Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?

Regards,
Andre

PS: The patch that motivated these changes follows as soon as its regtesting
has finished.
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


cosmetics_4.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 1708f7c..45e1369 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7855,9 +7859,7 @@ duplicate_allocatable (tree dest, tree src, tree type, int rank,

   if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (dest)))
 {
-  tmp = null_pointer_node;
-  tmp = fold_build2_loc (input_location, MODIFY_EXPR, type, dest, tmp);
-  gfc_add_expr_to_block (&block, tmp);
+  gfc_add_modify (&block, dest, fold_convert (type, null_pointer_node));
   null_data = gfc_finish_block (&block);

   gfc_init_block (&block);
@@ -7869,9 +7871,7 @@ duplicate_allocatable (tree dest, tree src, tree type, int rank,
   if (!no_malloc)
{
  tmp = gfc_call_malloc (&block, type, size);
- tmp = fold_build2_loc (input_location, MODIFY_EXPR, void_type_node,
-dest, fold_convert (type, tmp));
- gfc_add_expr_to_block (&block, tmp);
+ gfc_add_modify (&block, dest, fold_convert (type, tmp));
}

   if (!no_memcpy)
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index ba71a21..2e6ef2a 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5093,8 +5103,8 @@ generate_coarray_sym_init (gfc_symbol *sym)
 build_int_cst (integer_type_node, reg_type),
 token, gfc_build_addr_expr (pvoid_type_node, desc),
 null_pointer_node, /* stat.  */
-null_pointer_node, /* errgmsg, errmsg_len.  */
-build_int_cst (integer_type_node, 0));
+null_pointer_node, /* errgmsg.  */
+integer_zero_node); /* errmsg_len.  */
   gfc_add_expr_to_block (&caf_init_block, tmp);
   gfc_add_modify (&caf_init_block, decl, fold_convert (TREE_TYPE (decl),
  gfc_conv_descriptor_data_get (desc)));


Re: [patch,avr] Fix PR60300: Minor prologue improvement.

2016-11-22 Thread Denis Chertykov
2016-11-22 15:41 GMT+03:00 Georg-Johann Lay :
> This patch is a minor improvement of prologue length.  It now allows frame
> sizes of up to 11 to be allocated by RCALL + PUSH 0 sequences but limits the
> number of RCALLs to 3.
>
> The PR has some discussion on size vs. speed consideration w.r. to using
> RCALL in prologues, and following that I picked the rather arbitrary upper
> bound of 3 RCALLs.  The prior maximal frame size opt to such sequences was 6
> which also never produced more than 3 RCALLs.
>
> Ok for trunk?
>
>
> Johann
>
> gcc/
> PR target/60300
> * config/avr/constraints.md (Csp): Widen range to [-11..6].
> * config/avr/avr.c (avr_prologue_setup_frame): Limit number
> of RCALLs in prologue to 3.

Approved.


Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-22 Thread Denis Chertykov
Do you have any objections, George ?

2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi
:
> Ping!
>
> On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote:
>>
>> Ping!
>>
>> On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote:
>>>
>>> On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:

 On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:
>
> On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:
>>
>> On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:
>>>
>>> I have updated patch to include the flash size as well. Took that
>>> info from
>>> device headers (it was fed into crt's device information note section
>>> also).


 The new option would render -mn-flash superfluous, but we should
 keep it for
 backward compatibility.
>>>
>>> Ok.

 Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
 LINK_RELAX_SPEC?  And what happens if relaxation is off?
>>>
>>> Yes. Removed link_pmem_wrap from link_relax.
>>> Disabling relaxation doesn't change -mpmem-wrap-around behavior.
>>> 
>>> flashsize-and-wrap-around.patch
>>
>>
>>> diff --git a/gcc/config/avr/avr-mcus.def
>>> b/gcc/config/avr/avr-mcus.def
>>> index 6bcc6ff..9d4aa1a 100644
>>
>>
>>>  /*
>>
>> 
>
>  /* Classic, > 8K, <= 64K.  */
> -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
> 0x0060, 0x0, 1)
> -AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB355__",0x0060, 0x0, 1)
> -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT76C711__",  0x0060, 0x0, 1)
> +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
> 0x0060, 0x0, 1, 0x6000)
> +AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
> +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
> +AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)
>  /* Classic, == 128K.  */
> -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
> 0x0060, 0x0, 2)
> -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
> "__AVR_ATmega103__", 0x0060, 0x0, 2)
> -AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
> "__AVR_AT43USB320__",   0x0060, 0x0, 2)
> +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
> 0x0060, 0x0, 2, 0x2)
> +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
> "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
>  /* Classic + MOVW + JMP/CALL.  */


 If at43usb320 is in the wrong multilib, then this should be handled as
 separate issue / patch together with its own PR. Sorry for the confusion.  
 I
 just noticed that some fields don't match...

 It is not even clear to me from the data sheet if avr3 is the correct
 multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it also 
 has
 MUL) as there is no reference to the exact instruction set -- Atmochip will
 know.

 Moreover, such a change should be sync'ed with avr-libc as all multilib
 stuff is hand-wired there: no use of --print-foo meta information retrieval
 by avr-libc :-((

 I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 for
 this one.

>>> Thats better. I've attached the updated patch. If OK, could someone
>>> commit please?
>>>
>>> I'll try if I could find some more info for AT43USB320.
>>>
>>> Regards,
>>> Pitchumani
>>>
>>
>


formatting cleanups

2016-11-22 Thread Nathan Sidwell

I noticed some wonky formatting.  Fixed as obvious.

nathan
--
Nathan Sidwell
2016-11-22  Nathan Sidwell  

	gcc/
	* gcc-ar.c (main): Fix indentation.
	* gcov-io.c (gcov_write_summary): Remove extraneous {...}
	* ggc-page.c (move_ptes_to_front): Fix formatting.
	* hsa-dump.c (dump_has_cfun): Fix indentation.
	* sel-sched-ir.h: Remove trailing blank lines.

	gcc/c-family/
	* array-notation-common.c (cilkplus_extrat_an_triplets): Fix
	indentation.

Index: gcc-ar.c
===
--- gcc-ar.c	(revision 242695)
+++ gcc-ar.c	(working copy)
@@ -162,7 +162,7 @@ main (int ac, char **av)
 
 	  len = strlen (arg);
 	  if (len > 0)
-		  len--;
+	len--;
 	  end = arg + len;
 
 	  /* Always add a dir separator for the prefix list.  */
Index: gcov-io.c
===
--- gcov-io.c	(revision 242695)
+++ gcov-io.c	(working copy)
@@ -421,13 +421,11 @@ gcov_write_summary (gcov_unsigned_t tag,
 histo_bitvector[bv_ix] = 0;
   csum = &summary->ctrs[GCOV_COUNTER_ARCS];
   for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
-{
-  if (csum->histogram[h_ix].num_counters > 0)
-{
-  histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32);
-  h_cnt++;
-}
-}
+if (csum->histogram[h_ix].num_counters)
+  {
+	histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32);
+	h_cnt++;
+  }
   gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH (h_cnt));
   gcov_write_unsigned (summary->checksum);
   for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
Index: hsa-dump.c
===
--- hsa-dump.c	(revision 242695)
+++ hsa-dump.c	(working copy)
@@ -1130,10 +1130,10 @@ dump_hsa_cfun (FILE *f)
 }
 
   FOR_ALL_BB_FN (bb, cfun)
-  {
-hsa_bb *hbb = (struct hsa_bb *) bb->aux;
-dump_hsa_bb (f, hbb);
-  }
+{
+  hsa_bb *hbb = (struct hsa_bb *) bb->aux;
+  dump_hsa_bb (f, hbb);
+}
 }
 
 /* Dump textual representation of HSA IL instruction INSN to stderr.  */
Index: sel-sched-ir.h
===
--- sel-sched-ir.h	(revision 242695)
+++ sel-sched-ir.h	(working copy)
@@ -1669,11 +1669,3 @@ extern void alloc_sched_pools (void);
 extern void free_sched_pools (void);
 
 #endif /* GCC_SEL_SCHED_IR_H */
-
-
-
-
-
-
-
-
Index: c-family/array-notation-common.c
===
--- c-family/array-notation-common.c	(revision 242695)
+++ c-family/array-notation-common.c	(working copy)
@@ -621,21 +621,21 @@ cilkplus_extract_an_triplets (vec

Re: formatting cleanups

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 01:45:07PM -0500, Nathan Sidwell wrote:
> - tree ii_tree = array_exprs[ii][jj];
> - (*node)[ii][jj].is_vector = true;
> - (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
> - (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
> - (*node)[ii][jj].length =
> -   fold_build1 (CONVERT_EXPR, integer_type_node,
> -ARRAY_NOTATION_LENGTH (ii_tree));
> - (*node)[ii][jj].stride =
> -   fold_build1 (CONVERT_EXPR, integer_type_node,
> -ARRAY_NOTATION_STRIDE (ii_tree));
> -   }
> +  for (size_t ii = 0; ii < size; ii++)
> +if (TREE_CODE ((*list)[ii]) == ARRAY_NOTATION_REF)
> +  for (size_t jj = 0; jj < rank; jj++)
> + {
> +   tree ii_tree = array_exprs[ii][jj];
> +   (*node)[ii][jj].is_vector = true;
> +   (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
> +   (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
> +   (*node)[ii][jj].length =
> + fold_build1 (CONVERT_EXPR, integer_type_node,
> +  ARRAY_NOTATION_LENGTH (ii_tree));
> +   (*node)[ii][jj].stride =
> + fold_build1 (CONVERT_EXPR, integer_type_node,
> +  ARRAY_NOTATION_STRIDE (ii_tree));

When you are already changing this, the = should be on the next line.

Jakub


[PATCH, Fortran, accaf, v1] Add caf-API-calls to asynchronously handle allocatable components in derived type coarrays.

2016-11-22 Thread Andre Vehreschild
Hi all,

attached patch addresses the need of extending the API of the caf-libs to
enable allocatable components asynchronous allocation. Allocatable components
in derived type coarrays are different from regular coarrays or coarrayed
components. The latter have to be allocated on all images or on none.
Furthermore is the allocation a point of synchronisation.

For allocatable components the F2008 allows to have some allocated on some
images and on others not. Furthermore is the registration with the caf-lib, that
an allocatable component is present in a derived type coarray no longer a
synchronisation point. To implement these features two new types of coarray
registration have been introduced. The first one just registering the component
with the caf-lib and the latter doing the allocate. Furthermore has the caf-API
been extended to provide a query function to learn about the allocation status
of a component on a remote image. 

Sorry, that the patch is rather lengthy. Most of this is due to the
structure_alloc_comps' signature change. The routine and its wrappers are used
rather often which needed the appropriate changes.

I know I left two or three TODOs in the patch to remind me of things I have to
investigate further. For the current state these TODOs are no reason to hold
back the patch. The third party library opencoarrays implements the mpi-part of
the caf-model and will change in sync. It would of course be advantageous to
just have to say: With gcc-7 gfortran implements allocatable components in
derived coarrays nearly completely.

I know we are in stage 3. But the patch bootstraps and regtests ok on
x86_64-linux/F23. So, is it ok for trunk or shall it go to 7.2?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
libgfortran/ChangeLog:

2016-11-22  Andre Vehreschild  

* caf/libcaf.h: Add new action types for (de-)registration of
allocatable components in derived type coarrays.  Add _caf_is_present
prototype.
* caf/single.c (_gfortran_caf_register): Add support for registration
only and allocation of already registered allocatable components in
derived type coarrays.
(_gfortran_caf_deregister): Add mode to deallocate but not deregister
an allocatable component in a derived type coarray.
(_gfortran_caf_is_present): New function.  Query whether an
allocatable component in a derived type coarray on a remote image is
allocated.


gcc/testsuite/ChangeLog:

2016-11-22  Andre Vehreschild  

* gfortran.dg/coarray/alloc_comp_1.f90: Fix tree-dump scans to adhere
to the changed interfaces.
* gfortran.dg/coarray_alloc_comp_1.f08: Likewise.
* gfortran.dg/coarray_allocate_7.f08: Likewise.
* gfortran.dg/coarray_lib_alloc_1.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_2.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_3.f90: Likewise.
* gfortran.dg/coarray_lib_comm_1.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_4.f90: New test.

gcc/fortran/ChangeLog:

2016-11-22  Andre Vehreschild  

* check.c (gfc_check_allocated): By pass the caf_get call and check on
the array.
* gfortran.h: Add optional flag to gfc_caf_attr.
* gfortran.texi: Document new enum values and _caf_is_present function.
* primary.c (caf_variable_attr): Add optional flag to indicate that the
expression is reffing a component.
(gfc_caf_attr): Likewise.
* trans-array.c (gfc_array_deallocate): Handle deallocation mode for
coarray deregistration.
(gfc_trans_dealloc_allocated): Likewise.
(duplicate_allocatable_coarray): This function is similar to
duplicate_allocatable but tailored to handle coarrays.
(structure_alloc_comps): A mode for handling coarrays, that is no
longer encode in the purpose.  This makes the use cases of the
routine more flexible without repeating.  Allocatable components in
derived type coarrays are now registered only when nullifying an
object and allocated before copying data into them.
(gfc_nullify_alloc_comp): Use the caf_mode of structure_alloc_comps
now.
(gfc_deallocate_alloc_comp): Likewise.
(gfc_deallocate_alloc_comp_no_caf): Likewise.
(gfc_reassign_alloc_comp_caf): Likewise.
(gfc_copy_alloc_comp): Likewise.
(gfc_copy_only_alloc_comp): Likewise.
(gfc_alloc_allocatable_for_assignment): Make use to the cheaper way of
reallocating a coarray without deregistering and reregistering it.
(gfc_trans_deferred_array): Initialize the coarray token correctly for
deferred variables and tear them down on exit.
* trans-array.h: Change some prototypes to add the coarray (de-)
registration modes.
* trans-decl.c (gfc_build_builtin_function_decls): Generate the
declarations for the changed/new caf-lib routine

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:14 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:43:02PM -0800, Cesar Philippidis wrote:
>> +error_at (OMP_CLAUSE_LOCATION (c),
>> +  "%qs specifies a conflicting level of parallelism",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
>> +inform (OMP_CLAUSE_LOCATION (c_level),
>> +"... to the previous %qs clause here",
> 
> I think the '... ' part is unnecessary.
> Perhaps word it better like we word errors/warnings for mismatched
> attributes etc.?
> 
>> +incompatible:
>> +  if (c_diag != NULL_TREE)
>> +error_at (OMP_CLAUSE_LOCATION (c_diag),
>> +  "incompatible %qs clause when applying"
>> +  " %<%s%> to %qD, which has already been"
>> +  " marked as an accelerator routine",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)],
>> +  routine_str, fndecl);
>> +  else if (c_diag_p != NULL_TREE)
>> +error_at (loc,
>> +  "missing %qs clause when applying"
>> +  " %<%s%> to %qD, which has already been"
>> +  " marked as an accelerator routine",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)],
>> +  routine_str, fndecl);
>> +  else
>> +gcc_unreachable ();
>> +  if (c_diag_p != NULL_TREE)
>> +inform (OMP_CLAUSE_LOCATION (c_diag_p),
>> +"... with %qs clause here",
>> +omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)]);
> 
> Again, I think this usually would be something like "previous %qs clause"
> or similar in the inform.  Generally, I think the error message should
> be self-contained and infom should be just extra information, rather than
> error containing first half of the diagnostic message and inform the second
> one.  E.g. for translations, while such a sentence crossing the two
> diagnostic routines might make sense in english, it might look terrible in
> other languages.
> 
>> +  else
>> +{
>> +  /* In the front ends, we don't preserve location information for the
>> + OpenACC routine directive itself.  However, that of c_level_p
>> + should be close.  */
>> +  location_t loc_routine = OMP_CLAUSE_LOCATION (c_level_p);
>> +  inform (loc_routine, "... without %qs clause near to here",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)]);
>> +}
>> +  /* Incompatible.  */
>> +  return -1;
>> +}
>> +
>> +  return 0;

I've incorporated those changes in this patch. Is it ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c-family/
	* c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
	and PRAGMA_OACC_CLAUSE_NOHOST.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	(verify_oacc_routine_clauses): New function.
	(maybe_discard_oacc_function): New function.
	(execute_oacc_device_lower): Don't generate code for NOHOST.
	* omp-low.h (verify_oacc_routine_clauses): Declare.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (omp_clause_num_ops): Likewise.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
	* tree.h (OMP_CLAUSE_BIND_NAME): Define.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 964efe9..4b8 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -322,7 +322,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_omp_declare_simd_attribute, false },
   { "simd",		  0, 1, true,  false, false,
 			  handle_simd_attribute, false },
-  { "omp declare target", 0, 0, true, false, false,
+  { "omp declare target", 0, -1, true, false, false,
 			  handle_omp_declare_target_attribute, false },
   { "omp declare target link", 0, 0, true, false, false,
 			  handle_omp_declare_target_attribute, false },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6d9cb08..dd2722a 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -149,6 +149,7 @@ enum pragma_omp_clause {
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC = PRAGMA_CILK_CLAUSE_VECTORLENGTH + 1,
   PRAGMA_OACC_CLAUSE_AUTO,
+  PRAGMA_OACC_CLAUSE_BIND,
   PRAGMA_OACC_CLAUSE_COPY,
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
@@ -158,6 +159,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8611060..04b591e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimp

Re: [PATCH] OpenACC routines -- c front end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:21 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:43:23PM -0800, Cesar Philippidis wrote:
>> @@ -11801,12 +11807,11 @@ c_parser_oacc_shape_clause (c_parser *parser, 
>> omp_clause_code kind,
>>  }
>>  
>>location_t expr_loc = c_parser_peek_token (parser)->location;
>> -  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
>> -  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
>> -  tree expr = cexpr.value;
>> +  tree expr = c_parser_expr_no_commas (parser, NULL).value;
>>if (expr == error_mark_node)
>>  goto cleanup_error;
>>  
>> +  mark_exp_read (expr);
>>expr = c_fully_fold (expr, false, NULL);
>>  
>>/* Attempt to statically determine when the number isn't a
> 
> Why?  Are the arguments of the clauses lvalues?

The spec is unclear if those args must be constants or not. The only
time it explicitly mentions constant int-expr is for the tile clause,
which was added late. Gang, worker and vector were added early in the
1.0 spec, where things were defined somewhat loosely.

>> @@ -11867,12 +11872,12 @@ c_parser_oacc_shape_clause (c_parser *parser, 
>> omp_clause_code kind,
>> seq */
>>  
>>  static tree
>> -c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
>> - tree list)
>> +c_parser_oacc_simple_clause (c_parser * /* parser */, location_t loc,
> 
> Just leave it as c_parser *, or better yet remove the argument if you don't
> need it.

I removed that argument.

>> +  else
>> +{
>> +  //TODO? TREE_USED (decl) = 1;
> 
> This would be /* FIXME: TREE_USED (decl) = 1;  */
> but wouldn't it be better to figure out if you want to do that or not?

Thomas has more state on that, but it seems unneeded. The c++ FE doesn't
do that either, so I removed that comment.

Is this patch ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Handle OpenACC bind and
	nohost.
	(c_parser_oacc_shape_clause): New location_t loc argument.  Use it
	to report more accurate diagnostics.
	(c_parser_oacc_simple_clause): Likewise.
	(c_parser_oacc_clause_bind): New function.
	(c_parser_oacc_all_clauses): Handle OpenACC bind and nohost clauses.
	Update calls to c_parser_oacc_{simple,shape}_clause.
	(OACC_ROUTINE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{BIND,NOHOST}.
	(c_parser_oacc_routine): Update diagnostics.
	(c_finish_oacc_routine): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_{BIND,NOHOST}.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 00fe731..fd87b54 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10408,6 +10408,10 @@ c_parser_omp_clause_name (c_parser *parser)
 	  else if (!strcmp ("async", p))
 	result = PRAGMA_OACC_CLAUSE_ASYNC;
 	  break;
+	case 'b':
+	  if (!strcmp ("bind", p))
+	result = PRAGMA_OACC_CLAUSE_BIND;
+	  break;
 	case 'c':
 	  if (!strcmp ("collapse", p))
 	result = PRAGMA_OMP_CLAUSE_COLLAPSE;
@@ -10489,6 +10493,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	result = PRAGMA_OMP_CLAUSE_NOTINBRANCH;
 	  else if (!strcmp ("nowait", p))
 	result = PRAGMA_OMP_CLAUSE_NOWAIT;
+	  else if (!strcmp ("nohost", p))
+	result = PRAGMA_OACC_CLAUSE_NOHOST;
 	  else if (!strcmp ("num_gangs", p))
 	result = PRAGMA_OACC_CLAUSE_NUM_GANGS;
 	  else if (!strcmp ("num_tasks", p))
@@ -11676,12 +11682,12 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 */
 
 static tree
-c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+c_parser_oacc_shape_clause (c_parser *parser, location_t loc,
+			omp_clause_code kind,
 			const char *str, tree list)
 {
   const char *id = "num";
   tree ops[2] = { NULL_TREE, NULL_TREE }, c;
-  location_t loc = c_parser_peek_token (parser)->location;
 
   if (kind == OMP_CLAUSE_VECTOR)
 id = "length";
@@ -11746,12 +11752,11 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 	}
 
 	  location_t expr_loc = c_parser_peek_token (parser)->location;
-	  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
-	  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
-	  tree expr = cexpr.value;
+	  tree expr = c_parser_expr_no_commas (parser, NULL).value;
 	  if (expr == error_mark_node)
 	goto cleanup_error;
 
+	  mark_exp_read (expr);
 	  expr = c_fully_fold (expr, false, NULL);
 
 	  /* Attempt to statically determine when the number isn't a
@@ -11812,12 +11817,12 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
seq */
 
 static tree
-c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+c_parser_oacc_simple_clause (location_t loc, enum omp_clause_code code,
 			 tree list)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
 
-  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+  tree c = build_omp_clause (loc, code);
   OMP_CL

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 11:53:50AM -0800, Cesar Philippidis wrote:
> I've incorporated those changes in this patch. Is it ok for trunk?

The ChangeLog mentions omp-low.[ch] changes, but the patch doesn't include
them.
Have they been dropped, or moved to another patch?

> 2016-11-22  Cesar Philippidis  
>   Thomas Schwinge  
> 
>   gcc/c-family/
>   * c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
>   * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
>   and PRAGMA_OACC_CLAUSE_NOHOST.
> 
>   gcc/
>   * gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
>   OMP_CLAUSE_NOHOST.
>   (gimplify_adjust_omp_clauses): Likewise.
>   * omp-low.c (scan_sharing_clauses): Likewise.
>   (verify_oacc_routine_clauses): New function.
>   (maybe_discard_oacc_function): New function.
>   (execute_oacc_device_lower): Don't generate code for NOHOST.
>   * omp-low.h (verify_oacc_routine_clauses): Declare.
>   * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
>   OMP_CLAUSE_NOHOST.
>   * tree-pretty-print.c (dump_omp_clause): Likewise.
>   * tree.c (omp_clause_num_ops): Likewise.
>   (omp_clause_code_name): Likewise.
>   (walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
>   * tree.h (OMP_CLAUSE_BIND_NAME): Define.

Jakub


Re: [PATCH] OpenACC routines -- fortran front end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:29 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:44:07PM -0800, Cesar Philippidis wrote:
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -314,6 +314,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
> 
> Please add a newline after {.
> 
>>if (clauses)
>>  {
>>unsigned mask = 0;
>>  
>>if (clauses->gang)
>> -level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_GANG;
>> +}
>>if (clauses->worker)
>> -level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_WORKER;
>> +}
>>if (clauses->vector)
>> -level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_VECTOR;
>> +}
> 
> As you have {}s around, please use
>   level = GOMP_DIM_*;
>   mask |= GOMP_DIM_MASK (level);
>   ret = OACC_FUNCTION_*;
> 
>>if (clauses->seq)
>>  level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>>  
>>if (mask != (mask & -mask))
>> -gfc_error ("Multiple loop axes specified for routine");
>> +ret = OACC_FUNCTION_NONE;
>>  }
>>  
>> -  if (level < 0)
>> -level = GOMP_DIM_MAX;
>> -
>> -  return level;
>> +  return ret;
>>  }
>>  
>>  match
>>  gfc_match_oacc_routine (void)
>>  {
>>locus old_loc;
>> -  gfc_symbol *sym = NULL;
>>match m;
>> +  gfc_intrinsic_sym *isym = NULL;
>> +  gfc_symbol *sym = NULL;
>>gfc_omp_clauses *c = NULL;
>>gfc_oacc_routine_name *n = NULL;
>> +  oacc_function dims = OACC_FUNCTION_NONE;
>> +  bool seen_error = false;
>>  
>>old_loc = gfc_current_locus;
>>  
>> @@ -2287,45 +2314,52 @@ gfc_match_oacc_routine (void)
>>if (m == MATCH_YES)
>>  {
>>char buffer[GFC_MAX_SYMBOL_LEN + 1];
>> -  gfc_symtree *st;
>> +  gfc_symtree *st = NULL;
>>  
>>m = gfc_match_name (buffer);
>>if (m == MATCH_YES)
>>  {
>> -  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
>> +  if ((isym = gfc_find_function (buffer)) == NULL
>> +  && (isym = gfc_find_subroutine (buffer)) == NULL)
>> +{
>> +  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
>> +  if (st == NULL && gfc_current_ns->proc_name->attr.contained
> 
> Please add a newline before &&.
> 
>> +  && gfc_current_ns->parent)
>> +st = gfc_find_symtree (gfc_current_ns->parent->sym_root,
>> +   buffer);
>> +}
> 
>> @@ -5934,6 +6033,21 @@ gfc_resolve_oacc_blocks (gfc_code *code, 
>> gfc_namespace *ns)
>>ctx.private_iterators = new hash_set;
>>ctx.previous = omp_current_ctx;
>>ctx.is_openmp = false;
>> +
>> +  if (code->ext.omp_clauses->gang)
>> +dims = OACC_FUNCTION_GANG;
>> +  if (code->ext.omp_clauses->worker)
>> +dims = OACC_FUNCTION_WORKER;
>> +  if (code->ext.omp_clauses->vector)
>> +dims = OACC_FUNCTION_VECTOR;
>> +  if (code->ext.omp_clauses->seq)
>> +dims = OACC_FUNCTION_SEQ;
> 
> Shouldn't these be else if ?
>> +
>> +  if (dims == OACC_FUNCTION_NONE && ctx.previous != NULL
> 
> Again, as the whole condition doesn't fit on one line, please
> put && on a new line.
>> +  && !ctx.previous->is_openmp)
>> +dims = ctx.previous->dims;

I've address those issues in this patch. Is it ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (enum oacc_function): Make OACC_FUNCTION_SEQ the last
	entry the enum.
	(oacc_function_types): Declare.
	(symbol_attribute): Add oacc_function, oacc_function_nohost members.
	(gfc_omp_clauses): Add routine_bind, nohost, bind members.
	(gfc_oacc_routine_name): Add loc.
	(gfc_resolve_oacc_routine_call): Declare.
	(gfc_resolve_oacc_routines): Declare.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set the oacc_function attribute.
	* openmp.c (enum omp_mask2): Add OMP_CLAUSE_BIND and OMP_CLAUSE_NOHOST.
	(gfc_match_omp_clauses): Likewise.
	(OACC_ROUTINE_CLAUSES): Add OMP_CLAUSE_BIND and OMP_CLAUSE_NOHOST.
	(gfc_oacc_routine_dims): Change the type of oacc_function from unsigned
	to an ENUM_BITFIELD.Move gfc_error to gfc_match_oacc_routine.  Return
	OACC_FUNCTION_NONE on error.
	(gfc_match_oacc_routine):  Make error reporting more
	precise.  Defer rejection of non-function and subroutine symbols
	until gfc_resolve_oacc_routines.
	(struct fortran_omp_context): Add a dims member.
	(gfc_resolve_oacc_blocks): Update ctx->dims.
	(gfc_resolve_oacc_routine_call): New function.
	(gfc_resolve_oacc_routines): New function.
	* resolve.c (resolve_function): Call gfc_resolve_oacc_routine_call.
	(resolve_c

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Cesar Philippidis
On 11/22/2016 11:58 AM, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 11:53:50AM -0800, Cesar Philippidis wrote:
>> I've incorporated those changes in this patch. Is it ok for trunk?
> 
> The ChangeLog mentions omp-low.[ch] changes, but the patch doesn't include
> them.
> Have they been dropped, or moved to another patch?

No, sorry I forgot to include them in the diff. This patch should
contain all of the middle end changes.

Cesar

>> 2016-11-22  Cesar Philippidis  
>>  Thomas Schwinge  
>>
>>  gcc/c-family/
>>  * c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
>>  * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
>>  and PRAGMA_OACC_CLAUSE_NOHOST.
>>
>>  gcc/
>>  * gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
>>  OMP_CLAUSE_NOHOST.
>>  (gimplify_adjust_omp_clauses): Likewise.
>>  * omp-low.c (scan_sharing_clauses): Likewise.
>>  (verify_oacc_routine_clauses): New function.
>>  (maybe_discard_oacc_function): New function.
>>  (execute_oacc_device_lower): Don't generate code for NOHOST.
>>  * omp-low.h (verify_oacc_routine_clauses): Declare.
>>  * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
>>  OMP_CLAUSE_NOHOST.
>>  * tree-pretty-print.c (dump_omp_clause): Likewise.
>>  * tree.c (omp_clause_num_ops): Likewise.
>>  (omp_clause_code_name): Likewise.
>>  (walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
>>  * tree.h (OMP_CLAUSE_BIND_NAME): Define.
> 
>   Jakub
> 

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c-family/
	* c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
	and PRAGMA_OACC_CLAUSE_NOHOST.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	(verify_oacc_routine_clauses): New function.
	(maybe_discard_oacc_function): New function.
	(execute_oacc_device_lower): Don't generate code for NOHOST.
	* omp-low.h (verify_oacc_routine_clauses): Declare.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (omp_clause_num_ops): Likewise.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
	* tree.h (OMP_CLAUSE_BIND_NAME): Define.


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 964efe9..4b8 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -322,7 +322,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_omp_declare_simd_attribute, false },
   { "simd",		  0, 1, true,  false, false,
 			  handle_simd_attribute, false },
-  { "omp declare target", 0, 0, true, false, false,
+  { "omp declare target", 0, -1, true, false, false,
 			  handle_omp_declare_target_attribute, false },
   { "omp declare target link", 0, 0, true, false, false,
 			  handle_omp_declare_target_attribute, false },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6d9cb08..dd2722a 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -149,6 +149,7 @@ enum pragma_omp_clause {
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC = PRAGMA_CILK_CLAUSE_VECTORLENGTH + 1,
   PRAGMA_OACC_CLAUSE_AUTO,
+  PRAGMA_OACC_CLAUSE_BIND,
   PRAGMA_OACC_CLAUSE_COPY,
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
@@ -158,6 +159,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8611060..04b591e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8373,6 +8373,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	  ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
 	}
@@ -9112,6 +9114,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  remove = true;
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
 	}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7c58c03..b8a414b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2201,6 +2201,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	install_var_local (decl, ctx);
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE__CACHE_:
 	default:
@@ -2365,6 +2367,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	case OMP_CLAUSE__GRIDDIM_:
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE_TILE

Re: formatting cleanups

2016-11-22 Thread Nathan Sidwell

On 11/22/2016 01:48 PM, Jakub Jelinek wrote:


When you are already changing this, the = should be on the next line.


done


--
Nathan Sidwell
2016-11-22  Nathan Sidwell  

	* array-notation-common.c (cilkplus_extract_an_trplets): Fix
	indentation and formatting.

Index: c-family/array-notation-common.c
===
--- c-family/array-notation-common.c	(revision 242719)
+++ c-family/array-notation-common.c	(working copy)
@@ -629,12 +629,12 @@ cilkplus_extract_an_triplets (vec

Re: [PATCH] OpenACC routines -- c++ front end

2016-11-22 Thread Cesar Philippidis
On 11/11/2016 03:43 PM, Cesar Philippidis wrote:
> Like it's c FE counterpart, this contains the following changes:
> 
>  * Updates c_parser_oacc_shape_clause to accept a location_t
>argument in order to make the diagnostics more precise.
> 
>  * Adds support for the bind and nohost clauses.
> 
>  * Adds more diagnostics for invalid acc routines.
> 
> Is this patch OK for trunk?

Here is the updated version of the c++ OpenACC routine patch. It's
mostly the same as before, but now cp_parser_oacc_shape_clause no has a
dummy cp_parser argument like its c FE counterpart.

Is this patch ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/cp/
	* cp-tree.h (bind_decls_match): Declare.
	* decl.c (bind_decls_match): New function.
	* parser.c (cp_parser_omp_clause_name): 
	(cp_parser_oacc_simple_clause): Remove unused cp_parser argument.
	(cp_parser_oacc_shape_clause): New location_t loc argument.  Use it
	to report more accurate diagnostics.  Remove parser argument.
	(cp_parser_oacc_clause_bind): New function.
	(cp_parser_oacc_all_clauses): Handle OpenACC bind and nohost clauses.
	Update calls to c_parser_oacc_{simple,shape}_clause.
	(OACC_ROUTINE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{BIND,NOHOST}.
	(cp_parser_oacc_routine): Update diagnostics.
	(cp_parser_late_parsing_oacc_routine): Likewise.
	(cp_finalize_oacc_routine): Likewise.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_{BIND,NOHOST}.


diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5674886..c9dbc4f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5785,6 +5785,7 @@ extern void finish_scope			(void);
 extern void push_switch(tree);
 extern void pop_switch(void);
 extern tree make_lambda_name			(void);
+extern int bind_decls_match			(tree, tree);
 extern int decls_match(tree, tree);
 extern tree duplicate_decls			(tree, tree, bool);
 extern tree declare_local_label			(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6893eae..09f9ffc 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -1198,6 +1198,138 @@ decls_match (tree newdecl, tree olddecl)
   return types_match;
 }
 
+/* Similiar to decls_match, but only applies to FUNCTION_DECLS.  Functions
+   in separate namespaces may match.
+*/
+
+int
+bind_decls_match (tree newdecl, tree olddecl)
+{
+  int types_match;
+
+  if (newdecl == olddecl)
+return 1;
+
+  if (TREE_CODE (newdecl) != TREE_CODE (olddecl))
+/* If the two DECLs are not even the same kind of thing, we're not
+   interested in their types.  */
+return 0;
+
+  gcc_assert (DECL_P (newdecl));
+  gcc_assert (TREE_CODE (newdecl) == FUNCTION_DECL);
+
+  tree f1 = TREE_TYPE (newdecl);
+  tree f2 = TREE_TYPE (olddecl);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  tree p2 = TYPE_ARG_TYPES (f2);
+  tree r2;
+
+  /* Specializations of different templates are different functions
+ even if they have the same type.  */
+  tree t1 = (DECL_USE_TEMPLATE (newdecl)
+	 ? DECL_TI_TEMPLATE (newdecl)
+	 : NULL_TREE);
+  tree t2 = (DECL_USE_TEMPLATE (olddecl)
+	 ? DECL_TI_TEMPLATE (olddecl)
+	 : NULL_TREE);
+  if (t1 != t2)
+return 0;
+
+  if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
+  && TREE_CODE (CP_DECL_CONTEXT (newdecl)) != NAMESPACE_DECL
+  && TREE_CODE (CP_DECL_CONTEXT (olddecl)) != NAMESPACE_DECL
+  && ! (DECL_EXTERN_C_P (newdecl)
+	&& DECL_EXTERN_C_P (olddecl)))
+return 0;
+
+  /* A new declaration doesn't match a built-in one unless it
+ is also extern "C".  */
+  if (DECL_IS_BUILTIN (olddecl)
+  && DECL_EXTERN_C_P (olddecl) && !DECL_EXTERN_C_P (newdecl))
+return 0;
+
+  if (TREE_CODE (f1) != TREE_CODE (f2))
+return 0;
+
+  /* A declaration with deduced return type should use its pre-deduction
+ type for declaration matching.  */
+  r2 = fndecl_declared_return_type (olddecl);
+
+  if (same_type_p (TREE_TYPE (f1), r2))
+{
+  if (!prototype_p (f2) && DECL_EXTERN_C_P (olddecl)
+	  && (DECL_BUILT_IN (olddecl)
+#ifndef NO_IMPLICIT_EXTERN_C
+	  || (DECL_IN_SYSTEM_HEADER (newdecl) && !DECL_CLASS_SCOPE_P (newdecl))
+	  || (DECL_IN_SYSTEM_HEADER (olddecl) && !DECL_CLASS_SCOPE_P (olddecl))
+#endif
+	  ))
+	{
+	  types_match = self_promoting_args_p (p1);
+	  if (p1 == void_list_node)
+	TREE_TYPE (newdecl) = TREE_TYPE (olddecl);
+	}
+#ifndef NO_IMPLICIT_EXTERN_C
+  else if (!prototype_p (f1)
+	   && (DECL_EXTERN_C_P (olddecl)
+		   && DECL_IN_SYSTEM_HEADER (olddecl)
+		   && !DECL_CLASS_SCOPE_P (olddecl))
+	   && (DECL_EXTERN_C_P (newdecl)
+		   && DECL_IN_SYSTEM_HEADER (newdecl)
+		   && !DECL_CLASS_SCOPE_P (newdecl)))
+	{
+	  types_match = self_promoting_args_p (p2);
+	  TREE_TYPE (newdecl) = TREE_TYPE (olddecl);
+	}
+#endif
+  else
+	types_match =
+	  compparms (p1, p2)
+	  && type_memfn_rqual (f1) == type_memfn_rqual (f2)
+	  && (TYPE_ATTRIBUTES (TREE_TYPE (newdecl)) == NULL_TREE
+	  || comp_type_attributes (TREE_TYPE (newdecl),
+   TREE_TYPE (olddecl)) != 0

Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-22 Thread Georg-Johann Lay

Denis Chertykov schrieb:

Do you have any objections, George ?


No, the last delta rev3 from 2016-11-10 looks fine to me.



2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi
:

Ping!

On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote:

Ping!

On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote:

On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:

On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:

I have updated patch to include the flash size as well. Took that
info from
device headers (it was fed into crt's device information note section
also).


The new option would render -mn-flash superfluous, but we should
keep it for
backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def
b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /*



 /* Classic, > 8K, <= 64K.  */
-AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1)
-AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB355__",0x0060, 0x0, 1)
-AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT76C711__",  0x0060, 0x0, 1)
+AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
+AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)
 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
 /* Classic + MOVW + JMP/CALL.  */


If at43usb320 is in the wrong multilib, then this should be handled as
separate issue / patch together with its own PR. Sorry for the confusion.  I
just noticed that some fields don't match...

It is not even clear to me from the data sheet if avr3 is the correct
multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it also has
MUL) as there is no reference to the exact instruction set -- Atmochip will
know.

Moreover, such a change should be sync'ed with avr-libc as all multilib
stuff is hand-wired there: no use of --print-foo meta information retrieval
by avr-libc :-((

I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 for
this one.


Thats better. I've attached the updated patch. If OK, could someone
commit please?

I'll try if I could find some more info for AT43USB320.

Regards,
Pitchumani







Re: [PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 05:36:38PM +0100, Uros Bizjak wrote:
> > Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
> > shall those be replaced with _mm512_setzero_si512 too?
> > Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.
> 
> Yes, please also remove these two.
> 
> Patch to replace them with _mm512_setzero_si512 is pre-approved.

Ok, here is what I've committed after another bootstrap/regtest on
x86_64-linux and i686-linux:

2016-11-22  Jakub Jelinek  

PR target/78451
* config/i386/avx512bwintrin.h (_mm512_setzero_qi,
_mm512_setzero_hi): Removed.
(_mm512_maskz_mov_epi16, _mm512_maskz_loadu_epi16,
_mm512_maskz_mov_epi8, _mm512_maskz_loadu_epi8,
_mm512_maskz_broadcastb_epi8, _mm512_maskz_set1_epi8,
_mm512_maskz_broadcastw_epi16, _mm512_maskz_set1_epi16,
_mm512_mulhrs_epi16, _mm512_maskz_mulhrs_epi16, _mm512_mulhi_epi16,
_mm512_maskz_mulhi_epi16, _mm512_mulhi_epu16,
_mm512_maskz_mulhi_epu16, _mm512_maskz_mullo_epi16,
_mm512_cvtepi8_epi16, _mm512_maskz_cvtepi8_epi16, _mm512_cvtepu8_epi16,
_mm512_maskz_cvtepu8_epi16, _mm512_permutexvar_epi16,
_mm512_maskz_permutexvar_epi16, _mm512_avg_epu8, _mm512_maskz_avg_epu8,
_mm512_maskz_add_epi8, _mm512_maskz_sub_epi8, _mm512_avg_epu16,
_mm512_maskz_avg_epu16, _mm512_subs_epi8, _mm512_maskz_subs_epi8,
_mm512_subs_epu8, _mm512_maskz_subs_epu8, _mm512_adds_epi8,
_mm512_maskz_adds_epi8, _mm512_adds_epu8, _mm512_maskz_adds_epu8,
_mm512_maskz_sub_epi16, _mm512_subs_epi16, _mm512_maskz_subs_epi16,
_mm512_subs_epu16, _mm512_maskz_subs_epu16, _mm512_maskz_add_epi16,
_mm512_adds_epi16, _mm512_maskz_adds_epi16, _mm512_adds_epu16,
_mm512_maskz_adds_epu16, _mm512_srl_epi16, _mm512_maskz_srl_epi16,
_mm512_packs_epi16, _mm512_sll_epi16, _mm512_maskz_sll_epi16,
_mm512_maddubs_epi16, _mm512_maskz_maddubs_epi16, _mm512_unpackhi_epi8,
_mm512_maskz_unpackhi_epi8, _mm512_unpackhi_epi16,
_mm512_maskz_unpackhi_epi16, _mm512_unpacklo_epi8,
_mm512_maskz_unpacklo_epi8, _mm512_unpacklo_epi16,
_mm512_maskz_unpacklo_epi16, _mm512_shuffle_epi8,
_mm512_maskz_shuffle_epi8, _mm512_min_epu16, _mm512_maskz_min_epu16,
_mm512_min_epi16, _mm512_maskz_min_epi16, _mm512_max_epu8,
_mm512_maskz_max_epu8, _mm512_max_epi8, _mm512_maskz_max_epi8,
_mm512_min_epu8, _mm512_maskz_min_epu8, _mm512_min_epi8,
_mm512_maskz_min_epi8, _mm512_max_epi16, _mm512_maskz_max_epi16,
_mm512_max_epu16, _mm512_maskz_max_epu16, _mm512_sra_epi16,
_mm512_maskz_sra_epi16, _mm512_srav_epi16, _mm512_maskz_srav_epi16,
_mm512_srlv_epi16, _mm512_maskz_srlv_epi16, _mm512_sllv_epi16,
_mm512_maskz_sllv_epi16, _mm512_maskz_packs_epi16, _mm512_packus_epi16,
_mm512_maskz_packus_epi16, _mm512_abs_epi8, _mm512_maskz_abs_epi8,
_mm512_abs_epi16, _mm512_maskz_abs_epi16, _mm512_dbsad_epu8,
_mm512_maskz_dbsad_epu8, _mm512_srli_epi16, _mm512_maskz_srli_epi16,
_mm512_slli_epi16, _mm512_maskz_slli_epi16, _mm512_shufflehi_epi16,
_mm512_maskz_shufflehi_epi16, _mm512_shufflelo_epi16,
_mm512_maskz_shufflelo_epi16, _mm512_srai_epi16,
_mm512_maskz_srai_epi16, _mm512_packs_epi32,
_mm512_maskz_packs_epi32, _mm512_packus_epi32,
_mm512_maskz_packus_epi32): Use _mm512_setzero_si512 instead of
_mm512_setzero_qi or _mm512_setzero_hi.
(_mm512_maskz_alignr_epi8, _mm512_dbsad_epu8,
_mm512_maskz_dbsad_epu8): Formatting fixes.
(_mm512_srli_epi16, _mm512_maskz_srli_epi16, _mm512_slli_epi16,
_mm512_maskz_slli_epi16, _mm512_shufflehi_epi16,
_mm512_maskz_shufflehi_epi16, _mm512_shufflelo_epi16,
_mm512_maskz_shufflelo_epi16, _mm512_srai_epi16,
_mm512_maskz_srai_epi16): Use _mm512_setzero_si512 instead of
_mm512_setzero_qi or _mm512_setzero_hi.

--- gcc/config/i386/avx512bwintrin.h.jj 2016-08-15 10:13:27.0 +0200
+++ gcc/config/i386/avx512bwintrin.h2016-11-22 18:18:04.664913960 +0100
@@ -42,30 +42,6 @@ typedef unsigned long long __mmask64;
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_qi (void)
-{
-  return __extension__ (__m512i)(__v64qi){ 0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0 };
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_hi (vo

[patch] boehm-gc removal and libobjc changes to build with an external bdw-gc

2016-11-22 Thread Matthias Klose
Re-posting this top-level, discussions and review happened in the GCJ removal
thread:

 - https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02069.html (last patch
   review).
 - https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00387.html (first patch
   sent)
 - https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00290.html (OK by Jeff
   Law to remove boehm-gc)

Afaiu, it needs an ok by a global reviewer, and maybe an libobjc maintainer (?).

Matthias



2016-11-19  Matthias Klose  

* Makefile.def: Remove reference to boehm-gc target module.
* configure.ac: Include pkg.m4, check for --with-target-bdw-gc
options and for the bdw-gc pkg-config module.
* configure: Regenerate.
* Makefile.in: Regenerate.

gcc/

2016-11-19  Matthias Klose  

* doc/install.texi: Document configure options --enable-objc-gc
and --with-target-bdw-gc.

config/

2016-11-19  Matthias Klose  

* pkg.m4: New file.

libobjc/

2016-11-19  Matthias Klose  

* configure.ac (--enable-objc-gc): Allow to configure with a
system provided boehm-gc.
* configure: Regenerate.
* Makefile.in (OBJC_BOEHM_GC_LIBS): Get value from configure.
* gc.c: Include system bdw-gc headers.
* memory.c: Likewise
* objects.c: Likewise

boehm-gc/

2016-11-19  Matthias Klose  

Remove




2016-11-19  Matthias Klose  

	* Makefile.def: Remove reference to boehm-gc target module.
  	* configure.ac: Include pkg.m4, check for --with-target-bdw-gc
	options and for the bdw-gc pkg-config module.
	* configure: Regenerate.
	* Makefile.in: Regenerate.

gcc/

2016-11-19  Matthias Klose  

	* doc/install.texi: Document configure options --enable-objc-gc
	and --with-target-bdw-gc.

config/

2016-11-19  Matthias Klose  

	* pkg.m4: New file.

libobjc/

2016-11-19  Matthias Klose  

	* configure.ac (--enable-objc-gc): Allow to configure with a
	system provided boehm-gc.
	* configure: Regenerate.
	* Makefile.in (OBJC_BOEHM_GC_LIBS): Get value from configure.
	* gc.c: Include system bdw-gc headers.
	* memory.c: Likewise
	* objects.c: Likewise

boehm-gc/

2016-11-19  Matthias Klose  

	Remove

Index: Makefile.def
===
--- Makefile.def	(revision 242721)
+++ Makefile.def	(working copy)
@@ -166,7 +166,6 @@
 target_modules = { module= libgloss; no_check=true; };
 target_modules = { module= libffi; no_install=true; };
 target_modules = { module= zlib; };
-target_modules = { module= boehm-gc; };
 target_modules = { module= rda; };
 target_modules = { module= libada; };
 target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; };
@@ -543,7 +542,6 @@
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
 
-dependencies = { module=configure-target-boehm-gc; on=all-target-libstdc++-v3; };
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
 dependencies = { module=configure-target-libgo; on=configure-target-libffi; };
@@ -551,8 +549,6 @@
 dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
 dependencies = { module=all-target-libgo; on=all-target-libffi; };
 dependencies = { module=all-target-libgo; on=all-target-libatomic; };
-dependencies = { module=configure-target-libobjc; on=configure-target-boehm-gc; };
-dependencies = { module=all-target-libobjc; on=all-target-boehm-gc; };
 dependencies = { module=configure-target-libstdc++-v3; on=configure-target-libgomp; };
 dependencies = { module=configure-target-liboffloadmic; on=configure-target-libgomp; };
 dependencies = { module=configure-target-libsanitizer; on=all-target-libstdc++-v3; };
Index: config/pkg.m4
===
--- config/pkg.m4	(nonexistent)
+++ config/pkg.m4	(working copy)
@@ -0,0 +1,825 @@
+dnl pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
+dnl serial 11 (pkg-config-0.29)
+dnl
+dnl Copyright © 2004 Scott James Remnant .
+dnl Copyright © 2012-2015 Dan Nicholson 
+dnl
+dnl This program is free software; you can redistribute it and/or modify
+dnl it under the terms of the GNU General Public License as published by
+dnl the Free Software Foundation; either version 2 of the License, or
+dnl (at your option) any later version.
+dnl
+dnl This program is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+dnl General Public License for more details.
+dnl
+dnl You should have received a copy of the GNU General Public License
+dnl along with this program; if not, write to the Free Software
+dnl Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+dnl 02111-1307, USA.
+dnl
+dnl As a special exception to the GNU General Public License, if you
+dnl distribute thi

[C++ PATCH] Fix ICE during VEC_INIT_EXPR gimplification (PR c++/77739)

2016-11-22 Thread Jakub Jelinek
Hi!

As mentioned in the PR, we ICE because part of the body is genericized
twice and each time it wraps is_invisiref_parm RESULT_DECL (in this case,
could be also PARM_DEC) into REFERENCE_REF_P INDIRECT_REF.
The first time it is desirable, but when done again during VEC_INIT_EXPR
gimplification which calls cp_genericize_tree again, it is undesirable.

The following patch fixes it by only wrapping the invisiref parms/result
during the first cp_genericize_tree when the whole function is genericized.
I'd expect that any references to invisiref parms/result should be only
present in the VEC_INIT_EXPR arguments (which should be genericized already)
and that build_vec_init shouldn't create new ones out of the air.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

As mentioned in the PR, another option I see is special case
REFERENCE_REF_P INDIRECT_REFs and MEM_REFs into which they are gimplified
in cp_genericize_r by not changing is_invisiref_parm decls if they are
already wrapped in those.

2016-11-22  Jakub Jelinek  

PR c++/77739
* cp-gimplify.c (cp_gimplify_tree) : Pass
false as handle_invisiref_parm_p to cp_genericize_tree.
(struct cp_genericize_data): Add handle_invisiref_parm_p field.
(cp_genericize_r): Don't wrap is_invisiref_parm into references
if !wtd->handle_invisiref_parm_p.
(cp_genericize_tree): Add handle_invisiref_parm_p argument,
set wtd.handle_invisiref_parm_p to it.
(cp_genericize): Pass true as handle_invisiref_parm_p to
cp_genericize_tree.  Formatting fix.

* g++.dg/cpp1y/pr77739.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2016-11-15 16:18:49.0 +0100
+++ gcc/cp/cp-gimplify.c2016-11-22 19:12:07.606813783 +0100
@@ -38,7 +38,7 @@ along with GCC; see the file COPYING3.
 
 static tree cp_genericize_r (tree *, int *, void *);
 static tree cp_fold_r (tree *, int *, void *);
-static void cp_genericize_tree (tree*);
+static void cp_genericize_tree (tree*, bool);
 static tree cp_fold (tree);
 
 /* Local declarations.  */
@@ -623,7 +623,7 @@ cp_gimplify_expr (tree *expr_p, gimple_s
  tf_warning_or_error);
hash_set pset;
cp_walk_tree (expr_p, cp_fold_r, &pset, NULL);
-   cp_genericize_tree (expr_p);
+   cp_genericize_tree (expr_p, false);
ret = GS_OK;
input_location = loc;
   }
@@ -995,6 +995,7 @@ struct cp_genericize_data
   struct cp_genericize_omp_taskreg *omp_ctx;
   tree try_block;
   bool no_sanitize_p;
+  bool handle_invisiref_parm_p;
 };
 
 /* Perform any pre-gimplification folding of C++ front end trees to
@@ -,7 +1112,7 @@ cp_genericize_r (tree *stmt_p, int *walk
 }
 
   /* Otherwise, do dereference invisible reference parms.  */
-  if (is_invisiref_parm (stmt))
+  if (wtd->handle_invisiref_parm_p && is_invisiref_parm (stmt))
 {
   *stmt_p = convert_from_reference (stmt);
   *walk_subtrees = 0;
@@ -1511,7 +1512,7 @@ cp_genericize_r (tree *stmt_p, int *walk
 /* Lower C++ front end trees to GENERIC in T_P.  */
 
 static void
-cp_genericize_tree (tree* t_p)
+cp_genericize_tree (tree* t_p, bool handle_invisiref_parm_p)
 {
   struct cp_genericize_data wtd;
 
@@ -1520,6 +1521,7 @@ cp_genericize_tree (tree* t_p)
   wtd.omp_ctx = NULL;
   wtd.try_block = NULL_TREE;
   wtd.no_sanitize_p = false;
+  wtd.handle_invisiref_parm_p = handle_invisiref_parm_p;
   cp_walk_tree (t_p, cp_genericize_r, &wtd, NULL);
   delete wtd.p_set;
   wtd.bind_expr_stack.release ();
@@ -1639,12 +1641,12 @@ cp_genericize (tree fndecl)
   /* Expand all the array notations here.  */
   if (flag_cilkplus 
   && contains_array_notation_expr (DECL_SAVED_TREE (fndecl)))
-DECL_SAVED_TREE (fndecl) = 
-  expand_array_notation_exprs (DECL_SAVED_TREE (fndecl));
+DECL_SAVED_TREE (fndecl)
+  = expand_array_notation_exprs (DECL_SAVED_TREE (fndecl));
 
   /* We do want to see every occurrence of the parms, so we can't just use
  walk_tree's hash functionality.  */
-  cp_genericize_tree (&DECL_SAVED_TREE (fndecl));
+  cp_genericize_tree (&DECL_SAVED_TREE (fndecl), true);
 
   if (flag_sanitize & SANITIZE_RETURN
   && do_ubsan_in_current_function ())
--- gcc/testsuite/g++.dg/cpp1y/pr77739.C.jj 2016-11-22 19:15:02.182659407 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/pr77739.C2016-11-22 19:13:37.0 
+0100
@@ -0,0 +1,15 @@
+// PR c++/77739
+// { dg-do compile { target c++14 } }
+
+struct A {
+  A();
+  A(const A &);
+};
+struct B {
+  B();
+  template  auto g(Args &&... p1) {
+return [=] { f(p1...); };
+  }
+  void f(A, const char *);
+};
+B::B() { g(A(), ""); }

Jakub


[PATCH] PR fortran/78479 -- allocate a charlen

2016-11-22 Thread Steve Kargl
The patch and ChangeLog shuod be sufficient to explain the change.
Regression tested on x86_64-*-freebsd.  OK to commit?

2016-11-22  Steven G. Kargl  

PR fortran/78479
* expr.c (gfc_apply_init):  Allocate a charlen if needed.

2016-11-22  Steven G. Kargl  

PR fortran/78479
* gfortran.dg/pr78479.f90: New test.

-- 
Steve
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c	(revision 242638)
+++ gcc/fortran/expr.c	(working copy)
@@ -4132,7 +4132,12 @@ gfc_apply_init (gfc_typespec *ts, symbol
 {
   gfc_set_constant_character_len (len, ctor->expr,
   has_ts ? -1 : first_len);
-  ctor->expr->ts.u.cl->length = gfc_copy_expr (ts->u.cl->length);
+		  if (!ctor->expr->ts.u.cl)
+		ctor->expr->ts.u.cl
+		  = gfc_new_charlen (gfc_current_ns, ts->u.cl);
+		  else
+ctor->expr->ts.u.cl->length
+		  = gfc_copy_expr (ts->u.cl->length);
 }
 }
 }
Index: gcc/testsuite/gfortran.dg/pr78479.f90
===
--- gcc/testsuite/gfortran.dg/pr78479.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78479.f90	(working copy)
@@ -0,0 +1,6 @@
+! { dg-do compile }
+program p
+   type t
+  character(3) :: c(1) = 'a' // ['b']
+   end type
+end


libgo patch committed: Don't check standard packages in go tool with gccgo

2016-11-22 Thread Ian Lance Taylor
When using the go tool with gccgo, we can't check for whether the
standard packages are up to date, because we can't assume that the
source code is available.  And we can't read
runtime/internal/sys/zversion.go, because that too is not generally
available.  This was fixed in the gc repository with
https://golang.org/cl/33295.  This patch simply brings that change
over to gccgo.  This fixes GCC PR 77910.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242715)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-7593cc83a03999331c5e2dc65a9306c5fe57dfd0
+e66f30e862cb5d02b9d55bf44ac439bb8fc4ea19
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/pkg.go
===
--- libgo/go/cmd/go/pkg.go  (revision 242581)
+++ libgo/go/cmd/go/pkg.go  (working copy)
@@ -523,6 +523,11 @@ func disallowInternal(srcDir string, p *
return p
}
 
+   // We can't check standard packages with gccgo.
+   if buildContext.Compiler == "gccgo" && p.Standard {
+   return p
+   }
+
// The stack includes p.ImportPath.
// If that's the only thing on the stack, we started
// with a name given on the command line, not an
@@ -1588,7 +1593,7 @@ func computeBuildID(p *Package) {
// Include the content of runtime/internal/sys/zversion.go in the hash
// for package runtime. This will give package runtime a
// different build ID in each Go release.
-   if p.Standard && p.ImportPath == "runtime/internal/sys" {
+   if p.Standard && p.ImportPath == "runtime/internal/sys" && 
buildContext.Compiler != "gccgo" {
data, err := ioutil.ReadFile(filepath.Join(p.Dir, 
"zversion.go"))
if err != nil {
fatalf("go: %s", err)


Re: [PATCH] PR fortran/78479 -- allocate a charlen

2016-11-22 Thread Janus Weil
Hi Steve,

> The patch and ChangeLog shuod be sufficient to explain the change.
> Regression tested on x86_64-*-freebsd.  OK to commit?

the patch itself looks good.

For the test case, I'd prefer a somewhat more meaningful name (e.g.
char_component_initializer_3.f90 or similar) and a mention of the PR
number in a comment inside the test case.

Thanks,
Janus



> 2016-11-22  Steven G. Kargl  
>
> PR fortran/78479
> * expr.c (gfc_apply_init):  Allocate a charlen if needed.
>
> 2016-11-22  Steven G. Kargl  
>
> PR fortran/78479
> * gfortran.dg/pr78479.f90: New test.
>
> --
> Steve


Re: [PATCH 6/9] Split class rtx_reader into md_reader vs rtx_reader

2016-11-22 Thread Richard Sandiford
Sorry, only just realised that this one hadn't been approved as
part of the earlier series.

David Malcolm  writes:
> gcc/ChangeLog:
>   * genpreds.c (write_tm_constrs_h): Update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (write_tm_preds_h): Likewise.
>   (write_insn_preds_c): Likewise.
>   * read-md.c (rtx_reader_ptr): Rename to...
>   (md_reader_ptr): ...this, and convert from an
>   rtx_reader * to a md_reader *.
>   (rtx_reader::set_md_ptr_loc): Rename to...
>   (md_reader::set_md_ptr_loc): ...this.
>   (rtx_reader::get_md_ptr_loc): Rename to...
>   (md_reader::get_md_ptr_loc): ...this.
>   (rtx_reader::copy_md_ptr_loc): Rename to...
>   (md_reader::copy_md_ptr_loc): ...this.
>   (rtx_reader::fprint_md_ptr_loc): Rename to...
>   (md_reader::fprint_md_ptr_loc): ...this.
>   (rtx_reader::print_md_ptr_loc): Rename to...
>   (md_reader::print_md_ptr_loc): ...this.
>   (rtx_reader::join_c_conditions): Rename to...
>   (md_reader::join_c_conditions): ...this.
>   (rtx_reader::fprint_c_condition): ...this.
>   (rtx_reader::print_c_condition): Rename to...
>   (md_reader::print_c_condition): ...this.
>   (fatal_with_file_and_line):  Update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::require_char): Rename to...
>   (md_reader::require_char): ...this.
>   (rtx_reader::require_char_ws): Rename to...
>   (md_reader::require_char_ws): ...this.
>   (rtx_reader::require_word_ws): Rename to...
>   (md_reader::require_word_ws): ...this.
>   (rtx_reader::read_char): Rename to...
>   (md_reader::read_char): ...this.
>   (rtx_reader::unread_char): Rename to...
>   (md_reader::unread_char): ...this.
>   (rtx_reader::peek_char): Rename to...
>   (md_reader::peek_char): ...this.
>   (rtx_reader::read_name): Rename to...
>   (md_reader::read_name): ...this.
>   (rtx_reader::read_escape): Rename to...
>   (md_reader::read_escape): ...this.
>   (rtx_reader::read_quoted_string): Rename to...
>   (md_reader::read_quoted_string): ...this.
>   (rtx_reader::read_braced_string): Rename to...
>   (md_reader::read_braced_string): ...this.
>   (rtx_reader::read_string): Rename to...
>   (md_reader::read_string): ...this.
>   (rtx_reader::read_skip_construct): Rename to...
>   (md_reader::read_skip_construct): ...this.
>   (rtx_reader::handle_constants): Rename to...
>   (md_reader::handle_constants): ...this.
>   (rtx_reader::traverse_md_constants): Rename to...
>   (md_reader::traverse_md_constants): ...this.
>   (rtx_reader::handle_enum): Rename to...
>   (md_reader::handle_enum): ...this.
>   (rtx_reader::lookup_enum_type): Rename to...
>   (md_reader::lookup_enum_type): ...this.
>   (rtx_reader::traverse_enum_types): Rename to...
>   (md_reader::traverse_enum_types): ...this.
>   (rtx_reader::rtx_reader): Rename to...
>   (md_reader::md_reader): ...this, and update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::~rtx_reader): Rename to...
>   (md_reader::~md_reader): ...this, and update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::handle_include): Rename to...
>   (md_reader::handle_include): ...this.
>   (rtx_reader::handle_file): Rename to...
>   (md_reader::handle_file): ...this.
>   (rtx_reader::handle_toplevel_file): Rename to...
>   (md_reader::handle_toplevel_file): ...this.
>   (rtx_reader::get_current_location): Rename to...
>   (md_reader::get_current_location): ...this.
>   (rtx_reader::add_include_path): Rename to...
>   (md_reader::add_include_path): ...this.
>   (rtx_reader::read_md_files): Rename to...
>   (md_reader::read_md_files): ...this.
>   * read-md.h (class rtx_reader): Split into...
>   (class md_reader): ...new class.
>   (rtx_reader_ptr): Rename to...
>   (md_reader_ptr): ...this, and convert to a md_reader *.
>   (class noop_reader): Update base class to be md_reader.
>   (class rtx_reader): Reintroduce as a subclass of md_reader.
>   (rtx_reader_ptr): Reintroduce as a rtx_reader *.
>   (read_char): Update for renaming of rtx_reader_ptr to
>   md_reader_ptr.
>   (unread_char): Likewise.
>   * read-rtl.c (rtx_reader_ptr): New global.
>   (rtx_reader::apply_iterator_to_string): Rename to...
>   (md_reader::apply_iterator_to_string): ...this.
>   (rtx_reader::copy_rtx_for_iterators): Rename to...
>   (md_reader::copy_rtx_for_iterators): ...this.
>   (rtx_reader::read_conditions): Rename to...
>   (md_reader::read_conditions): ...this.
>   (rtx_reader::record_potential_iterator_use): Rename to...
>   (md_reader::record_potential_iterator_use): ...this.
>   (rtx_reader::read_mapping): Rename to...
>   (md_reader::read_mapping): ...this.
>   

  1   2   >