[PATCH] Fix PR81505

2017-07-25 Thread Richard Biener

The following makes fold_neagte_const properly handle TREE_OVERFLOW as
sticky (even if present on POINTER_TYPE constants).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-06-25  Richard Biener  

PR middle-end/81505
* fold-const.c (fold_negate_const): TREE_OVERFLOW should be
sticky.

* gcc.dg/ubsan/pr81505.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 250386)
+++ gcc/fold-const.c(working copy)
@@ -13693,8 +13693,8 @@ fold_negate_const (tree arg0, tree type)
bool overflow;
wide_int val = wi::neg (arg0, &overflow);
t = force_fit_type (type, val, 1,
-   (overflow | TREE_OVERFLOW (arg0))
-   && !TYPE_UNSIGNED (type));
+   (overflow && ! TYPE_UNSIGNED (type))
+   || TREE_OVERFLOW (arg0));
break;
   }
 
Index: gcc/testsuite/gcc.dg/ubsan/pr81505.c
===
--- gcc/testsuite/gcc.dg/ubsan/pr81505.c(nonexistent)
+++ gcc/testsuite/gcc.dg/ubsan/pr81505.c(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fsanitize=signed-integer-overflow" } */
+
+int a, b, c, h;
+
+int i[5][5];
+
+void
+fn1 ()
+{
+  int l = 0;
+
+  for (a = 0; a <= 3; a++)
+for (b = 1; b >= 0; b -= 1)
+  l |= i[0][b];
+  c = l;
+}


Re: [PATCH, committed] Fix PR81162

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 7:40 PM, Bill Schmidt
 wrote:
> Hi Richard,
>
> I would like to backport this fix to GCC 5, 6, and 7.  All have passed 
> regstrap on
> powerpc64le-linux-gnu (with the test case moved to gcc.dg/ubsan, of course).
> Is this ok?

Yes.

Richard.

> Thanks!
>
> -- Bill
>
>
>> On Jul 14, 2017, at 1:05 PM, Bill Schmidt  
>> wrote:
>>
>> Hi,
>>
>> PR81162 identifies a bug in SLSR involving overflow that occurs when
>> replacing a NEGATE_EXPR with a PLUS_EXPR.  This is another example
>> of an unprofitable transformation that should be skipped anyway,
>> hence this simple patch.  Bootstrapped and tested on
>> powerpc64le-unknown-linux-gnu, committed.  Test case provided from
>> the bug report.
>>
>> Thanks,
>> Bill
>>
>>
>> [gcc]
>>
>> 2016-07-14  Bill Schmidt  
>>
>>   PR tree-optimization/81162
>>   * gimple-ssa-strength-reduction.c (replace_mult_candidate): Don't
>>   replace a negate with an add.
>>
>> [gcc/testsuite]
>>
>> 2016-07-14  Bill Schmidt  
>>
>>   PR tree-optimization/81162
>>   * gcc.dg/pr81162.c: New file.
>>
>>
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===
>> --- gcc/gimple-ssa-strength-reduction.c   (revision 250189)
>> +++ gcc/gimple-ssa-strength-reduction.c   (working copy)
>> @@ -2082,13 +2082,14 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>  types but allows for safe negation without twisted logic.  */
>>   if (wi::fits_shwi_p (bump)
>>   && bump.to_shwi () != HOST_WIDE_INT_MIN
>> -  /* It is not useful to replace casts, copies, or adds of
>> +  /* It is not useful to replace casts, copies, negates, or adds of
>>an SSA name and a constant.  */
>>   && cand_code != SSA_NAME
>>   && !CONVERT_EXPR_CODE_P (cand_code)
>>   && cand_code != PLUS_EXPR
>>   && cand_code != POINTER_PLUS_EXPR
>> -  && cand_code != MINUS_EXPR)
>> +  && cand_code != MINUS_EXPR
>> +  && cand_code != NEGATE_EXPR)
>> {
>>   enum tree_code code = PLUS_EXPR;
>>   tree bump_tree;
>> Index: gcc/testsuite/gcc.dg/pr81162.c
>> ===
>> --- gcc/testsuite/gcc.dg/pr81162.c(nonexistent)
>> +++ gcc/testsuite/gcc.dg/pr81162.c(working copy)
>> @@ -0,0 +1,17 @@
>> +/* PR tree-optimization/81162 */
>> +/* { dg-do run } */
>> +/* { dg-options "-fsanitize=undefined -O2" } */
>> +
>> +short s;
>> +int i1 = 1;
>> +int i2 = 1;
>> +unsigned char uc = 147;
>> +
>> +int main() {
>> +  s = (-uc + 2147483647) << 0;
>> +  if (9031239389974324562ULL >= (-((i1 && i2) + uc) ^ -21096) ) {
>> +return 0;
>> +  } else {
>> +return -1;
>> +  }
>> +}
>>
>


Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 9:30 PM, Aldy Hernandez  wrote:
> On Mon, Jul 17, 2017 at 6:23 AM, Richard Biener
>  wrote:
>> On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez  wrote:
>
>>> How does this look?
>>
>> It's a change that on its own doesn't look worthwhile to me.
>>
>> So please post the changes that will build ontop of this.  Like removing
>> anti-ranges from VRP or your on-demand value-range stuff.
>>
>> Thanks,
>> Richard.
>
> From the looks of it, we can have a variety of VRP ranges that are not
> representable at all with the an integer range class.  For instance, I
> see the following ranges built in set_value_range():
>
> [INT, (nop_expr SSA)]
>
> [INT, (plus_expr SSA INT)]
>
> [(negate_expr SSA), (negate_expr SSA)]
>
> [(plus_expr (negate_expr SSA INT)),
>  (plus_expr (negate_expr SSA) INT)]
>
> [SSA, SSA]
>
> So...I guess the first suggestion is out of the question ;-).

Well.  We do not want range operations implemented twice (really!)
so range operations need to work on both representations.  So we
should think of a way to get rid of anti-ranges in VRP which, frankly,
means that for the sake symbolic ranges we have to trade them
with handling open/closed ranges which I'm not sure will be less of
a hassle to handle?

Richard.

> Aldy


Re: [PATCH] unitialized memory access vs BIT_INSERT_EXPR

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 8:24 PM, Andrew Pinski  wrote:
> Hi,
>   Due to the way bit-field access lower is done, we can get an
> unitialized memory load and this causes the unitialized warning to
> kick in.
> The case we have is:
> temp_1 = BIT_FIELD_REF
> temp_2 = BIT_INSERT
> BIT_FIELD_REF = temp_2
>
> What this patch does is similar to what was done for the case of
> BIT_INSERT for unitialized ssa names (default) but for memory
> accesses.
> Note also tested with the bit-field lower pass added; this and the
> fold-const.c/tree-ssa-sccvn.c patch (which was just committed) are
> requirements to the lower pass.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Ok.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
> * tree-ssa-uninit.c (warn_uninitialized_vars): Don't warn about memory
> accesses where the use is for the first operand of a BIT_INSERT.


Re: [PATCH v12] add -fpatchable-function-entry=N,M option

2017-07-25 Thread Maxim Kuvyrkov
> On Jul 20, 2017, at 4:12 PM, Maxim Kuvyrkov  wrote:
> 
>> On Jul 20, 2017, at 3:06 PM, Torsten Duwe  wrote:
>> 
>> On Thu, Jul 20, 2017 at 01:58:06PM +0300, Maxim Kuvyrkov wrote:
 On Jul 17, 2017, at 3:10 PM, Torsten Duwe  wrote:
 
 What is the next step now? Is anybody going to commit that patch?
>>> 
>>> Torsten, if you prefer I can commit your patch (after bootstrap and a 
>>> regtest on aarch64-linux-gnu and x86_64-linux-gnu).
>> 
>> Ok, who is going to do that?
> 
> I'll do that, np.

I confirmed that bootstrap and reg test on x86_64-linux-gnu and 
aarch64-linux-gnu are OK.

> 
>> 
>>> There is one last hurdle to sort out with the FSF copyright assignment.  
>>> Looking at FSF's list I don't see SuSe's copyright assignment for your 
>>> contributions to FSF projects, of which GCC is one.  You can either request 
>>> SuSe to file a copyright assignment for you, or, probably simpler, ask one 
>>> of your SuSe colleagues (Richard B., Jan H.) to commit under their 
>>> assignment.
>> 
>> From my understanding, it should not make a difference, as the copyright
>> owner is identical (SUSE). But I'm not going to argue on legal issues here.
>> Let's first see what the test outcome is. Thanks so far!
> 
> SuSe/Novell choose to do its assignments per engineer, not for the whole 
> company, so I don't think you are covered as is.

Torsten, any update on which way you are going to handle copyright assignment 
-- apply for a new one or have one of your SuSe colleagues commit under theirs?

--
Maxim Kuvyrkov
www.linaro.org




Re: Deprecate DBX/stabs?

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 4:15 PM, David Edelsohn  wrote:
>> Nathan Sidwell writes:
>
>> Let's at least deprecate it.  I attach a patch to do so.  With the
>> patch, you'll get a note about dbx being deprecated whenever you use
>> stabs debugging on a system that prefers stabs (thus both -g and -gstabs
>> will warn).  On systems where stabs is not preferred, -gstabs will not
>> give you a warning.  The patch survices an x86_64-linux bootstrap.
>
> Absolutely not.
>
> AIX still uses DBX as the primary debugging format.  AIX supports
> DWARF but the AIX toolchain does not fully interoperate with DWARF
> generated by GCC.
>
> With the extensive use of DBX by AIX and regular patches from me to
> fix xcoff stabs debugging, omitting me from the cc list implies that
> you really haven't done your homework.

The proposal was to define DBX_DEBUG_OK on such targets so their
users do not get the deprecation warning.

The idea is to remove support for STABS from targets that have well
established DWARF support (like xyz-linux).

Of course I would like complete STABS deprecation more but I knew
that at least AIX stands in the way here ;)

Richard.


> Thanks, David


Re: [PATCH] dwarf2asm.c for AIX

2017-07-25 Thread Richard Biener
On Sun, Jul 23, 2017 at 5:10 AM, David Edelsohn  wrote:
> dw2_asm_output_nstring() essentially inlines the version of
> OUTPUT_ASCII for the GNU Assembler ELF file format for the debug_asm
> case to append the comment and calls OUTPUT_ASCII when not debugging.
> The DWARF portion of the GCC testsuite relies on the text that appears
> in the comments, so it needs to work somewhat on AIX.
>
> The function previously had been adjusted to emit the AIX assembler
> .byte directive.  The AIX assembler escapes double quotes with another
> double quote, not a backslash.  The AIX toolchain added support for
> the debug_macinfo section and macros with quotes generate assembler
> errors when debug_asm is enabled.
>
> This patch adjusts the function to emit a double quote instead of a
> backslash when XCOFF_DEBUGGING_INFO is defined for AIX.
>
> Bootstrapped on powerpc-ibm-aix7.2.0.0
>
> Okay?

Ok.

Richard.

>
> Thanks, David
>
> Index: dwarf2asm.c
> ===
> --- dwarf2asm.c (revision 250453)
> +++ dwarf2asm.c (working copy)
> @@ -345,7 +345,9 @@ dw2_asm_output_nstring (const char *str, size_t or
>for (i = 0; i < len; i++)
> {
>   int c = str[i];
> - if (c == '\"' || c == '\\')
> + if (c == '\"')
> +   fputc (XCOFF_DEBUGGING_INFO ? '\"' : '\\', asm_out_file);
> + else if (c == '\\')
> fputc ('\\', asm_out_file);
>   if (ISPRINT (c))
> fputc (c, asm_out_file);


Re: [patch] Ad PR81487: More asprintf -> xasprintf replacements

2017-07-25 Thread Richard Biener
On Mon, Jul 24, 2017 at 10:19 AM, Georg-Johann Lay  wrote:
> Hi, as proposed in
>
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01294.html
>
> this patch does more asprintf -> xasprintf replacements.
>
> Bootstrapped + reg-tested on x86_64-linux.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> Johann
>
> gcc/
> PR 81487
> * hsa-brig.c (brig_init): Use xasprintf instead of asprintf.
> * gimple-pretty-print.c (dump_profile, dump_probability): Same.
> * tree-ssa-structalias.c (alias_get_name): Same.
>


Re: [PATCH GCC]Make pointer overflow always undefined and remove the macro

2017-07-25 Thread Richard Biener
On Mon, Jul 24, 2017 at 10:43 AM, Bin Cheng  wrote:
> Hi,
> This is a followup patch to PR81388's fix.  According to Richi,
> POINTER_TYPE_OVERFLOW_UNDEFINED was added in -fstrict-overflow
> warning work.  Given:
>   A) strict-overflow was removed;
>   B) memory object can not wrap in address space;
>   C) existing code doesn't take it in consideration, as in nowrap_type_p.
> This patch makes it always true thus removes definition/usage of the macro.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Please give others 24h to comment.

Thanks,
Richard.

> Thanks,
> bin
> 2017-07-20  Bin Cheng  
>
> * tree.h (POINTER_TYPE_OVERFLOW_UNDEFINED): Delete.
> * fold-const.c (fold_comparison, fold_binary_loc): Delete use of
> above macro.
> * match.pd: Ditto in address comparison pattern.
>
> gcc/testsuite/ChangeLog
> 2017-07-20  Bin Cheng  
>
> * gcc.dg/no-strict-overflow-7.c: Revise comment and test string.
> * gcc.dg/tree-ssa/pr81388-1.c: Ditto.


Re: [PATCH TEST]Require vect_perm in gcc.dg/vect/pr80815-3.c

2017-07-25 Thread Richard Biener
On Mon, Jul 24, 2017 at 10:44 AM, Bin Cheng  wrote:
> Hi,
> The test has negative step in memory access, thus can't be vectorized on
> target like sparc-sun-solaris2.12.  This patch adds vect_perm requirement
> for it.  Test result checked.  Is it OK?


Ok.

> Thanks,
> bin
> gcc/testsuite/ChangeLog
> 2017-07-20  Bin Cheng  
>
> * gcc.dg/vect/pr80815-3.c: Require vect_perm.


Re: [PATCH][AArch64] vec_pack_trunc_ should split after register allocator

2017-07-25 Thread Hurugalawadi, Naveen
Hi,

>> I think we can split this whenever we like, and
>> that there isn't any benefit in keeping the pair together?

Thanks for the review and your views.

The patch is modified as per your suggestion.

Please review the patch and let me know if its okay?

Bootstrapped and Regression done on AArch64-Thunder-Linux.

Thanks,
Naveen   diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 1cb6eeb..a41edad 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1291,6 +1291,18 @@
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
+(define_insn "aarch64_simd_vec_pack_trunc_hi_"
+ [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+	 (truncate: (match_operand:VQN 1 "register_operand" "w"))
+	 (vec_select:
+	   (match_operand: 3 "register_operand" "0")
+	   (match_operand: 2 "vect_par_cnst_hi_half" ""]
+ "TARGET_SIMD"
+ "xtn2\\t%0., %1."
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
+
 (define_expand "vec_pack_trunc_"
  [(match_operand: 0 "register_operand" "")
   (match_operand:VDN 1 "register_operand" "")
@@ -1309,20 +1321,39 @@
 
 ;; For quads.
 
-(define_insn "vec_pack_trunc_"
- [(set (match_operand: 0 "register_operand" "=&w")
+(define_expand "vec_pack_trunc_"
+ [(set (match_operand: 0 "register_operand" "")
(vec_concat:
-	 (truncate: (match_operand:VQN 1 "register_operand" "w"))
-	 (truncate: (match_operand:VQN 2 "register_operand" "w"]
+	 (truncate: (match_operand:VQN 1 "register_operand" ""))
+	 (truncate: (match_operand:VQN 2 "register_operand" ""]
  "TARGET_SIMD"
  {
if (BYTES_BIG_ENDIAN)
- return "xtn\\t%0., %2.\;xtn2\\t%0., %1.";
+ {
+   rtx low_part = gen_lowpart (mode, operands[0]);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
+			  operands[2]));
+   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
+			true);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
+			 operands[1],
+			 high_part,
+			 operands[0]));
+ }
else
- return "xtn\\t%0., %1.\;xtn2\\t%0., %2.";
+ {
+   rtx low_part = gen_lowpart (mode, operands[0]);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
+			  operands[1]));
+   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
+			true);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
+			 operands[2],
+			 high_part,
+			 operands[0]));
+ }
+   DONE;
  }
-  [(set_attr "type" "multiple")
-   (set_attr "length" "8")]
 )
 
 ;; Widening operations.


Re: [PATCH v12] add -fpatchable-function-entry=N,M option

2017-07-25 Thread Torsten Duwe
On Tue, Jul 25, 2017 at 10:12:00AM +0300, Maxim Kuvyrkov wrote:
> 
> I confirmed that bootstrap and reg test on x86_64-linux-gnu and 
> aarch64-linux-gnu are OK.

Thanks a lot!

> Torsten, any update on which way you are going to handle copyright assignment 
> -- apply for a new one or have one of your SuSe colleagues commit under 
> theirs?

AFAIU all this code is "Copyright (C) 2016,2017 SUSE Linux GmbH", just
exactly like e.g. Richard B.'s code, but IANAL. I'll find out the details.
Thanks so far!

Torsten



Re: [PATCH] Optimize BB sorting in domwalk

2017-07-25 Thread Alexander Monakov
On Mon, 24 Jul 2017, Jeff Law wrote:
> As Uli noted, we should be using std::swap.
> 
> Can you please repost ?

* domwalk.c (cmp_bb_postorder): Simplify.
(sort_bbs_postorder): New function.  Use it...
(dom_walker::walk): ...here to optimize common cases.

---
 gcc/domwalk.c | 53 -
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/gcc/domwalk.c b/gcc/domwalk.c
index a0daae6b2d8..df51b8c4451 100644
--- a/gcc/domwalk.c
+++ b/gcc/domwalk.c
@@ -128,19 +128,46 @@ along with GCC; see the file COPYING3.  If not see
 which is currently an abstraction over walking tree statements.  Thus
 the dominator walker is currently only useful for trees.  */
 
+/* Reverse postorder index of each basic block.  */
 static int *bb_postorder;
 
 static int
 cmp_bb_postorder (const void *a, const void *b)
 {
-  basic_block bb1 = *(basic_block *)const_cast(a);
-  basic_block bb2 = *(basic_block *)const_cast(b);
-  if (bb1->index == bb2->index)
-return 0;
+  basic_block bb1 = *(const basic_block *)(a);
+  basic_block bb2 = *(const basic_block *)(b);
+  int n1 = bb_postorder[bb1->index], n2 = bb_postorder[bb2->index];
   /* Place higher completion number first (pop off lower number first).  */
-  if (bb_postorder[bb1->index] > bb_postorder[bb2->index])
-return -1;
-  return 1;
+  return (n1 < n2) - (n1 > n2);
+}
+
+/* Permute array BBS of N basic blocks in postorder,
+   i.e. by descending number in BB_POSTORDER array.  */
+
+static void
+sort_bbs_postorder (basic_block *bbs, int n)
+{
+  if (__builtin_expect (n == 2, true))
+{
+  basic_block bb0 = bbs[0], bb1 = bbs[1];
+  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
+   bbs[0] = bb1, bbs[1] = bb0;
+}
+  else if (__builtin_expect (n == 3, true))
+{
+  basic_block bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2];
+  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
+   std::swap (bb0, bb1);
+  if (bb_postorder[bb1->index] < bb_postorder[bb2->index])
+   {
+ std::swap (bb1, bb2);
+ if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
+   std::swap (bb0, bb1);
+   }
+  bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2;
+}
+  else
+qsort (bbs, n, sizeof *bbs, cmp_bb_postorder);
 }
 
 /* Constructor for a dom walker.
@@ -284,16 +311,8 @@ dom_walker::walk (basic_block bb)
  for (dest = first_dom_son (m_dom_direction, bb);
   dest; dest = next_dom_son (m_dom_direction, dest))
worklist[sp++] = dest;
- if (m_dom_direction == CDI_DOMINATORS)
-   switch (sp - saved_sp)
- {
- case 0:
- case 1:
-   break;
- default:
-   qsort (&worklist[saved_sp], sp - saved_sp,
-  sizeof (basic_block), cmp_bb_postorder);
- }
+ if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
+   sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
}
   /* NULL is used to mark pop operations in the recursion stack.  */
   while (sp > 0 && !worklist[sp - 1])
-- 
2.13.3


Re: [PATCH 2/6] gimple-ssa-store-merging.c: fix sort_by_bitpos

2017-07-25 Thread Kyrill Tkachov

Hi,

On 24/07/17 21:48, Alexander Monakov wrote:

On Sat, 22 Jul 2017, Segher Boessenkool wrote:


On Sat, Jul 15, 2017 at 11:47:45PM +0300, Alexander Monakov wrote:

--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -516,12 +516,12 @@ sort_by_bitpos (const void *x, const void *y)
store_immediate_info *const *tmp = (store_immediate_info * const *) x;
store_immediate_info *const *tmp2 = (store_immediate_info * const *) y;
  
-  if ((*tmp)->bitpos <= (*tmp2)->bitpos)

+  if ((*tmp)->bitpos < (*tmp2)->bitpos)
  return -1;
else if ((*tmp)->bitpos > (*tmp2)->bitpos)
  return 1;
-
-  gcc_unreachable ();
+  else
+return 0;
  }

Does any sort using this comparison function require the sort to be
stable?

It looks like the <= was simply a typo and should have been <, but
the gcc_unreachable was as intended?  (With <= it is trivially
unreachable there).

I think it's best if the original author can chime in - adding Kyrill.

(to be clear, equal bitpos is possible, this issue causes qsort checker errors)


For the uses of this function the order when the bitpos is the same
does not matter, I just wanted to avoid returning zero to avoid perturbations
due to qsort.
IMO the right thing to do here to avoid the qsort checker errors is to break 
the tie between
store_immediate_info objects with equal bitpos by using the sort_by_order 
heuristic
i.e. something like "return (*tmp)->order - (*tmp2)->order;".
That should give well-behaved results as the order of two store_immediate_info 
objects
is guaranteed not to be the same (otherwise something has gone wrong elsewhere).

Thanks,
Kyrill


Alexander




Re: [PATCH] Optimize BB sorting in domwalk

2017-07-25 Thread Richard Biener
On Tue, Jul 25, 2017 at 10:22 AM, Alexander Monakov  wrote:
> On Mon, 24 Jul 2017, Jeff Law wrote:
>> As Uli noted, we should be using std::swap.
>>
>> Can you please repost ?
>
> * domwalk.c (cmp_bb_postorder): Simplify.
> (sort_bbs_postorder): New function.  Use it...
> (dom_walker::walk): ...here to optimize common cases.

Ok.  Note that I think vec::qsort might benefit from handling length
() == 2 and domwalk
might benefit from using vec::qsort if it would work on a sub-vector
as we are asking for
here...

Richard.

> ---
>  gcc/domwalk.c | 53 -
>  1 file changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/domwalk.c b/gcc/domwalk.c
> index a0daae6b2d8..df51b8c4451 100644
> --- a/gcc/domwalk.c
> +++ b/gcc/domwalk.c
> @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3.  If not see
>  which is currently an abstraction over walking tree statements.  Thus
>  the dominator walker is currently only useful for trees.  */
>
> +/* Reverse postorder index of each basic block.  */
>  static int *bb_postorder;
>
>  static int
>  cmp_bb_postorder (const void *a, const void *b)
>  {
> -  basic_block bb1 = *(basic_block *)const_cast(a);
> -  basic_block bb2 = *(basic_block *)const_cast(b);
> -  if (bb1->index == bb2->index)
> -return 0;
> +  basic_block bb1 = *(const basic_block *)(a);
> +  basic_block bb2 = *(const basic_block *)(b);
> +  int n1 = bb_postorder[bb1->index], n2 = bb_postorder[bb2->index];
>/* Place higher completion number first (pop off lower number first).  */
> -  if (bb_postorder[bb1->index] > bb_postorder[bb2->index])
> -return -1;
> -  return 1;
> +  return (n1 < n2) - (n1 > n2);
> +}
> +
> +/* Permute array BBS of N basic blocks in postorder,
> +   i.e. by descending number in BB_POSTORDER array.  */
> +
> +static void
> +sort_bbs_postorder (basic_block *bbs, int n)
> +{
> +  if (__builtin_expect (n == 2, true))
> +{
> +  basic_block bb0 = bbs[0], bb1 = bbs[1];
> +  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   bbs[0] = bb1, bbs[1] = bb0;
> +}
> +  else if (__builtin_expect (n == 3, true))
> +{
> +  basic_block bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2];
> +  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   std::swap (bb0, bb1);
> +  if (bb_postorder[bb1->index] < bb_postorder[bb2->index])
> +   {
> + std::swap (bb1, bb2);
> + if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   std::swap (bb0, bb1);
> +   }
> +  bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2;
> +}
> +  else
> +qsort (bbs, n, sizeof *bbs, cmp_bb_postorder);
>  }
>
>  /* Constructor for a dom walker.
> @@ -284,16 +311,8 @@ dom_walker::walk (basic_block bb)
>   for (dest = first_dom_son (m_dom_direction, bb);
>dest; dest = next_dom_son (m_dom_direction, dest))
> worklist[sp++] = dest;
> - if (m_dom_direction == CDI_DOMINATORS)
> -   switch (sp - saved_sp)
> - {
> - case 0:
> - case 1:
> -   break;
> - default:
> -   qsort (&worklist[saved_sp], sp - saved_sp,
> -  sizeof (basic_block), cmp_bb_postorder);
> - }
> + if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
> +   sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
> }
>/* NULL is used to mark pop operations in the recursion stack.  */
>while (sp > 0 && !worklist[sp - 1])
> --
> 2.13.3


Re: [PATCH 2/6] gimple-ssa-store-merging.c: fix sort_by_bitpos

2017-07-25 Thread Richard Biener
On Tue, Jul 25, 2017 at 10:34 AM, Kyrill Tkachov
 wrote:
> Hi,
>
> On 24/07/17 21:48, Alexander Monakov wrote:
>>
>> On Sat, 22 Jul 2017, Segher Boessenkool wrote:
>>
>>> On Sat, Jul 15, 2017 at 11:47:45PM +0300, Alexander Monakov wrote:

 --- a/gcc/gimple-ssa-store-merging.c
 +++ b/gcc/gimple-ssa-store-merging.c
 @@ -516,12 +516,12 @@ sort_by_bitpos (const void *x, const void *y)
 store_immediate_info *const *tmp = (store_immediate_info * const *)
 x;
 store_immediate_info *const *tmp2 = (store_immediate_info * const *)
 y;
   -  if ((*tmp)->bitpos <= (*tmp2)->bitpos)
 +  if ((*tmp)->bitpos < (*tmp2)->bitpos)
   return -1;
 else if ((*tmp)->bitpos > (*tmp2)->bitpos)
   return 1;
 -
 -  gcc_unreachable ();
 +  else
 +return 0;
   }
>>>
>>> Does any sort using this comparison function require the sort to be
>>> stable?
>>>
>>> It looks like the <= was simply a typo and should have been <, but
>>> the gcc_unreachable was as intended?  (With <= it is trivially
>>> unreachable there).
>>
>> I think it's best if the original author can chime in - adding Kyrill.
>>
>> (to be clear, equal bitpos is possible, this issue causes qsort checker
>> errors)
>
>
> For the uses of this function the order when the bitpos is the same
> does not matter, I just wanted to avoid returning zero to avoid
> perturbations
> due to qsort.
> IMO the right thing to do here to avoid the qsort checker errors is to break
> the tie between
> store_immediate_info objects with equal bitpos by using the sort_by_order
> heuristic
> i.e. something like "return (*tmp)->order - (*tmp2)->order;".
> That should give well-behaved results as the order of two
> store_immediate_info objects
> is guaranteed not to be the same (otherwise something has gone wrong
> elsewhere).

Agreed.

Richard.

> Thanks,
> Kyrill
>
>> Alexander
>
>


[PATCH] Fix movti_internal with -mavx512bw -mno-avx512dq (PR target/81532)

2017-07-25 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the following testcases ICE if AVX512BW/VL is enabled,
but AVX512DQ is not.  The problem is that for movti_internal we need to use
vpextrq or vpinsrq instructions to extract or insert the 64-bit reg from
above the lowest 64-bit and if it is in %xmm16+ register, we need EVEX
encoded insn to do that.  Only vpextr[bw] and vpinsr[bw] are in AVX512BW,
the rest is AVX512DQ.

Yd/Ye constraints are only used in movdi_internal and movti_internal
patterns, in the former only for !TARGET_64BIT (then ALL_SSE_REGS vs.
SSE_REGS makes no difference), in the latter is this case I'm trying to fix.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-07-25  Jakub Jelinek  

PR target/81532
* config/i386/constraints.md (Yd, Ye): Use ALL_SSE_REGS for
TARGET_AVX512DQ rather than TARGET_AVX512BW.

* gcc.target/i386/pr80833-3.c: New test.
* gcc.target/i386/avx512dq-pr81532.c: New test.
* gcc.target/i386/avx512bw-pr81532.c: New test.

--- gcc/config/i386/constraints.md.jj   2017-06-02 09:01:21.0 +0200
+++ gcc/config/i386/constraints.md  2017-07-24 12:58:06.778105305 +0200
@@ -138,19 +138,19 @@ (define_register_constraint "Yj"
 
 (define_register_constraint "Yd"
  "TARGET_INTER_UNIT_MOVES_TO_VEC
-  ? (TARGET_AVX512BW
+  ? (TARGET_AVX512DQ
  ? ALL_SSE_REGS
  : (TARGET_SSE4_1 ? SSE_REGS : NO_REGS))
   : NO_REGS"
- "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW 
target or any SSE register for SSE4_1 target, when inter-unit moves to vector 
registers are enabled.")
+ "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512DQ 
target or any SSE register for SSE4_1 target, when inter-unit moves to vector 
registers are enabled.")
 
 (define_register_constraint "Ye"
  "TARGET_INTER_UNIT_MOVES_FROM_VEC
-  ? (TARGET_AVX512BW
+  ? (TARGET_AVX512DQ
  ? ALL_SSE_REGS
  : (TARGET_SSE4_1 ? SSE_REGS : NO_REGS))
   : NO_REGS"
- "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW 
target or any SSE register for SSE4_1 target, when inter-unit moves from vector 
registers are enabled.")
+ "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512DQ 
target or any SSE register for SSE4_1 target, when inter-unit moves from vector 
registers are enabled.")
 
 (define_register_constraint "Ym"
  "TARGET_MMX && TARGET_INTER_UNIT_MOVES_TO_VEC ? MMX_REGS : NO_REGS"
--- gcc/testsuite/gcc.target/i386/pr80833-3.c.jj2017-07-24 
13:20:00.061347826 +0200
+++ gcc/testsuite/gcc.target/i386/pr80833-3.c   2017-07-24 13:23:17.711976351 
+0200
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -mavx512dq -mavx512vl -mno-avx512bw -mtune=intel" } */
+
+__int128 test (__int128 a)
+{
+  asm ("" : "+v" (a) : : "xmm0", "xmm1", "xmm2", "xmm3",
+"xmm4", "xmm5", "xmm6", "xmm7",
+"xmm8", "xmm9", "xmm10", "xmm11",
+"xmm12", "xmm13", "xmm14", "xmm15");
+  return a;
+}
+
+/* { dg-final { scan-assembler "pinsrq" } } */
+/* { dg-final { scan-assembler "pextrq" } } */
--- gcc/testsuite/gcc.target/i386/avx512dq-pr81532.c.jj 2017-07-24 
13:12:09.165997780 +0200
+++ gcc/testsuite/gcc.target/i386/avx512dq-pr81532.c2017-07-24 
13:13:59.950668550 +0200
@@ -0,0 +1,12 @@
+/* PR target/81532 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -mavx512dq -mavx512vl -mno-avx512bw" } */
+
+typedef unsigned __int128 V __attribute__ ((vector_size (64)));
+
+V
+foo (V c)
+{
+  c >>= 0 != c;
+  return c;
+}
--- gcc/testsuite/gcc.target/i386/avx512bw-pr81532.c.jj 2017-07-24 
13:12:40.660619897 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr81532.c2017-07-24 
13:14:20.984416180 +0200
@@ -0,0 +1,5 @@
+/* PR target/81532 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl -mno-avx512dq" } */
+
+#include "avx512dq-pr81532.c"

Jakub


Re: [PATCH] Optimize BB sorting in domwalk

2017-07-25 Thread Alexander Monakov
On Tue, 25 Jul 2017, Alexander Monakov wrote:
> --- a/gcc/domwalk.c
> +++ b/gcc/domwalk.c
> @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3.  If not see
>  which is currently an abstraction over walking tree statements.  Thus
>  the dominator walker is currently only useful for trees.  */
>  
> +/* Reverse postorder index of each basic block.  */
>  static int *bb_postorder;
>  
>  static int
>  cmp_bb_postorder (const void *a, const void *b)
>  {
> -  basic_block bb1 = *(basic_block *)const_cast(a);
> -  basic_block bb2 = *(basic_block *)const_cast(b);
> -  if (bb1->index == bb2->index)
> -return 0;
> +  basic_block bb1 = *(const basic_block *)(a);
> +  basic_block bb2 = *(const basic_block *)(b);
> +  int n1 = bb_postorder[bb1->index], n2 = bb_postorder[bb2->index];
>/* Place higher completion number first (pop off lower number first).  */
> -  if (bb_postorder[bb1->index] > bb_postorder[bb2->index])
> -return -1;
> -  return 1;
> +  return (n1 < n2) - (n1 > n2);

Actually since the 'int *bb_postorder' array is not going to hold negative
entries, this can be simply

  return n2 - n1;

(the (A > B) - (A < B) formulation is useful in case signed A - B may overflow)

Alexander


[PATCh] Fix a -Wnarrowing warning in Fortran FE

2017-07-25 Thread Jakub Jelinek
Hi!

I've noticed yesterday a -Wnarrowing warning in the Fortran FE that broke
my build (old configure, these days we use -Wno-narrowing, so don't see
that).

We are storing the mask into unsigned int field, and 1 << 31 is negative.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-07-25  Jakub Jelinek  

* ioparm.def: Use 1U << 31 instead of 1 << 31 as flags2 mask.

--- gcc/fortran/ioparm.def.jj   2017-01-01 12:45:47.0 +0100
+++ gcc/fortran/ioparm.def  2017-07-24 18:52:43.268946713 +0200
@@ -82,7 +82,7 @@ IOPARM (inquire, read,1 << 27, char2)
 IOPARM (inquire, write,1 << 28, char1)
 IOPARM (inquire, readwrite,1 << 29, char2)
 IOPARM (inquire, convert,  1 << 30, char1)
-IOPARM (inquire, flags2,   1 << 31, int4)
+IOPARM (inquire, flags2,   1U << 31, int4)
 IOPARM (inquire, asynchronous, 1 << 0,  char1)
 IOPARM (inquire, decimal,  1 << 1,  char2)
 IOPARM (inquire, encoding, 1 << 2,  char1)

Jakub


Re: [PATCH][AArch64] vec_pack_trunc_ should split after register allocator

2017-07-25 Thread James Greenhalgh
On Tue, Jul 25, 2017 at 07:30:49AM +, Hurugalawadi, Naveen wrote:
> Hi,
> 
> >> I think we can split this whenever we like, and
> >> that there isn't any benefit in keeping the pair together?
> 
> Thanks for the review and your views.

Thanks for the updated patch, and sorry that I haven't been clear in what
I was asking for.

I was wondering why we could not use an insn_and_split without the
reload_completed guard - there is probably still value in allowing other
optimisation passes to see that we can support:

   [(set (match_operand: 0 "register_operand" "")
 (vec_concat:
   (truncate: (match_operand:VQN 1 "register_operand" "w"))
   (truncate: (match_operand:VQN 2 "register_operand" "w"]

but an expand pattern is not going to permit that.

Could you switch this back to an insn_and_split as it was in the previous
patch, and just drop the && reload_completed ?

Thanks,
James

> 
> The patch is modified as per your suggestion.
> 
> Please review the patch and let me know if its okay?
> 
> Bootstrapped and Regression done on AArch64-Thunder-Linux.
> 
> Thanks,
> Naveen   

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 1cb6eeb..a41edad 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1291,6 +1291,18 @@
>[(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
>  
> +(define_insn "aarch64_simd_vec_pack_trunc_hi_"
> + [(set (match_operand: 0 "register_operand" "=w")
> +   (vec_concat:
> +  (truncate: (match_operand:VQN 1 "register_operand" "w"))
> +  (vec_select:
> +(match_operand: 3 "register_operand" "0")
> +(match_operand: 2 "vect_par_cnst_hi_half" ""]
> + "TARGET_SIMD"
> + "xtn2\\t%0., %1."
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> +
>  (define_expand "vec_pack_trunc_"
>   [(match_operand: 0 "register_operand" "")
>(match_operand:VDN 1 "register_operand" "")
> @@ -1309,20 +1321,39 @@
>  
>  ;; For quads.
>  
> -(define_insn "vec_pack_trunc_"
> - [(set (match_operand: 0 "register_operand" "=&w")
> +(define_expand "vec_pack_trunc_"
> + [(set (match_operand: 0 "register_operand" "")
> (vec_concat:
> -  (truncate: (match_operand:VQN 1 "register_operand" "w"))
> -  (truncate: (match_operand:VQN 2 "register_operand" "w"]
> +  (truncate: (match_operand:VQN 1 "register_operand" ""))
> +  (truncate: (match_operand:VQN 2 "register_operand" ""]
>   "TARGET_SIMD"
>   {
> if (BYTES_BIG_ENDIAN)
> - return "xtn\\t%0., %2.\;xtn2\\t%0., %1.";
> + {
> +   rtx low_part = gen_lowpart (mode, operands[0]);
> +   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
> +   operands[2]));
> +   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
> + true);
> +   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
> +  operands[1],
> +  high_part,
> +  operands[0]));
> + }
> else
> - return "xtn\\t%0., %1.\;xtn2\\t%0., %2.";
> + {
> +   rtx low_part = gen_lowpart (mode, operands[0]);
> +   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
> +   operands[1]));
> +   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
> + true);
> +   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
> +  operands[2],
> +  high_part,
> +  operands[0]));
> + }
> +   DONE;
>   }
> -  [(set_attr "type" "multiple")
> -   (set_attr "length" "8")]
>  )
>  
>  ;; Widening operations.



[PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)

2017-07-25 Thread Jakub Jelinek
Hi!

The following patch adjusts the vec_init and vec_extract optabs, so that
they don't have in the expander names just the vector mode, but also another
mode, for vec_extract the mode of the result and for vec_init the mode of
the elts of the vector passed as second operand.

Without this patch, the second mode has been implicit, GET_MODE_INNER of
the vector mode, so one could just extract a single element from a vector
or construct vector from elements.  While that is most common, we allow
in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
and the vectorizer uses them.  By having the second mode in the name
it allows the generic code (vectorizer, expansion) to query whether the
backend supports such vector from vector expansions or inits from vector
elts and use them if available.

For vec_extract, if we say want to extract high V2SImode from V4SImode
the fallback is try to expand it as DImode extraction from V2DImode.
This works well in many cases, but doesn't really work for very large
vectors, say if we want to extract high V8SImode from V16SImode on x86,
we'd need OImode extraction from V2OImode, which is something the backend
doesn't have any support for.
For vec_init, the fallback is usually to go through memory, which is slow in
many cases.

This patch only adds new vector from vector extract and init patterns to
the i386 backend, but I had to change many other targets too, because
it needs to have the element mode in the vec_extract/vec_init expander
names.  Seems most of the backends didn't really have a mode attribute
usable for this or had it only in uppercase, while for the names we need
lowercase.  Some backends had a convention on how to name lower case
vs. upper case modes, others didn't have any.  So I'm CCing maintainers
of affected backends to seek advice on what mode attributes they want to
use.

Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
e.g. the code generation for slp-43.c and slp-45.c testcases.
make cc1 tested in cross-compilers to the remaining targets.

Ok for trunk?

2017-07-25  Jakub Jelinek  

PR target/80846
* optabs.def (vec_extract_optab, vec_init_optab): Change from
a direct optab to conversion optab.
* optabs.c (expand_vector_broadcast): Use convert_optab_handler
with GET_MODE_INNER as last argument instead of optab_handler.
* expmed.c (extract_bit_field_1): Likewise.  Use vector from
vector extraction if possible and optab is available.
* expr.c (store_constructor): Use convert_optab_handler instead
of optab_handler.  Use vector initialization from smaller
vectors if possible and optab is available.
* tree-vect-stmts.c (vectorizable_load): Likewise.
* doc/md.texi (vec_extract, vec_init): Document that the optabs
now have two modes.
* config/i386/i386.c (ix86_expand_vector_init): Handle expansion
of vec_init from half-sized vectors with the same element mode.
* config/i386/sse.md (ssehalfvecmode): Add V4TI case.
(ssehalfvecmodelower, ssescalarmodelower): New mode attributes.
(reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df,
reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf,
reduc__scal_, reduc_umin_scal_v8hi): Add element mode
after mode in gen_vec_extract* calls.
(vec_extract): Renamed to ...
(vec_extract): ... this.
(vec_extract): New expander.
(rotl3, rotr3, 3, ashrv2di3): Add
element mode after mode in gen_vec_init* calls.
(VEC_INIT_HALF_MODE): New mode iterator.
(vec_init): Renamed to ...
(vec_init): ... this.
(vec_init): New expander.
* config/i386/mmx.md (vec_extractv2sf): Renamed to ...
(vec_extractv2sfsf): ... this.
(vec_initv2sf): Renamed to ...
(vec_initv2sfsf): ... this.
(vec_extractv2si): Renamed to ...
(vec_extractv2sisi): ... this.
(vec_initv2si): Renamed to ...
(vec_initv2sisi): ... this.
(vec_extractv4hi): Renamed to ...
(vec_extractv4hihi): ... this.
(vec_initv4hi): Renamed to ...
(vec_initv4hihi): ... this.
(vec_extractv8qi): Renamed to ...
(vec_extractv8qiqi): ... this.
(vec_initv8qi): Renamed to ...
(vec_initv8qiqi): ... this.
* config/rs6000/vector.md (VEC_base_l): New mode attribute.
(vec_init): Renamed to ...
(vec_init): ... this.
(vec_extract): Renamed to ...
(vec_extract): ... this.
* config/rs6000/paired.md (vec_initv2sf): Renamed to ...
(vec_initv2sfsf): ... this.
* config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3,
vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve2): Add
element mode after mode in gen_vec_init* calls.
* config/aarch64/aa

[RFC] If conversion min/max search, costs and problems

2017-07-25 Thread Robin Dapp
Hi,

recently I wondered why a snippet like the following is not being
if-converted at all on s390:

int foo (int *a, unsigned int n)
{
  int min = 99;
  int bla = 0;
  for (int i = 0; i < n; i++)
{
  if (a[i] < min)
{
  min = a[i];
  bla = 1;
}
}

  if (bla)
min += 1;
  return min;
}

After skimming through ifcvt's code, it turned out that there are two
paths that could handle such a situation.  At first I looked at
cond_move_process_if_block ().  One condition to check if the block is
suitable for processing is

  if (reg_overlap_mentioned_p (dest, cond))
return FALSE;

i.e. abort if the destination of a conditional move appears in the
condition.  This means we do not handle minimum or maximum searches like
if (a < min) min = a; with this.  That might be intentional as
noce_try_minmax () can handle it but for me the reason was not entirely
obvious.

Disabling the condition above for the snippet

 if (a[i] < min)
 {
   min = a[i];
   bla = 1;
 }

yields assembly like the following on i386:

 cmpl%edx, %ecx
 cmovg   %ecx, %edx
 cmpl%edx, %ecx
 cmovg   %eax, %esi

meaning the compare is emitted multiple times and a write to a register
used by the compare is obviously wrong.

The compares are produced by noce_emit_cmove () calling
emit_conditional_move () which, in turn, seems to prepare and emit a
comparison every time it emits a conditional move.  Is this really
necessary or the correct thing to do?

--

The second ifcvt path uses noce_convert_multiple_sets () whose
preconditions don't seem as strict.  The conversion result is never
considered however, because costs are estimated as higher than the
non-converted version.  The cost estimation seems odd to me at best though:

Before noce_process_if_block () the original costs are set to
 if_info.original_cost = COSTS_N_INSNS (2);
but the adding of the then/else block costs is only performed after
noce_convert_multiple_sets () i.e. the costs after conversion will never
be lower than the original costs.  When artifically lowering the costs
we end up with the same assembly sequence as above including the
superfluous compares.

To address these issues I came up with a tentative patch that
 - adds cost for the original then basic block and adapts s390's
noce_conversion_profitable_p () hook to allow processing (magic value
for now).
 - extracts the compare from the first-emitted conditional move and uses
it for the subsequent conditional moves getting rid of all but the first
compare.

Test suite on s390 and i386 looks ok.

Some questions/remarks, comments welcome:
 - ifcvt performs creates things that it expects other passes to clean
up afterwards.  In the case of the superfluous compares this might be
possible but the code is actually wrong and we cannot rely on a pass
fixing wrong code.
 - The extraction of the compare from the conditional move is pretty
ad-hoc and pattern-dependent currently.  Is there a way to do it in a
more backend-independent fashion?
 - Branch mispredict costs don't seem to play a major role in ifcvt.
Shouldn't they be accounted for in all cases, maybe weighted by bb
probabilities?

Regards
 Robin
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 0c72098..e9cdf5d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
+#include "ifcvt.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -15671,6 +15672,22 @@ s390_asan_shadow_offset (void)
   return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC (0x2000);
 }
 
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  bool speed_p = if_info->speed_p;
+
+  /* Cost up the new sequence.  */
+  unsigned int cost = seq_cost (seq, speed_p) - 5;
+
+  if (cost <= if_info->original_cost)
+return true;
+
+  /* When compiling for size, we can make a reasonably accurately guess
+ at the size growth.  When compiling for speed, use the maximum.  */
+  return speed_p && cost <= if_info->max_seq_cost;
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -15902,6 +15919,9 @@ s390_asan_shadow_offset (void)
 #undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
 #define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT s390_support_vector_misalignment
 
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
 #undef TARGET_VECTOR_ALIGNMENT
 #define TARGET_VECTOR_ALIGNMENT s390_vector_alignment
 
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fd682a4..e0b3f48 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -772,6 +772,8 @@ static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, en

Patch ping

2017-07-25 Thread Jakub Jelinek
Hi!

I'd like to ping 2 patches:

- UBSAN -fsanitize=pointer-overflow support
  - http://gcc.gnu.org/ml/gcc-patches/2017-06/msg01365.html

- noipa attribute addition  
   
  http://gcc.gnu.org/ml/gcc-patches/2016-12/msg01501.html   
   

Jakub


Re: c-family PATCH to improve -Wsign-compare (PR c/81417)

2017-07-25 Thread Marek Polacek
Ping.

On Fri, Jul 14, 2017 at 03:27:21PM +0200, Marek Polacek wrote:
> On Thu, Jul 13, 2017 at 04:59:20PM -0400, David Malcolm wrote:
> > On Thu, 2017-07-13 at 16:39 -0400, Eric Gallager wrote:
> > > On 7/13/17, David Malcolm  wrote:
> > > > On Thu, 2017-07-13 at 18:33 +0200, Marek Polacek wrote:
> > > > > A tiny patch for -Wsign-compare so that is also prints the types
> > > > > when
> > > > > reporting a warning.
> > > > > 
> > > > > Bootstrapped/regtested on x86_64-linux and ppc64le-redhat-linux,
> > > > > ok
> > > > > for trunk?
> > > > 
> > > > Looks like it always display the types in the order signed then
> > > > unsigned, which matches the text of the diagnostic, but not
> > > > necessarily
> > > > the ordering within the expression, which might be confusing if
> > > > someone's comparing e.g.
> > > > 
> > > >   unsigned_a < signed_b
> > > > 
> > > 
> > > Good catch, I forgot about that case when opening the original bug
> > > that Marek posted this patch for...
> > > 
> > > > But we already hardcode the ordering within the text of the
> > > > diagnostic,
> > > > so that feels excessively nit-picky.
> > > 
> > > I don't think it's being excessively nit-picky; I think it'd make
> > > more
> > > sense to match the ordering of the expression. That's what clang
> > > does:
> > > 
> > > $ cat Wsign_compare.c
> > > /* { dg-do compile } */
> > > 
> > > int foo(signed int a, unsigned int b)
> > > {
> > >   return (a < b);
> > > }
> > > 
> > > int bar(unsigned int c, signed int d)
> > > {
> > >   return (c < d);
> > > }
> > > 
> > > $ /sw/opt/llvm-3.1/bin/clang -c -Wsign-compare Wsign_compare.c
> > > Wsign_compare.c:5:12: warning: comparison of integers of different
> > > signs: 'int' and 'unsigned int' [-Wsign-compare]
> > > return (a < b);
> > > ~ ^ ~
> > > Wsign_compare.c:10:12: warning: comparison of integers of different
> > > signs: 'unsigned int' and 'int' [-Wsign-compare]
> > > return (c < d);
> > > ~ ^ ~
> > > 2 warnings generated.
> > 
> > That's much nicer.
> > 
> > > 
> > > > 
> > > > OK for trunk (with my "diagnostic messages" maintainer hat on).
> > 
> > Marek: I take it back; can you update the patch accordingly, please?
> > 
> > (Note to self: always doublecheck the linked PR for context).
> 
> Sure, here goes:
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-07-14  Marek Polacek  
> 
>   PR c/81417
>   * c-warn.c (warn_for_sign_compare): Print the types.
> 
>   * c-c++-common/Wsign-compare-1.c: New test.
> 
> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> index b9378c2dbe2..7ff11821453 100644
> --- gcc/c-family/c-warn.c
> +++ gcc/c-family/c-warn.c
> @@ -1891,9 +1891,10 @@ warn_for_sign_compare (location_t location,
>  c_common_signed_type (base_type)))
>   /* OK */;
>else
> - warning_at (location,
> - OPT_Wsign_compare,
> - "comparison between signed and unsigned integer 
> expressions");
> + warning_at (location, OPT_Wsign_compare,
> + "comparison between signed and unsigned integer "
> + "expressions: %qT and %qT", TREE_TYPE (orig_op0),
> + TREE_TYPE (orig_op1));
>  }
>  
>/* Warn if two unsigned values are being compared in a size larger
> diff --git gcc/testsuite/c-c++-common/Wsign-compare-1.c 
> gcc/testsuite/c-c++-common/Wsign-compare-1.c
> index e69de29bb2d..0e01453e7d8 100644
> --- gcc/testsuite/c-c++-common/Wsign-compare-1.c
> +++ gcc/testsuite/c-c++-common/Wsign-compare-1.c
> @@ -0,0 +1,27 @@
> +/* PR c/81417 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wsign-compare" } */
> +
> +int
> +fn1 (signed int a, unsigned int b)
> +{
> +  return a < b; /* { dg-warning "comparison between signed and unsigned 
> integer expressions: 'int' and 'unsigned int'" } */
> +}
> +
> +int
> +fn2 (signed int a, unsigned int b)
> +{
> +  return b < a; /* { dg-warning "comparison between signed and unsigned 
> integer expressions: 'unsigned int' and 'int'" } */
> +}
> +
> +int
> +fn3 (signed long int a, unsigned long int b)
> +{
> +  return b < a; /* { dg-warning "comparison between signed and unsigned 
> integer expressions: 'long unsigned int' and 'long int'" } */
> +}
> +
> +int
> +fn4 (signed short int a, unsigned int b)
> +{
> +  return b < a; /* { dg-warning "comparison between signed and unsigned 
> integer expressions: 'unsigned int' and 'short int'" } */
> +}

Marek


[patch,avr,committed] avr-log.c: Print NULL_TREE as ""

2017-07-25 Thread Georg-Johann Lay

https://gcc.gnu.org/r250495

Committed $subject as obvious.

Johann

gcc/
* config/avr/avr-log.c (avr_log_vadump) ['T']: Print NULL-TREE.

Index: config/avr/avr-log.c
===
--- config/avr/avr-log.c(revision 250494)
+++ config/avr/avr-log.c(revision 250495)
@@ -146,7 +146,13 @@ avr_log_vadump (FILE *file, const char *
   }

 case 'T':
-  print_node_brief (file, "", va_arg (ap, tree), 3);
+  {
+tree t = va_arg (ap, tree);
+if (NULL_TREE == t)
+  fprintf (file, "");
+else
+  print_node_brief (file, "", t, 3);
+  }
   break;

 case 'd':


Re: [patch, fortran] Fix PR 79312, find invalid typespec for empty array constructors

2017-07-25 Thread Thomas Koenig

I wrote:


Regression-testing as I write this.  OK for trunk if it passes?


Regression test passed.

Regards

Thomas


Re: [PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-07-25 Thread Martin Liška
PING^1

Jakub can you please take a look? I would like to have it in 7.2 if possible.

Thanks,
Martin

On 07/18/2017 10:38 AM, Martin Liška wrote:
> On 07/17/2017 03:15 PM, Michael Matz wrote:
>> Hello,
>>
>> On Mon, 17 Jul 2017, Martin Liška wrote:
>>
>>> which does all the stack preparation (including the problematic call to 
>>> __asan_stack_malloc_N).
>>>
>>> Note that this code still should be placed before parm_birth_note as we 
>>> cant's say that params are ready before a fake stack is prepared.
>>
>> Yes, understood.
>>
>>> Then we generate code that loads the implicit chain argument:
>>>
>>> (gdb) p debug_rtx_list(get_insns(), 100)
>>> (note 1 0 37 NOTE_INSN_DELETED)
>>>
>>> (note 37 1 38 NOTE_INSN_FUNCTION_BEG)
>>>
>>> (insn 38 37 39 (set (reg/f:DI 94 [ CHAIN.1 ])
>>> (reg:DI 39 r10 [ CHAIN.1 ])) 
>>> "/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
>>>  (nil))
>>>
>>> (insn 39 38 0 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
>>> (const_int -584 [0xfdb8])) [0  S8 A64])
>>> (reg:DI 39 r10 [ CHAIN.1 ])) 
>>> "/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
>>>  (nil))
>>>
>>> Which is problematic as using virtual-stack-vars which should point to 
>>> fake stack done by AddressSanitizer in __asan_stack_malloc_N.
>>
>> If anything, then only the stack access is problematic, i.e. the last 
>> instruction.  I don't understand why that should be problematic, though.  
> 
> Hi.
> 
> Thanks one more time, it's really educative this PR and whole problematic of 
> function prologue.
> So short answer for your email: marking parm_birth_insn after static chain 
> init solves the problem :)
> It's because:
> 
> (insn 2 1 3 (set (reg/f:DI 100 [ CHAIN.2 ])
> (reg:DI 39 r10 [ CHAIN.2 ])) "/tmp/nested.c":6 -1
>  (nil))
> 
> (insn 3 2 4 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
> (const_int -8 [0xfff8])) [0  S8 A64])
> (reg:DI 39 r10 [ CHAIN.2 ])) "/tmp/nested.c":6 -1
>  (nil))
> 
> is just storage of &FRAME.0 from caller where content of the FRAME struct 
> lives on stack (and thus on
> shadow stack). That said it's perfectly fine to store &CHAIN to real stack of 
> callee.
> 
> Thus I'm going to test attached patch.
> 
> P.S. One interesting side effect of how static chain is implemented:
> 
> Consider:
> 
> int
> main ()
> {
>   __label__ l;
>   int buffer[100];
>   void f ()
>   {
> int a[123];
> *(&buffer[0] - 4) = 123;
> 
> goto l;
>   }
> 
>   f ();
> l:
>   return 0;
> }
> 
> It's funny that *(&buffer[0] - 4) actually corrupts __nl_goto_buf and we end 
> up with a
> dead signal:
> 
> ASAN:DEADLYSIGNAL
> =
> ==30888==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
> 0x bp 0x sp 0x7ffe049b T0)
> 
> Thanks,
> Martin
> 
>> Probably because I don't know much about the ASAN implementation.  But why 
>> should there be something magic about using the non-asan stack?  Most 
>> local variable accesses are rewritten to be in terms of the fake stack, 
>> but those that aren't could use the normal stack just fine, can't they?
>>
>> If that really is a problem then that could also be rectified by splitting 
>> the static_chain_decl in expand_function_start a bit, ala this:
>>
>>   if (cfun->static_chain_decl) {
>> all code except the last "if (!optimize) store-into-stack"
>>   }
>>   emit_note; parm_birth_insn = ...
>>   if (cfun->static_chain_decl && !optimize) {
>> store into assign_stack_local
>>   }
>>
>> (requires moving some local variable to an outer scope, but hey).
>>
>> But what you say above mystifies me.  You claim that access via 
>> virtual-stack-vars is problematic before the shadow stack is created by 
>> ASAN.  But the whole parameter setup always uses such local stack storage 
>> whenever it needs.  And those definitely happen before the ASAN setup.  
>> See the subroutines of assign_parms, (e.g. assign_parm_setup_block and 
>> assign_parm_setup_stack).  You might need to use special function argument 
>> types or special ABIs to trigger this, though you should be able to find 
>> some cases to trigger also on i386 or x86_64.
>>
>> So, if the stack access for the static chain is problematic I don't see 
>> why the stack accesses for the parameters are not.  And if they indeed are 
>> problematic, then something is confused within ASAN, and the fix for that 
>> confusion is not to move parm_birth_insn, but something else (I can't say 
>> what, as I don't know much about how ASAN is supposed to work in such 
>> situations).
>>
>>
>> Ciao,
>> Michael.
>>
> 



Re: [PATCH] Fix ifunc and resolver (PR ipa/81213).

2017-07-25 Thread Martin Liška
PING^2

On 07/11/2017 03:51 PM, Martin Liška wrote:
> PING^1
> 
> Martin
> 
> On 06/30/2017 10:47 AM, Martin Liška wrote:
>> Hello.
>>
>> Following patch does refactoring of make_resolver_func where ifunc
>> alias and resolver were probably confused.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> i386.exp tests work on x86_64-linux-gnu.
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-06-29  Martin Liska  
>>
>> PR ipa/81213
>> * config/i386/i386.c (make_resolver_func): Do complete
>> refactoring of the function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-06-29  Martin Liska  
>>
>> PR ipa/81213
>> * gcc.target/i386/pr81213.c: New test.
>> ---
>>   gcc/config/i386/i386.c  | 37 
>> -
>>   gcc/testsuite/gcc.target/i386/pr81213.c | 19 +
>>   2 files changed, 37 insertions(+), 19 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/i386/pr81213.c
>>
>>
> 



Re: [PATCh] Fix a -Wnarrowing warning in Fortran FE

2017-07-25 Thread Thomas Koenig

Hi Jakub,


I've noticed yesterday a -Wnarrowing warning in the Fortran FE that broke
my build (old configure, these days we use -Wno-narrowing, so don't see
that).

We are storing the mask into unsigned int field, and 1 << 31 is negative.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-07-25  Jakub Jelinek  

* ioparm.def: Use 1U << 31 instead of 1 << 31 as flags2 mask.

--- gcc/fortran/ioparm.def.jj   2017-01-01 12:45:47.0 +0100
+++ gcc/fortran/ioparm.def  2017-07-24 18:52:43.268946713 +0200
@@ -82,7 +82,7 @@ IOPARM (inquire, read,1 << 27, char2)
  IOPARM (inquire, write,   1 << 28, char1)
  IOPARM (inquire, readwrite,   1 << 29, char2)
  IOPARM (inquire, convert, 1 << 30, char1)
-IOPARM (inquire, flags2,   1 << 31, int4)
+IOPARM (inquire, flags2,   1U << 31, int4)
  IOPARM (inquire, asynchronous,1 << 0,  char1)
  IOPARM (inquire, decimal, 1 << 1,  char2)
  IOPARM (inquire, encoding,1 << 2,  char1)


The patch is OK.

Thanks!

Thomas


Re: [ARM, VXworks] Fix build

2017-07-25 Thread Olivier Hainque
Hi Richard,

(back from a few days away)

> On Jul 17, 2017, at 12:01 , Eric Botcazou  wrote:
> 
>> That's good news.  Does that mean we'll be able to drop the old stuff
>> though?  I'd really like to make progress towards removing the old ABI
>> support from GCC.
> 
> Yes, I'd think so, but Olivier knows better.
> 
>> Note that considering a port for deprecation is only the first step.  It
>> does not mean that it has been deprecated.  Sometimes the only way to
>> find out if a port really is wanted is to make such a threat...
> 
> No disagreement. ;-)

In this instance, no doubt we want to keep the port in. As I mentioned
off-list, the port is still very active from our (AdaCore) perspective,
I have just recently been enlisted as maintainer and plan to contribute
significant updates.

Part of that is support for VxWorks 7, which has switched to the new
ABI.

Regarding removal of old ABI support, which release were you
targeting ?

On the VxWorks front, where we adapt to what the system toolchains
do, it will mean dropping support for VxWorks versions prior to 7,
which is not so old - couple of years I think. Not the end of the
world, but an extra release cycle can make a difference.

Is VxWorks alone in this category ?

Olivier







[PATCH] Fix PR81510

2017-07-25 Thread Richard Biener

The following fixes the issue that if the reduction stmt (the stmt
defining the backedge value of a loop PHI) is not in the loop
then this isn't a reduction thus we shouldn't continue analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-07-25  Richard Biener  

PR tree-optimization/81510
* tree-vect-loop.c (vect_is_simple_reduction): When the
reduction stmt is not inside the loop bail out.

* gcc.dg/torture/pr81510.c: New testcase.
* gcc.dg/torture/pr81510-2.c: Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 250493)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2813,27 +2813,29 @@ vect_is_simple_reduction (loop_vec_info
   return NULL;
 }
 
+  if (! flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
+return NULL;
+
   nloop_uses = 0;
   auto_vec lcphis;
-  if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
-FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
-  {
-   gimple *use_stmt = USE_STMT (use_p);
-   if (is_gimple_debug (use_stmt))
- continue;
-   if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
- nloop_uses++;
-   else
- /* We can have more than one loop-closed PHI.  */
- lcphis.safe_push (as_a  (use_stmt));
-   if (nloop_uses > 1)
- {
-   if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "reduction used in loop.\n");
-   return NULL;
- }
-  }
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
+{
+  gimple *use_stmt = USE_STMT (use_p);
+  if (is_gimple_debug (use_stmt))
+   continue;
+  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+   nloop_uses++;
+  else
+   /* We can have more than one loop-closed PHI.  */
+   lcphis.safe_push (as_a  (use_stmt));
+  if (nloop_uses > 1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"reduction used in loop.\n");
+ return NULL;
+   }
+}
 
   /* If DEF_STMT is a phi node itself, we expect it to have a single argument
  defined in the inner loop.  */
Index: gcc/testsuite/gcc.dg/torture/pr81510.c
===
--- gcc/testsuite/gcc.dg/torture/pr81510.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr81510.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+typedef int d;
+typedef int f;
+typedef long h;
+int a;
+int b;
+int c;
+int e()
+{
+  f *g;
+  h i;
+  for (;;)
+if (g)
+  for (; b; b++) {
+ g = c;
+ if (a &= c) {
+ d *j = &b;
+ h k;
+ for (; i; i++) {
+ *g ?: (*j = k);
+ g = &a;
+ }
+ for (; i <= 3; i++)
+   ;
+ }
+  }
+}
Index: gcc/testsuite/gcc.dg/torture/pr81510-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr81510-2.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr81510-2.c(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+typedef int h;
+typedef int k;
+int a;
+int b;
+int c;
+int d;
+int e;
+int f(int g)
+{
+  h *i = &e;
+  k *j;
+  if (d -= b)
+for (; *j; *j += 1) {
+   g = g || (a = e ? c = (__UINTPTR_TYPE__)j : 0) + *i;
+   i = &d;
+}
+}


[PATCH] Fix ICE part of PR81546

2017-07-25 Thread Richard Biener

The following patch fixes the immediate use verifier error by
removing an artificial cap on the number of immediate uses (to
detect cycles).  We'd run into problems sooner or later in immediate
uses iteration anyways.

This allows the testcase to compile.  SLSR still goes crazy on this
testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-07-25  Richard Biener  

PR middle-end/81546
* tree-ssa-operands.c (verify_imm_links): Remove cap on number
of immediate uses, be more verbose on errors.

Index: gcc/tree-ssa-operands.c
===
--- gcc/tree-ssa-operands.c (revision 250494)
+++ gcc/tree-ssa-operands.c (working copy)
@@ -1139,7 +1139,7 @@ DEBUG_FUNCTION bool
 verify_imm_links (FILE *f, tree var)
 {
   use_operand_p ptr, prev, list;
-  int count;
+  unsigned int count;
 
   gcc_assert (TREE_CODE (var) == SSA_NAME);
 
@@ -1157,20 +1157,31 @@ verify_imm_links (FILE *f, tree var)
   for (ptr = list->next; ptr != list; )
 {
   if (prev != ptr->prev)
-   goto error;
+   {
+ fprintf (f, "prev != ptr->prev\n");
+ goto error;
+   }
 
   if (ptr->use == NULL)
-   goto error; /* 2 roots, or SAFE guard node.  */
+   {
+ fprintf (f, "ptr->use == NULL\n");
+ goto error; /* 2 roots, or SAFE guard node.  */
+   }
   else if (*(ptr->use) != var)
-   goto error;
+   {
+ fprintf (f, "*(ptr->use) != var\n");
+ goto error;
+   }
 
   prev = ptr;
   ptr = ptr->next;
 
-  /* Avoid infinite loops.  50,000,000 uses probably indicates a
-problem.  */
-  if (count++ > 5000)
-   goto error;
+  count++;
+  if (count == 0)
+   {
+ fprintf (f, "number of immediate uses doesn't fit unsigned int\n");
+ goto error;
+   }
 }
 
   /* Verify list in the other direction.  */
@@ -1178,15 +1189,25 @@ verify_imm_links (FILE *f, tree var)
   for (ptr = list->prev; ptr != list; )
 {
   if (prev != ptr->next)
-   goto error;
+   {
+ fprintf (f, "prev != ptr->next\n");
+ goto error;
+   }
   prev = ptr;
   ptr = ptr->prev;
-  if (count-- < 0)
-   goto error;
+  if (count == 0)
+   {
+ fprintf (f, "count-- < 0\n");
+ goto error;
+   }
+  count--;
 }
 
   if (count != 0)
-goto error;
+{
+  fprintf (f, "count != 0\n");
+  goto error;
+}
 
   return false;
 


Re: [PATCH, rs6000] Add testcases for vec_cnttz

2017-07-25 Thread Segher Boessenkool
Hi!

On Mon, Jul 24, 2017 at 01:51:20PM -0700, Carl Love wrote:
> On Fri, 2017-07-14 at 15:58 -0500, Segher Boessenkool wrote:
> > On Fri, Jul 14, 2017 at 01:20:32PM -0700, Carl Love wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-p9-runnable.c
> > > @@ -0,0 +1,95 @@
> > > +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } 
> > > } */
> > 
> > powerpc*-*-* instead.
> > 
> > > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > > "-mcpu=power9" } } */
> > > +/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
> > 
> > Does this need to be restricted to 64-bit?  Please test.

I mean: do you need the "lp64".  So, does

/* { dg-do run { target { powerpc*-*-* && { p9vector_hw } } } } */

work?

> The test passed for -m64 and failed on -m32

>   === gcc Summary for unix/-m32 ===
> 
> # of unsupported tests1

That's not a fail.  But why is it unsupported?  Did you leave in
powerpc64*-*-*?  Or maybe it is the -mupper-regs-di?  That option is
has been deleted now btw.


Segher


Re: [PATCH, PR81430] Use finalize_options in lto1

2017-07-25 Thread Richard Biener
On Fri, 21 Jul 2017, Tom de Vries wrote:

> On 07/21/2017 11:41 AM, Richard Biener wrote:
> > On Thu, 20 Jul 2017, Tom de Vries wrote:
> > 
> > > On 07/20/2017 12:10 PM, Richard Biener wrote:
> > > > On Thu, 20 Jul 2017, Tom de Vries wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > this patch fixes PR81430, an ICE in the libgomp testsuite for both
> > > > > openmp
> > > > > and
> > > > > openacc test-cases for x86_64 with nvptx accelerator.
> > > > > 
> > > > > The scenario how we hit the ICE is as follows:
> > > > > - a testcase is compiled with -O2
> > > > > - ix86_option_optimization_table enables
> > > > > OPT_freorder_blocks_and_partition at -O2
> > > > > - cc1 writes out the flag as part of
> > > > > DECL_FUNCTION_SPECIFIC_OPTIMIZATION
> > > > > - lto1 reads in the flag as part of
> > > > > DECL_FUNCTION_SPECIFIC_OPTIMIZATION
> > > > > - lto1 uses the flag, and runs pass_partition_blocks
> > > > > - pass_partition_blocks ICEs, because it generates code that is not
> > > > > supported by the nvptx target.
> > > > > 
> > > > > Note that for standalone compilation for single-thread ptx execution,
> > > > > we
> > > > > don't
> > > > > attempt to run pass_partition_blocks. This is because for nvptx,
> > > > > TARGET_HAVE_NAMED_SECTIONS is set to false, and this bit in
> > > > > finish_options
> > > > > switches off pass_partition_blocks:
> > > > > ...
> > > > >  /* If the target requested unwind info, then turn off the
> > > > > partitioning optimization with a different message.  Likewise,
> > > > > if
> > > > > the target does not support named sections.  */
> > > > > 
> > > > > if (opts->x_flag_reorder_blocks_and_partition
> > > > > && (!targetm_common.have_named_sections
> > > > > || (opts->x_flag_unwind_tables
> > > > > && targetm_common.unwind_tables_default
> > > > > && (ui_except == UI_SJLJ || ui_except >= UI_TARGET
> > > > >   {
> > > > > if (opts_set->x_flag_reorder_blocks_and_partition)
> > > > >   inform (loc,
> > > > >   "-freorder-blocks-and-partition does not work "
> > > > >   "on this architecture");
> > > > > opts->x_flag_reorder_blocks_and_partition = 0;
> > > > > opts->x_flag_reorder_blocks = 1;
> > > > >   }
> > > > > ...
> > > > > 
> > > > > The patch fixes this by calling finish_options in lto1 after
> > > > > cl_optimization_restore.
> > > > > 
> > > > > Points for review:
> > > > > 1. I'm uncertain though about the placement of the call. Perhaps it
> > > > > should
> > > > > be
> > > > > in cl_optimization_restore, before
> > > > > targetm.override_options_after_change?
> > > > > 
> > > > > 2. I think that this is offloading specific, so perhaps this should be
> > > > > guarded
> > > > > with lto_stream_offload_p or #ifdef ACCEL_COMPILER or some such.
> > > > 
> > > > Hmm, I agree with #2.  I think it conceptually is a LTO stream
> > > > adjustment
> > > > and thus we should do this at the time we stream in the
> > > > optimization/target nodes (like we remap modes for example).  Not
> > > > sure if it's possible to do this at that point, but it looks like
> > > > finish_options takes two option structs and thus we should be able to
> > > > call it.
> > > > 
> > > 
> > > With what parameters? Patch below tries with same option struct, but ...
> > > 
> > > > Do you get the inform note?  I suppose we don't really want that, no?
> > > > 
> > > 
> > > ... I think that way we'll get the inform note (while the previous
> > > solution
> > > did not).
> > > 
> > > I've also tried with a tmp2 memset to 0, but that ran into problems when
> > > doing
> > > a maybe_set_param_value.
> > 
> > Use global_options_set?  That should do what the other patch did.
> > 
> 
> I managed to get it working now.  The variable tmp was only partly
> initialized, which caused the problems when calling maybe_set_param_value. I'm
> now using init_options_struct.
> 
> There's no note when using -O2 or "-O2 -freorder-blocks-and-partition".
> 
> But when I do "-O2 -foffload=-freorder-blocks-and-partition" I get:
> ...
> lto1: note: '-freorder-blocks-and-partition' does not work on this
> architecture
> lto1: note: '-freorder-blocks-and-partition' does not support unwind info on
> this architecture
> ...
> 
> And for "-O0 -foffload=-freorder-blocks-and-partition" I just get:
> ...
> lto1: note: '-freorder-blocks-and-partition' does not work on this
> architecture
> ...

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 1/2] simplify-rtx: The truncation of an IOR can have all bits set (PR81423)

2017-07-25 Thread Segher Boessenkool
On Mon, Jul 24, 2017 at 04:06:39PM -0600, Jeff Law wrote:
> > 2017-07-24  Segher Boessenkool  
> > 
> > gcc/testsuite/
> > PR rtl-optimization/81423
> > * gcc.c-torture/execute/pr81423.c: New testcase.
> I think int32plus just indicates ints are at least 32 bits. But a long
> or long long could still be just 32 bits.  so int32plus && long_neq_int,
> to ensure that long/long long are 64 bits?

Well, long long is required to be 64 bits or more by the C standard.
But some GCC targets do not follow that, with certain options at least.

It looks like that test actually requires long long to be *exactly*
64 bits.  I'll modify the test to test for that.


Segher


[PATCH] Fix PR81410 testcase

2017-07-25 Thread Richard Biener

Rainer reports uint64_t clashes, so avoid it.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-07-25  Richard Biener  

PR tree-optimization/81410
* gcc.dg/vect/pr81410.c: Do not typedef uint64_t.

Index: gcc/testsuite/gcc.dg/vect/pr81410.c
===
--- gcc/testsuite/gcc.dg/vect/pr81410.c (revision 250505)
+++ gcc/testsuite/gcc.dg/vect/pr81410.c (working copy)
@@ -3,10 +3,9 @@
 
 #include "tree-vect.h"
 
-typedef long long uint64_t;
-uint64_t x[24];
-uint64_t y[16];
-uint64_t z[8];
+long long x[24];
+long long y[16];
+long long z[8];
 
 void __attribute__((noinline)) foo()
 {


Re: [PATCH 1/2] simplify-rtx: The truncation of an IOR can have all bits set (PR81423)

2017-07-25 Thread Kyrill Tkachov


On 24/07/17 09:50, Segher Boessenkool wrote:

On Wed, Jul 19, 2017 at 12:19:32AM -0600, Jeff Law wrote:

On 07/18/2017 01:36 PM, Segher Boessenkool wrote:

* simplify-rtx.c (simplify_truncation): Handle truncating an IOR
with a constant that is -1 in the truncated to mode.

OK.  A testcase would be advisable :-)

jeff

Like this.  Is this okay for trunk?  (Is int32plus the correct test
to use here?)


We sometimes use the __mode__ attribute to force certain sizes in C types.
For example: typedef int ditype __attribute__ ((mode (DI)));
Maybe you can do this to force the right sizes. I don't know if there are any 
targets
that don't support DImode ops though :)

Kyrill



Segher


2017-07-24  Segher Boessenkool  

gcc/testsuite/
PR rtl-optimization/81423
* gcc.c-torture/execute/pr81423.c: New testcase.

---
  gcc/testsuite/gcc.c-torture/execute/pr81423.c | 30 +++
  1 file changed, 30 insertions(+)
  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr81423.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr81423.c 
b/gcc/testsuite/gcc.c-torture/execute/pr81423.c
new file mode 100644
index 000..75c3518
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr81423.c
@@ -0,0 +1,30 @@
+/* { dg-require-effective-target int32plus } */
+
+extern void abort (void);
+
+unsigned long long int ll = 0;
+unsigned long long int ull1 = 1ULL;
+unsigned long long int ull2 = 12008284144813806346ULL;
+unsigned long long int ull3;
+
+void
+foo (void)
+{
+  ll = -5597998501375493990LL;
+
+  ll = (5677365550390624949L - ll) - (ull1 > 0);
+  ull3 = (unsigned int)
+(2067854353L <<
+ (((ll + -2129105131L) ^ 10280750144413668236ULL) -
+  10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2)
+   - 12098357307243495419ULL);
+}
+
+int
+main (void)
+{
+  foo ();
+  if (ull3 != 3998784)
+abort ();
+  return 0;
+}




GCC 7.2 Status report (2017-07-25)

2017-07-25 Thread Richard Biener

Status
==

It's time to do a GCC 7.2 release and thus please check if you have
backports for regression or wrong-code bugs pending.  The plan is to
do GCC 7.2 RC1 mid next week and a release roughly a week after that.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  141   +  45
P33
P4  156   +   9
P5   28   -   1
---   ---
Total P1-P3 144   +  45
Total   328   +  53


Previous Report
===

https://gcc.gnu.org/ml/gcc/2017-04/msg00080.html


Re: [PATCH][AArch64] vec_pack_trunc_ should split after register allocator

2017-07-25 Thread Hurugalawadi, Naveen
Hi,

>> I haven't been clear in what I was asking for

Sorry. We understood right with the first comment but the second
part confused us a bit :).

>> Could you switch this back to an insn_and_split as it was in the previous
>> patch, and just drop the && reload_completed ?

Done.

Bootstrapped and Regression done on AArch64-Thunder-Linux.
Please review the patch and let me know if its okay?

Thanks,
Naveen   

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 1cb6eeb..0011040 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1291,6 +1291,18 @@
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
+(define_insn "aarch64_simd_vec_pack_trunc_hi_"
+ [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+	 (truncate: (match_operand:VQN 1 "register_operand" "w"))
+	 (vec_select:
+	   (match_operand: 3 "register_operand" "0")
+	   (match_operand: 2 "vect_par_cnst_hi_half" ""]
+ "TARGET_SIMD"
+ "xtn2\\t%0., %1."
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
+
 (define_expand "vec_pack_trunc_"
  [(match_operand: 0 "register_operand" "")
   (match_operand:VDN 1 "register_operand" "")
@@ -1309,17 +1321,41 @@
 
 ;; For quads.
 
-(define_insn "vec_pack_trunc_"
+(define_insn_and_split "vec_pack_trunc_"
  [(set (match_operand: 0 "register_operand" "=&w")
(vec_concat:
 	 (truncate: (match_operand:VQN 1 "register_operand" "w"))
 	 (truncate: (match_operand:VQN 2 "register_operand" "w"]
  "TARGET_SIMD"
+ "#"
+ ""
+ [(const_int 0)]
  {
if (BYTES_BIG_ENDIAN)
- return "xtn\\t%0., %2.\;xtn2\\t%0., %1.";
+ {
+   rtx low_part = gen_lowpart (mode, operands[0]);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
+			  operands[2]));
+   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
+			true);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
+			 operands[1],
+			 high_part,
+			 operands[0]));
+ }
else
- return "xtn\\t%0., %1.\;xtn2\\t%0., %2.";
+ {
+   rtx low_part = gen_lowpart (mode, operands[0]);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_ (low_part,
+			  operands[1]));
+   rtx high_part = aarch64_simd_vect_par_cnst_half (mode,
+			true);
+   emit_insn (gen_aarch64_simd_vec_pack_trunc_hi_ (operands[0],
+			 operands[2],
+			 high_part,
+			 operands[0]));
+ }
+   DONE;
  }
   [(set_attr "type" "multiple")
(set_attr "length" "8")]


[patch,committed,v7] Backport PR75964

2017-07-25 Thread Georg-Johann Lay

Backported to v7:

https://gcc.gnu.org/r250509

Johann

gcc/
Backport from 2017-05-06 trunk r247719.
PR rtl-optimization/75964
* simplify-rtx.c (simplify_const_relational_operation): Remove
invalid handling of comparisons of integer ABS.

gcc/testsuite/
Backport from 2017-05-06 trunk r247719.
PR rtl-optimization/75964
* gcc.dg/torture/pr75964.c: New test.


Index: testsuite/gcc.dg/torture/pr75964.c
===
--- testsuite/gcc.dg/torture/pr75964.c	(nonexistent)
+++ testsuite/gcc.dg/torture/pr75964.c	(revision 250509)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+typedef __UINT8_TYPE__ uint8_t;
+
+uint8_t __attribute__ ((noinline, noclone))
+abs8 (uint8_t x)
+{
+  if (x & 0x80)
+x = -x;
+
+  if (x & 0x80)
+x = 0x7f;
+
+  return x;
+}
+
+int
+main (void)
+{
+  if (abs8 (0) != 0
+  || abs8 (1) != 1
+  || abs8 (127) != 127
+  || abs8 (128) != 127
+  || abs8 (129) != 127
+  || abs8 (255) != 1)
+__builtin_abort ();
+  return 0;
+}
Index: simplify-rtx.c
===
--- simplify-rtx.c	(revision 250508)
+++ simplify-rtx.c	(revision 250509)
@@ -5314,34 +5314,14 @@ simplify_const_relational_operation (enu
 	{
 	case LT:
 	  /* Optimize abs(x) < 0.0.  */
-	  if (!HONOR_SNANS (mode)
-	  && (!INTEGRAL_MODE_P (mode)
-		  || (!flag_wrapv && !flag_trapv && flag_strict_overflow)))
-	{
-	  if (INTEGRAL_MODE_P (mode)
-		  && (issue_strict_overflow_warning
-		  (WARN_STRICT_OVERFLOW_CONDITIONAL)))
-		warning (OPT_Wstrict_overflow,
-			 ("assuming signed overflow does not occur when "
-			  "assuming abs (x) < 0 is false"));
-	   return const0_rtx;
-	}
+	  if (!INTEGRAL_MODE_P (mode) && !HONOR_SNANS (mode))
+	return const0_rtx;
 	  break;
 
 	case GE:
 	  /* Optimize abs(x) >= 0.0.  */
-	  if (!HONOR_NANS (mode)
-	  && (!INTEGRAL_MODE_P (mode)
-		  || (!flag_wrapv && !flag_trapv && flag_strict_overflow)))
-	{
-	  if (INTEGRAL_MODE_P (mode)
-	  && (issue_strict_overflow_warning
-		  (WARN_STRICT_OVERFLOW_CONDITIONAL)))
-	warning (OPT_Wstrict_overflow,
-			 ("assuming signed overflow does not occur when "
-			  "assuming abs (x) >= 0 is true"));
-	  return const_true_rtx;
-	}
+	  if (!INTEGRAL_MODE_P (mode) && !HONOR_NANS (mode))
+	return const_true_rtx;
 	  break;
 
 	case UNGE:


Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-25 Thread Richard Biener
On Mon, Jul 24, 2017 at 4:27 PM, Bin.Cheng  wrote:
> Ping^1.

Ok.

Thanks and sorry for the delay.
Richard.

> Thanks,
> bin
>
> On Mon, Jul 10, 2017 at 9:23 AM, Bin.Cheng  wrote:
>> On Tue, Jul 4, 2017 at 1:29 PM, Richard Biener
>>  wrote:
>>> On Tue, Jul 4, 2017 at 2:06 PM, Bin.Cheng  wrote:
 On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener
  wrote:
> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng  wrote:
>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>>  wrote:
>>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
 Hi,
 For the moment, tree-predcom.c only supports 
 invariant/load-loads/store-loads chains.
 This patch generalizes dead store elimination (or store motion) across 
 loop iterations in
 predictive commoning pass by supporting store-store chain.  As comment 
 in the patch:

Apart from predictive commoning on Load-Load and Store-Load chains, 
 we
also support Store-Store chains -- stores killed by other store can 
 be
eliminated.  Given below example:

  for (i = 0; i < n; i++)
{
  a[i] = 1;
  a[i+2] = 2;
}

It can be replaced with:

  t0 = a[0];
  t1 = a[1];
  for (i = 0; i < n; i++)
{
  a[i] = 1;
  t2 = 2;
  t0 = t1;
  t1 = t2;
}
  a[n] = t0;
  a[n+1] = t1;

If the loop runs more than 1 iterations, it can be further 
 simplified into:

  for (i = 0; i < n; i++)
{
  a[i] = 1;
}
  a[n] = 2;
  a[n+1] = 2;

The interesting part is this can be viewed either as general store 
 motion
or general dead store elimination in either intra/inter-iterations 
 way.

 There are number of interesting facts about this enhancement:
 a) This patch supports dead store elimination for both 
 across-iteration case and single-iteration
  case.  For the latter, it is dead store elimination.
 b) There are advantages supporting dead store elimination in predcom, 
 for example, it has
  complete information about memory address.  On the contrary, DSE 
 pass can only handle
  memory references with exact the same memory address expression.
 c) It's cheap to support store-stores chain in predcom based on 
 existing code.
 d) As commented, the enhancement can be viewed as either generalized 
 dead store elimination
  or generalized store motion.  I prefer DSE here.

 Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>>
>>> Looks mostly ok.  I have a few questions though.
>>>
>>> +  /* Don't do store elimination if loop has multiple exit edges.  */
>>> +  bool eliminate_store_p = single_exit (loop) != NULL;
>>>
>>> handling this would be an enhancement?  IIRC LIM store-motion handles 
>>> this
>>> just fine by emitting code on all exits.
>> It is an enhancement with a little bit more complication.  We would
>> need to setup/record finalizer memory references for different exit
>> edges.  I added TODO description for this (and following one).  Is it
>> okay to pick up this in the future?
>
> Yes.
>
>>>
>>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
>>>  {
>>>if (chain->type == CT_INVARIANT)
>>> continue;
>>> +  /* Don't unroll when eliminating stores.  */
>>> +  else if (chain->type == CT_STORE_STORE)
>>> +   return 1;
>>>
>>> this is a hard exit value so we do not handle the case where another 
>>> chain
>>> in the loop would want to unroll? (enhancement?)  I'd have expected to 
>>> do
>>> the same as for CT_INVARIANT here.
>> I didn't check what change is needed in case of unrolling.  I am not
>> very sure if we should prefer unroll for *load chains or prefer not
>> unroll for store-store chains, because unroll in general increases
>> loop-carried register pressure for store-store chains rather than
>> decreases register pressure for *load chains.
>> I was also thinking if it's possible to restrict unrolling somehow in
>> order to enable predcom at O2.  BTW, this is not common, it only
>> happens once in spec2k6 with factor forced to 1.  So okay if as it is
>> now?
>
> I think it is ok for now with a TODO added.  Please change the comment
> to "we can't handle unrolling when eliminating stores" (it's not clear if 
> we
> can -- did you try?  maybe add a testcase that wou

Re: [PATCH] Emit DWARF5 DW_AT_export_symbols for namespaces

2017-07-25 Thread Marek Polacek
On Fri, Jul 21, 2017 at 10:33:12PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> DWARF5 introduced DW_AT_export_symbols that may be preset on
> DW_TAG_namespace or DW_TAG_{structure,union,class}_type to signalize
> inline or anonymous namespaces or anonymous structures/unions/classes.
> 
> What we were emitting instead is an implicit DW_TAG_imported_module
> in the outer namespace.
> 
> The following patch changes nothing for -gdwarf-4 and below with
> -gstrict-dwarf, for -gdwarf-4 and below -gno-strict-dwarf it
> just adds DW_AT_export_symbols to inline namespaces (so that interested
> consumers can find out it is inline namespace, but consumers not knowing
> about DW_AT_export_symbols still have the implicit DW_TAG_imported_module).
> In that mode, no changes for anonymous namespaces, because those
> are already easily detectable by the consumers (missing DW_AT_name).
> For -gdwarf-5 it emits DW_AT_export_symbols on both inline namespaces
> and anonymous namespaces and doesn't emit the implicit
> DW_TAG_imported_module, which is shorter.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> This patch doesn't do anything about anon struct/union/class, I've tried
> to handle it, the problem is that ANON_AGGR_TYPE_P flag is set too late,
> after the debug hook adds the type DIE.  Any thoughts on how to handle that?
> And I wonder what is the counterpart of ANON_AGGR_TYPE_P in the C FE, CCing
> Marek on that.

I think you need to check RECORD_OR_UNION_TYPE_P && DECL_NAME == NULL_TREE.
Or do it in grokfield -- would that work?

Marek


[PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber
Prominent LEON2 chips like the AT697E and AT697F do not properly support
the FSMULD instruction.  LEON3 targets using the GRFPU-lite floating
point unit do not support the FSMULD in hardware.  Disable this
instruction for for all LEON/LEON3 targets for simplicity.

This leads to a code size increase and performance degradation on
GR712RC and GR740 targets.  The code generation for the fmaf() function
of Newlib changes for example from

 :
   0:   9c 03 bf b0 add  %sp, -80, %sp
   4:   d2 23 a0 4c st  %o1, [ %sp + 0x4c ]
   8:   d1 03 a0 4c ld  [ %sp + 0x4c ], %f8
   c:   d0 23 a0 4c st  %o0, [ %sp + 0x4c ]
  10:   d3 03 a0 4c ld  [ %sp + 0x4c ], %f9
  14:   d4 23 a0 4c st  %o2, [ %sp + 0x4c ]
  18:   81 a2 0d 29 fsmuld  %f8, %f9, %f0
  1c:   d1 03 a0 4c ld  [ %sp + 0x4c ], %f8
  20:   91 a0 19 28 fstod  %f8, %f8
  24:   9c 03 a0 50 add  %sp, 0x50, %sp
  28:   81 a0 08 48 faddd  %f0, %f8, %f0
  2c:   81 c3 e0 08 retl
  30:   81 a0 18 c0 fdtos  %f0, %f0

to

 :
   0:   9c 03 bf b0 add  %sp, -80, %sp
   4:   d0 23 a0 4c st  %o0, [ %sp + 0x4c ]
   8:   d1 03 a0 4c ld  [ %sp + 0x4c ], %f8
   c:   d2 23 a0 4c st  %o1, [ %sp + 0x4c ]
  10:   81 a0 19 28 fstod  %f8, %f0
  14:   d1 03 a0 4c ld  [ %sp + 0x4c ], %f8
  18:   95 a0 19 28 fstod  %f8, %f10
  1c:   d4 23 a0 4c st  %o2, [ %sp + 0x4c ]
  20:   d1 03 a0 4c ld  [ %sp + 0x4c ], %f8
  24:   81 a0 09 4a fmuld  %f0, %f10, %f0
  28:   91 a0 19 28 fstod  %f8, %f8
  2c:   9c 03 a0 50 add  %sp, 0x50, %sp
  30:   81 a0 08 48 faddd  %f0, %f8, %f0
  34:   81 c3 e0 08 retl
  38:   81 a0 18 c0 fdtos  %f0, %f0

gcc/
* config/sparc/sparc.md (muldf3_extend): Disable for LEON/LEON3
targets.
---
 gcc/config/sparc/sparc.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index b154003c54a..37b4b6e6884 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -6121,7 +6121,7 @@ visl")
   [(set (match_operand:DF 0 "register_operand" "=e")
(mult:DF (float_extend:DF (match_operand:SF 1 "register_operand" "f"))
 (float_extend:DF (match_operand:SF 2 "register_operand" 
"f"]
-  "(TARGET_V8 || TARGET_V9) && TARGET_FPU && !sparc_fix_ut699"
+  "(TARGET_V8 || TARGET_V9) && TARGET_FPU && !sparc_fix_ut699 && !TARGET_LEON 
&& !TARGET_LEON3"
   "fsmuld\t%1, %2, %0"
   [(set_attr "type" "fpmul")
(set_attr "fptype" "double")])
-- 
2.12.3



Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant

2017-07-25 Thread Richard Biener
On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng  wrote:
> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng  wrote:
>> Hi,
>> This is a followup patch better handling below case:
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>  a[i+2] = 2;
>>}
>> Instead of generating root variables by loading from memory and propagating 
>> with PHI
>> nodes, like:
>>  t0 = a[0];
>>  t1 = a[1];
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>  t2 = 2;
>>  t0 = t1;
>>  t1 = t2;
>>}
>>  a[n] = t0;
>>  a[n+1] = t1;
>> We can simply store loop invariant values after loop body if we know loop 
>> iterates more
>> than chain->length times, like:
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>}
>>  a[n] = 2;
>>  a[n+1] = 2;
>>
>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
> Update patch wrto changes in previous patch.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

+  if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
+   continue;

Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
FIXED_CST would be ok as well for example.

Ok with that change.  Did we eventually optimize this in followup
passes previously?

Richard.

> Thanks,
> bin
>>
>> Thanks,
>> bin
>> 2017-06-21  Bin Cheng  
>>
>> * tree-predcom.c: (struct chain): Handle store-store chain in which
>> stores for elimination only store loop invariant values.
>> (execute_pred_commoning_chain): Ditto.
>> (prepare_initializers_chain_store_elim): Ditto.
>> (prepare_finalizers): Ditto.
>> (is_inv_store_elimination_chain): New function.
>> (initialize_root_vars_store_elim_1): New function.


[C, committed] Minor cleanup in grokfield

2017-07-25 Thread Marek Polacek
No need to use a dedicated variable if we can use RECORD_OR_UNION_TYPE_P
directly and only once.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2017-07-25  Marek Polacek  

* c-decl.c (grokfield): Remove local variable.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 50da185e363..12fbc18bb94 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -7559,10 +7559,9 @@ grokfield (location_t loc,
 that took root before someone noticed the bug...  */
 
   tree type = declspecs->type;
-  bool type_ok = RECORD_OR_UNION_TYPE_P (type);
   bool ok = false;
 
-  if (type_ok
+  if (RECORD_OR_UNION_TYPE_P (type)
  && (flag_ms_extensions
  || flag_plan9_extensions
  || !declspecs->typedef_p))

Marek


Re: [RFC] Remaining references of Java

2017-07-25 Thread Richard Biener
On Tue, Jul 11, 2017 at 4:35 PM, Martin Liška  wrote:
> Hi.
>
> There's a small follow-up with remaining occurrences:
>
> 1) dwarf2out.c:
>
>  20213  origin_die = lookup_type_die (origin);
>  20214else if (TREE_CODE (origin) == BLOCK)
>  20215  origin_die = BLOCK_DIE (origin);
>  20216
>  20217/* XXX: Functions that are never lowered don't always have correct
> block
>  20218   trees (in the case of java, they simply have no block tree, in
> some other
>  20219   languages).  For these functions, there is nothing we can
> really do to
>  20220   output correct debug info for inlined functions in all cases.
> Rather
>  20221   than die, we'll just produce deficient debug info now, in that
> we will
>  20222   have variables without a proper abstract origin.  In the
> future, when all
>  20223   functions are lowered, we should re-add a gcc_assert
> (origin_die)
>
> Probably Jakub can help with that?

Can you check whether all-language bootstrap passes with the suggested
gcc_assert?

> 2) fold-const.c:
>
>   1882/* The following code implements the floating point to integer
>   1883   conversion rules required by the Java Language Specification,
>   1884   that IEEE NaNs are mapped to zero and values that overflow
>   1885   the target precision saturate, i.e. values greater than
>   1886   INT_MAX are mapped to INT_MAX, and values less than INT_MIN
>   1887   are mapped to INT_MIN.  These semantics are allowed by the
>   1888   C and C++ standards that simply state that the behavior of
>   1889   FP-to-integer conversion is unspecified upon overflow.  */
>   1890
>   1891wide_int val;
>   1892REAL_VALUE_TYPE r;
>   1893REAL_VALUE_TYPE x = TREE_REAL_CST (arg1);
>
> Can we somehow remove that Richi?

Remove what?  The comment?  It still matches the code and we have to
do sth.  So I don't see why we should change this.

Maybe Joseph knows whether future standards / TRs recommend sth different
or exactly this.

>
> 3) gimplify.c:
>
>   2771   Java requires that we elaborated nodes in source order.  That
>   2772   means we must gimplify the inner expression followed by each of
>   2773   the indices, in order.  But we can't gimplify the inner
>   2774   expression until we deal with any variable bounds, sizes, or
>   2775   positions in order to deal with PLACEHOLDER_EXPRs.
>   2776
>   2777   So we do this in three steps.  First we deal with the
> annotations
>   2778   for any variables in the components, then we gimplify the base,
>   2779   then we gimplify any indices, from left to right.  */
>   2780for (i = expr_stack.length () - 1; i >= 0; i--)
>
> Richi?

Just change the comment to "We elaborate nodes in source order. [...]"

> 4) tree.c:
>
>  13535if (RECORD_OR_UNION_TYPE_P (t) && TYPE_BINFO (t) && TYPE_BINFO
> (tv)
>  13536&& TYPE_BINFO (t) != TYPE_BINFO (tv)
>  13537/* FIXME: Java sometimes keep dump TYPE_BINFOs on variant
> types.
>  13538   Since there is no cheap way to tell C++/Java type w/o LTO,
> do checking
>  13539   at LTO time only.  */
>  13540&& (in_lto_p && odr_type_p (t)))
>  13541  {
>  13542error ("type variant has different TYPE_BINFO");
>  13543debug_tree (tv);
>  13544error ("type variant's TYPE_BINFO");
>  13545debug_tree (TYPE_BINFO (tv));
>  13546error ("type's TYPE_BINFO");
>  13547debug_tree (TYPE_BINFO (t));
>  13548return false;
>
> Can we Honza remove that?

Try it? (remove in_lto_p &&)

> Thanks,
> Martin


[patch,committed,v6] Backport PR75964

2017-07-25 Thread Georg-Johann Lay

...also backported to gcc-6-branch:

https://gcc.gnu.org/r250511

Johann

gcc/
 Backport from 2017-05-06 trunk r247719.
 PR rtl-optimization/75964
 * simplify-rtx.c (simplify_const_relational_operation): Remove
 invalid handling of comparisons of integer ABS.

gcc/testsuite/
 Backport from 2017-05-06 trunk r247719.
 PR rtl-optimization/75964
 * gcc.dg/torture/pr75964.c: New test.
Index: simplify-rtx.c
===
--- simplify-rtx.c	(revision 250510)
+++ simplify-rtx.c	(working copy)
@@ -5235,34 +5235,14 @@ simplify_const_relational_operation (enu
 	{
 	case LT:
 	  /* Optimize abs(x) < 0.0.  */
-	  if (!HONOR_SNANS (mode)
-	  && (!INTEGRAL_MODE_P (mode)
-		  || (!flag_wrapv && !flag_trapv && flag_strict_overflow)))
-	{
-	  if (INTEGRAL_MODE_P (mode)
-		  && (issue_strict_overflow_warning
-		  (WARN_STRICT_OVERFLOW_CONDITIONAL)))
-		warning (OPT_Wstrict_overflow,
-			 ("assuming signed overflow does not occur when "
-			  "assuming abs (x) < 0 is false"));
-	   return const0_rtx;
-	}
+	  if (!INTEGRAL_MODE_P (mode) && !HONOR_SNANS (mode))
+	return const0_rtx;
 	  break;
 
 	case GE:
 	  /* Optimize abs(x) >= 0.0.  */
-	  if (!HONOR_NANS (mode)
-	  && (!INTEGRAL_MODE_P (mode)
-		  || (!flag_wrapv && !flag_trapv && flag_strict_overflow)))
-	{
-	  if (INTEGRAL_MODE_P (mode)
-	  && (issue_strict_overflow_warning
-		  (WARN_STRICT_OVERFLOW_CONDITIONAL)))
-	warning (OPT_Wstrict_overflow,
-			 ("assuming signed overflow does not occur when "
-			  "assuming abs (x) >= 0 is true"));
-	  return const_true_rtx;
-	}
+	  if (!INTEGRAL_MODE_P (mode) && !HONOR_NANS (mode))
+	return const_true_rtx;
 	  break;
 
 	case UNGE:
Index: testsuite/gcc.dg/torture/pr75964.c
===
--- testsuite/gcc.dg/torture/pr75964.c	(nonexistent)
+++ testsuite/gcc.dg/torture/pr75964.c	(working copy)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+typedef __UINT8_TYPE__ uint8_t;
+
+uint8_t __attribute__ ((noinline, noclone))
+abs8 (uint8_t x)
+{
+  if (x & 0x80)
+x = -x;
+
+  if (x & 0x80)
+x = 0x7f;
+
+  return x;
+}
+
+int
+main (void)
+{
+  if (abs8 (0) != 0
+  || abs8 (1) != 1
+  || abs8 (127) != 127
+  || abs8 (128) != 127
+  || abs8 (129) != 127
+  || abs8 (255) != 1)
+__builtin_abort ();
+  return 0;
+}


Re: [PATCH v2] New C++ warning option '-Waccess-specifiers'

2017-07-25 Thread Marek Polacek
On Sun, Jul 23, 2017 at 11:22:14PM +0200, Volker Reichelt wrote:
[...]

Not sure if the warning is too useful, but in any case...

> +  /* Emit warning.  */
> +  gcc_rich_location richloc (token->location);
> +  richloc.add_fixit_remove ();
> +  if (colon_token->type == CPP_COLON)
> +richloc.add_fixit_remove (colon_token->location);
> +
> +  switch (message_id)
> +{
> +case 1:
> +  warning_at_rich_loc (&richloc, OPT_Waccess_specifiers_,
> +"redundant %qE access-specifier",
> +token->u.value);
> +  inform (next_token->location, "directly followed by another one here");
> +  break;
> +
> +case 2:
> +  warning_at_rich_loc (&richloc, OPT_Waccess_specifiers_,
> +"duplicate %qE access-specifier",
> +token->u.value);
> +  inform (current_access_specifier_loc,
> +   "same access-specifier was previously given here");
> +  break;

...you should only call inform if warning_at_rich_loc returned true.

Marek


Re: [PATCH] Emit DWARF5 DW_AT_export_symbols for namespaces

2017-07-25 Thread Jakub Jelinek
On Tue, Jul 25, 2017 at 01:45:31PM +0200, Marek Polacek wrote:
> > This patch doesn't do anything about anon struct/union/class, I've tried
> > to handle it, the problem is that ANON_AGGR_TYPE_P flag is set too late,
> > after the debug hook adds the type DIE.  Any thoughts on how to handle that?
> > And I wonder what is the counterpart of ANON_AGGR_TYPE_P in the C FE, CCing
> > Marek on that.
> 
> I think you need to check RECORD_OR_UNION_TYPE_P && DECL_NAME == NULL_TREE.
> Or do it in grokfield -- would that work?

One thing is that the information needs to be propagated from the FE to
dwarf2out somehow.

And the second thing is that it isn't really clear to me when exactly do we
want to use DW_AT_export_symbols on types with no name.

Consider:

struct { int a, b; } c;
typedef struct { int d, e; } f;
struct S {
  f h;
#ifdef MS_EXTENSIONS
  f;
#endif
  struct { int i, j; };
  struct { int k, l; } m;
} g;

with -g -dA {,-DMS_EXTENSIONS -fms-extensions}
(or -fplan9-extensions).
For the struct with i/j fields we clearly want the DW_AT_export_symbols
on it, for the struct with k/l fields I think we don't, the fields of that
structure aren't exported into the outer namespace.
And then there is the case of the misdesigned Plan9 and Microsoft
extensions, the same type then can be used once as not really exporting
stuff out of it and another time as doing that.

Guess my preference would be to only mark i+j struct here, i.e. struct/union
with DECL_NAME NULL with DECL_CONTEXT of some aggregate where there are no
named fields with that type in the aggregate.
It might be a good idea to change DWARF6 so that DW_AT_export_symbols is
also allowed on DW_TAG_member and emit it that way for the f; case above.

Another issue is that DWARF5 D.2.5 lists the anonymous structure nested
as child of the outer structure, so just for:
struct T { struct { int n; }; };
DW_TAG_structure_type
  DW_AT_name T
  DW_TAG_structure_type
DW_AT_export_symbols
DW_TAG_member
  DW_AT_name n
but what GCC emits is:
1: DW_TAG_structure_type
  DW_TAG_member
DW_AT_name n
DW_TAG_structure_type
  DW_AT_name T
DW_TAG_member
  DW_AT_type 1b

Maybe at least for -gdwarf-5 we should start moving stuff into the right
context.

Jakub


Re: [COMMITED][AArch64] Fix PR79041

2017-07-25 Thread Wilco Dijkstra
James Greenhalgh wrote:
>
> OK. Either like this, or with the conditions swapped around as Yvan
> suggested to make backporting easier.

I swapped the conditions around, not sure whether it helps...
Also I needed an additional scan-assembler, this was committed to
trunk and GCC7:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
d7fab3775c06477365217e357b5754026d09752b..7713d543af788d47dcdf8ea89482a2d4f66a1344
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10181,7 +10181,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
  /* This is alright even in PIC code as the constant
 pool reference is always PC relative and within
 the same translation unit.  */
- if (CONSTANT_POOL_ADDRESS_P (x))
+ if (!aarch64_pcrelative_literal_loads && CONSTANT_POOL_ADDRESS_P (x))
return SYMBOL_SMALL_ABSOLUTE;
  else
return SYMBOL_FORCE_TO_MEM;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
new file mode 100644
index 
..cd34fbab85a92d00cba7091d4146deaaf3a862a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
+
+__int128
+t (void)
+{
+  return (__int128)1 << 80;
+}
+
+/* { dg-final { scan-assembler "adr" } } */
+/* { dg-final { scan-assembler-not "adrp" } } */

Re: [PATCH] Fix infinite recursion with div-by-zero (PR middle-end/70992)

2017-07-25 Thread Marek Polacek
On Thu, Jul 20, 2017 at 08:22:59PM +0200, Richard Biener wrote:
> No, we shouldn't.  Maybe trapping ops shouldn't be marked TREE_CONSTANT?
> (Make sure to test with Ada...)

That works for both testcases, but I can't say I really like the idea of
modifying build2... but it's where the TREE_CONSTANT comes from.

The tree_could_trap_p bit is because the C++ FE can pass "< x" around, e.g.
a LT_EXPR with the first operand null.

Bootstrapped/regtested on x86_64-linux (Ada included) and 
powerpc64le-unknown-linux-gnu,
ok for trunk?

2017-07-25  Marek Polacek  

PR c/70992
* tree-eh.c (tree_could_trap_p): Check if the first operand is null.
* tree.c: Include "tree-eh.h".
(build2_stat): Only set TREE_CONSTANT if the tree couldn't trap.

* gcc.dg/overflow-warn-1.c: Adjust dg-error.
* gcc.dg/overflow-warn-2.c: Likewise.
* gcc.dg/overflow-warn-3.c: Likewise.
* gcc.dg/overflow-warn-4.c: Likewise.
* gcc.dg/torture/pr70992-2.c: New test.
* gcc.dg/torture/pr70992.c: New test.

diff --git gcc/testsuite/gcc.dg/overflow-warn-1.c 
gcc/testsuite/gcc.dg/overflow-warn-1.c
index 8eb322579cf..a5cd5738636 100644
--- gcc/testsuite/gcc.dg/overflow-warn-1.c
+++ gcc/testsuite/gcc.dg/overflow-warn-1.c
@@ -49,7 +49,7 @@ static int sc = INT_MAX + 1; /* { dg-warning "25:integer 
overflow in expression"
 void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer overflow in expression" 
} */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-1 } */
 void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
-/* { dg-error "initializer element is not computable at load time" "constant" 
{ target *-*-* } .-1 } */
+/* { dg-error "initializer element is not constant" "constant" { target *-*-* 
} .-1 } */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *r = (1 ? 0 : INT_MAX+1);
 
diff --git gcc/testsuite/gcc.dg/overflow-warn-2.c 
gcc/testsuite/gcc.dg/overflow-warn-2.c
index f048d6dae2a..05ab104fa4a 100644
--- gcc/testsuite/gcc.dg/overflow-warn-2.c
+++ gcc/testsuite/gcc.dg/overflow-warn-2.c
@@ -49,7 +49,7 @@ static int sc = INT_MAX + 1; /* { dg-warning "integer 
overflow in expression" }
 void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer overflow in expression" 
} */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-1 } */
 void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
-/* { dg-error "initializer element is not computable at load time" "constant" 
{ target *-*-* } .-1 } */
+/* { dg-error "initializer element is not constant" "constant" { target *-*-* 
} .-1 } */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *r = (1 ? 0 : INT_MAX+1);
 
diff --git gcc/testsuite/gcc.dg/overflow-warn-3.c 
gcc/testsuite/gcc.dg/overflow-warn-3.c
index 664011e401d..fd4a34f67e2 100644
--- gcc/testsuite/gcc.dg/overflow-warn-3.c
+++ gcc/testsuite/gcc.dg/overflow-warn-3.c
@@ -55,7 +55,7 @@ void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer 
overflow in expression" }
 /* { dg-warning "overflow in constant expression" "constant" { target *-*-* } 
.-1 } */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
-/* { dg-error "initializer element is not computable at load time" "constant" 
{ target *-*-* } .-1 } */
+/* { dg-error "initializer element is not constant" "constant" { target *-*-* 
} .-1 } */
 /* { dg-warning "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *r = (1 ? 0 : INT_MAX+1);
 
diff --git gcc/testsuite/gcc.dg/overflow-warn-4.c 
gcc/testsuite/gcc.dg/overflow-warn-4.c
index 52677ce897a..018e3e1e4cd 100644
--- gcc/testsuite/gcc.dg/overflow-warn-4.c
+++ gcc/testsuite/gcc.dg/overflow-warn-4.c
@@ -55,7 +55,7 @@ void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer 
overflow in expression" }
 /* { dg-error "overflow in constant expression" "constant" { target *-*-* } 
.-1 } */
 /* { dg-error "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
-/* { dg-error "initializer element is not computable at load time" "constant" 
{ target *-*-* } .-1 } */
+/* { dg-error "initializer element is not constant" "constant" { target *-*-* 
} .-1 } */
 /* { dg-error "initialization makes pointer from integer without a cast" 
"null" { target *-*-* } .-2 } */
 void *r = (1 ? 0 : INT_MAX+1);
 
diff --git gcc/testsuite/gcc.dg/torture/pr70992-2.c 
gcc/testsuite/gcc.dg/torture/pr70992-2.c
index e69de29bb2d..c5d2c5f2683 100644
--- gcc/testsuite/gcc.dg/torture/pr70992-2.c
+++ gcc/testsuite/gcc.dg/torture/pr70992-2.c
@@ -0,0 +1,9 @@
+/* PR middle-end/70992 */
+/* { dg-do compile }

[PATCH] [PowerPC/RTEMS] Add 64-bit support using ELFv2 ABI

2017-07-25 Thread Sebastian Huber
Add 64-bit support for RTEMS using the ELFv2 ABI with 64-bit long
double.

gcc/
* config.gcc (powerpc-*-rtems*): Remove rs6000/eabi.h.  Add
rs6000/biarch64.h.
* config/rs6000/rtems.h (ASM_DECLARE_FUNCTION_SIZE): New macro.
(ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Likewise.
(CRT_CALL_STATIC_FUNCTION): Likewise.
(ASM_DEFAULT_SPEC): New define.
(ASM_SPEC32): Likewise.
(ASM_SPEC64): Likewise.
(ASM_SPEC_COMMON): Likewise.
(ASM_SPEC): Likewise.
(INVALID_64BIT): Likewise.
(LINK_OS_DEFAULT_SPEC): Likewise.
(LINK_OS_SPEC32): Likewise.
(LINK_OS_SPEC64): Likewise.
(POWERPC_LINUX): Likewise.
(PTRDIFF_TYPE): Likewise.
(RESTORE_FP_PREFIX): Likewise.
(RESTORE_FP_SUFFIX): Likewise.
(SAVE_FP_PREFIX): Likewise.
(SAVE_FP_SUFFIX): Likewise.
(SIZE_TYPE): Likewise.
(SUBSUBTARGET_OVERRIDE_OPTIONS): Likewise.
(TARGET_64BIT): Likewise.
(TARGET_64BIT): Likewise.
(TARGET_AIX): Likewise.
(WCHAR_TYPE_SIZE): Likewise.
(WCHAR_TYPE): Undefine.
(TARGET_OS_CPP_BUILTINS): Add 64-bit PowerPC defines.
(CPP_OS_DEFAULT_SPEC): Use previous CPP_OS_RTEMS_SPEC.
(CPP_OS_RTEMS_SPEC): Delete.
(SUBSUBTARGET_EXTRA_SPECS): Remove cpp_os_rtems.  Add
asm_spec_common, asm_spec32, asm_spec64, link_os_spec32, and
link_os_spec64.
* config/rs6000/t-rtems: Add mcpu=e6500/m64 multilib.

libgcc/
* config/rs6000/ibm-ldouble.c: Disable if defined __rtems__.
---
 gcc/config.gcc |   2 +-
 gcc/config/rs6000/rtems.h  | 197 ++---
 gcc/config/rs6000/t-rtems  |   5 +
 libgcc/config/rs6000/ibm-ldouble.c |   3 +-
 4 files changed, 189 insertions(+), 18 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2ae0218b5fc..aab7f65c1df 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2424,7 +2424,7 @@ powerpc-*-rtems*spe*)
tmake_file="${tmake_file} powerpcspe/t-fprules powerpcspe/t-rtems 
powerpcspe/t-ppccomm"
;;
 powerpc-*-rtems*)
-   tm_file="${tm_file} dbxelf.h elfos.h freebsd-spec.h newlib-stdint.h 
rs6000/sysv4.h rs6000/eabi.h rs6000/rtems.h rtems.h"
+   tm_file="rs6000/biarch64.h ${tm_file} dbxelf.h elfos.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/rtems.h rtems.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="${tmake_file} rs6000/t-fprules rs6000/t-rtems 
rs6000/t-ppccomm"
;;
diff --git a/gcc/config/rs6000/rtems.h b/gcc/config/rs6000/rtems.h
index 1ab4e313fdb..8a62fdcbaf3 100644
--- a/gcc/config/rs6000/rtems.h
+++ b/gcc/config/rs6000/rtems.h
@@ -23,29 +23,163 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
-/* Specify predefined symbols in preprocessor.  */
+/* Copy and paste from linux64.h and freebsd64.h */
+#ifdef IN_LIBGCC2
+#undef TARGET_64BIT
+#ifdef __powerpc64__
+#define TARGET_64BIT 1
+#else
+#define TARGET_64BIT 0
+#endif
+#endif
+
+/* Copy and paste from linux64.h and freebsd64.h */
+#undef TARGET_AIX
+#defineTARGET_AIX TARGET_64BIT
 
 #undef TARGET_OS_CPP_BUILTINS
-#define TARGET_OS_CPP_BUILTINS()  \
-  do  \
-{ \
-  builtin_define_std ("PPC"); \
-  builtin_define ("__rtems__");   \
-  builtin_define ("__USE_INIT_FINI__"); \
-  builtin_assert ("system=rtems");\
-  builtin_assert ("cpu=powerpc"); \
-  builtin_assert ("machine=powerpc"); \
-  TARGET_OS_SYSV_CPP_BUILTINS (); \
-} \
+#define TARGET_OS_CPP_BUILTINS()   \
+  do   \
+{  \
+  builtin_define ("__rtems__");\
+  builtin_define ("__USE_INIT_FINI__");\
+  builtin_assert ("system=rtems"); \
+  if (TARGET_64BIT)\
+   {   \
+ builtin_define ("__PPC__");   \
+ builtin_define ("__PPC64__"); \
+ builtin_define ("__powerpc64__"); \
+ builtin_assert ("cpu=powerpc64"); \
+ builtin_assert ("machine=powerpc64"); \
+   }   \
+  else \
+   {   \
+ builtin_define_std ("PPC");   \
+ builtin_define_std ("powerpc");   \
+ builtin_assert ("cpu=powerpc");   \
+ builtin_assert ("machine=powerpc");   \
+ TARGET_OS_SYSV_CPP_BUILTINS ();   \
+ 

Re: [PATCH #2, cleanup] Remove PowerPC TARGET_UPPER_REGS_{DF,SF} macros

2017-07-25 Thread Segher Boessenkool
Hi Mike,

On Mon, Jul 24, 2017 at 07:40:26PM -0400, Michael Meissner wrote:
> This patch eliminates the TARGET_UPPER_REGS_{DF,SF} macros.  The next patch
> will eliminate TARGET_UPPER_REGS_DI.
> 
> I had to tune the optimization that turned load into FPR register and then 
> move
> to Altivec register (and the store equivalent) because it used
> TARGET_UPPER_REGS_ to protect SFmode on power7.

> --- gcc/config/rs6000/rs6000.md   (revision 250485)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -691,7 +691,7 @@ (define_code_attr SMINMAX [(smin "SM
>  ;;   D-form load to FPR register & move to Altivec register
>  ;;   Move Altivec register to FPR register and store
>  (define_mode_iterator ALTIVEC_DFORM [DF
> -  SF
> +  (SF "TARGET_P8_VECTOR")
>(DI "TARGET_POWERPC64")])

Is that this part?

> --- gcc/config/rs6000/rs6000.c(revision 250485)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -3216,22 +3216,12 @@ rs6000_init_hard_regno_mode_ok (bool glo
>rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS;
>rs6000_constraints[RS6000_CONSTRAINT_wd] = VSX_REGS;   /* V2DFmode  */
>rs6000_constraints[RS6000_CONSTRAINT_wf] = VSX_REGS;   /* V4SFmode  */
> +  rs6000_constraints[RS6000_CONSTRAINT_ws] = VSX_REGS;   /* DFmode  */
> +  rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS;   /* 
> DFmode  */
> +  rs6000_constraints[RS6000_CONSTRAINT_wi] = VSX_REGS;   /* DImode  */

After this all is done you can probably simplify the constraints a bit.
Looking forward to it :-)

> --- gcc/config/rs6000/rs6000.h(revision 250485)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -575,9 +575,11 @@ extern int rs6000_vector_align[];
> and/or SFmode could go in the traditional Altivec registers.  GCC 8.x 
> deleted
> these options.  In order to simplify the code, define the options in terms
> of the base option (vsx, power8-vector).  */
> -#define TARGET_UPPER_REGS_DF TARGET_VSX
> +#if (GCC_VERSION >= 3000)
> +#pragma GCC poison TARGET_UPPER_REGS_DF TARGET_UPPER_REGS_SF
> +#endif

Why poison it?  If someone accidentally slips in a new use it won't
compile anyway.

Okay for trunk with the poison removed.  Thanks!


Segher


Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant

2017-07-25 Thread Bin.Cheng
On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
 wrote:
> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng  wrote:
>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng  wrote:
>>> Hi,
>>> This is a followup patch better handling below case:
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>  a[i+2] = 2;
>>>}
>>> Instead of generating root variables by loading from memory and propagating 
>>> with PHI
>>> nodes, like:
>>>  t0 = a[0];
>>>  t1 = a[1];
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>  t2 = 2;
>>>  t0 = t1;
>>>  t1 = t2;
>>>}
>>>  a[n] = t0;
>>>  a[n+1] = t1;
>>> We can simply store loop invariant values after loop body if we know loop 
>>> iterates more
>>> than chain->length times, like:
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>}
>>>  a[n] = 2;
>>>  a[n+1] = 2;
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>> Update patch wrto changes in previous patch.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> +  if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
> +   continue;
>
> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
> FIXED_CST would be ok as well for example.
>
> Ok with that change.  Did we eventually optimize this in followup
> passes previously?
Probably not?  Given below test:

int a[1], b[1], c[1];
int f(void)
{
  int i, n = 100;
  int t0 = a[0];
  int t1 = a[1];
 for (i = 0; i < n; i++)
   {
 a[i] = 1;
 int t2 = 2;
 t0 = t1;
 t1 = t2;
   }
 a[n] = t0;
 a[n+1] = t1;
  return 0;
}
The optimized dump is as:

   [1.00%] [count: INV]:
  t1_8 = a[1];
  ivtmp.9_17 = (unsigned long) &a;
  _16 = ivtmp.9_17 + 400;

   [99.00%] [count: INV]:
  # t1_20 = PHI <2(3), t1_8(2)>
  # ivtmp.9_2 = PHI 
  _15 = (void *) ivtmp.9_2;
  MEM[base: _15, offset: 0B] = 1;
  ivtmp.9_1 = ivtmp.9_2 + 4;
  if (ivtmp.9_1 != _16)
goto ; [98.99%] [count: INV]
  else
goto ; [1.01%] [count: INV]

   [1.00%] [count: INV]:
  a[100] = t1_20;
  a[101] = 2;
  return 0;

We now eliminate one phi and leave another behind.  It is vrp1/dce2
when the phi is eliminated.

Thanks,
bin


Re: [PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-07-25 Thread Jakub Jelinek
On Tue, Jul 18, 2017 at 10:38:50AM +0200, Martin Liška wrote:
> 2017-06-27  Martin Liska  
> 
> PR sanitize/81186

8 spaces instead of tab?

>   * function.c (expand_function_start): Set parm_birth_insn after
>   static chain is initialized.

I don't like this description, after all, parm_birth_insn was set
after static chain initialization before too (just not right after it
in some cases).  The important change is that you've moved parm_birth_insn
before the nonlocal_goto_save_area setup code, so IMHO the ChangeLog entry
should say that.

As for the patch itself, there are many spots which insert some code
before or after parm_birth_insn or spots tied to the NOTE_INSN_FUNCTION_BEG
note, but I'd hope nothing inserted there can actually call functions that
perform non-local gotos, so I think the patch is fine.  And for debug info
experience which is also related to NOTE_INSN_FUNCTION_BEG, I think the nl
goto save area is nothing that can be seen in the debugger unless you know
where it is, so the only change might be if you put a breakpoint on the end
of prologue (i.e. NOTE_INSN_FUNCTION_BEG) and call from inferios some
function that performs a non-local goto.  I think there are no barriers
on that initialization anyway, so scheduler can move it around.

Thus, ok for trunk/7.2 with the above suggested ChangeLog change.

Jakub


Re: [gomp4] OpenACC async re-work

2017-07-25 Thread Chung-Lin Tang
On 2017/6/29 6:31 AM, Cesar Philippidis wrote:
> On 06/27/2017 03:56 AM, Chung-Lin Tang wrote:
>> On 2017/6/27 6:45 AM, Cesar Philippidis wrote:
 (1) Instead of essentially implementing the entire OpenACC async support
 inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented
 by the plugin, along with core 'test', 'synchronize', 'serialize', etc.
 plugin functions. Most of the OpenACC specific logic is pulled into
 libgomp/oacc-async.c
>>> I'm not sure if plugins need to maintain backwards compatibility.
>>> However, I don't see any changes inside libgomp.map, so maybe it's not
>>> required.
>>
>> This patch is pretty large, but only inner workings (including libgomp vs. 
>> plugin interface) were modified.
>> No user API compatibility was altered.
>>
 (3) For 'wait + async', we now add a local thread synchronize, instead
 of just ordering the streams.

 (4) To work with the (3) change, some front end changes were added to
 propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to
 represent a 'wait all'.
>>> What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been
>>> easier to make that change in the gimplifier?
>>
>> Actually, we were basically throwing away argument-less wait clauses in 
>> front-ends
>> before this patch; i.e. '#pragma acc parallel async' and '#pragma acc 
>> parallel wait async'
>> were internally the same.
>>
>> The use of GOMP_ASYNC_NOVAL (-1) was just following the current 'async' 
>> clause representation
>> convention.
> 
> So then then wait was implied before? Or maybe that's why 'wait async'
> didn't work.

Not really, 'wait async' didn't work because we were not synchronizing
the local thread.

>>> Overall, I like how you were able eliminate the externally managed map_*
>>> data structure which was used to pass in arguments to nvptx_exec.
>>> Although I wonder if we should just pass in those individual arguments
>>> directly to cuLaunchKernel. But that's a big change in itself.
>>
>> I didn't think of that when working on the current patch, maybe later.
> 
> Here's some more comments regarding the code below. One high-level
> comment regarding the usage of async-specific locks. Can't you get by
> with using the global device lock, instead of a special async queue or
> would that cause a deadlock?

I did not try to re-use the device lock here. I think using that might
cause more ordering/nesting problems (gomp mutexes are not pthread mutexes,
no 'try' mechanism here).

The intention of the async queue locks are just very short temporal
protection when looking up a struct goacc_asyncqueue. Also see below comments.

>> +attribute_hidden struct goacc_asyncqueue *
>> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int
> async)
>> +{
>> +  /* The special value acc_async_noval (-1) maps to the thread-specific
>> + default async stream.  */
>> +  if (async == acc_async_noval)
>> +async = thr->default_async;
> 
> Is the default async queue device independent? I thought the default
> async queue is defined in the acc_async_t enum. Maybe set
> 
>  async = acc_async_default

It's now thread dependent and user settable in OpenACC 2.5
(i.e. the acc_set/get_default_async user API routines)


>> +  if (async == acc_async_sync)
>> +return NULL;
>> +
>> +  if (async < 0)
>> +gomp_fatal ("bad async %d", async);
>> +
>> +  struct gomp_device_descr *dev = thr->dev;
>> +
>> +  if (!create
>> +  && (async >= dev->openacc.async.nasyncqueue
>> +  || !dev->openacc.async.asyncqueue[async]))
>> +return NULL;
>> +
>> +  gomp_mutex_lock (&dev->openacc.async.lock);
> Is this lock sufficient? What happens if the device is released?
> 
>> +  if (async >= dev->openacc.async.nasyncqueue)
>> +{
> Not your fault, but I wonder if we would be better off just hard-capping
> the number of async queues. What happens if the user does something like
> wait (1<<30)? That can be addressed later.

Yeah, looks like we should use some sort of map or hash (above some threshold)
instead.

>>  int
>>  acc_async_test (int async)
>>  {
>> @@ -54,15 +129,14 @@ acc_async_test (int async)
>>if (!thr || !thr->dev)
>>  gomp_fatal ("no device active");
>>
>> -  int res = thr->dev->openacc.async_test_func (async);
>> -
>>if (profiling_setup_p)
>>  {
>>thr->prof_info = NULL;
>>thr->api_info = NULL;
>>  }
>> -
>> -  return res;
>> +
>> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
>> +  return thr->dev->openacc.async.test_func (aq);
> I'm not sure how the profling stuff works. Should the profling state be
> state be set after calling thr->dev->openacc.async.test_func?

I've updated the patch to place those profiling code (which seems to be
end of routine cleanups) to the very end.

>> -  void *ret = NULL;
>>if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
>> -ret = thr->dev->openacc.cuda.get_stream_func (async);
>> -
>> -  if (profiling_se

Re: [PATCH] Fix movti_internal with -mavx512bw -mno-avx512dq (PR target/81532)

2017-07-25 Thread Uros Bizjak
On Tue, Jul 25, 2017 at 10:50 AM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, the following testcases ICE if AVX512BW/VL is enabled,
> but AVX512DQ is not.  The problem is that for movti_internal we need to use
> vpextrq or vpinsrq instructions to extract or insert the 64-bit reg from
> above the lowest 64-bit and if it is in %xmm16+ register, we need EVEX
> encoded insn to do that.  Only vpextr[bw] and vpinsr[bw] are in AVX512BW,
> the rest is AVX512DQ.
>
> Yd/Ye constraints are only used in movdi_internal and movti_internal
> patterns, in the former only for !TARGET_64BIT (then ALL_SSE_REGS vs.
> SSE_REGS makes no difference), in the latter is this case I'm trying to fix.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-07-25  Jakub Jelinek  
>
> PR target/81532
> * config/i386/constraints.md (Yd, Ye): Use ALL_SSE_REGS for
> TARGET_AVX512DQ rather than TARGET_AVX512BW.
>
> * gcc.target/i386/pr80833-3.c: New test.
> * gcc.target/i386/avx512dq-pr81532.c: New test.
> * gcc.target/i386/avx512bw-pr81532.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/constraints.md.jj   2017-06-02 09:01:21.0 +0200
> +++ gcc/config/i386/constraints.md  2017-07-24 12:58:06.778105305 +0200
> @@ -138,19 +138,19 @@ (define_register_constraint "Yj"
>
>  (define_register_constraint "Yd"
>   "TARGET_INTER_UNIT_MOVES_TO_VEC
> -  ? (TARGET_AVX512BW
> +  ? (TARGET_AVX512DQ
>   ? ALL_SSE_REGS
>   : (TARGET_SSE4_1 ? SSE_REGS : NO_REGS))
>: NO_REGS"
> - "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for 
> AVX512BW target or any SSE register for SSE4_1 target, when inter-unit moves 
> to vector registers are enabled.")
> + "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for 
> AVX512DQ target or any SSE register for SSE4_1 target, when inter-unit moves 
> to vector registers are enabled.")
>
>  (define_register_constraint "Ye"
>   "TARGET_INTER_UNIT_MOVES_FROM_VEC
> -  ? (TARGET_AVX512BW
> +  ? (TARGET_AVX512DQ
>   ? ALL_SSE_REGS
>   : (TARGET_SSE4_1 ? SSE_REGS : NO_REGS))
>: NO_REGS"
> - "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for 
> AVX512BW target or any SSE register for SSE4_1 target, when inter-unit moves 
> from vector registers are enabled.")
> + "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for 
> AVX512DQ target or any SSE register for SSE4_1 target, when inter-unit moves 
> from vector registers are enabled.")
>
>  (define_register_constraint "Ym"
>   "TARGET_MMX && TARGET_INTER_UNIT_MOVES_TO_VEC ? MMX_REGS : NO_REGS"
> --- gcc/testsuite/gcc.target/i386/pr80833-3.c.jj2017-07-24 
> 13:20:00.061347826 +0200
> +++ gcc/testsuite/gcc.target/i386/pr80833-3.c   2017-07-24 13:23:17.711976351 
> +0200
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -mavx512dq -mavx512vl -mno-avx512bw -mtune=intel" } */
> +
> +__int128 test (__int128 a)
> +{
> +  asm ("" : "+v" (a) : : "xmm0", "xmm1", "xmm2", "xmm3",
> +"xmm4", "xmm5", "xmm6", "xmm7",
> +"xmm8", "xmm9", "xmm10", "xmm11",
> +"xmm12", "xmm13", "xmm14", "xmm15");
> +  return a;
> +}
> +
> +/* { dg-final { scan-assembler "pinsrq" } } */
> +/* { dg-final { scan-assembler "pextrq" } } */
> --- gcc/testsuite/gcc.target/i386/avx512dq-pr81532.c.jj 2017-07-24 
> 13:12:09.165997780 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512dq-pr81532.c2017-07-24 
> 13:13:59.950668550 +0200
> @@ -0,0 +1,12 @@
> +/* PR target/81532 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -mavx512dq -mavx512vl -mno-avx512bw" } */
> +
> +typedef unsigned __int128 V __attribute__ ((vector_size (64)));
> +
> +V
> +foo (V c)
> +{
> +  c >>= 0 != c;
> +  return c;
> +}
> --- gcc/testsuite/gcc.target/i386/avx512bw-pr81532.c.jj 2017-07-24 
> 13:12:40.660619897 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-pr81532.c2017-07-24 
> 13:14:20.984416180 +0200
> @@ -0,0 +1,5 @@
> +/* PR target/81532 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -mavx512bw -mavx512vl -mno-avx512dq" } */
> +
> +#include "avx512dq-pr81532.c"
>
> Jakub


Re: [COMMITED][AArch64] Fix PR79041

2017-07-25 Thread Ramana Radhakrishnan
On Tue, Jul 25, 2017 at 1:13 PM, Wilco Dijkstra  wrote:
> James Greenhalgh wrote:
>>
>> OK. Either like this, or with the conditions swapped around as Yvan
>> suggested to make backporting easier.
>
> I swapped the conditions around, not sure whether it helps...
> Also I needed an additional scan-assembler, this was committed to
> trunk and GCC7:


BZ suggests that this affects GCC 6 but GCC 7 is fixed ? Should there
be a backport to GCC 6 as well ?

Can you please keep BZ up to date please ?


Ramana
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> d7fab3775c06477365217e357b5754026d09752b..7713d543af788d47dcdf8ea89482a2d4f66a1344
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10181,7 +10181,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
>   /* This is alright even in PIC code as the constant
>  pool reference is always PC relative and within
>  the same translation unit.  */
> - if (CONSTANT_POOL_ADDRESS_P (x))
> + if (!aarch64_pcrelative_literal_loads && CONSTANT_POOL_ADDRESS_P 
> (x))
> return SYMBOL_SMALL_ABSOLUTE;
>   else
> return SYMBOL_FORCE_TO_MEM;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> new file mode 100644
> index 
> ..cd34fbab85a92d00cba7091d4146deaaf3a862a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
> +
> +__int128
> +t (void)
> +{
> +  return (__int128)1 << 80;
> +}
> +
> +/* { dg-final { scan-assembler "adr" } } */
> +/* { dg-final { scan-assembler-not "adrp" } } */


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Eric Botcazou
> Prominent LEON2 chips like the AT697E and AT697F do not properly support
> the FSMULD instruction.  LEON3 targets using the GRFPU-lite floating
> point unit do not support the FSMULD in hardware.  Disable this
> instruction for for all LEON/LEON3 targets for simplicity.

And penalize everyone else?

-- 
Eric Botcazou


Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant

2017-07-25 Thread Richard Biener
On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng  wrote:
> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
>  wrote:
>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng  wrote:
>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng  wrote:
 Hi,
 This is a followup patch better handling below case:
  for (i = 0; i < n; i++)
{
  a[i] = 1;
  a[i+2] = 2;
}
 Instead of generating root variables by loading from memory and 
 propagating with PHI
 nodes, like:
  t0 = a[0];
  t1 = a[1];
  for (i = 0; i < n; i++)
{
  a[i] = 1;
  t2 = 2;
  t0 = t1;
  t1 = t2;
}
  a[n] = t0;
  a[n+1] = t1;
 We can simply store loop invariant values after loop body if we know loop 
 iterates more
 than chain->length times, like:
  for (i = 0; i < n; i++)
{
  a[i] = 1;
}
  a[n] = 2;
  a[n+1] = 2;

 Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>> Update patch wrto changes in previous patch.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +  if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>> +   continue;
>>
>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>> FIXED_CST would be ok as well for example.
>>
>> Ok with that change.  Did we eventually optimize this in followup
>> passes previously?
> Probably not?  Given below test:
>
> int a[1], b[1], c[1];
> int f(void)
> {
>   int i, n = 100;
>   int t0 = a[0];
>   int t1 = a[1];
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  int t2 = 2;
>  t0 = t1;
>  t1 = t2;
>}
>  a[n] = t0;
>  a[n+1] = t1;
>   return 0;
> }
> The optimized dump is as:
>
>[1.00%] [count: INV]:
>   t1_8 = a[1];
>   ivtmp.9_17 = (unsigned long) &a;
>   _16 = ivtmp.9_17 + 400;
>
>[99.00%] [count: INV]:
>   # t1_20 = PHI <2(3), t1_8(2)>
>   # ivtmp.9_2 = PHI 
>   _15 = (void *) ivtmp.9_2;
>   MEM[base: _15, offset: 0B] = 1;
>   ivtmp.9_1 = ivtmp.9_2 + 4;
>   if (ivtmp.9_1 != _16)
> goto ; [98.99%] [count: INV]
>   else
> goto ; [1.01%] [count: INV]
>
>[1.00%] [count: INV]:
>   a[100] = t1_20;
>   a[101] = 2;
>   return 0;
>
> We now eliminate one phi and leave another behind.  It is vrp1/dce2
> when the phi is eliminated.

Ok, I see.  Maybe worth filing a missed optimization PR.

Richard.

> Thanks,
> bin


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber

On 25/07/17 14:59, Eric Botcazou wrote:


Prominent LEON2 chips like the AT697E and AT697F do not properly support
the FSMULD instruction.  LEON3 targets using the GRFPU-lite floating
point unit do not support the FSMULD in hardware.  Disable this
instruction for for all LEON/LEON3 targets for simplicity.

And penalize everyone else?


Yes, everyone using LEON/LEON3 will no longer benefit from the FSMULD 
after this patch. This was discussed with Gaisler. We have to make a 
trade-off between correctness, performance and the number of GCC options.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[PATCH][GCC][ARM] Fix code generation from parsecpu.awk to remove duplicates.

2017-07-25 Thread Tamar Christina
Hi All,

The new ARM options parser generates from the following entry

begin cpu cortex-a55
 ...
 architecture armv8.2-a+fp16+foo
 option crypto add FP_ARMv8 CRYPTO
 option dotprod add FP_ARMv8 FOO
 option nofp remove ALL_FP
 ...
end cpu cortex-a55

A list of all feature bits that using -mcpu=cortex-a55
enabled and stores these in an array. The size of this array
is statically set to be isa_num_bits which is the last entry
in the isa_feature_bits enum.

What this means is that there is an assumption that you can
never have more feature bits in this array than there are
features.  Which is a reasonable assumption only if the entries
in the array are unique.

The above example currently would add FP_ARMv8 three times.
FP_ARMv8 maps to the macro ISA_FP_ARMv8.  Which when expanded
trivially exceeds the size of the array.

This patch makes sure that when constructing the array we only
emit an a value once. The array is constructed between three
different sources which is where the duplications come in.

Because we can't however look into the macros. When expended
they may still introduce duplicate bits. So this doesn't
completely fix the problem. Just makes it a bit less likely.

Bootstrapped on arm-none-eabi and no issues.

Ok for trunk?

gcc/
2017-07-25  Tamar Christina  

* config/arm/parsecpu.awk (all_cores): Remove duplicates.diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index 9d01e2cf992c9643534a7621e59e78041f8a1f9f..070d193b338ace0deb9aeb53a51796bfd633ade8 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -223,10 +223,39 @@ function gen_comm_data () {
 	if (arch_opt_remove[feats[1],feats[m]] == "true") {
 		fatal("cannot remove features from architecture specs")
 	}
-	print "" arch_opt_isa[feats[1],feats[m]] ","
+	# The isa_features array that is being initialized here has a length
+	# of max isa_bit_num, which is the last entry in the enum.
+	# Logically this means that the number of features is implicitly
+	# never more than the number of feature bits we have.  This is only
+	# true if we don't emit duplicates here.  So keep track of which
+	# options we have already emitted so we don't emit them twice.
+	nopts = split (arch_opt_isa[feats[1],feats[m]], opts, ",")
+	for (i = 1; i <= nopts; i++) {
+		if (! (opts[i] in seen)) {
+		  print "" opts[i] ","
+		  seen[opts[i]]
+		}
+	}
+	}
+	if (cpus[n] in cpu_fpu) {
+	nopts = split (fpu_isa[cpu_fpu[cpus[n]]], opts, ",")
+	for (i = 1; i <= nopts; i++) {
+		if (! (opts[i] in seen)) {
+		  print "" opts[i] ","
+		  seen[opts[i]]
+		}
+	}
+	}
+	if (cpus[n] in cpu_isa) {
+	nopts = split (cpu_isa[cpus[n]], opts, ",")
+	for (i = 1; i <= nopts; i++) {
+		if (! (opts[i] in seen)) {
+		  print "" opts[i] ","
+		  seen[opts[i]]
+		}
+	}
 	}
-	if (cpus[n] in cpu_fpu) print "" fpu_isa[cpu_fpu[cpus[n]]] ","
-	if (cpus[n] in cpu_isa) print "" cpu_isa[cpus[n]] ","
+	delete seen
 	print "isa_nobit"
 	print "  }"
 	print "},"


Re: [libgomp, OpenACC] Add more map handling for enter/exit data directives

2017-07-25 Thread Chung-Lin Tang
Ping.

On 2017/6/22 3:18 PM, Chung-Lin Tang wrote:
> On 2017/6/14 12:00 AM, Jakub Jelinek wrote:
>> I don't see sufficient information on what you want to change and why
>> and whether the changes are backwards compatible (say will a valid
>> OpenACC 2.0 program compiled by GCC 7 work against both libgomp from GCC 7
>> as well as one with this patch)?
>> Can you write a few paragraphs on it (doesn't have to be comments in the
>> source, mailing list is fine)?
> 
> The current code doesn't handle GOMP_MAP_TO (present_or_copyin) and also the
> GOMP_MAP_PSET/MAP_POINTER handling wasn't entirely correct. This patch fixes
> them.
> 
> In the new attached patch, I added a fix a memory management fix that was
> forgotten earlier. The collective patch was originally by Cesar, from here:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01367.html
> (in that post, his main emphasis was the memory management fix)
> 
>>> +   case GOMP_MAP_ALLOC:
>>> + acc_present_or_create (hostaddrs[i], sizes[i]);
>>>   break;
>>> case GOMP_MAP_FORCE_ALLOC:
>>>   acc_create (hostaddrs[i], sizes[i]);
>>>   break;
>>> -   case GOMP_MAP_FORCE_PRESENT:
>>> +   case GOMP_MAP_TO:
>>>   acc_present_or_copyin (hostaddrs[i], sizes[i]);
>>>   break;
>>> case GOMP_MAP_FORCE_TO:
>>> - acc_present_or_copyin (hostaddrs[i], sizes[i]);
>>> + acc_copyin (hostaddrs[i], sizes[i]);
>>>   break;
>>
>> E.g. in this hunk you remove GOMP_MAP_POINTER and GOMP_MAP_FORCE_PRESENT
>> handling and significantly change GOMP_MAP_FORCE_TO.  The first two will
>> now gomp_fatal, right?  Can it ever appear in GOACC_enter_exit_data
>> calls?
> 
> GOMP_MAP_FORCE_PRESENT does not appear in enter/exit data directives, while
> GOMP_MAP_POINTER is handled in find_pointer().
> 
> Thanks,
> Chung-Lin
> 
> 2017-06-22  Cesar Philippidis  
> Thomas Schwinge  
> Chung-Lin Tang  
> 
> libgomp/
>   * oacc-mem.c (gomp_acc_remove_pointer): Fix a memory leak preventing
>   target_mem_desc.to_free from being deallocated with acc exit data.
>   * oacc-parallel.c (find_pset): Adjust and rename from...
>   (find_pointer): ...this function.
>   (GOACC_enter_exit_data): Handle GOMP_MAP_TO and GOMP_MAP_ALLOC,
>   adjust find_pointer calls into find_pset, adjust pointer map handling,
>   add acc_is_present guards to calls to gomp_acc_insert_pointer and
>   gomp_acc_remove_pointer.
> 
>   * testsuite/libgomp.oacc-c-c++-common/data-2.c: Update test.
>   * testsuite/libgomp.oacc-c-c++-common/enter-data.c: New test.
>   * testsuite/libgomp.oacc-fortran/data-2.f90: Update test.
> 
> 



Re: [PATCH v12] add -fpatchable-function-entry=N,M option

2017-07-25 Thread Torsten Duwe
On Tue, Jul 25, 2017 at 10:12:00AM +0300, Maxim Kuvyrkov wrote:
> > 
> > SuSe/Novell choose to do its assignments per engineer, not for the whole 
> > company, so I don't think you are covered as is.
> 
> Torsten, any update on which way you are going to handle copyright assignment 
> -- apply for a new one or have one of your SuSe colleagues commit under 
> theirs?

I have checked with our GCC guys and the code is covered by our
company assignment. So it's safe to commit.

Torsten



Re: [PATCH] Fix infinite recursion with div-by-zero (PR middle-end/70992)

2017-07-25 Thread Richard Biener
On Tue, Jul 25, 2017 at 2:19 PM, Marek Polacek  wrote:
> On Thu, Jul 20, 2017 at 08:22:59PM +0200, Richard Biener wrote:
>> No, we shouldn't.  Maybe trapping ops shouldn't be marked TREE_CONSTANT?
>> (Make sure to test with Ada...)
>
> That works for both testcases, but I can't say I really like the idea of
> modifying build2... but it's where the TREE_CONSTANT comes from.

Yes, I know.

> The tree_could_trap_p bit is because the C++ FE can pass "< x" around, e.g.
> a LT_EXPR with the first operand null.

Eh... :/

> Bootstrapped/regtested on x86_64-linux (Ada included) and 
> powerpc64le-unknown-linux-gnu,
> ok for trunk?

Eric, any comments?

We could also avoid tree_could_trap_p by special-casing only
*_DIV_EXPR and *_MOD_EXPR
with integer zero 2nd operand explicitely in build2 given there's no
"constant" value for this.  That is,
for FP 1./0. is NaN (a "constant" value) even if the operation might trap.

Thanks,
Richard.

> 2017-07-25  Marek Polacek  
>
> PR c/70992
> * tree-eh.c (tree_could_trap_p): Check if the first operand is null.
> * tree.c: Include "tree-eh.h".
> (build2_stat): Only set TREE_CONSTANT if the tree couldn't trap.
>
> * gcc.dg/overflow-warn-1.c: Adjust dg-error.
> * gcc.dg/overflow-warn-2.c: Likewise.
> * gcc.dg/overflow-warn-3.c: Likewise.
> * gcc.dg/overflow-warn-4.c: Likewise.
> * gcc.dg/torture/pr70992-2.c: New test.
> * gcc.dg/torture/pr70992.c: New test.
>
> diff --git gcc/testsuite/gcc.dg/overflow-warn-1.c 
> gcc/testsuite/gcc.dg/overflow-warn-1.c
> index 8eb322579cf..a5cd5738636 100644
> --- gcc/testsuite/gcc.dg/overflow-warn-1.c
> +++ gcc/testsuite/gcc.dg/overflow-warn-1.c
> @@ -49,7 +49,7 @@ static int sc = INT_MAX + 1; /* { dg-warning "25:integer 
> overflow in expression"
>  void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer overflow in 
> expression" } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-1 } */
>  void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
> -/* { dg-error "initializer element is not computable at load time" 
> "constant" { target *-*-* } .-1 } */
> +/* { dg-error "initializer element is not constant" "constant" { target 
> *-*-* } .-1 } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-2 } */
>  void *r = (1 ? 0 : INT_MAX+1);
>
> diff --git gcc/testsuite/gcc.dg/overflow-warn-2.c 
> gcc/testsuite/gcc.dg/overflow-warn-2.c
> index f048d6dae2a..05ab104fa4a 100644
> --- gcc/testsuite/gcc.dg/overflow-warn-2.c
> +++ gcc/testsuite/gcc.dg/overflow-warn-2.c
> @@ -49,7 +49,7 @@ static int sc = INT_MAX + 1; /* { dg-warning "integer 
> overflow in expression" }
>  void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer overflow in 
> expression" } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-1 } */
>  void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
> -/* { dg-error "initializer element is not computable at load time" 
> "constant" { target *-*-* } .-1 } */
> +/* { dg-error "initializer element is not constant" "constant" { target 
> *-*-* } .-1 } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-2 } */
>  void *r = (1 ? 0 : INT_MAX+1);
>
> diff --git gcc/testsuite/gcc.dg/overflow-warn-3.c 
> gcc/testsuite/gcc.dg/overflow-warn-3.c
> index 664011e401d..fd4a34f67e2 100644
> --- gcc/testsuite/gcc.dg/overflow-warn-3.c
> +++ gcc/testsuite/gcc.dg/overflow-warn-3.c
> @@ -55,7 +55,7 @@ void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer 
> overflow in expression" }
>  /* { dg-warning "overflow in constant expression" "constant" { target *-*-* 
> } .-1 } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-2 } */
>  void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
> -/* { dg-error "initializer element is not computable at load time" 
> "constant" { target *-*-* } .-1 } */
> +/* { dg-error "initializer element is not constant" "constant" { target 
> *-*-* } .-1 } */
>  /* { dg-warning "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-2 } */
>  void *r = (1 ? 0 : INT_MAX+1);
>
> diff --git gcc/testsuite/gcc.dg/overflow-warn-4.c 
> gcc/testsuite/gcc.dg/overflow-warn-4.c
> index 52677ce897a..018e3e1e4cd 100644
> --- gcc/testsuite/gcc.dg/overflow-warn-4.c
> +++ gcc/testsuite/gcc.dg/overflow-warn-4.c
> @@ -55,7 +55,7 @@ void *p = 0 * (INT_MAX + 1); /* { dg-warning "integer 
> overflow in expression" }
>  /* { dg-error "overflow in constant expression" "constant" { target *-*-* } 
> .-1 } */
>  /* { dg-error "initialization makes pointer from integer without a cast" 
> "null" { target *-*-* } .-2 } */
>  void *q = 0 * (1 / 0); /* { dg-warning "division by zero" } */
> -/* { dg-error "initializer element is not computable 

Re: [PATCH #2, cleanup] Remove PowerPC TARGET_UPPER_REGS_{DF,SF} macros

2017-07-25 Thread Michael Meissner
On Tue, Jul 25, 2017 at 07:34:46AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Jul 24, 2017 at 07:40:26PM -0400, Michael Meissner wrote:
> > This patch eliminates the TARGET_UPPER_REGS_{DF,SF} macros.  The next patch
> > will eliminate TARGET_UPPER_REGS_DI.
> > 
> > I had to tune the optimization that turned load into FPR register and then 
> > move
> > to Altivec register (and the store equivalent) because it used
> > TARGET_UPPER_REGS_ to protect SFmode on power7.
> 
> > --- gcc/config/rs6000/rs6000.md (revision 250485)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -691,7 +691,7 @@ (define_code_attr SMINMAX   [(smin "SM
> >  ;; D-form load to FPR register & move to Altivec register
> >  ;; Move Altivec register to FPR register and store
> >  (define_mode_iterator ALTIVEC_DFORM [DF
> > -SF
> > +(SF "TARGET_P8_VECTOR")
> >  (DI "TARGET_POWERPC64")])
> 
> Is that this part?

Yes.

> > --- gcc/config/rs6000/rs6000.c  (revision 250485)
> > +++ gcc/config/rs6000/rs6000.c  (working copy)
> > @@ -3216,22 +3216,12 @@ rs6000_init_hard_regno_mode_ok (bool glo
> >rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS;
> >rs6000_constraints[RS6000_CONSTRAINT_wd] = VSX_REGS; /* V2DFmode  */
> >rs6000_constraints[RS6000_CONSTRAINT_wf] = VSX_REGS; /* V4SFmode  */
> > +  rs6000_constraints[RS6000_CONSTRAINT_ws] = VSX_REGS; /* DFmode  */
> > +  rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS; /* 
> > DFmode  */
> > +  rs6000_constraints[RS6000_CONSTRAINT_wi] = VSX_REGS; /* DImode  */
> 
> After this all is done you can probably simplify the constraints a bit.
> Looking forward to it :-)

No, we can never remove constraints, since otherwise it would break user
written asm statements.

> > --- gcc/config/rs6000/rs6000.h  (revision 250485)
> > +++ gcc/config/rs6000/rs6000.h  (working copy)
> > @@ -575,9 +575,11 @@ extern int rs6000_vector_align[];
> > and/or SFmode could go in the traditional Altivec registers.  GCC 8.x 
> > deleted
> > these options.  In order to simplify the code, define the options in 
> > terms
> > of the base option (vsx, power8-vector).  */
> > -#define TARGET_UPPER_REGS_DF   TARGET_VSX
> > +#if (GCC_VERSION >= 3000)
> > +#pragma GCC poison TARGET_UPPER_REGS_DF TARGET_UPPER_REGS_SF
> > +#endif
> 
> Why poison it?  If someone accidentally slips in a new use it won't
> compile anyway.
> 
> Okay for trunk with the poison removed.  Thanks!

Ok.  The second patch is in make check.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant

2017-07-25 Thread Bin.Cheng
On Tue, Jul 25, 2017 at 1:57 PM, Richard Biener
 wrote:
> On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng  wrote:
>> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
>>  wrote:
>>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng  wrote:
 On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng  wrote:
> Hi,
> This is a followup patch better handling below case:
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  a[i+2] = 2;
>}
> Instead of generating root variables by loading from memory and 
> propagating with PHI
> nodes, like:
>  t0 = a[0];
>  t1 = a[1];
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  t2 = 2;
>  t0 = t1;
>  t1 = t2;
>}
>  a[n] = t0;
>  a[n+1] = t1;
> We can simply store loop invariant values after loop body if we know loop 
> iterates more
> than chain->length times, like:
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>}
>  a[n] = 2;
>  a[n+1] = 2;
>
> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
 Update patch wrto changes in previous patch.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +  if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>>> +   continue;
>>>
>>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>>> FIXED_CST would be ok as well for example.
>>>
>>> Ok with that change.  Did we eventually optimize this in followup
>>> passes previously?
>> Probably not?  Given below test:
>>
>> int a[1], b[1], c[1];
>> int f(void)
>> {
>>   int i, n = 100;
>>   int t0 = a[0];
>>   int t1 = a[1];
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>  int t2 = 2;
>>  t0 = t1;
>>  t1 = t2;
>>}
>>  a[n] = t0;
>>  a[n+1] = t1;
>>   return 0;
>> }
>> The optimized dump is as:
>>
>>[1.00%] [count: INV]:
>>   t1_8 = a[1];
>>   ivtmp.9_17 = (unsigned long) &a;
>>   _16 = ivtmp.9_17 + 400;
>>
>>[99.00%] [count: INV]:
>>   # t1_20 = PHI <2(3), t1_8(2)>
>>   # ivtmp.9_2 = PHI 
>>   _15 = (void *) ivtmp.9_2;
>>   MEM[base: _15, offset: 0B] = 1;
>>   ivtmp.9_1 = ivtmp.9_2 + 4;
>>   if (ivtmp.9_1 != _16)
>> goto ; [98.99%] [count: INV]
>>   else
>> goto ; [1.01%] [count: INV]
>>
>>[1.00%] [count: INV]:
>>   a[100] = t1_20;
>>   a[101] = 2;
>>   return 0;
>>
>> We now eliminate one phi and leave another behind.  It is vrp1/dce2
>> when the phi is eliminated.
>
> Ok, I see.  Maybe worth filing a missed optimization PR.
Right, PR81549 filed.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin


Re: [PATCH][GCC][ARM] Fix code generation from parsecpu.awk to remove duplicates.

2017-07-25 Thread Ramana Radhakrishnan

On 7/25/17 1:59 PM, Tamar Christina wrote:

Hi All,

The new ARM options parser generates from the following entry

begin cpu cortex-a55
  ...
  architecture armv8.2-a+fp16+foo
  option crypto add FP_ARMv8 CRYPTO
  option dotprod add FP_ARMv8 FOO
  option nofp remove ALL_FP
  ...
end cpu cortex-a55

A list of all feature bits that using -mcpu=cortex-a55
enabled and stores these in an array. The size of this array
is statically set to be isa_num_bits which is the last entry
in the isa_feature_bits enum.

What this means is that there is an assumption that you can
never have more feature bits in this array than there are
features.  Which is a reasonable assumption only if the entries
in the array are unique.

The above example currently would add FP_ARMv8 three times.
FP_ARMv8 maps to the macro ISA_FP_ARMv8.  Which when expanded
trivially exceeds the size of the array.

This patch makes sure that when constructing the array we only
emit an a value once. The array is constructed between three
different sources which is where the duplications come in.

Because we can't however look into the macros. When expended
they may still introduce duplicate bits. So this doesn't
completely fix the problem. Just makes it a bit less likely.

Bootstrapped on arm-none-eabi and no issues.

Ok for trunk?

gcc/
2017-07-25  Tamar Christina  

* config/arm/parsecpu.awk (all_cores): Remove duplicates.




Ok.

Thanks for fixing this.

regards
Ramana


Re: [COMMITED][AArch64] Fix PR79041

2017-07-25 Thread Wilco Dijkstra
Ramana Radhakrishnan 
>
> BZ suggests that this affects GCC 6 but GCC 7 is fixed ? Should there
> be a backport to GCC 6 as well ?
>
> Can you please keep BZ up to date please ?

GCC7 was affected as well, the example in PR79041 didn't trigger in GCC7,
my patch has a better example that does show the issue.

GCC6 requires a backport of multiple patches. The patch is 
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01708.html
which is still waiting for review.

Wilco


[PATCH #3, cleanup] Remove PowerPC TARGET_UPPER_REGS_DI macro

2017-07-25 Thread Michael Meissner
This patch eliminates TARGET_UPPER_REGS_DI.  I deleted the poison attribute in
patch #1.  I will combine the ChangeLog and submit this patch and the previous
patch together if approved.

It bootstraps and has no regressions on big endian power7 and little endian
power8.  Can I install this patch on the trunk.

2017-07-25  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_option_override_internal):
Eliminate TARGET_UPPER_REGS_DI.
(rs6000_expand_vector_set): Likewise.
* config/rs6000/vsx.md (vsx_set__p9): Likewise.
(vsx_set_v4sf_p9): Likewise.
(vsx_set_v4sf_p9_zero): Likewise.
(vsx_insert_extract_v4sf_p9): Likewise.
(vsx_insert_extract_v4sf_p9_2): Likewise.
* config/rs6000/rs6000.c (TARGET_UPPER_REGS_DI): Delete.
(TARGET_VEXTRACTUB): Eliminate TARGET_UPPER_REGS_DI.
(TARGET_DIRECT_MOVE_64BIT): Likewise.
* config/rs6000/rs6000.md
(float2_internal): Likewise.
(Splitters for DI constants in Altivec registers): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 250487)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4637,11 +4637,11 @@ rs6000_option_override_internal (bool gl
  variables through memory to do moves.  SImode can be used on ISA 2.07,
  while HImode and QImode require ISA 3.0.  */
   if (TARGET_VSX_SMALL_INTEGER
-  && (!TARGET_DIRECT_MOVE || !TARGET_P8_VECTOR || !TARGET_UPPER_REGS_DI))
+  && (!TARGET_DIRECT_MOVE || !TARGET_P8_VECTOR))
 {
   if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_SMALL_INTEGER)
error ("-mvsx-small-integer requires -mpower8-vector, "
-  "-mupper-regs-di, and -mdirect-move");
+  "and -mdirect-move");
 
   rs6000_isa_flags &= ~OPTION_MASK_VSX_SMALL_INTEGER;
 }
@@ -7338,8 +7338,7 @@ rs6000_expand_vector_set (rtx target, rt
   else if (mode == V2DImode)
insn = gen_vsx_set_v2di (target, target, val, elt_rtx);
 
-  else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
-  && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
+  else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER && 
TARGET_POWERPC64)
{
  if (mode == V4SImode)
insn = gen_vsx_set_v4si_p9 (target, target, val, elt_rtx);
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 250485)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3366,7 +3366,7 @@ (define_insn "vsx_set__p9"
  (match_operand:QI 3 "" "n")]
 UNSPEC_VSX_SET))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_P9_VECTOR && 
TARGET_VSX_SMALL_INTEGER
-   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64"
+   && TARGET_POWERPC64"
 {
   int ele = INTVAL (operands[3]);
   int nunits = GET_MODE_NUNITS (mode);
@@ -3391,7 +3391,7 @@ (define_insn_and_split "vsx_set_v4sf_p9"
 UNSPEC_VSX_SET))
(clobber (match_scratch:SI 4 "=&wJwK"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
-   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64"
+   && TARGET_POWERPC64"
   "#"
   "&& reload_completed"
   [(set (match_dup 5)
@@ -3427,7 +3427,7 @@ (define_insn_and_split "*vsx_set_v4sf_p9
 UNSPEC_VSX_SET))
(clobber (match_scratch:SI 4 "=&wJwK"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
-   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64"
+   && TARGET_POWERPC64"
   "#"
   "&& reload_completed"
   [(set (match_dup 4)
@@ -3458,7 +3458,7 @@ (define_insn "*vsx_insert_extract_v4sf_p
  (match_operand:QI 4 "const_0_to_3_operand" "n")]
 UNSPEC_VSX_SET))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
-   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64
+   && TARGET_POWERPC64
&& (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
 {
   int ele = INTVAL (operands[4]);
@@ -3486,8 +3486,7 @@ (define_insn_and_split "*vsx_insert_extr
 UNSPEC_VSX_SET))
(clobber (match_scratch:SI 5 "=&wJwK"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode)
-   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
-   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64
+   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER && TARGET_POWERPC64
&& (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
   "#"
   "&& 1"
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 250487)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -571,16 +571,6 @@ extern int rs6000_vector_align[];
 
 #define TARGET_ISEL64 (TARGET_ISEL && TARGET_POWERPC64)
 
-/* We previously had -mupper-regs-{df,di,sf} to control whether DFmode

[PATCH] Fix PR81529

2017-07-25 Thread Richard Biener

This fixes an oversight in an earlier patch of mine (SLP of inductions).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-07-25  Richard Biener  

PR tree-optimization/81529
* tree-vect-stmts.c (process_use): Disregard live induction PHIs
when optimizing backedge uses.

* gfortran.dg/pr81529.f90: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 250494)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -581,9 +583,11 @@ process_use (gimple *stmt, tree use, loo
 }
   /* We are also not interested in uses on loop PHI backedges that are
  inductions.  Otherwise we'll needlessly vectorize the IV increment
- and cause hybrid SLP for SLP inductions.  */
+ and cause hybrid SLP for SLP inductions.  Unless the PHI is live
+ of course.  */
   else if (gimple_code (stmt) == GIMPLE_PHI
   && STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_induction_def
+  && ! STMT_VINFO_LIVE_P (stmt_vinfo)
   && (PHI_ARG_DEF_FROM_EDGE (stmt, loop_latch_edge (bb->loop_father))
   == use))
 {

Index: gcc/testsuite/gfortran.dg/pr81529.f90
===
--- gcc/testsuite/gfortran.dg/pr81529.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr81529.f90   (working copy)
@@ -0,0 +1,32 @@
+! { dg-do compile }
+! { dg-options "-std=gnu -fno-tree-scev-cprop -Ofast" }
+
+subroutine CalcCgr(C,rmax,ordgr_max)
+  integer, intent(in) :: rmax,ordgr_max
+  double complex :: Zadj(2,2), Zadj2(2,2)
+  double complex, intent(out) :: C(0:rmax,0:rmax,0:rmax)
+  double complex, allocatable :: Cexpgr(:,:,:,:)
+  double complex :: Caux
+  integer :: rmaxB,rmaxExp,r,n0,n1,n2,k,l,i,j,m,n,nn
+
+  rmaxB = 2*rmax
+  rmaxExp = rmaxB
+  allocate(Cexpgr(0:rmaxExp/2,0:rmaxExp,0:rmaxExp,0:ordgr_max))
+   
+  rloop: do r=0,rmaxExp/2
+do n0=r,1,-1
+  do nn=r-n0,0,-1
+do i=1,2
+  Caux = Caux - Zadj(i,l)
+end do
+Cexpgr(n0,0,0,0) = Caux/(2*(nn+1))
+  end do
+end do
+do n1=0,r
+  n2 = r-n1
+  if (r.le.rmax) then
+C(0,n1,n2) = Cexpgr(0,n1,n2,0)
+  end if
+end do
+  end do rloop
+end subroutine CalcCgr


[PATCH] FIx PR81455

2017-07-25 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-07-25  Richard Biener  

PR tree-optimization/81455
* tree-ssa-loop-unswitch.c (find_loop_guard): Make sure to
not walk in cycles when looking for guards.

* gcc.dg/pr81455.c: New testcase.

Index: gcc/tree-ssa-loop-unswitch.c
===
--- gcc/tree-ssa-loop-unswitch.c(revision 250494)
+++ gcc/tree-ssa-loop-unswitch.c(working copy)
@@ -582,8 +582,9 @@ find_loop_guard (struct loop *loop)
   gcond *cond;
   do
 {
+  basic_block next = NULL;
   if (single_succ_p (header))
-   header = single_succ (header);
+   next = single_succ (header);
   else
{
  cond = dyn_cast  (last_stmt (header));
@@ -593,12 +594,16 @@ find_loop_guard (struct loop *loop)
  /* Make sure to skip earlier hoisted guards that are left
 in place as if (true).  */
  if (gimple_cond_true_p (cond))
-   header = te->dest;
+   next = te->dest;
  else if (gimple_cond_false_p (cond))
-   header = fe->dest;
+   next = fe->dest;
  else
break;
}
+  /* Never traverse a backedge.  */
+  if (header->loop_father->header == next)
+   return NULL;
+  header = next;
 }
   while (1);
   if (!flow_bb_inside_loop_p (loop, te->dest)
Index: gcc/testsuite/gcc.dg/pr81455.c
===
--- gcc/testsuite/gcc.dg/pr81455.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr81455.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -funswitch-loops" } */
+
+void
+jh (unsigned int aw, int sn)
+{
+  int xs;
+
+  for (xs = 0; xs < 1; ++xs)
+aw &= 1;
+
+  while (aw < 1 || ++sn < 1)
+{
+}
+}


Re: [PATCH] Fix infinite recursion with div-by-zero (PR middle-end/70992)

2017-07-25 Thread Eric Botcazou
> Eric, any comments?

No objection for the build2_stat hunk, I think it's in keeping with the Ada 
semantics.  But the tree_could_trap_p hunk is certainly an abomination...

> We could also avoid tree_could_trap_p by special-casing only
> *_DIV_EXPR and *_MOD_EXPR
> with integer zero 2nd operand explicitely in build2 given there's no
> "constant" value for this.  That is,
> for FP 1./0. is NaN (a "constant" value) even if the operation might trap.

Yes, that would be faster & simpler and avoids the abomination.

-- 
Eric Botcazou


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber

On 25/07/17 14:59, Sebastian Huber wrote:


On 25/07/17 14:59, Eric Botcazou wrote:

Prominent LEON2 chips like the AT697E and AT697F do not properly 
support

the FSMULD instruction.  LEON3 targets using the GRFPU-lite floating
point unit do not support the FSMULD in hardware.  Disable this
instruction for for all LEON/LEON3 targets for simplicity.

And penalize everyone else?


Yes, everyone using LEON/LEON3 will no longer benefit from the FSMULD 
after this patch. This was discussed with Gaisler. We have to make a 
trade-off between correctness, performance and the number of GCC options.




What is your opinion with respect to a -mno-fsmuld option or something 
similar?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Eric Botcazou
> Yes, everyone using LEON/LEON3 will no longer benefit from the FSMULD
> after this patch. This was discussed with Gaisler. We have to make a
> trade-off between correctness, performance and the number of GCC options.

IMO we shouldn't introduce regressions like this, i.e. people should never be 
gratuitously penalized when they upgrade the compiler, this sends a very bad 
message.  The !TARGET_LEON part is probably OK but not the !TARGET_LEON3 part.

-- 
Eric Botcazou


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Eric Botcazou
> What is your opinion with respect to a -mno-fsmuld option or something
> similar?

Far better in my opinion (at least for LEON3).

-- 
Eric Botcazou


Re: [PATCH] Fix infinite recursion with div-by-zero (PR middle-end/70992)

2017-07-25 Thread Richard Biener
On Tue, Jul 25, 2017 at 3:30 PM, Eric Botcazou  wrote:
>> Eric, any comments?
>
> No objection for the build2_stat hunk, I think it's in keeping with the Ada
> semantics.  But the tree_could_trap_p hunk is certainly an abomination...
>
>> We could also avoid tree_could_trap_p by special-casing only
>> *_DIV_EXPR and *_MOD_EXPR
>> with integer zero 2nd operand explicitely in build2 given there's no
>> "constant" value for this.  That is,
>> for FP 1./0. is NaN (a "constant" value) even if the operation might trap.
>
> Yes, that would be faster & simpler and avoids the abomination.

Ok, then let's go with that slightly uglier but less abominal variant ;)

Thanks,
Richard.

> --
> Eric Botcazou


PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread H.J. Lu
On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
>>> > Hi!
>>> >
>>> > Honza recently changed the i?86 backend, so that it often doesn't
>>> > do -maccumulate-outgoing-args by default on x86_64.
>>> > Unfortunately, on some of the here included testcases this regressed
>>> > quite a bit the generated code.  As AVX vectors are used, the dynamic
>>> > realignment code needs to assume e.g. that some of them will need to be
>>> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > need_drap early as well.  But in when emitting the prologue/epilogue,
>>> > if need_drap is set, we don't perform the optimization for leaf functions
>>> > which have zero size stack frame, thus we end up with uselessly doing
>>> > dynamic stack realignment, setting up DRAP that nothing uses and later on
>>> > restore everything back.
>>> >
>>> > This patch improves it, if the DRAP register isn't live at the start of
>>> > entry bb successor and we aren't going to realign the stack, we don't
>>> > need DRAP at all, and even if we need DRAP register, that can't be the 
>>> > sole
>>> > reason for doing stack realignment, the prologue code is able to set up 
>>> > DRAP
>>> > even without dynamic stack realignment.
>>> >
>>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> >
>>> > 2013-12-20  Jakub Jelinek  
>>> >
>>> > PR target/59501
>>> > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > drap_reg
>>> > if !crtl->stack_realign_needed.
>>> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
>>> > entry
>>> > and stack_realign_needed will be false, clear drap_reg and 
>>> > need_drap.
>>> > Optimize leaf functions that don't need stack frame even if
>>> > crtl->need_drap.
>>> >
>>> > * gcc.target/i386/pr59501-1.c: New test.
>>> > * gcc.target/i386/pr59501-1a.c: New test.
>>> > * gcc.target/i386/pr59501-2.c: New test.
>>> > * gcc.target/i386/pr59501-2a.c: New test.
>>> > * gcc.target/i386/pr59501-3.c: New test.
>>> > * gcc.target/i386/pr59501-3a.c: New test.
>>> > * gcc.target/i386/pr59501-4.c: New test.
>>> > * gcc.target/i386/pr59501-4a.c: New test.
>>> > * gcc.target/i386/pr59501-5.c: New test.
>>> > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> >
>>> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
>>> > 12:19:20.603212859 +0100
>>> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
>>> > 12:23:33.647881672 +0100
>>> > @@ -0,0 +1,8 @@
>>> > +/* PR target/59501 */
>>> > +/* { dg-do compile { target { ! ia32 } } } */
>>> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
>>> > +
>>> > +#include "pr59501-3a.c"
>>> > +
>>> > +/* Verify no dynamic realignment is performed.  */
>>> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
>>> > */
>>> >
>>>
>>> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
>>> xfailed due to stack frame access via frame pointer instead of DARP.
>>> This patch finds the maximum stack alignment from the stack frame access
>>> instructions and avoids stack realignment if stack alignment needed is
>>> less than incoming stack boundary.
>>>
>>> I am testing this patch.  OK for trunk if there is no regression?
>>>
>>>
>>
>> We need to keep the preferred stack alignment as the minimum stack
>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>> trunk?
>>
>> Thanks.
>
> Hi Jakub,
>
> This patch fixes the xfailed testcase in your patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>
> Your original patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>
> assumes that any instructions accessing stack require stack
> realignment:
>
> +  FOR_EACH_BB (bb)
> +{
> +  rtx insn;
> +  FOR_BB_INSNS (bb, insn)
> +if (NONDEBUG_INSN_P (insn)
> + && requires_stack_frame_p (insn, prologue_used,
> +   set_up_by_prologue))
> +  {
> + crtl->stack_realign_needed = stack_realign;
> + crtl->stack_realign_finalized = true;
> + return;
> +  }
> + }
>
> This patch checks the actual alignment needed for any instructions
> accessing stack.  It skips stack realignment if the actual stack alignment
> needed is less than or equal to incoming stack alignment.
>
> Can you take look at it?
>
> Thanks.
>

PING

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html

-- 
H.J.


[PATCH][AArch64] Simplify frame layout for stack probing

2017-07-25 Thread Wilco Dijkstra
This patch makes some changes to the frame layout in order to simplify
stack probing.  We want to use the save of LR as a probe in any non-leaf
function.  With shrinkwrapping we may only save LR before a call, so it
is useful to define a fixed location in the callee-saves. So force LR at
the bottom of the callee-saves even with -fomit-frame-pointer.

Also remove a rarely used frame layout that saves the callee-saves first
with -fomit-frame-pointer.

OK for commit (and backport to GCC7)?

ChangeLog:
2017-07-25  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_layout_frame):
Ensure LR is always stored at the bottom of the callee-saves.
Remove frame option which saves callee-saves at top of frame.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b8a4160d9de8e689ccd26cb9f0ce046ee65e0ef4..3fc36ae28d18b9635480fd99f1fa7719267e66e4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2875,7 +2875,8 @@ aarch64_frame_pointer_required (void)
 
 /* Mark the registers that need to be saved by the callee and calculate
the size of the callee-saved registers area and frame record (both FP
-   and LR may be omitted).  */
+   and LR may be omitted).  If the function is not a leaf, ensure LR is
+   saved at the bottom of the callee-save area.  */
 static void
 aarch64_layout_frame (void)
 {
@@ -2926,7 +2927,14 @@ aarch64_layout_frame (void)
   cfun->machine->frame.wb_candidate1 = R29_REGNUM;
   cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
   cfun->machine->frame.wb_candidate2 = R30_REGNUM;
-  offset += 2 * UNITS_PER_WORD;
+  offset = 2 * UNITS_PER_WORD;
+}
+  else if (!crtl->is_leaf)
+{
+  /* Ensure LR is saved at the bottom of the callee-saves.  */
+  cfun->machine->frame.reg_offset[R30_REGNUM] = 0;
+  cfun->machine->frame.wb_candidate1 = R30_REGNUM;
+  offset = UNITS_PER_WORD;
 }
 
   /* Now assign stack slots for them.  */
@@ -3025,20 +3033,6 @@ aarch64_layout_frame (void)
   cfun->machine->frame.final_adjust
= cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
 }
-  else if (!frame_pointer_needed
-  && varargs_and_saved_regs_size < max_push_offset)
-{
-  /* Frame with large local area and outgoing arguments (this pushes the
-callee-saves first, followed by the locals and outgoing area):
-stp reg1, reg2, [sp, -varargs_and_saved_regs_size]!
-stp reg3, reg4, [sp, 16]
-sub sp, sp, frame_size - varargs_and_saved_regs_size  */
-  cfun->machine->frame.callee_adjust = varargs_and_saved_regs_size;
-  cfun->machine->frame.final_adjust
-   = cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
-  cfun->machine->frame.hard_fp_offset = cfun->machine->frame.callee_adjust;
-  cfun->machine->frame.locals_offset = cfun->machine->frame.hard_fp_offset;
-}
   else
 {
   /* Frame with large local area and outgoing arguments using frame 
pointer:


Re: [gomp4] OpenACC async re-work

2017-07-25 Thread Cesar Philippidis
On 07/25/2017 05:51 AM, Chung-Lin Tang wrote:
> On 2017/6/29 6:31 AM, Cesar Philippidis wrote:

> Attached is the updated version of the patch, re-tested.
> 
> Thomas, do you need some more time to look over it? Or should I commit it 
> first?

I'm not too concerned about the profiling stuff because that should be
considered beta quality (because it's still not fully implemented).

We'll probably need to revisit the locking stuff later, but this patch
is OK for og7. I suspect that we may have to make other locking tweaks
in the nvptx plugin anyway.

Cesar


Re: [PATCHv4][PING][PR 57371] Remove useless floating point casts in comparisons

2017-07-25 Thread Richard Biener
On Mon, Jul 17, 2017 at 9:29 AM, Yuri Gribov  wrote:
> Hi all,
>
> This is an updated version of patch in
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00409.html . It prevents
> optimization in presense of sNaNs (and qNaNs when comparison operator
> is > >= < <=) to preserve FP exceptions.
>
> Note that I had to use -fsignaling-nans in pr57371-5.c test because by
> default this option is off and some existing patterns in match.pd
> happily optimize NaN comparisons, even with sNaNs (!).
>
> Bootstrapped and regtested on x64. Ok for trunk?

+ {
+   tree itype = TREE_TYPE (@0);
+   gcc_assert (INTEGRAL_TYPE_P (itype));

no need to spell out this assert.

+
+   format_helper fmt (REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@1;
+
+   const REAL_VALUE_TYPE *cst = TREE_REAL_CST_PTR (@1);
+
+   /* Be careful to preserve any potential exceptions due to
+ NaNs.  qNaNs are ok in == or != context.
+
+ TODO: relax under -fno-trapping-math or
+ -fno-signaling-nans.  */
+   bool exception_p
+ = real_isnan (cst) && (cst->signalling
+ || (cmp != EQ_EXPR || cmp != NE_EXPR));
+
+   /* INT?_MIN is power-of-two so it takes
+ only one mantissa bit.  */

please reduce vertical spacing

+   bool itype_fits_ftype_p
+   = TYPE_PRECISION (itype) - signed_p <= significand_size (fmt);

watch spacing -- indent by two.

+  (with
+   {
+REAL_VALUE_TYPE imin, imax;

likewise.

+if (!cst_int_p && cmp == GT_EXPR)
+  icmp = GE_EXPR;
+else if (!cst_int_p && cmp == LT_EXPR)
+  icmp = LE_EXPR;

ugh.  Please do not assign to match.pd iterators, I'm going to commit a patch
making them const.

+   }
+
+   (switch
+
+   /* Optimize cases when CST is outside of ITYPE's range.  */
+   (if (real_compare (LT_EXPR, cst, &imin))
+{ constant_boolean_node (cmp == GT_EXPR || cmp == GE_EXPR ||
cmp == NE_EXPR,
+ type); })
+   (if (real_compare (GT_EXPR, cst, &imax))
+{ constant_boolean_node (cmp == LT_EXPR || cmp == LE_EXPR ||
cmp == NE_EXPR,
+ type); })
+
+   /* Remove cast if CST is an integer representable by ITYPE.  */
+   (if (cst_int_p)
+(cmp @0 { gcc_assert (!overflow_p);
+  wide_int_to_tree (itype, icst_val); })
+   )
+

reduce vertical spacing, watch long lines.

+   /* Otherwise replace with sensible integer constant.  */
+   (with
+{
+  gcc_assert (!overflow_p);
+  gcc_assert (real_compare (GE_EXPR, &icst, &imin)
+  && real_compare (LE_EXPR, &icst, &imax));
+  gcc_assert (wi::ge_p (icst_val, wi::min_value (itype), isign)
+  && wi::le_p (icst_val, wi::max_value (itype), isign));

ugh. IFF then gcc_checking_assert but are those really necessary?

Thanks,
Richard.


> -Y


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber

On 25/07/17 15:42, Eric Botcazou wrote:


What is your opinion with respect to a -mno-fsmuld option or something
similar?

Far better in my opinion (at least for LEON3).



How should I add this option? For example:

diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index ae63d2018e3..7a582f3a870 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -93,6 +93,10 @@ mfmaf
 Target Report Mask(FMAF)
 Use UltraSPARC Fused Multiply-Add extensions.

+mfsmuld
+Target Report Mask(FSMULD)
+Use V8+ Floating-point Multiply Single to Double (FsMULd) instruction.
+
 mpopc
 Target Report Mask(POPC)
 Use UltraSPARC Population-Count instruction.

In this case I have to update the cpu_table. I guess all (MASK_V8 || 
MASK_V9) && MASK_FPU entries should then get a MASK_FSMULD in the enable 
field (except leon)?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[PATCH] Make iterators and 'type' const in genmatch generated code

2017-07-25 Thread Richard Biener

So a new pattern came up assigning to the iterator var of (for var (...)).
Disallow that, likewise protect 'type' (another commonly refered 
identifier).

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2017-07-25  Richard Biener  

* genmatch.c (dt_simplify::gen): Make iterator vars const.
(decision_tree::gen): Make 'type' const.
(write_predicate): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 250518)
+++ gcc/genmatch.c  (working copy)
@@ -3461,11 +3483,11 @@ dt_simplify::gen (FILE *f, int indent, b
  if (! s->for_subst_vec[i].first->used)
continue;
  if (is_a  (s->for_subst_vec[i].second))
-   fprintf_indent (f, indent, "enum tree_code %s = %s;\n",
+   fprintf_indent (f, indent, "const enum tree_code %s = %s;\n",
s->for_subst_vec[i].first->id,
s->for_subst_vec[i].second->id);
  else if (is_a  (s->for_subst_vec[i].second))
-   fprintf_indent (f, indent, "combined_fn %s = %s;\n",
+   fprintf_indent (f, indent, "const combined_fn %s = %s;\n",
s->for_subst_vec[i].first->id,
s->for_subst_vec[i].second->id);
  else
@@ -3601,13 +3623,13 @@ decision_tree::gen (FILE *f, bool gimple
 "%s (code_helper *res_code, tree *res_ops,\n"
 " gimple_seq *seq, tree (*valueize)(tree) "
 "ATTRIBUTE_UNUSED,\n"
-" tree ARG_UNUSED (type), tree *ARG_UNUSED "
+" const tree ARG_UNUSED (type), tree 
*ARG_UNUSED "
 "(captures)\n",
 s->fname);
   else
{
  fprintf (f, "\nstatic tree\n"
-  "%s (location_t ARG_UNUSED (loc), tree ARG_UNUSED (type),\n",
+  "%s (location_t ARG_UNUSED (loc), const tree ARG_UNUSED 
(type),\n",
   (*iter).second->fname);
  for (unsigned i = 0;
   i < as_a (s->s->s->match)->ops.length (); ++i)
@@ -3619,10 +3641,10 @@ decision_tree::gen (FILE *f, bool gimple
  if (! s->s->s->for_subst_vec[i].first->used)
continue;
  if (is_a  (s->s->s->for_subst_vec[i].second))
-   fprintf (f, ", enum tree_code ARG_UNUSED (%s)",
+   fprintf (f, ", const enum tree_code ARG_UNUSED (%s)",
 s->s->s->for_subst_vec[i].first->id);
  else if (is_a  (s->s->s->for_subst_vec[i].second))
-   fprintf (f, ", combined_fn ARG_UNUSED (%s)",
+   fprintf (f, ", const combined_fn ARG_UNUSED (%s)",
 s->s->s->for_subst_vec[i].first->id);
}
 
@@ -3663,7 +3685,7 @@ decision_tree::gen (FILE *f, bool gimple
  else
fprintf (f, "\nstatic tree\n"
 "generic_simplify_%s (location_t ARG_UNUSED (loc), enum "
-"tree_code ARG_UNUSED (code), tree ARG_UNUSED (type)",
+"tree_code ARG_UNUSED (code), const tree ARG_UNUSED 
(type)",
 e->operation->id);
  for (unsigned i = 0; i < n; ++i)
fprintf (f, ", tree op%d", i);
@@ -3683,11 +3705,11 @@ decision_tree::gen (FILE *f, bool gimple
fprintf (f, "\nstatic bool\n"
 "gimple_simplify (code_helper *res_code, tree *res_ops,\n"
 " gimple_seq *seq, tree (*valueize)(tree),\n"
-" code_helper code, tree type");
+" code_helper code, const tree type");
   else
fprintf (f, "\ntree\n"
 "generic_simplify (location_t loc, enum tree_code code, "
-"tree type ATTRIBUTE_UNUSED");
+"const tree type ATTRIBUTE_UNUSED");
   for (unsigned i = 0; i < n; ++i)
fprintf (f, ", tree op%d", i);
   fprintf (f, ")\n");
@@ -3751,7 +3773,7 @@ write_predicate (FILE *f, predicate_id *
   p->nargs > 0 ? ", tree *res_ops" : "",
   gimple ? ", tree (*valueize)(tree) ATTRIBUTE_UNUSED" : "");
   /* Conveniently make 'type' available.  */
-  fprintf_indent (f, 2, "tree type = TREE_TYPE (t);\n");
+  fprintf_indent (f, 2, "const tree type = TREE_TYPE (t);\n");
 
   if (!gimple)
 fprintf_indent (f, 2, "if (TREE_SIDE_EFFECTS (t)) return false;\n");



Re: [COMMITED][AArch64] Fix PR79041

2017-07-25 Thread Andreas Schwab
On Jul 25 2017, Wilco Dijkstra  wrote:

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> new file mode 100644
> index 
> ..cd34fbab85a92d00cba7091d4146deaaf3a862a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
> +
> +__int128
> +t (void)
> +{
> +  return (__int128)1 << 80;
> +}
> +
> +/* { dg-final { scan-assembler "adr" } } */
> +/* { dg-final { scan-assembler-not "adrp" } } */

That fails in ILP32 mode.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH v2 1/2] match.pd: reassociate multiplications

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 5:55 PM, Alexander Monakov  wrote:
> Previous revision here: 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00889.html
>
> Reassociate (X * CST) * Y to (X * Y) * CST, this pushes constants in
> multiplication chains to outermost factors, where they can be combined.
>
> Changed in this revision:
> - remove !TYPE_OVERFLOW_SANITIZED and !TYPE_SATURATING checks;
>  (in previous discussion Richard indicated that introducing false negatives
>   in UBSAN by concealing signed overflow is not a concern, and saturating
>   types shouldn't appear here because the constant operand should be 
> FIXED_CST)
>
> The checks for @1 being 0 or -1 remain as they are required for correctness,
> but since this rule is ordered after the simpler rules that fold X * {0, -1},
> those checks are always false at runtime.

Ok.

Thanks,
Richard.

>
> * match.pd ((X * CST) * Y): Reassociate to (X * Y) * CST.
> testsuite/
> * gcc.dg/tree-ssa/assoc-2.c: New testcase.
>
> ---
>  gcc/match.pd| 8 
>  gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 8 
>  2 files changed, 16 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f5807c..39e1e5c 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2213,6 +2213,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (mult @0 integer_minus_onep)
>   (negate @0))
>
> +/* Reassociate (X * CST) * Y to (X * Y) * CST.  This does not introduce
> +   signed overflow for CST != 0 && CST != -1.  */
> +(simplify
> + (mult:c (mult:s @0 INTEGER_CST@1) @2)
> + (if (TREE_CODE (@2) != INTEGER_CST
> +  && !integer_zerop (@1) && !integer_minus_onep (@1))
> +  (mult (mult @0 @2) @1)))
> +
>  /* True if we can easily extract the real and imaginary parts of a complex
> number.  */
>  (match compositional_complex
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> new file mode 100644
> index 000..a92c882
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-gimple-raw -fdump-tree-optimized-raw" } */
> +
> +int f0(int a, int b){
> +  return a * 33 * b * 55;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
> --
> 1.8.3.1
>


Re: [PATCH v2 2/2] combine successive multiplications by constants

2017-07-25 Thread Richard Biener
On Fri, Jul 21, 2017 at 5:55 PM, Alexander Monakov  wrote:
> Previous revision here: 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01090.html
>
> Reassociate X * CST1 * CST2 to X * (CST1 * CST2).
>
> Changed in this revision:
> - remove the check for @2 being 0 or -1

Ok.

Thanks,
Richard.

> * match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2).
> testsuite:
> * gcc.dg/tree-ssa/assoc-2.c: Enhance.
> * gcc.dg/tree-ssa/slsr-4.c: Adjust.
>
> ---
>  gcc/match.pd| 13 +
>  gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 13 -
>  gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c  |  8 ++--
>  3 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39e1e5c..732b80c 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -284,6 +284,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
>   { build_zero_cst (type); })
>
> +/* Combine successive multiplications.  Similar to above, but handling
> +   overflow is different.  */
> +(simplify
> + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> + (with {
> +   bool overflow_p;
> +   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
> +  }
> +  /* Skip folding on overflow: the only special case is @1 * @2 == -INT_MIN,
> + otherwise undefined overflow implies that @0 must be zero.  */
> +  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))
> +   (mult @0 { wide_int_to_tree (type, mul); }
> +
>  /* Optimize A / A to 1.0 if we don't care about
> NaNs or Infinities.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> index a92c882..cc0e9d4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> @@ -5,4 +5,15 @@ int f0(int a, int b){
>return a * 33 * b * 55;
>  }
>
> -/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
> +int f1(int a){
> +  a *= 33;
> +  return a * 55;
> +}
> +
> +int f2(int a, int b){
> +  a *= 33;
> +  return a * b * 55;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */
> +/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> index 17d7b4c..1e943b7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> @@ -23,13 +23,9 @@ f (int i)
>foo (y);
>  }
>
> -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */
>  /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
>  /* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
> --
> 1.8.3.1
>


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber

On 25/07/17 16:12, Sebastian Huber wrote:


On 25/07/17 15:42, Eric Botcazou wrote:


What is your opinion with respect to a -mno-fsmuld option or something
similar?

Far better in my opinion (at least for LEON3).



How should I add this option? For example:

diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index ae63d2018e3..7a582f3a870 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -93,6 +93,10 @@ mfmaf
 Target Report Mask(FMAF)
 Use UltraSPARC Fused Multiply-Add extensions.

+mfsmuld
+Target Report Mask(FSMULD)
+Use V8+ Floating-point Multiply Single to Double (FsMULd) instruction.
+
 mpopc
 Target Report Mask(POPC)
 Use UltraSPARC Population-Count instruction.

In this case I have to update the cpu_table. I guess all (MASK_V8 || 
MASK_V9) && MASK_FPU entries should then get a MASK_FSMULD in the 
enable field (except leon)? 


The only CPUs in this category are hypersparc and leon3. What about all 
the V9 processors? They don't have MASK_FPU set?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH GCC][1/2]Feed bound computation to folder in loop split

2017-07-25 Thread Richard Biener
On Mon, Jul 24, 2017 at 4:31 PM, Marc Glisse  wrote:
> On Mon, 24 Jul 2017, Bin.Cheng wrote:
>
>> On Mon, Jul 24, 2017 at 2:59 PM, Marc Glisse  wrote:
>>>
>>> On Mon, 24 Jul 2017, Bin.Cheng wrote:
>>>
 But since definition of _197 isn't in current stmt sequence, call "o31
 = do_valueize (valueize, o31)" will return NULL.  As a result, it's
 not matched.
>>>
>>>
>>>
>>> Wait, actually, how was your fold_build* version working? Why was the
>>> first
>>> addition "in the current generic tree" and why isn't it "in current stmt
>>> sequence"?
>>
>> Maybe I didn't express it clearly.  In compute_new_first_bound, we
>> have stmt sequence "_124 = _197 + 1", and we try to simplify "_124 -
>> 1" by calling gimple_build.  The definition of _197 is a PHI and isn't
>> in current stmt sequence.  For fold_build* version, it builds
>> expression "_197 + 1 - 1" and simplifies it.
>
>
> It seems like it shouldn't be relevant whether the definition of _197 is in
> the stmt sequence or not, but indeed we seem to generate a lot of calls to
> do_valueize... I had misunderstood the issue, sorry.
>
> Strangely, for a pattern like
> (simplify (plus @0 @1) @0)
> we generate no call to valueize,

Expected.  We expect the "toplevel" trees to be already valueized.

> while for the following
> (simplify (plus @0 (minus @1 @2)) @0)
> we generate 3 calls to do_valueize.
>
> I think we need Richard to say what the intent is for the valueization
> function. It is used both to stop looking at defining stmt if the return is
> NULL, and to replace/optimize one SSA_NAME with another, but currently it
> seems hard to prevent looking at the defining statement without preventing
> from looking at the SSA_NAME at all.

Yeah, this semantic overloading is an issue.  For gimple_build we have nothing
to "valueize" but we only use it to tell genmatch that it may not look at the
SSA_NAME_DEF_STMT.

> I guess we'll need a fix in genmatch...

I'll have a look tomorrow.

RIchard.

> --
> Marc Glisse


[patch,avr,v7,committed] Backport PR81407

2017-07-25 Thread Georg-Johann Lay

Backported from trunk as:

https://gcc.gnu.org/r250522

Johann

gcc/
Backport from 2017-07-12 trunk r250151.
PR target/81407
* config/avr/avr.c (avr_encode_section_info)
[progmem && !TREE_READONLY]: Error if progmem object needs
constructing.

Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 250505)
+++ config/avr/avr.c(working copy)
@@ -10140,18 +10140,26 @@ avr_encode_section_info (tree decl, rtx

   if (new_decl_p
   && decl && DECL_P (decl)
-  && NULL_TREE == DECL_INITIAL (decl)
   && !DECL_EXTERNAL (decl)
   && avr_progmem_p (decl, DECL_ATTRIBUTES (decl)))
 {
-  // Don't warn for (implicit) aliases like in PR80462.
-  tree asmname = DECL_ASSEMBLER_NAME (decl);
-  varpool_node *node = varpool_node::get_for_asmname (asmname);
-  bool alias_p = node && node->alias;
+  if (!TREE_READONLY (decl))
+{
+  // This might happen with C++ if stuff needs constructing.
+  error ("variable %q+D with dynamic initialization put "
+ "into program memory area", decl);
+}
+  else if (NULL_TREE == DECL_INITIAL (decl))
+{
+  // Don't warn for (implicit) aliases like in PR80462.
+  tree asmname = DECL_ASSEMBLER_NAME (decl);
+  varpool_node *node = varpool_node::get_for_asmname (asmname);
+  bool alias_p = node && node->alias;

-  if (!alias_p)
-warning (OPT_Wuninitialized, "uninitialized variable %q+D put 
into "

- "program memory area", decl);
+  if (!alias_p)
+warning (OPT_Wuninitialized, "uninitialized variable %q+D put "
+ "into program memory area", decl);
+}
 }

   default_encode_section_info (decl, rtl, new_decl_p);


Re: [PATCH] [SPARC] Disable muldf3_extend for LEON/LEON3

2017-07-25 Thread Sebastian Huber

On 25/07/17 15:42, Eric Botcazou wrote:


What is your opinion with respect to a -mno-fsmuld option or something
similar?

Far better in my opinion (at least for LEON3).



Attached is a variant with this option. It touches a lot more files.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

>From 132e68c33c4bb48b6b9b99c4cada58e020ce67bd Mon Sep 17 00:00:00 2001
From: Sebastian Huber 
Date: Thu, 20 Jul 2017 10:38:58 +0200
Subject: [PATCH] [SPARC] Add -mfslumd option

---
 gcc/config/sparc/freebsd.h|  2 +-
 gcc/config/sparc/linux64.h|  2 +-
 gcc/config/sparc/netbsd-elf.h |  3 ++-
 gcc/config/sparc/openbsd64.h  |  3 ++-
 gcc/config/sparc/sol2.h   |  4 ++--
 gcc/config/sparc/sp64-elf.h   |  3 ++-
 gcc/config/sparc/sparc.c  | 13 +
 gcc/config/sparc/sparc.h  |  3 ++-
 gcc/config/sparc/sparc.md |  2 +-
 gcc/config/sparc/sparc.opt|  4 
 gcc/doc/invoke.texi   |  9 -
 11 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/gcc/config/sparc/freebsd.h b/gcc/config/sparc/freebsd.h
index 3e7ea6c9d17..c718e9e5cd8 100644
--- a/gcc/config/sparc/freebsd.h
+++ b/gcc/config/sparc/freebsd.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT \
   (MASK_V9 + MASK_64BIT + MASK_PTR64 /* + MASK_FASTER_STRUCTS */ \
-   + MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU \
+   + MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_FSMULD \
+ MASK_LONG_DOUBLE_128 /* + MASK_HARD_QUAD */)
 
 /* The default code model.  */
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 573ce8a9a4c..4f9d4e835b8 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT \
   (MASK_V9 + MASK_PTR64 + MASK_64BIT + MASK_STACK_BIAS + \
-   MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+   MASK_APP_REGS + MASK_FPU + MASK_FSMULD + MASK_LONG_DOUBLE_128)
 #endif
 
 /* This must be v9a not just v9 because by default we enable
diff --git a/gcc/config/sparc/netbsd-elf.h b/gcc/config/sparc/netbsd-elf.h
index 9d670cff8ac..c25998434db 100644
--- a/gcc/config/sparc/netbsd-elf.h
+++ b/gcc/config/sparc/netbsd-elf.h
@@ -96,7 +96,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT \
   (MASK_V9 + MASK_PTR64 + MASK_64BIT /* + MASK_HARD_QUAD */ \
-   + MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+   + MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_FSMULD \
+   + MASK_LONG_DOUBLE_128)
 
 #undef SPARC_DEFAULT_CMODEL
 #define SPARC_DEFAULT_CMODEL CM_MEDANY
diff --git a/gcc/config/sparc/openbsd64.h b/gcc/config/sparc/openbsd64.h
index e8a8abccde6..ec5d782c834 100644
--- a/gcc/config/sparc/openbsd64.h
+++ b/gcc/config/sparc/openbsd64.h
@@ -21,7 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT \
 (MASK_V9 + MASK_PTR64 + MASK_64BIT + MASK_HARD_QUAD \
- + MASK_APP_REGS + MASK_FPU + MASK_STACK_BIAS + MASK_LONG_DOUBLE_128)
+ + MASK_APP_REGS + MASK_FPU + MASK_FSMULD + MASK_STACK_BIAS \
+ + MASK_LONG_DOUBLE_128)
 
 #undef SPARC_DEFAULT_CMODEL
 #define SPARC_DEFAULT_CMODEL CM_MEDMID
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
index b8177c0b692..567466be4c9 100644
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -26,10 +26,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifdef TARGET_64BIT_DEFAULT
 #define TARGET_DEFAULT \
   (MASK_V9 + MASK_64BIT + MASK_PTR64 + MASK_STACK_BIAS + \
-   MASK_V8PLUS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+   MASK_V8PLUS + MASK_APP_REGS + MASK_FPU + MASK_FSMULD + MASK_LONG_DOUBLE_128)
 #else
 #define TARGET_DEFAULT \
-  (MASK_V8PLUS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+  (MASK_V8PLUS + MASK_APP_REGS + MASK_FPU + MASK_FSMULD + MASK_LONG_DOUBLE_128)
 #endif
 
 /* The default code model used to be CM_MEDANY on Solaris
diff --git a/gcc/config/sparc/sp64-elf.h b/gcc/config/sparc/sp64-elf.h
index 8b0b0788185..0e98af60c46 100644
--- a/gcc/config/sparc/sp64-elf.h
+++ b/gcc/config/sparc/sp64-elf.h
@@ -22,7 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT \
 (MASK_V9 + MASK_PTR64 + MASK_64BIT + MASK_HARD_QUAD \
- + MASK_APP_REGS + MASK_FPU + MASK_STACK_BIAS + MASK_LONG_DOUBLE_128)
+ + MASK_APP_REGS + MASK_FPU + MASK_FSMULD + MASK_STACK_BIAS \
+ + MASK_LONG_DOUBLE_128)
 
 #undef SPARC_DEFAULT_CMODEL
 #define SPARC_DEFAULT_CMODEL CM_EMBMEDANY
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index f4f318a81a1..55c42c417e8 100644
--- a/gcc/config/sparc/spa

Fix thinko in gimple_assign_set_rhs_with_ops

2017-07-25 Thread Eric Botcazou
The attached fix to the Ada front-end introduces a regression in the ACATS 
testsuite for cb4009a.  The backtrace is:

#0  operation_could_trap_helper_p(tree_code, bool, bool, bool, bool, 
tree_node*, bool*) () at /home/eric/gnat/gnat-head/src/gcc/tree-eh.c:2439
#1  0x012946d7 in stmt_could_throw_1_p (stmt=)
at /home/eric/gnat/gnat-head/src/gcc/tree-eh.c:2759
#2  stmt_could_throw_p(gimple*) [clone .part.186] ()
at /home/eric/gnat/gnat-head/src/gcc/tree-eh.c:2809
#3  0x01295f28 in stmt_could_throw_p (stmt=0x76c5d420)
at /home/eric/gnat/gnat-head/src/gcc/tree-eh.c:2924
#4  maybe_clean_or_replace_eh_stmt (old_stmt=old_stmt@entry=0x76c565d8, 
new_stmt=new_stmt@entry=0x76c5d420)
at /home/eric/gnat/gnat-head/src/gcc/tree-eh.c:2908
#5  0x00fd8a0b in gsi_replace(gimple_stmt_iterator*, gimple*, bool) ()
at /home/eric/gnat/gnat-head/src/gcc/gimple-iterator.c:447
#6  0x00fd14af in 
gimple_assign_set_rhs_with_ops(gimple_stmt_iterator*, tree_code, tree_node*, 
tree_node*, tree_node*) ()
at /home/eric/gnat/gnat-head/src/gcc/gimple.c:1616
#7  0x00fe7af7 in replace_stmt_with_simplification (inplace=false, 
seq=0x7fffd818, ops=0x7fffd820, rcode=..., gsi=0x7fffd8e0)
at /home/eric/gnat/gnat-head/src/gcc/gimple-fold.c:4151
#8  fold_stmt_1(gimple_stmt_iterator*, bool, tree_node* (*)(tree_node*)) ()
at /home/eric/gnat/gnat-head/src/gcc/gimple-fold.c:4462
#9  0x00fe961a in fold_stmt (gsi=gsi@entry=0x7fffd8e0, 
valueize=valueize@entry=0x1331170 )
at /home/eric/gnat/gnat-head/src/gcc/gimple-fold.c:4689

The folding is turning a TRUNC_DIV_EXPR into a COND_EXPR and the code does:

  /* If the new CODE needs more operands, allocate a new statement.  */
  if (gimple_num_ops (stmt) < new_rhs_ops + 1)
{
  tree lhs = gimple_assign_lhs (stmt);
  gimple *new_stmt = gimple_alloc (gimple_code (stmt), new_rhs_ops + 1);
  memcpy (new_stmt, stmt, gimple_size (gimple_code (stmt)));
  gimple_init_singleton (new_stmt);
  gsi_replace (gsi, new_stmt, true);
  stmt = new_stmt;

  /* The LHS needs to be reset as this also changes the SSA name
 on the LHS.  */
  gimple_assign_set_lhs (stmt, lhs);
}

i.e it asks gsi_replace to update EH info, which doesn't work since the new 
statement is dummy at this point.  Fixed by passing false instead of true.

Bootstrapped/regtested on x86_64-suse-linux, applied on mainline as obvious.


2017-07-25  Eric Botcazou  

* gimple.c (gimple_assign_set_rhs_with_ops): Do not ask gsi_replace
to update EH info here.


2017-07-25  Javier Miranda  

ada/
* checks.adb (Apply_Divide_Checks): Ensure that operands are not
evaluated twice. 


2017-07-25  Eric Botcazou  

testsuite/
* gnat.dg/opt66.adb: New test.


-- 
Eric BotcazouIndex: gimple.c
===
--- gimple.c	(revision 250379)
+++ gimple.c	(working copy)
@@ -1613,7 +1613,7 @@ gimple_assign_set_rhs_with_ops (gimple_stmt_iterat
   gimple *new_stmt = gimple_alloc (gimple_code (stmt), new_rhs_ops + 1);
   memcpy (new_stmt, stmt, gimple_size (gimple_code (stmt)));
   gimple_init_singleton (new_stmt);
-  gsi_replace (gsi, new_stmt, true);
+  gsi_replace (gsi, new_stmt, false);
   stmt = new_stmt;
 
   /* The LHS needs to be reset as this also changes the SSA name
Index: ada/checks.adb
===
--- ada/checks.adb	(revision 250379)
+++ ada/checks.adb	(working copy)
@@ -1818,6 +1818,13 @@ package body Checks is
  and then
   ((not LOK) or else (Llo = LLB))
then
+  --  Ensure that expressions are not evaluated twice (once
+  --  for their runtime checks and once for their regular
+  --  computation).
+
+  Force_Evaluation (Left, Mode => Strict);
+  Force_Evaluation (Right, Mode => Strict);
+
   Insert_Action (N,
 Make_Raise_Constraint_Error (Loc,
   Condition =>
-- { dg-do compile }
-- { dg-options "-O" }

procedure Opt66 (I : Integer) is
  E : exception;
begin
  if I = 0 then
raise E;
  end if;
  Opt66 (I - I / abs (I));
exception
  when others => null;
end;


[PATCH, i386]: Two cleanups

2017-07-25 Thread Uros Bizjak
2017-07-25  Uros Bizjak  

* config/i386/i386.c (ix86_decompose_address): Do not check for
register RTX when looking at index_reg or base_reg.
* config/i386/i386.h (INCOMING_RETURN_ADDR_RTX): Use stack_pointer_rtx.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 250499)
+++ config/i386/i386.c  (working copy)
@@ -16187,9 +16187,9 @@ ix86_decompose_address (rtx addr, struct ix86_addr
 
   /* Allow arg pointer and stack pointer as index if there is not scaling.  */
   if (base_reg && index_reg && scale == 1
-  && (index_reg == arg_pointer_rtx
- || index_reg == frame_pointer_rtx
- || (REG_P (index_reg) && REGNO (index_reg) == STACK_POINTER_REGNUM)))
+  && (REGNO (index_reg) == ARG_POINTER_REGNUM
+ || REGNO (index_reg) == FRAME_POINTER_REGNUM
+ || REGNO (index_reg) == SP_REG))
 {
   std::swap (base, index);
   std::swap (base_reg, index_reg);
@@ -16197,14 +16197,11 @@ ix86_decompose_address (rtx addr, struct ix86_addr
 
   /* Special case: %ebp cannot be encoded as a base without a displacement.
  Similarly %r13.  */
-  if (!disp
-  && base_reg
-  && (base_reg == hard_frame_pointer_rtx
- || base_reg == frame_pointer_rtx
- || base_reg == arg_pointer_rtx
- || (REG_P (base_reg)
- && (REGNO (base_reg) == HARD_FRAME_POINTER_REGNUM
- || REGNO (base_reg) == R13_REG
+  if (!disp && base_reg
+  && (REGNO (base_reg) == ARG_POINTER_REGNUM
+ || REGNO (base_reg) == FRAME_POINTER_REGNUM
+ || REGNO (base_reg) == BP_REG
+ || REGNO (base_reg) == R13_REG))
 disp = const0_rtx;
 
   /* Special case: on K6, [%esi] makes the instruction vector decoded.
@@ -16213,7 +16210,7 @@ ix86_decompose_address (rtx addr, struct ix86_addr
  to test cfun for being non-NULL. */
   if (TARGET_K6 && cfun && optimize_function_for_speed_p (cfun)
   && base_reg && !index_reg && !disp
-  && REG_P (base_reg) && REGNO (base_reg) == SI_REG)
+  && REGNO (base_reg) == SI_REG)
 disp = const0_rtx;
 
   /* Special case: encode reg+reg instead of reg*2.  */
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 250499)
+++ config/i386/i386.h  (working copy)
@@ -2165,7 +2165,7 @@ extern int const svr4_dbx_register_map[FIRST_PSEUD
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
+  gen_rtx_MEM (Pmode, stack_pointer_rtx)
 
 /* After the prologue, RA is at -4(AP) in the current frame.  */
 #define RETURN_ADDR_RTX(COUNT, FRAME)  \


Re: c-family PATCH to improve -Wsign-compare (PR c/81417)

2017-07-25 Thread Martin Sebor

How hard would it be to also suppress the warning for benign
comparisons like C++ manages to do?  E.g., C warns on this
code even though there's no problem with it:

  int foo (unsigned int b)
  {
const int a = 1;
return (a < b);
  }

Martin

On 07/25/2017 03:45 AM, Marek Polacek wrote:

Ping.

On Fri, Jul 14, 2017 at 03:27:21PM +0200, Marek Polacek wrote:

On Thu, Jul 13, 2017 at 04:59:20PM -0400, David Malcolm wrote:

On Thu, 2017-07-13 at 16:39 -0400, Eric Gallager wrote:

On 7/13/17, David Malcolm  wrote:

On Thu, 2017-07-13 at 18:33 +0200, Marek Polacek wrote:

A tiny patch for -Wsign-compare so that is also prints the types
when
reporting a warning.

Bootstrapped/regtested on x86_64-linux and ppc64le-redhat-linux,
ok
for trunk?


Looks like it always display the types in the order signed then
unsigned, which matches the text of the diagnostic, but not
necessarily
the ordering within the expression, which might be confusing if
someone's comparing e.g.

  unsigned_a < signed_b



Good catch, I forgot about that case when opening the original bug
that Marek posted this patch for...


But we already hardcode the ordering within the text of the
diagnostic,
so that feels excessively nit-picky.


I don't think it's being excessively nit-picky; I think it'd make
more
sense to match the ordering of the expression. That's what clang
does:

$ cat Wsign_compare.c
/* { dg-do compile } */

int foo(signed int a, unsigned int b)
{
return (a < b);
}

int bar(unsigned int c, signed int d)
{
return (c < d);
}

$ /sw/opt/llvm-3.1/bin/clang -c -Wsign-compare Wsign_compare.c
Wsign_compare.c:5:12: warning: comparison of integers of different
signs: 'int' and 'unsigned int' [-Wsign-compare]
return (a < b);
~ ^ ~
Wsign_compare.c:10:12: warning: comparison of integers of different
signs: 'unsigned int' and 'int' [-Wsign-compare]
return (c < d);
~ ^ ~
2 warnings generated.


That's much nicer.





OK for trunk (with my "diagnostic messages" maintainer hat on).


Marek: I take it back; can you update the patch accordingly, please?

(Note to self: always doublecheck the linked PR for context).


Sure, here goes:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-07-14  Marek Polacek  

PR c/81417
* c-warn.c (warn_for_sign_compare): Print the types.

* c-c++-common/Wsign-compare-1.c: New test.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index b9378c2dbe2..7ff11821453 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -1891,9 +1891,10 @@ warn_for_sign_compare (location_t location,
   c_common_signed_type (base_type)))
/* OK */;
   else
-   warning_at (location,
-   OPT_Wsign_compare,
-   "comparison between signed and unsigned integer 
expressions");
+   warning_at (location, OPT_Wsign_compare,
+   "comparison between signed and unsigned integer "
+   "expressions: %qT and %qT", TREE_TYPE (orig_op0),
+   TREE_TYPE (orig_op1));
 }

   /* Warn if two unsigned values are being compared in a size larger
diff --git gcc/testsuite/c-c++-common/Wsign-compare-1.c 
gcc/testsuite/c-c++-common/Wsign-compare-1.c
index e69de29bb2d..0e01453e7d8 100644
--- gcc/testsuite/c-c++-common/Wsign-compare-1.c
+++ gcc/testsuite/c-c++-common/Wsign-compare-1.c
@@ -0,0 +1,27 @@
+/* PR c/81417 */
+/* { dg-do compile } */
+/* { dg-options "-Wsign-compare" } */
+
+int
+fn1 (signed int a, unsigned int b)
+{
+  return a < b; /* { dg-warning "comparison between signed and unsigned integer 
expressions: 'int' and 'unsigned int'" } */
+}
+
+int
+fn2 (signed int a, unsigned int b)
+{
+  return b < a; /* { dg-warning "comparison between signed and unsigned integer 
expressions: 'unsigned int' and 'int'" } */
+}
+
+int
+fn3 (signed long int a, unsigned long int b)
+{
+  return b < a; /* { dg-warning "comparison between signed and unsigned integer 
expressions: 'long unsigned int' and 'long int'" } */
+}
+
+int
+fn4 (signed short int a, unsigned int b)
+{
+  return b < a; /* { dg-warning "comparison between signed and unsigned integer 
expressions: 'unsigned int' and 'short int'" } */
+}


Marek





Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-07-25 Thread Andrew MacLeod

On 07/25/2017 03:12 AM, Richard Biener wrote:

On Fri, Jul 21, 2017 at 9:30 PM, Aldy Hernandez  wrote:

On Mon, Jul 17, 2017 at 6:23 AM, Richard Biener
 wrote:

On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez  wrote:

How does this look?

It's a change that on its own doesn't look worthwhile to me.

So please post the changes that will build ontop of this.  Like removing
anti-ranges from VRP or your on-demand value-range stuff.

Thanks,
Richard.

 From the looks of it, we can have a variety of VRP ranges that are not
representable at all with the an integer range class.  For instance, I
see the following ranges built in set_value_range():

[INT, (nop_expr SSA)]

[INT, (plus_expr SSA INT)]

[(negate_expr SSA), (negate_expr SSA)]

[(plus_expr (negate_expr SSA INT)),
  (plus_expr (negate_expr SSA) INT)]

[SSA, SSA]

So...I guess the first suggestion is out of the question ;-).

Well.  We do not want range operations implemented twice (really!)
so range operations need to work on both representations.  So we
should think of a way to get rid of anti-ranges in VRP which, frankly,
means that for the sake symbolic ranges we have to trade them
with handling open/closed ranges which I'm not sure will be less of
a hassle to handle?


I originally looked at ranges with symbolic expressions, but as soon as 
there are ranges comprised of more than 1 range, symbolics became a 
nightmare.  At best you can do endpoints, and even then you have to 
start adding complexity..  [0, 100] Union  [5, x_4]  now has to become  
[0, max(100, x_4)] , and chained operations were making the expressions 
more and more complicated.  Trying to maintain these expression across 
optimizations was also very problematic as the IL changes and these 
expressions are not in the IL and don't suffer updates easily.


At which point one asks themselves whether it actually makes sense to 
integrate that into the range representation, or try a different tactic.


Seems to me its cleaner to have an integral range, and when appropriate 
track symbols separately to determine if their values can be refined.
If at some point you can resolve those symbols to an integral value,  
then you simply update the integral range with the new range you've 
determined.


VRP chooses to do this by creating a completely different structure for 
ranges, and representing endpoints as expression trees. It then updates 
the integral ranges at the end of the pass. It uses anti-ranges as a 
shorthand to handle a special case of a range comprised of 2 ranges.  As 
it stands right now, VRP's version of union and intersection can never 
have much in common with a general integral range implementation. They 
are 2 very different beasts.


So we cant do much about VRP internally yet without some significant 
surgery.


This issue seems orthogonal to me though.  irange is not intended to 
ever do anything with symbols, thus can never replace VRPs internal 
value_range class.We're simply replacing "struct range_info_def" 
with a new range structure and API which can support more precise ranges 
than a pair of integral endpoints.  We'll then build on that by 
providing routines which can generate more precise range information as 
well.


For the moment we provide an implementation with the same precision to
  a) ensure code generation remains the same
  b) allow the compiler to use it for a while to make sure no bugs have 
been introduced
  c) and there is nothing that would utilize the additional precision 
yet anyway.


I just think its important to get it in the code base so its being used 
and we can be sure its correct.


On its own it is an improvement to "struct range_info_def" as it 
encapsulates the activity in one place and provides a standard API any 
optimization can utilize.  It allows us to improve the precision of 
ranges in the future without changing any client code., and it does not 
appear to have any measurable compile time storage or speed issues.  It 
is simply an infrastructure improvement.


Andrew



Re: [PATCH, rs6000, 5/6] Backport patch to align .toc section

2017-07-25 Thread Segher Boessenkool
Hi Bill,

On Sun, Jul 23, 2017 at 12:07:37PM -0500, Bill Schmidt wrote:
> Alan added code to have GCC align the .toc section in GCC 7, aligning to
> four bytes for 32-bit mode, and 8 bytes for 64-bit mode.  This is normally
> unnecessary since alignment of the .toc has historically been handled by
> the standard linker script.  However, people using non-standard toolchains
> can still run into problems, so as a courtesy this is now done by the
> compiler as well.
> 
> We've recently seen this encountered using an out-of-service compiler, and
> would like to prevent any more occurrences by people finding interesting
> ways to avoid the standard toolchain.  Thus I'd like to backport Alan's
> patch to GCC 5 and 6.  Alan, please review as well to be sure I didn't
> miss any release-specific differences.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Are these backports ok?

Sure, if Alan is happy with it, please commit.


Segher


> 2017-07-23  Bill Schmidt  
> 
>   Backport from mainline r235874
>   2016-05-04  Alan Modra  
> 
>   * config/rs6000/rs6000.c (rs6000_elf_output_toc_section_asm_op):
>   Align .toc.


Re: PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread Uros Bizjak
On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
 On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
 > Hi!
 >
 > Honza recently changed the i?86 backend, so that it often doesn't
 > do -maccumulate-outgoing-args by default on x86_64.
 > Unfortunately, on some of the here included testcases this regressed
 > quite a bit the generated code.  As AVX vectors are used, the dynamic
 > realignment code needs to assume e.g. that some of them will need to be
 > spilled, and for -mno-accumulate-outgoing-args the code needs to set
 > need_drap early as well.  But in when emitting the prologue/epilogue,
 > if need_drap is set, we don't perform the optimization for leaf functions
 > which have zero size stack frame, thus we end up with uselessly doing
 > dynamic stack realignment, setting up DRAP that nothing uses and later on
 > restore everything back.
 >
 > This patch improves it, if the DRAP register isn't live at the start of
 > entry bb successor and we aren't going to realign the stack, we don't
 > need DRAP at all, and even if we need DRAP register, that can't be the 
 > sole
 > reason for doing stack realignment, the prologue code is able to set up 
 > DRAP
 > even without dynamic stack realignment.
 >
 > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 >
 > 2013-12-20  Jakub Jelinek  
 >
 > PR target/59501
 > * config/i386/i386.c (ix86_save_reg): Don't return true for 
 > drap_reg
 > if !crtl->stack_realign_needed.
 > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
 > entry
 > and stack_realign_needed will be false, clear drap_reg and 
 > need_drap.
 > Optimize leaf functions that don't need stack frame even if
 > crtl->need_drap.
 >
 > * gcc.target/i386/pr59501-1.c: New test.
 > * gcc.target/i386/pr59501-1a.c: New test.
 > * gcc.target/i386/pr59501-2.c: New test.
 > * gcc.target/i386/pr59501-2a.c: New test.
 > * gcc.target/i386/pr59501-3.c: New test.
 > * gcc.target/i386/pr59501-3a.c: New test.
 > * gcc.target/i386/pr59501-4.c: New test.
 > * gcc.target/i386/pr59501-4a.c: New test.
 > * gcc.target/i386/pr59501-5.c: New test.
 > * gcc.target/i386/pr59501-6.c: New test.

LGTM, assuming Jakub is OK with the patch.

Thanks,
Uros.


 >
 > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
 > 12:19:20.603212859 +0100
 > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
 > 12:23:33.647881672 +0100
 > @@ -0,0 +1,8 @@
 > +/* PR target/59501 */
 > +/* { dg-do compile { target { ! ia32 } } } */
 > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
 > +
 > +#include "pr59501-3a.c"
 > +
 > +/* Verify no dynamic realignment is performed.  */
 > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
 > */
 >

 Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
 xfailed due to stack frame access via frame pointer instead of DARP.
 This patch finds the maximum stack alignment from the stack frame access
 instructions and avoids stack realignment if stack alignment needed is
 less than incoming stack boundary.

 I am testing this patch.  OK for trunk if there is no regression?


>>>
>>> We need to keep the preferred stack alignment as the minimum stack
>>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>>> trunk?
>>>
>>> Thanks.
>>
>> Hi Jakub,
>>
>> This patch fixes the xfailed testcase in your patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>>
>> Your original patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>>
>> assumes that any instructions accessing stack require stack
>> realignment:
>>
>> +  FOR_EACH_BB (bb)
>> +{
>> +  rtx insn;
>> +  FOR_BB_INSNS (bb, insn)
>> +if (NONDEBUG_INSN_P (insn)
>> + && requires_stack_frame_p (insn, prologue_used,
>> +   set_up_by_prologue))
>> +  {
>> + crtl->stack_realign_needed = stack_realign;
>> + crtl->stack_realign_finalized = true;
>> + return;
>> +  }
>> + }
>>
>> This patch checks the actual alignment needed for any instructions
>> accessing stack.  It skips stack realignment if the actual stack alignment
>> needed is less than or equal to incoming stack alignment.
>>
>> Can you take look at it?
>>
>> Thanks.
>>
>
> PING
>
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>
> --
> H.J.


Re: [gomp4] OpenACC async re-work

2017-07-25 Thread Chung-Lin Tang
On 2017/7/25 10:09 PM, Cesar Philippidis wrote:
> On 07/25/2017 05:51 AM, Chung-Lin Tang wrote:
>> On 2017/6/29 6:31 AM, Cesar Philippidis wrote:
> 
>> Attached is the updated version of the patch, re-tested.
>>
>> Thomas, do you need some more time to look over it? Or should I commit it 
>> first?
> 
> I'm not too concerned about the profiling stuff because that should be
> considered beta quality (because it's still not fully implemented).
> 
> We'll probably need to revisit the locking stuff later, but this patch
> is OK for og7. I suspect that we may have to make other locking tweaks
> in the nvptx plugin anyway.
> 
> Cesar
> 

Patch has been committed to gomp-4_0-branch.

Chung-Lin



Re: c-family PATCH to improve -Wsign-compare (PR c/81417)

2017-07-25 Thread David Malcolm
On Fri, 2017-07-14 at 15:27 +0200, Marek Polacek wrote:
> On Thu, Jul 13, 2017 at 04:59:20PM -0400, David Malcolm wrote:
> > On Thu, 2017-07-13 at 16:39 -0400, Eric Gallager wrote:
> > > On 7/13/17, David Malcolm  wrote:
> > > > On Thu, 2017-07-13 at 18:33 +0200, Marek Polacek wrote:
> > > > > A tiny patch for -Wsign-compare so that is also prints the
> > > > > types
> > > > > when
> > > > > reporting a warning.
> > > > > 
> > > > > Bootstrapped/regtested on x86_64-linux and ppc64le-redhat
> > > > > -linux,
> > > > > ok
> > > > > for trunk?
> > > > 
> > > > Looks like it always display the types in the order signed then
> > > > unsigned, which matches the text of the diagnostic, but not
> > > > necessarily
> > > > the ordering within the expression, which might be confusing if
> > > > someone's comparing e.g.
> > > > 
> > > >   unsigned_a < signed_b
> > > > 
> > > 
> > > Good catch, I forgot about that case when opening the original
> > > bug
> > > that Marek posted this patch for...
> > > 
> > > > But we already hardcode the ordering within the text of the
> > > > diagnostic,
> > > > so that feels excessively nit-picky.
> > > 
> > > I don't think it's being excessively nit-picky; I think it'd make
> > > more
> > > sense to match the ordering of the expression. That's what clang
> > > does:
> > > 
> > > $ cat Wsign_compare.c
> > > /* { dg-do compile } */
> > > 
> > > int foo(signed int a, unsigned int b)
> > > {
> > >   return (a < b);
> > > }
> > > 
> > > int bar(unsigned int c, signed int d)
> > > {
> > >   return (c < d);
> > > }
> > > 
> > > $ /sw/opt/llvm-3.1/bin/clang -c -Wsign-compare Wsign_compare.c
> > > Wsign_compare.c:5:12: warning: comparison of integers of
> > > different
> > > signs: 'int' and 'unsigned int' [-Wsign-compare]
> > > return (a < b);
> > > ~ ^ ~
> > > Wsign_compare.c:10:12: warning: comparison of integers of
> > > different
> > > signs: 'unsigned int' and 'int' [-Wsign-compare]
> > > return (c < d);
> > > ~ ^ ~
> > > 2 warnings generated.
> > 
> > That's much nicer.
> > 
> > > 
> > > > 
> > > > OK for trunk (with my "diagnostic messages" maintainer hat on).
> > 
> > Marek: I take it back; can you update the patch accordingly,
> > please?
> > 
> > (Note to self: always doublecheck the linked PR for context).
> 
> Sure, here goes:

Thanks for updating the patch.

There's still an issue with ordering in the updated patch.

There are three orderings:

  (a) the order of the expressions in the source code (LHS CMP RHS)

  (b) the order of kinds of signedness in the messages (currently
hardcoded as "signed and unsigned", which doesn't respect (a))

  (c) the order of the the types that are reported (currently done as
orig_op0 vs orig_op1, which if I'm reading the code is LHS vs RHS).

So, as written (a) and (c) have the same order, but (b)'s order is
hardcoded, and so there could be a mismatch.

All of the examples in the testcase are of the form
  signed LHS with unsigned RHS.

What happens if the LHS is unsigned, and the RHS is signed?  e.g.

  int
  fn5 (unsigned int a, signed int b)
  {
return a < b;
  }

Presumably this case still merits a warning, but as written, the
warning would read:

  warning "comparison between signed and unsigned integer expressions:   
'unsigned int' and 'signed int'

This seems rather awkward to me; in a less trivial example, I can imagine the 
user having to take a moment to figure out which side of the expression has 
which signedness.

I think that any time we're reporting on the types of two sides of an 
expression like this, we should follow the ordering in the user's code, i.e. 
(a) above.   The patch has (c) doing this, but the text (b) is problematic.

I can see two ways of fixing this:

(i) rework the text of the message so that it changes based on which side has 
which signedness, e.g.:

  "comparison between signed and unsigned integer expressions"
vs
  "comparison between unsigned and signed integer expressions"

or,

(ii) change the text of the message to not have an ordering.  Clang has 
"comparison of integers of different signs" - though I think this should say 
"signedness", not "signs"; surely an instance of an int has a sign (e.g. "-3" 
is negative), but a integer *type* has a signedness (e.g. "unsigned short").  
So I'd change it to say:
"comparison of integer expressions of different signedness"

Please also add a testcase that covers this case (e.g. fn5 above).

Dave


> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-07-14  Marek Polacek  
> 
>   PR c/81417
>   * c-warn.c (warn_for_sign_compare): Print the types.
> 
>   * c-c++-common/Wsign-compare-1.c: New test.
> 
> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> index b9378c2dbe2..7ff11821453 100644
> --- gcc/c-family/c-warn.c
> +++ gcc/c-family/c-warn.c
> @@ -1891,9 +1891,10 @@ warn_for_sign_compare (location_t location,
>  c_common_signed_type
> (base_ty

Re: [rs6000] Avoid rotates of floating-point modes

2017-07-25 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi Richard,
>
> On Wed, Jul 12, 2017 at 05:33:42PM +0100, Richard Sandiford wrote:
>> The little-endian VSX code uses rotates to swap the two 64-bit halves of
>> 128-bit scalar modes.  This is fine for TImode and V1TImode, but it
>> isn't really valid to use RTL rotates on floating-point modes like
>> KFmode and TFmode, and doing that triggered an assert added by the
>> SVE series.  This patch uses bit-casts to V1TImode instead.
>> 
>> Tested on powerpc64le-linux-gnu.  OK to install?
>
>
>> +void
>> +rs6000_emit_le_vsx_permute (rtx dest, rtx source, machine_mode mode)
>>  {
>>/* Use ROTATE instead of VEC_SELECT on IEEE 128-bit floating point, and
>>   128-bit integers if they are allowed in VSX registers.  */
>> -  if (FLOAT128_VECTOR_P (mode) || mode == TImode || mode == V1TImode)
>> -return gen_rtx_ROTATE (mode, source, GEN_INT (64));
>> +  if (FLOAT128_VECTOR_P (mode))
>> +{
>> +  dest = gen_lowpart (V1TImode, dest);
>> +  source = gen_lowpart (V1TImode, source);
>> +  mode = V1TImode;
>> +}
>
> Add an empty line here?  And maybe a comment.
>
>> +  if (mode == TImode || mode == V1TImode)
>> +emit_insn (gen_rtx_SET (dest, gen_rtx_ROTATE (mode, source,
>> +  GEN_INT (64;
>>else
>>  {
>>rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
>> -  return gen_rtx_VEC_SELECT (mode, source, par);
>> +  emit_insn (gen_rtx_SET (dest, gen_rtx_VEC_SELECT (mode, source, 
>> par)));
>>  }
>>  }
>
>> --- gcc/config/rs6000/vsx.md 2017-06-30 12:50:38.889632907 +0100
>> +++ gcc/config/rs6000/vsx.md 2017-07-12 16:30:38.734631598 +0100
>> @@ -37,6 +37,10 @@ (define_mode_iterator VSX_LE_128 [(KF
>>(TI   "TARGET_VSX_TIMODE")
>>V1TI])
>>  
>> +;; Same, but with just the integer modes.
>> +(define_mode_iterator VSX_LE_128I [(TI  "TARGET_VSX_TIMODE")
>> +   V1TI])
>
> I don't like that name much.  The difference between VSX_LE_128 and
> VSX_LE_128I is easy to overlook (and what _is_ the difference?  "I"
> means "integer" I guess?).  The "LE" in the name has no real meaning
> (it is used for LE, sure, but that doesn't matter for the iterator).
> Maybe just VSX_TI?  Or is that too short.
>
> Other than that, looks fine.  Thank you for the patch!

OK, how does this look?  Tested in the same way as before.

> Does this need backports?

Not sure, but probably not.  I don't know of any specific code that
would complain at the moment (but there again I haven't looked that hard).

Thanks,
Richard


2017-07-25  Richard Sandiford  

gcc/
* config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_permute): Declare.
* config/rs6000/rs6000.c (rs6000_gen_le_vsx_permute): Replace with...
(rs6000_emit_le_vsx_permute): ...this.  Take the destination as input.
Emit instructions rather than returning an expression.  Handle TFmode
and KFmode by casting to TImode.
(rs6000_emit_le_vsx_load): Update to use rs6000_emit_le_vsx_permute.
(rs6000_emit_le_vsx_store): Likewise.
* config/rs6000/vsx.md (VSX_TI): New iterator.
(*vsx_le_permute_): Use it instead of VSX_LE_128.
(*vsx_le_undo_permute_): Likewise.
(*vsx_le_perm_load_): Use rs6000_emit_le_vsx_permute to
emit the split sequence.
(*vsx_le_perm_store_): Likewise.

Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   2017-07-13 09:25:13.909213921 +0100
+++ gcc/config/rs6000/rs6000-protos.h   2017-07-25 11:04:20.314991769 +0100
@@ -151,6 +151,7 @@ extern rtx rs6000_longcall_ref (rtx);
 extern void rs6000_fatal_bad_address (rtx);
 extern rtx create_TOC_reference (rtx, rtx);
 extern void rs6000_split_multireg_move (rtx, rtx);
+extern void rs6000_emit_le_vsx_permute (rtx, rtx, machine_mode);
 extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode);
 extern bool valid_sf_si_move (rtx, rtx, machine_mode);
 extern void rs6000_emit_move (rtx, rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  2017-07-13 09:25:13.909213921 +0100
+++ gcc/config/rs6000/rs6000.c  2017-07-25 11:14:27.692739547 +0100
@@ -10503,17 +10503,28 @@ rs6000_const_vec (machine_mode mode)
 
 /* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
for a VSX load or store operation.  */
-rtx
-rs6000_gen_le_vsx_permute (rtx source, machine_mode mode)
+void
+rs6000_emit_le_vsx_permute (rtx dest, rtx source, machine_mode mode)
 {
-  /* Use ROTATE instead of VEC_SELECT on IEEE 128-bit floating point, and
- 128-bit integers if they are allowed in VSX registers.  */
-  if (FLOAT128_VECTOR_P (mode) || mode == TImode || mode == V1TImode)
-return gen_rtx_ROTATE (mode, source, GEN_INT (

C PATCH to detect clashing attributes (PR c/81544)

2017-07-25 Thread Marek Polacek
PR c/81544 complaints that we aren't detecting clashing noreturn /
warn_unused_result attributes so this patch adds that checking.  Martin
plans to do more systematic checking in this area but meanwhile we
might want to go with this.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-07-25  Marek Polacek  

PR c/81544
* c-common.h (diagnose_mismatched_attributes): Update decl.
* c-warn.c (diagnose_mismatched_attributes): Add OLDTYPE and
NEWTYPE parameters.  Diagnose clashing noreturn / warn_unused_result
attributes.

* c-decl.c (diagnose_mismatched_decls): Pass OLDTYPE and NEWTYPE down
to diagnose_mismatched_attributes.

* decl.c (duplicate_decls): Pass the types of OLDDECL and NEWDECL down
to diagnose_mismatched_attributes.

* c-c++-common/attributes-4.c: New test.

diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index a29f1ade25d..7c4b2a0e108 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1537,7 +1537,7 @@ extern void maybe_warn_unused_local_typedefs (void);
 extern void maybe_warn_bool_compare (location_t, enum tree_code, tree, tree);
 extern bool maybe_warn_shift_overflow (location_t, tree, tree);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
-extern bool diagnose_mismatched_attributes (tree, tree);
+extern bool diagnose_mismatched_attributes (tree, tree, tree, tree);
 extern tree do_warn_duplicated_branches_r (tree *, int *, void *);
 extern void warn_for_multistatement_macros (location_t, location_t,
location_t, enum rid);
diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index a8b38c1b98d..d3c62a5ad9e 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -2125,7 +2125,8 @@ warn_duplicated_cond_add_or_warn (location_t loc, tree 
cond, vec **chain)
attributes, such as always_inline vs. noinline.  */
 
 bool
-diagnose_mismatched_attributes (tree olddecl, tree newdecl)
+diagnose_mismatched_attributes (tree olddecl, tree newdecl, tree oldtype,
+   tree newtype)
 {
   bool warned = false;
 
@@ -2172,6 +2173,17 @@ diagnose_mismatched_attributes (tree olddecl, tree 
newdecl)
 warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
   "%qs follows declaration with attribute %qs",
   newdecl, "hot", "cold");
+  else if (lookup_attribute ("noreturn", DECL_ATTRIBUTES (newdecl))
+  && lookup_attribute ("warn_unused_result",
+   TYPE_ATTRIBUTES (oldtype)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "noreturn", "warn_unused_result");
+  else if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (newtype))
+  && lookup_attribute ("noreturn", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "warn_unused_result", "noreturn");
   return warned;
 }
 
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 12fbc18bb94..e995606499f 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2234,7 +2234,8 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 }
 
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
-warned |= diagnose_mismatched_attributes (olddecl, newdecl);
+warned |= diagnose_mismatched_attributes (olddecl, newdecl, oldtype,
+ newtype);
   else /* PARM_DECL, VAR_DECL */
 {
   /* Redeclaration of a parameter is a constraint violation (this is
diff --git gcc/cp/decl.c gcc/cp/decl.c
index d98fab370d7..4a67e57c675 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -1397,7 +1397,9 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
   if (DECL_P (olddecl)
   && TREE_CODE (newdecl) == FUNCTION_DECL
   && TREE_CODE (olddecl) == FUNCTION_DECL
-  && diagnose_mismatched_attributes (olddecl, newdecl))
+  && diagnose_mismatched_attributes (olddecl, newdecl,
+TREE_TYPE (olddecl),
+TREE_TYPE (newdecl)))
 {
   if (DECL_INITIAL (olddecl))
inform (DECL_SOURCE_LOCATION (olddecl),
diff --git gcc/testsuite/c-c++-common/attributes-4.c 
gcc/testsuite/c-c++-common/attributes-4.c
index e69de29bb2d..2f8b9676ecd 100644
--- gcc/testsuite/c-c++-common/attributes-4.c
+++ gcc/testsuite/c-c++-common/attributes-4.c
@@ -0,0 +1,8 @@
+/* PR c/81544 */
+/* { dg-do compile } */
+
+int __attribute__ ((noreturn)) foo (void); /* { dg-message "previous 
declaration" } */
+int __attribute__ ((warn_unused_result)) foo (void); /* { dg-warning 
"attribute .warn_unused_result. follows declaration with attribute .noreturn." 
} */
+
+int __attribute_

  1   2   >