[PATCH][PR bootstrap/80565] Fixed bootstrap errors due to uninitialized memory

2017-06-29 Thread Yuri Gribov
Hi,

This patch fixes uninitialized memory errors which caused bootstrap to
fail on some distros (e.g. gcc12 on compile farm). This was reported
in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80565 (and maybe also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224#c15).

x86_64 bootstrap completed fine. I was unable to run and compare tests
(as I was unable to bootstrap compiler before the change) but
hopefully the fix is obvious.

-Yury


pr80565-1.patch
Description: Binary data


Re: [PATCH][PR bootstrap/80565] Fixed bootstrap errors due to uninitialized memory

2017-06-29 Thread Jan Hubicka
> Hi,
> 
> This patch fixes uninitialized memory errors which caused bootstrap to
> fail on some distros (e.g. gcc12 on compile farm). This was reported
> in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80565 (and maybe also
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224#c15).
> 
> x86_64 bootstrap completed fine. I was unable to run and compare tests
> (as I was unable to bootstrap compiler before the change) but
> hopefully the fix is obvious.

OK, thanks!
Honza
> 
> -Yury




Re: [PATCH] Fix PR middle-end/81194, ICE during RTL pass: expand

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 5:01 AM, Peter Bergner  wrote:
> With the fix to PR51513 and follow on fixes for PR80707, PR80775 and PR80823,
> we can now end up with switch statements that contain nothing but a default
> case statement.  The expand_case() function contains code that assumes we
> have at least one non-default case statement, leading to the ICE reported
> in the PR81194.
>
> This patch fixes the bug by expanding switch statements that contain only
> a default case statement, as a GOTO to the default case's label.
>
> This passed bootstrap and regtesting on x86_64-linux with no regressions.
> Ok for trunk?

Ok.

Btw, I'm curious -- which pass ends up with just the default case?  IMHO
it would be a cfgcleanup task but eventually we fail to run it?  That is
cleanup_control_expr_graph calls find_taken_edge () and maybe that
doesn't handle the case where there's just the default label and thus
the value of the switch var doesn't matter ...

Thanks,
Richard.

> Peter
>
> gcc/
> PR middle-end/81194
> * cfgexpand.c (expand_gimple_stmt_1): Handle switch statements
> with only one label.
> * stmt.c (expand_case): Assert NCASES is greater than one.
>
> gcc/testsuite/
> PR middle-end/81194
> * g++.dg/pr81194.C: New test.
>
> Index: gcc/cfgexpand.c
> ===
> --- gcc/cfgexpand.c (revision 249747)
> +++ gcc/cfgexpand.c (working copy)
> @@ -3566,7 +3566,13 @@
>  case GIMPLE_PREDICT:
>break;
>  case GIMPLE_SWITCH:
> -  expand_case (as_a  (stmt));
> +  {
> +   gswitch *swtch = as_a  (stmt);
> +   if (gimple_switch_num_labels (swtch) == 1)
> + expand_goto (CASE_LABEL (gimple_switch_default_label (swtch)));
> +   else
> + expand_case (swtch);
> +  }
>break;
>  case GIMPLE_ASM:
>expand_asm_stmt (as_a  (stmt));
> Index: gcc/stmt.c
> ===
> --- gcc/stmt.c  (revision 249747)
> +++ gcc/stmt.c  (working copy)
> @@ -1142,8 +1142,11 @@
>/* cleanup_tree_cfg removes all SWITCH_EXPR with their index
>   expressions being INTEGER_CST.  */
>gcc_assert (TREE_CODE (index_expr) != INTEGER_CST);
> -
>
> +  /* Optimization of switch statements with only one label has already
> + occurred, so we should never see them at this point.  */
> +  gcc_assert (ncases > 1);
> +
>do_pending_stack_adjust ();
>
>/* Find the default case target label.  */
> Index: gcc/testsuite/g++.dg/pr81194.C
> ===
> --- gcc/testsuite/g++.dg/pr81194.C  (nonexistent)
> +++ gcc/testsuite/g++.dg/pr81194.C  (working copy)
> @@ -0,0 +1,60 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -std=c++17 -fno-exceptions" }
> +
> +template  struct b { typedef a *c; };
> +class e {};
> +template  class d {
> +public:
> +  typedef typename b::c c;
> +  c begin();
> +  c end();
> +};
> +struct f {
> +  enum { g } h;
> +};
> +struct i {
> +  d j();
> +};
> +struct l {
> +  d k();
> +};
> +class ac;
> +class o {
> +public:
> +  o(int *, int *, int *, ac *);
> +};
> +class ac {
> +public:
> +  ac(e);
> +  virtual o *ae(int *, int *, int *, int *);
> +};
> +class p {
> +  void af(f *m) {
> +switch (m->h)
> +case f::g:
> +  ag();
> +  }
> +
> +public:
> +  void n() {
> +l ah;
> +for (i *ai : ah.k())
> +  for (f *m : ai->j())
> +af(m);
> +  }
> +  virtual void ag() { __builtin_unreachable(); }
> +};
> +template  class an : o {
> +public:
> +  an(int *, int *, int *, int *, ac *);
> +};
> +class q : ac {
> +public:
> +  q() : ac([]() -> e {}()) {}
> +  o *ae(int *ap, int *aq, int *ar, int *as) { an(ap, aq, ar, as, this); }
> +};
> +template 
> +an::an(int *, int *aq, int *ar, int *as, ac *au) : o(aq, ar, as, au) {
> +  p().n();
> +}
> +void av() { new q; }
>


Re: [PATCH, GCC/ARM] Remove ARMv8-M code for D17-D31

2017-06-29 Thread Thomas Preudhomme

Hi Richard,

On 28/06/17 16:56, Richard Earnshaw (lists) wrote:

On 20/06/17 16:01, Thomas Preudhomme wrote:

Hi,

Function cmse_nonsecure_entry_clear_before_return has code to deal with
high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do
not support more than 16 double VFP registers (D0-D15). This makes this
security-sensitive code harder to read for not much benefit since
libcall for cmse_nonsecure_call functions do not deal with those high
VFP registers anyway.

This commit gets rid of this code for simplicity and fixes 2 issues in
the same function:

- stop the first loop when reaching maxregno to avoid dealing with VFP
   registers if targetting Thumb-1 or using -mfloat-abi=soft
- include maxregno in that loop



This is silently baking in dangerous assumptions about GCC's internal
numbering of the registers.  That's not a good idea from a long-term
portability perspective.

At the very least you need to assert that all the interesting registers
are numbered in the range 0..63; but ideally the code should just handle
pretty much any assignment of internal register numbers.


Well there is already this:

gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__);



Did you consider using sbitmaps rather than doing all the multi-word
stuff by steam?


No but am happy to. I'll respin the patch.

Best regards,

Thomas


[gomp5] Support OpenMP loops with != condition

2017-06-29 Thread Jakub Jelinek
Hi!

OpenMP 5.0 is going to support loops where condition is not just
/>= comparison, but also !=, with the requirement that the
increment has to be constant expression of 1 or -1 in that case
(and no overflow even for unsigned iterators).

The following patch implements it, tested on x86_64-linux, committed to
gomp-5_0-branch.

2017-06-29  Jakub Jelinek  

gcc/
* omp-general.c (omp_extract_for_data): Allow NE_EXPR
even in OpenMP loops, transform them into LT_EXPR or
GT_EXPR loops depending on incr sign.  Formatting fixes.
gcc/c-family/
* c-common.h (c_finish_omp_for): Add FINAL_P argument.
* c-omp.c (check_omp_for_incr_expr): Formatting fixes.
(c_finish_omp_for): Add FINAL_P argument.  Allow NE_EXPR
even in OpenMP loops, diagnose if NE_EXPR and incr expression
is not constant expression 1 or -1.  Transform NE_EXPR loops
with iterators pointers to VLA into LT_EXPR or GT_EXPR loops.
gcc/c/
* c-parser.c (c_parser_omp_for_loop): Allow NE_EXPR even in
OpenMP loops, adjust c_finish_omp_for caller.
gcc/cp/
* parser.c (cp_parser_omp_for_cond): Allow NE_EXPR even in OpenMP
loops.
* pt.c (dependent_omp_for_p): Return true if class type iterator
does not have INTEGER_CST increment.
* semantics.c (handle_omp_for_class_iterator): Call cp_fully_fold
on incr.
(finish_omp_for): Adjust c_finish_omp_for caller.
gcc/testsuite/
* c-c++-common/gomp/for-1.c: New test.
* c-c++-common/gomp/for-2.c: New test.
* c-c++-common/gomp/for-3.c: New test.
* c-c++-common/gomp/for-4.c: New test.
* c-c++-common/gomp/for-5.c: New test.
* gcc.dg/gomp/pr39495-2.c (foo): Don't expect errors on !=.
* g++.dg/gomp/pr39495-2.C (foo): Likewise.
* g++.dg/gomp/loop-4.C: New test.
libgomp/
* testsuite/libgomp.c/for-2.h: If CONDNE macro is defined, define
a different N(test), don't define N(f0) to N(f14), but instead define
N(f20) to N(f34) using != comparisons.
* testsuite/libgomp.c/for-4.c: Use dg-additional-options.
* testsuite/libgomp.c/for-7.c: New test.
* testsuite/libgomp.c/for-8.c: New test.
* testsuite/libgomp.c/for-9.c: New test.
* testsuite/libgomp.c/for-10.c: New test.
* testsuite/libgomp.c/for-11.c: New test.
* testsuite/libgomp.c/for-12.c: New test.
* testsuite/libgomp.c/for-13.c: New test.
* testsuite/libgomp.c++/for-12.C: Remove dg-options.
* testsuite/libgomp.c++/for-15.C: New test.
* testsuite/libgomp.c++/for-16.C: New test.
* testsuite/libgomp.c++/for-17.C: New test.
* testsuite/libgomp.c++/for-18.C: New test.
* testsuite/libgomp.c++/for-19.C: New test.
* testsuite/libgomp.c++/for-20.C: New test.
* testsuite/libgomp.c++/for-21.C: New test.
* testsuite/libgomp.c++/for-22.C: New test.
* testsuite/libgomp.c++/for-23.C: New test.

--- gcc/omp-general.c.jj2017-05-24 11:48:18.881013651 +0200
+++ gcc/omp-general.c   2017-06-23 15:05:12.655397129 +0200
@@ -252,14 +252,45 @@ omp_extract_for_data (gomp_for *for_stmt
   loop->cond_code = gimple_omp_for_cond (for_stmt, i);
   loop->n2 = gimple_omp_for_final (for_stmt, i);
   gcc_assert (loop->cond_code != NE_EXPR
- || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKSIMD
- || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKFOR);
-  omp_adjust_for_condition (loc, &loop->cond_code, &loop->n2);
-
+ || (gimple_omp_for_kind (for_stmt)
+ != GF_OMP_FOR_KIND_OACC_LOOP));
   t = gimple_omp_for_incr (for_stmt, i);
   gcc_assert (TREE_OPERAND (t, 0) == var);
   loop->step = omp_get_for_step_from_incr (loc, t);
 
+  if (loop->cond_code == NE_EXPR
+  && fd->sched_kind != OMP_CLAUSE_SCHEDULE_CILKFOR
+  && (!simd || (gimple_omp_for_kind (for_stmt)
+   != GF_OMP_FOR_KIND_CILKSIMD)))
+   {
+ gcc_assert (TREE_CODE (loop->step) == INTEGER_CST);
+ if (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE)
+   {
+ if (integer_onep (loop->step))
+   loop->cond_code = LT_EXPR;
+ else
+   {
+ gcc_assert (integer_minus_onep (loop->step));
+ loop->cond_code = GT_EXPR;
+   }
+   }
+ else
+   {
+ tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (loop->v)));
+ gcc_assert (TREE_CODE (unit) == INTEGER_CST);
+ if (tree_int_cst_equal (unit, loop->step))
+   loop->cond_code = LT_EXPR;
+ else
+   {
+ gcc_assert (wi::neg (wi::to_widest (unit))
+ == wi::to_widest (loop->step));
+ loop->cond_code = GT_EXPR;
+   }
+   }
+   }

Re: [patch][Ping #3] PR80929: Realistic PARALLEL cost in seq_cost.

2017-06-29 Thread Georg-Johann Lay

On 28.06.2017 22:18, Wilco Dijkstra wrote:

Georg-Johann Lay wrote:

@@ -5300,6 +5300,9 @@ seq_cost (const rtx_insn *seq, bool spee
set = single_set (seq);
if (set)
  cost += set_rtx_cost (set, speed);
+  else if (INSN_P (seq)
+  && PARALLEL == GET_CODE (PATTERN (seq)))
+   cost += insn_rtx_cost (PATTERN (seq), speed);
else
  cost++;

insn_rtx_cost may return zero if it can't find something useful in the parallel,
which means it may return a lower cost and even zero. Not sure whether this
is important, but in eg. combine a cost of zero means infinite and so could have
unintended consequences. So incrementing cost with a non-zero value
if insn_rtx_cost == 0 would seem safer.


Updated patch below, it just adds 1 (which is 1/4 of CONST_N_INSNS) to
avoid zero.



Also why does the else do cost++ and not cost += COSTS_N_INSNS (1)?

Wilco


Dunno, I didn't change this.  Maybe it's also just to escape 0.

Johann

gcc/
PR middle-end/80929
* rtlanal.c (seq_cost) [PARALLEL]: Get cost from insn_rtx_cost
instead of assuming cost of 1.


Index: rtlanal.c
===
--- rtlanal.c   (revision 248745)
+++ rtlanal.c   (working copy)
@@ -5300,6 +5300,9 @@ seq_cost (const rtx_insn *seq, bool spee
   set = single_set (seq);
   if (set)
 cost += set_rtx_cost (set, speed);
+  else if (INSN_P (seq)
+  && PARALLEL == GET_CODE (PATTERN (seq)))
+   cost += 1 + insn_rtx_cost (PATTERN (seq), speed);
   else
 cost++;
 }


Re: [PATCH] Fix PR middle-end/81194, ICE during RTL pass: expand

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 10:34 AM, Richard Biener
 wrote:
> On Thu, Jun 29, 2017 at 5:01 AM, Peter Bergner  wrote:
>> With the fix to PR51513 and follow on fixes for PR80707, PR80775 and PR80823,
>> we can now end up with switch statements that contain nothing but a default
>> case statement.  The expand_case() function contains code that assumes we
>> have at least one non-default case statement, leading to the ICE reported
>> in the PR81194.
>>
>> This patch fixes the bug by expanding switch statements that contain only
>> a default case statement, as a GOTO to the default case's label.
>>
>> This passed bootstrap and regtesting on x86_64-linux with no regressions.
>> Ok for trunk?
>
> Ok.
>
> Btw, I'm curious -- which pass ends up with just the default case?  IMHO
> it would be a cfgcleanup task but eventually we fail to run it?  That is
> cleanup_control_expr_graph calls find_taken_edge () and maybe that
> doesn't handle the case where there's just the default label and thus
> the value of the switch var doesn't matter ...

To answer myself the unreachable case vanishes at
execute_cleanup_cfg_post_optimizing
via group_case_labels.  find_taken_edge wouldn't handle this case either.

I am testing a patch fixing both - your patch should still go in.

Thanks,
Richard.

> Thanks,
> Richard.
>
>> Peter
>>
>> gcc/
>> PR middle-end/81194
>> * cfgexpand.c (expand_gimple_stmt_1): Handle switch statements
>> with only one label.
>> * stmt.c (expand_case): Assert NCASES is greater than one.
>>
>> gcc/testsuite/
>> PR middle-end/81194
>> * g++.dg/pr81194.C: New test.
>>
>> Index: gcc/cfgexpand.c
>> ===
>> --- gcc/cfgexpand.c (revision 249747)
>> +++ gcc/cfgexpand.c (working copy)
>> @@ -3566,7 +3566,13 @@
>>  case GIMPLE_PREDICT:
>>break;
>>  case GIMPLE_SWITCH:
>> -  expand_case (as_a  (stmt));
>> +  {
>> +   gswitch *swtch = as_a  (stmt);
>> +   if (gimple_switch_num_labels (swtch) == 1)
>> + expand_goto (CASE_LABEL (gimple_switch_default_label (swtch)));
>> +   else
>> + expand_case (swtch);
>> +  }
>>break;
>>  case GIMPLE_ASM:
>>expand_asm_stmt (as_a  (stmt));
>> Index: gcc/stmt.c
>> ===
>> --- gcc/stmt.c  (revision 249747)
>> +++ gcc/stmt.c  (working copy)
>> @@ -1142,8 +1142,11 @@
>>/* cleanup_tree_cfg removes all SWITCH_EXPR with their index
>>   expressions being INTEGER_CST.  */
>>gcc_assert (TREE_CODE (index_expr) != INTEGER_CST);
>> -
>>
>> +  /* Optimization of switch statements with only one label has already
>> + occurred, so we should never see them at this point.  */
>> +  gcc_assert (ncases > 1);
>> +
>>do_pending_stack_adjust ();
>>
>>/* Find the default case target label.  */
>> Index: gcc/testsuite/g++.dg/pr81194.C
>> ===
>> --- gcc/testsuite/g++.dg/pr81194.C  (nonexistent)
>> +++ gcc/testsuite/g++.dg/pr81194.C  (working copy)
>> @@ -0,0 +1,60 @@
>> +// { dg-do compile }
>> +// { dg-options "-O2 -std=c++17 -fno-exceptions" }
>> +
>> +template  struct b { typedef a *c; };
>> +class e {};
>> +template  class d {
>> +public:
>> +  typedef typename b::c c;
>> +  c begin();
>> +  c end();
>> +};
>> +struct f {
>> +  enum { g } h;
>> +};
>> +struct i {
>> +  d j();
>> +};
>> +struct l {
>> +  d k();
>> +};
>> +class ac;
>> +class o {
>> +public:
>> +  o(int *, int *, int *, ac *);
>> +};
>> +class ac {
>> +public:
>> +  ac(e);
>> +  virtual o *ae(int *, int *, int *, int *);
>> +};
>> +class p {
>> +  void af(f *m) {
>> +switch (m->h)
>> +case f::g:
>> +  ag();
>> +  }
>> +
>> +public:
>> +  void n() {
>> +l ah;
>> +for (i *ai : ah.k())
>> +  for (f *m : ai->j())
>> +af(m);
>> +  }
>> +  virtual void ag() { __builtin_unreachable(); }
>> +};
>> +template  class an : o {
>> +public:
>> +  an(int *, int *, int *, int *, ac *);
>> +};
>> +class q : ac {
>> +public:
>> +  q() : ac([]() -> e {}()) {}
>> +  o *ae(int *ap, int *aq, int *ar, int *as) { an(ap, aq, ar, as, this); }
>> +};
>> +template 
>> +an::an(int *, int *aq, int *ar, int *as, ac *au) : o(aq, ar, as, au) {
>> +  p().n();
>> +}
>> +void av() { new q; }
>>


Re: [C++ Patch] Replace a few more error + error with error + inform

2017-06-29 Thread Paolo Carlini

Hi,

gently pinging this, still in my tree:

On 19/05/2017 18:13, Paolo Carlini wrote:

Hi,

while looking into some bugs (eg, c++/71464) I noticed a few more of 
those consecutive errors, which I propose to adjust per the below 
patchlet. The second case in add_method is a bit tricky because in 
principle we'd really like to be more specific (eg, clang talks about 
constructors which cannot be redeclared, member functions which cannot 
be redeclared and many more) and avoid verbose diagnostic, but in the 
below I only try to avoid emitting error + error... Tested x86_64-linux.


https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01593.html

Thanks,
Paolo.


[PATCH][OBVIOUS] Fix -mbranch-cost range.

2017-06-29 Thread Martin Liška
On 06/28/2017 10:14 PM, Rainer Orth wrote:
> Hi Martin,
> 
>> On 06/28/2017 06:52 AM, Jeff Law wrote:
>>> On 03/15/2017 03:58 AM, Martin Liška wrote:
 Huh, I forgot to attach the patch.

 Martin

 0001-Introduce-IntegerRange-for-options-PR-driver-79659.patch


 From bb89456e6cecfa9497cf8e265d2083e762d5bc3e Mon Sep 17 00:00:00 2001
 From: marxin 
 Date: Mon, 27 Feb 2017 14:07:03 +0100
 Subject: [PATCH] Introduce IntegerRange for options (PR driver/79659).

 gcc/ChangeLog:

 2017-02-28  Martin Liska  

PR driver/79659
* common.opt: Add IntegerRange to various options.
* opt-functions.awk (integer_range_info): New function.
* optc-gen.awk: Add integer_range_info to cl_options struct.
* opts-common.c (decode_cmdline_option): Handle
CL_ERR_INT_RANGE_ARG.
(cmdline_handle_error): Likewise.
* opts.c (print_filtered_help): Show valid interval in
when --help is provided.
* opts.h (struct cl_option): Add range_min and range_max fields.
* config/i386/i386.opt: Add IntegerRange for -mbranch-cost.

 gcc/c-family/ChangeLog:

 2017-02-28  Martin Liska  

PR driver/79659
* c.opt: Add IntegerRange to various options.

 gcc/testsuite/ChangeLog:

 2017-02-28  Martin Liska  

PR driver/79659
* g++.dg/opt/pr79659.C: New test.
>>> Presumably this never fully moved forward because it wasn't a regression?
>>>
>>> This looks quite reasonable to me.  I'm not sure of the state of the
>>> prereqs and you may want/need to add IntegerRange checks on newly added
>>> options since this was first submitted.
>>>
>>> If the prereqs are ack'd, then as far as I'm concerned this is good to
>>> go and you're free to add any new IntegerRange checks you deem
>>> necessary/desirable.
>>>
>>> jeff
>>>
>>
>> Thank you Jeff for looking at the patch. I've just re-tested the patch and
>> I'm going to install it.
> 
> seems you didn't test thoroughly enough: your patch introduced a couple
> of testsuite regressions on i386-pc-solaris2.12 and x86_64-pc-linux-gnu
> (any x86 target, in fact):
> 
> +FAIL: gcc.dg/uninit-pred-7_d.c (test for excess errors)
> +FAIL: gcc.dg/uninit-pred-7_d.c warning (test for warnings, line 48)
> +FAIL: gcc.dg/uninit-pred-8_d.c (test for excess errors)
> +FAIL: gcc.dg/uninit-pred-8_d.c warning (test for warnings, line 42)
> 
> +FAIL: gcc.target/i386/branch-cost1.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/branch-cost1.c scan-tree-dump-not gimple " & "
> +UNRESOLVED: gcc.target/i386/branch-cost1.c scan-tree-dump-times gimple "if " 
> 2
> +FAIL: gcc.target/i386/branch-cost4.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/branch-cost4.c scan-tree-dump-not gimple " & "
> +UNRESOLVED: gcc.target/i386/branch-cost4.c scan-tree-dump-times gimple "if " 
> 2
> 
> In all cases, you get
> 
> Excess errors:
> xgcc: error: argument to '-mbranch-cost=' is not between 1 and 5
> 
> since the tests are compiled with -mbranch-cost=0.
> 
> Please fix.
> 
>   Rainer
> 

Thanks for head up, I didn't catch it because I did testing on a ppc64le 
machine.
Fixed as obvious.

Martin
>From d0003f3602f099dac9be1266c974eb24de4265f9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 29 Jun 2017 10:42:04 +0200
Subject: [PATCH] Fix -mbranch-cost range.

gcc/ChangeLog:

2017-06-29  Martin Liska  

	* config/i386/i386.opt: Change range from [1,5] to [0,5].
---
 gcc/config/i386/i386.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 90eadbc4e18..adc75f36602 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -267,7 +267,7 @@ EnumValue
 Enum(asm_dialect) String(att) Value(ASM_ATT)
 
 mbranch-cost=
-Target RejectNegative Joined UInteger Var(ix86_branch_cost) IntegerRange(1, 5)
+Target RejectNegative Joined UInteger Var(ix86_branch_cost) IntegerRange(0, 5)
 Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
-- 
2.13.1



[PATCH] make find_taken_edge handle case with just default

2017-06-29 Thread Richard Biener

This refactors things a bit to make CFG cleanup handle switches with
just a default label.  If we make sure to cleanup the CFG after
group_case_labels removes cases with just __builtin_unreachable ()
inside then this fixes the ICE seen in PR81994 as well.

I wonder if find_taken_edge should generally handle successors
with __builtin_unreachable () -- OTOH that would get rid of those
too early I guess.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-06-29  Richard Biener  

* tree-cfg.c (group_case_labels_stmt): Return whether we changed
anything.
(group_case_labels): Likewise.
(find_taken_edge): Push sanity checking on val to workers...
(find_taken_edge_cond_expr): ... here
(find_taken_edge_switch_expr): ... and here, handle cases
with just a default label.
* tree-cfg.h (group_case_labels_stmt): Adjust prototype.
(group_case_labels): Likewise.
* tree-cfgcleanup.c (execute_cleanup_cfg_post_optimizing): When
group_case_labels does anything cleanup the CFG again.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 249769)
+++ gcc/tree-cfg.c  (working copy)
@@ -1675,7 +1675,7 @@ cleanup_dead_labels (void)
the ones jumping to the same label.
Eg. three separate entries 1: 2: 3: become one entry 1..3:  */
 
-void
+bool
 group_case_labels_stmt (gswitch *stmt)
 {
   int old_size = gimple_switch_num_labels (stmt);
@@ -1759,23 +1759,27 @@ group_case_labels_stmt (gswitch *stmt)
 
   gcc_assert (new_size <= old_size);
   gimple_switch_set_num_labels (stmt, new_size);
+  return new_size < old_size;
 }
 
 /* Look for blocks ending in a multiway branch (a GIMPLE_SWITCH),
and scan the sorted vector of cases.  Combine the ones jumping to the
same label.  */
 
-void
+bool
 group_case_labels (void)
 {
   basic_block bb;
+  bool changed = false;
 
   FOR_EACH_BB_FN (bb, cfun)
 {
   gimple *stmt = last_stmt (bb);
   if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
-   group_case_labels_stmt (as_a  (stmt));
+   changed |= group_case_labels_stmt (as_a  (stmt));
 }
+
+  return changed;
 }
 
 /* Checks whether we can merge block B into block A.  */
@@ -2243,15 +2247,8 @@ find_taken_edge (basic_block bb, tree va
 
   stmt = last_stmt (bb);
 
-  gcc_assert (stmt);
   gcc_assert (is_ctrl_stmt (stmt));
 
-  if (val == NULL)
-return NULL;
-
-  if (!is_gimple_min_invariant (val))
-return NULL;
-
   if (gimple_code (stmt) == GIMPLE_COND)
 return find_taken_edge_cond_expr (bb, val);
 
@@ -2266,7 +2263,8 @@ find_taken_edge (basic_block bb, tree va
  It may be the case that we only need to allow the LABEL_REF to
  appear inside an ADDR_EXPR, but we also allow the LABEL_REF to
  appear inside a LABEL_EXPR just to be safe.  */
-  if ((TREE_CODE (val) == ADDR_EXPR || TREE_CODE (val) == LABEL_EXPR)
+  if (val
+ && (TREE_CODE (val) == ADDR_EXPR || TREE_CODE (val) == LABEL_EXPR)
  && TREE_CODE (TREE_OPERAND (val, 0)) == LABEL_DECL)
return find_taken_edge_computed_goto (bb, TREE_OPERAND (val, 0));
   return NULL;
@@ -2304,9 +2302,12 @@ find_taken_edge_cond_expr (basic_block b
 {
   edge true_edge, false_edge;
 
+  if (val == NULL
+  || TREE_CODE (val) != INTEGER_CST)
+return NULL;
+
   extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
 
-  gcc_assert (TREE_CODE (val) == INTEGER_CST);
   return (integer_zerop (val) ? false_edge : true_edge);
 }
 
@@ -2322,7 +2323,12 @@ find_taken_edge_switch_expr (gswitch *sw
   edge e;
   tree taken_case;
 
-  taken_case = find_case_label_for_value (switch_stmt, val);
+  if (gimple_switch_num_labels (switch_stmt) == 1)
+taken_case = gimple_switch_default_label (switch_stmt);
+  else if (! val || TREE_CODE (val) != INTEGER_CST)
+return NULL;
+  else
+taken_case = find_case_label_for_value (switch_stmt, val);
   dest_bb = label_to_block (CASE_LABEL (taken_case));
 
   e = find_edge (bb, dest_bb);
Index: gcc/tree-cfg.h
===
--- gcc/tree-cfg.h  (revision 249769)
+++ gcc/tree-cfg.h  (working copy)
@@ -36,8 +36,8 @@ extern void end_recording_case_labels (v
 extern basic_block label_to_block_fn (struct function *, tree);
 #define label_to_block(t) (label_to_block_fn (cfun, t))
 extern void cleanup_dead_labels (void);
-extern void group_case_labels_stmt (gswitch *);
-extern void group_case_labels (void);
+extern bool group_case_labels_stmt (gswitch *);
+extern bool group_case_labels (void);
 extern void replace_uses_by (tree, tree);
 extern basic_block single_noncomplex_succ (basic_block bb);
 extern void notice_special_calls (gcall *);
Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 249769)
+++ gcc/tree-cfgcleanup.c   (worki

Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 7:06 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
> The patch optimizes it inside tree-vrp.c when simplifying with the range
> inside simplify_stmt_using_ranges. If a multiply is found and either side
> has a range [0,1], then transform it.
>
> Ex:- d * c where d has a range of [0,1] transform it to be
> COND_EXPR(d != 0, c, 0).
> The other optimization passes should prop m1 > m2.
>
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?

What's the reason of this transform?  I expect that the HW multiplier
is quite fast given one operand is either zero or one and a multiplication
is a gimple operation that's better handled in optimizations than
COND_EXPRs which eventually expand to conditional code which
would be much slower.

Thanks,
Richard.

> Thanks,
> Naveen
>
> 2017-06-28  Naveen H.S  
>
> gcc
> * tree-vrp.c (simplify_stmt_using_ranges): Add case for
> optimizing a case of multiplication.
> (simplify_mult_ops_using_ranges): New.
>
> gcc/testsuite
> * gcc.dg/tree-ssa/vrp116.c: New Test.
>
>


[PING^2][PATCH][Aarch64] Relational compare zero not merged into subtract

2017-06-29 Thread Michael Collison
Ping^2. Original patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00091.html




Re: Tweak BB analysis for dr_analyze_innermost

2017-06-29 Thread Richard Biener
On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
 wrote:
> dr_analyze_innermost had a "struct loop *nest" parameter that acted
> like a boolean.  This was added in r179161, with the idea that a
> null nest selected BB-level analysis rather than loop analysis.
>
> The handling seemed strange though.  If the DR was part of a loop,
> we still tried to express the base and offset values as IVs, potentially
> giving a nonzero step.  If that failed for any reason, we'd revert to
> using the original base and offset, just as we would if we hadn't asked
> for an IV in the first place.
>
> It seems more natural to use the !in_loop handling whenever nest is null
> and always set the step to zero.  This actually enables one more SLP
> opportunity in bb-slp-pr65935.c.
>
> I checked out r179161 and tried the patch there.  The test case added
> in that revision still passes, so I don't think there was any particular
> need to check simple_iv.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

I have a few additional comments for consideration.  I remembered code
in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
BB mode:

  /* Similarly we can only use base and misalignment information relative to
 an innermost loop if the misalignment stays the same throughout the
 execution of the loop.  As above, this is the case if the stride of
 the dataref evenly divides by the vector size.  */
  else
{
  tree step = DR_STEP (dr);
  unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;

  if (tree_fits_shwi_p (step)
  && ((tree_to_shwi (step) * vf)
  % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "step doesn't divide the vector-size.\n");
  misalign = NULL_TREE;
}
}

so I guess with the change we may end up with worse (misalign is
always NULL?) or bogus (with DR_STEP == 0 the test is always false)
alignment analysis results for BB vectorization?  I guess worse
or we had the issue before if DR_STEP was just not analyzable.

The testcase that now gets vectorized tried to use the asm to prevent
vectorization but that of course doesn't work for BB vectorization.
Using volatile stores might.  I guess the chance that we miscompile
both loops in a way that we still pass the testcase (even the compare
loop could be BB vectorized I guess) is unlikely.

The patch is ok but I guess we need to keep an eye on BB vectorization
results for targets w/o unaligned vector loads/stores for the above
alignment issue?

Thanks,
Richard.

> Richard
>
>
> 2017-06-28  Richard Sandiford  
>
> gcc/
> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
> parameter with a "loop" parameter and use it instead of the
> loop containing DR_STMT.  Don't check simple_iv when doing
> BB analysis.  Describe the two analysis modes in the comment.
>
> gcc/testsuite/
> * gcc.dg/vect/bb-slp-pr65935.c: Expect SLP to be used in main
> as well.
>
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100
> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100
> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a
>return build_fold_addr_expr (TREE_OPERAND (addr, 0));
>  }
>
> -/* Analyzes the behavior of the memory reference DR in the innermost loop or
> -   basic block that contains it.  Returns true if analysis succeed or false
> -   otherwise.  */
> +/* Analyze the behavior of memory reference DR.  There are two modes:
> +
> +   - BB analysis.  In this case we simply split the address into base,
> + init and offset components, without reference to any containing loop.
> + The resulting base and offset are general expressions and they can
> + vary arbitrarily from one iteration of the containing loop to the next.
> + The step is always zero.
> +
> +   - loop analysis.  In this case we analyze the reference both wrt LOOP
> + and on the basis that the reference occurs (is "used") in LOOP;
> + see the comment above analyze_scalar_evolution_in_loop for more
> + information about this distinction.  The base, init, offset and
> + step fields are all invariant in LOOP.
> +
> +   Perform BB analysis if LOOP is null, or if LOOP is the function's
> +   dummy outermost loop.  In other cases perform loop analysis.
> +
> +   Return true if the analysis succeeded and store the results in DR if so.
> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */
>
>  bool
> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)
> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)
>  {
> -  gimple *stmt = DR_STMT (dr);
> -  struct loop *loop = loop_containing_stmt (stmt);
>tree ref =

[gomp5] Fix taskgroup genericization in Fortran FE

2017-06-29 Thread Jakub Jelinek
Hi!

Apparently I forgot to test fortran last time during the
http://gcc.gnu.org/ml/gcc-patches/2017-06/msg00839.html
changes.  Fixed thusly, committed to gomp-5_0-branch.

2017-06-29  Jakub Jelinek  

* trans-openmp.c (gfc_trans_omp_taskgroup): Build OMP_TASKGROUP using
make_node instead of build1_loc.

--- gcc/fortran/trans-openmp.c.jj   2017-05-24 11:47:41.0 +0200
+++ gcc/fortran/trans-openmp.c  2017-06-29 11:43:00.620323982 +0200
@@ -4546,8 +4546,12 @@ gfc_trans_omp_task (gfc_code *code)
 static tree
 gfc_trans_omp_taskgroup (gfc_code *code)
 {
-  tree stmt = gfc_trans_code (code->block->next);
-  return build1_loc (input_location, OMP_TASKGROUP, void_type_node, stmt);
+  tree body = gfc_trans_code (code->block->next);
+  tree stmt = make_node (OMP_TASKGROUP);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_TASKGROUP_BODY (stmt) = body;
+  OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE;
+  return stmt;
 }
 
 static tree

Jakub


Re: [PATCH PR81196]Analyze ntiers for loop with exit condition comparing induction variables

2017-06-29 Thread Richard Biener
On Wed, Jun 28, 2017 at 11:32 AM, Bin Cheng  wrote:
> Hi,
> This patch picks up a missed-optimization case in loop niter analysis.  With 
> this
> patch, niters information for loop as in added test can be analyzed.  
> Bootstrap
> and test on x86_64 and AArch64.  Is it OK?


+ provided that either below condition is satisfied:
+
+   a) the test is NE_EXP;

NE_EXPR

+   b) iv0.step - iv1.step is positive integer.

Ok with that change.

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-27  Bin Cheng  
>
> PR tree-optimization/81196
> * tree-ssa-loop-niter.c (number_of_iterations_cond): Handle loop
> exit condition comparing two IVs.
>
> gcc/testsuite/ChangeLog
> 2017-06-27  Bin Cheng  
>
> PR tree-optimization/81196
> * gcc.dg/vect/pr81196.c: New.


Re: Avoid generating useless range info

2017-06-29 Thread Richard Biener
On Wed, Jun 28, 2017 at 9:56 AM, Aldy Hernandez  wrote:
>
>
> On 06/27/2017 06:38 AM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 27, 2017 at 06:26:46AM -0400, Aldy Hernandez wrote:
>>>
>>> How about this?
>>
>>
>> @@ -360,6 +363,22 @@ set_range_info (tree name, enum value_range_type
>> range_type,
>>   }
>>   }
>>   +/* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name
>> +   NAME while making sure we don't store useless range info.  */
>> +
>> +void
>> +set_range_info (tree name, enum value_range_type range_type,
>> +   const wide_int_ref &min, const wide_int_ref &max)
>> +{
>> +  /* A range of the entire domain is really no range at all.  */
>> +  tree type = TREE_TYPE (name);
>> +  if (min == wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type))
>> +  && max == wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type)))
>> +return;
>> +
>> +  set_range_info_raw (name, range_type, min, max);
>> +}
>> +
>>
>> Won't this misbehave if we have a narrower range on some SSA_NAME and
>> call set_range_info to make it VARYING?
>> In that case (i.e. SSA_NAME_RANGE_INFO (name) != NULL), we should either
>> set_range_info_raw too (if nonzero_bits is not all ones) or clear
>> SSA_NAME_RANGE_INFO (otherwise).
>
>
> Good point.  Fixed.
>
>> /* Gets range information MIN, MAX and returns enum value_range_type
>>  corresponding to tree ssa_name NAME.  enum value_range_type returned
>> @@ -419,9 +438,13 @@ set_nonzero_bits (tree name, const wide_int_ref
>> &mask)
>>   {
>> gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
>> if (SSA_NAME_RANGE_INFO (name) == NULL)
>> -set_range_info (name, VR_RANGE,
>> -   TYPE_MIN_VALUE (TREE_TYPE (name)),
>> -   TYPE_MAX_VALUE (TREE_TYPE (name)));
>> +{
>> +  if (mask == -1)
>> +   return;
>> +  set_range_info_raw (name, VR_RANGE,
>> + TYPE_MIN_VALUE (TREE_TYPE (name)),
>> + TYPE_MAX_VALUE (TREE_TYPE (name)));
>> +}
>> range_info_def *ri = SSA_NAME_RANGE_INFO (name);
>> ri->set_nonzero_bits (mask);
>>
>> Similarly, if SSA_NAME_RANGE_INFO is previously non-NULL, but min/max
>> are VARYING and the new mask is -1, shouldn't we free it rather than
>> set it to the default?
>
>
> Here, if SSA_NAME_RANGE_INFO is previously non-NULL then we proceed as
> always-- just set the nonzero bits to whatever was specified (without
> clearning SSA_NAME_RANGE_INFO).  A mask of -1 and an SSA_NAME_RANGE_INFO of
> non-NULL can coexist just fine.
>
> How about this?

Ok.

Thanks,
Richard.

> Aldy


Re: [arm] Fix incorrect __ARM_ARCH_PROFILE for -march=armv7

2017-06-29 Thread Richard Earnshaw (lists)
On 28/06/17 16:05, Richard Earnshaw (lists) wrote:
> ACLE explicitly states that when targetting the common subset of
> ARMv7-A, ARMv7-R and ARMv7-M, the __ARM_ARCH_PROFILE macro should not be
> set.  We currently set it to 'M' which is clearly erroneous.
> 
> The logic for creating this is very convoluted and also somewhat
> fragile, so I've taken the opportunity to use the new CPU and
> architecture definition infrastructure to record the profile for each
> architecture explicitly rather than try to reconstruct it from other
> data.  I think this results in a much more robust solution.
> 
> 
> 2017-06-28  Richard Earnshaw  
> 
>   * config/arm/parsecpu.awk (profile): Parse new keyword in an arch
>   context.
>   (gen_comm_data): Emit architectural setting of arch_prof.
>   * config/arm/arm-cpus.in (armv6-m, armv6s-m, armv7-a, armv7ve): Set the
>   profile.
>   (armv7-r, armv7-m, armv7e-m, armv8-a, armv8.1-a, armv8.2-a): Likewise.
>   (armv8-m.base, armv8-m.main): Likewise.
>   * arm-protos.h (arm_build_target): Add profile field.
>   (arch_option): Likewise.
>   * config/arm/arm.c (arm_configure_build_target): Copy the profile to
>   the active target.
>   * config/arm/arm.h (TARGET_ARM_ARCH_PROFILE): Use
>   arm_active_target.profile.
> 
> 
> Committed.


My patch yesterday accidentally missed a hunk that added the
update to the tail entry of the autogenerated data structure
produced by parsecpu.awk.  This causes native bootstraps to
fail.

This patch adds back the missing hunk.

2017-06-29  Richard Earnshaw  

* config/arm/parsecpu.awk (gen_comm_data): Add initializer for
profile to the dummy entry at the end of the list of architectures.
* config/arm/arm-cpu-cdata.h: Regenerated.
diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
index 4528d07..1cf1149 100644
--- a/gcc/config/arm/arm-cpu-cdata.h
+++ b/gcc/config/arm/arm-cpu-cdata.h
@@ -2462,7 +2462,7 @@ const arch_option all_architectures[] =
 TARGET_CPU_iwmmxt2,
   },
   {{NULL, NULL, {isa_nobit}},
-   NULL, BASE_ARCH_0, TARGET_CPU_arm_none}
+   NULL, BASE_ARCH_0, 0, TARGET_CPU_arm_none}
 };
 
 const arm_fpu_desc all_fpus[] =
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index d38d664..d096bca 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -311,7 +311,7 @@ function gen_comm_data () {
 }
 
 print "  {{NULL, NULL, {isa_nobit}},"
-print "   NULL, BASE_ARCH_0, TARGET_CPU_arm_none}"
+print "   NULL, BASE_ARCH_0, 0, TARGET_CPU_arm_none}"
 print "};\n"
 
 print "const arm_fpu_desc all_fpus[] ="


Re: Tweak BB analysis for dr_analyze_innermost

2017-06-29 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
>  wrote:
>> dr_analyze_innermost had a "struct loop *nest" parameter that acted
>> like a boolean.  This was added in r179161, with the idea that a
>> null nest selected BB-level analysis rather than loop analysis.
>>
>> The handling seemed strange though.  If the DR was part of a loop,
>> we still tried to express the base and offset values as IVs, potentially
>> giving a nonzero step.  If that failed for any reason, we'd revert to
>> using the original base and offset, just as we would if we hadn't asked
>> for an IV in the first place.
>>
>> It seems more natural to use the !in_loop handling whenever nest is null
>> and always set the step to zero.  This actually enables one more SLP
>> opportunity in bb-slp-pr65935.c.
>>
>> I checked out r179161 and tried the patch there.  The test case added
>> in that revision still passes, so I don't think there was any particular
>> need to check simple_iv.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> I have a few additional comments for consideration.  I remembered code
> in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
> BB mode:
>
>   /* Similarly we can only use base and misalignment information relative to
>  an innermost loop if the misalignment stays the same throughout the
>  execution of the loop.  As above, this is the case if the stride of
>  the dataref evenly divides by the vector size.  */
>   else
> {
>   tree step = DR_STEP (dr);
>   unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
>
>   if (tree_fits_shwi_p (step)
>   && ((tree_to_shwi (step) * vf)
>   % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "step doesn't divide the vector-size.\n");
>   misalign = NULL_TREE;
> }
> }
>
> so I guess with the change we may end up with worse (misalign is
> always NULL?) or bogus (with DR_STEP == 0 the test is always false)
> alignment analysis results for BB vectorization?  I guess worse
> or we had the issue before if DR_STEP was just not analyzable.

DR_STEP will always be zero for bb vectorisation, so the results
shouldn't get worse.  But the value that was previously a nonzero
step is now part of the base or offset instead (the choice between
the two being the same as it would be for get_inner_reference).
We still take the alignments of those into account, so I think
we should be safe (at least after DR_BASE_ALIGNMENT).

Like you say, we previously had the same situation for bases
that weren't simple IVs, or for bases that were simple IVs but
had an invariant rather than constant step.

> The testcase that now gets vectorized tried to use the asm to prevent
> vectorization but that of course doesn't work for BB vectorization.
> Using volatile stores might.  I guess the chance that we miscompile
> both loops in a way that we still pass the testcase (even the compare
> loop could be BB vectorized I guess) is unlikely.

Putting an extra volatile asm between the real and imag stores stops
the vectorisation, if you'd prefer that.

> The patch is ok but I guess we need to keep an eye on BB vectorization
> results for targets w/o unaligned vector loads/stores for the above
> alignment issue?

Thanks,
Richard


Re: Tweak BB analysis for dr_analyze_innermost

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 12:32 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
>>  wrote:
>>> dr_analyze_innermost had a "struct loop *nest" parameter that acted
>>> like a boolean.  This was added in r179161, with the idea that a
>>> null nest selected BB-level analysis rather than loop analysis.
>>>
>>> The handling seemed strange though.  If the DR was part of a loop,
>>> we still tried to express the base and offset values as IVs, potentially
>>> giving a nonzero step.  If that failed for any reason, we'd revert to
>>> using the original base and offset, just as we would if we hadn't asked
>>> for an IV in the first place.
>>>
>>> It seems more natural to use the !in_loop handling whenever nest is null
>>> and always set the step to zero.  This actually enables one more SLP
>>> opportunity in bb-slp-pr65935.c.
>>>
>>> I checked out r179161 and tried the patch there.  The test case added
>>> in that revision still passes, so I don't think there was any particular
>>> need to check simple_iv.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> I have a few additional comments for consideration.  I remembered code
>> in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
>> BB mode:
>>
>>   /* Similarly we can only use base and misalignment information relative to
>>  an innermost loop if the misalignment stays the same throughout the
>>  execution of the loop.  As above, this is the case if the stride of
>>  the dataref evenly divides by the vector size.  */
>>   else
>> {
>>   tree step = DR_STEP (dr);
>>   unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
>>
>>   if (tree_fits_shwi_p (step)
>>   && ((tree_to_shwi (step) * vf)
>>   % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
>> {
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>  "step doesn't divide the vector-size.\n");
>>   misalign = NULL_TREE;
>> }
>> }
>>
>> so I guess with the change we may end up with worse (misalign is
>> always NULL?) or bogus (with DR_STEP == 0 the test is always false)
>> alignment analysis results for BB vectorization?  I guess worse
>> or we had the issue before if DR_STEP was just not analyzable.
>
> DR_STEP will always be zero for bb vectorisation, so the results
> shouldn't get worse.  But the value that was previously a nonzero
> step is now part of the base or offset instead (the choice between
> the two being the same as it would be for get_inner_reference).
> We still take the alignments of those into account, so I think
> we should be safe (at least after DR_BASE_ALIGNMENT).

I think for

  DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));

we have a harder time extracting alignment from sth like (sizetype)i * 4 + 8
than if the base is zero.  But yes, in principle we can do that and hopefully
SCEV analysis is not much more powerful than highest_pow2_factor.

> Like you say, we previously had the same situation for bases
> that weren't simple IVs, or for bases that were simple IVs but
> had an invariant rather than constant step.

Yes.

>> The testcase that now gets vectorized tried to use the asm to prevent
>> vectorization but that of course doesn't work for BB vectorization.
>> Using volatile stores might.  I guess the chance that we miscompile
>> both loops in a way that we still pass the testcase (even the compare
>> loop could be BB vectorized I guess) is unlikely.
>
> Putting an extra volatile asm between the real and imag stores stops
> the vectorisation, if you'd prefer that.

No, your patch is fine.

>> The patch is ok but I guess we need to keep an eye on BB vectorization
>> results for targets w/o unaligned vector loads/stores for the above
>> alignment issue?

Still keep an eye on SCEV vs. highest_pow2_factor from testsuite fallout.

Patch still ok,
Richard.

> Thanks,
> Richard


[PATCH] Support reduction chain and SLP reduction at the same time

2017-06-29 Thread Richard Biener

I noticed vect_analyze_slp didn't try SLP reduction when it detected
any reduction chain.  That's because the LOOP_VINFO_REDUCTIONS array
contains also the detected chains -- but a reduction chain can only
be vectorized as reduction chain (well, I'm going to fix that!  I
just ran into this code in this process).

The following rectifies this by properly not putting reduction chains
onto LOOP_VINFO_REDUCTIONS, simplifying vect_analyze_slp thereby.

The testcase is now vectorized with full SLP, for v4si that's an
unroll factor of only 2 compared to previously where we used
interelaving with unroll factor 4 and two reductions to process
the remaining SLP reduction (and used SLP for the reduction chain).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-06-29  Richard Biener  

* tree-vect-loop.c (vect_analyze_scalar_cycles_1): Do not add
reduction chains to LOOP_VINFO_REDUCTIONS.
* tree-vect-slp.c (vect_analyze_slp): Continue looking for
SLP reductions after processing reduction chains.

* gcc.dg/vect/slp-reduc-8.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 249729)
+++ gcc/tree-vect-loop.c(working copy)
@@ -890,8 +895,10 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (reduc_stmt)) =
vect_reduction_def;
   /* Store the reduction cycles for possible vectorization in
- loop-aware SLP.  */
-  LOOP_VINFO_REDUCTIONS (loop_vinfo).safe_push (reduc_stmt);
+ loop-aware SLP if it was not detected as reduction
+chain.  */
+ if (! GROUP_FIRST_ELEMENT (vinfo_for_stmt (reduc_stmt)))
+   LOOP_VINFO_REDUCTIONS (loop_vinfo).safe_push (reduc_stmt);
 }
 }
 }
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 249729)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2102,15 +2103,13 @@ vect_analyze_slp (vec_info *vinfo, unsig
 {
   unsigned int i;
   gimple *first_element;
-  bool ok = false;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "=== vect_analyze_slp ===\n");
 
   /* Find SLP sequences starting from groups of grouped stores.  */
   FOR_EACH_VEC_ELT (vinfo->grouped_stores, i, first_element)
-if (vect_analyze_slp_instance (vinfo, first_element, max_tree_size))
-  ok = true;
+vect_analyze_slp_instance (vinfo, first_element, max_tree_size);
 
   if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
 {
@@ -2118,22 +2117,15 @@ vect_analyze_slp (vec_info *vinfo, unsig
{
  /* Find SLP sequences starting from reduction chains.  */
  FOR_EACH_VEC_ELT (loop_vinfo->reduction_chains, i, first_element)
- if (vect_analyze_slp_instance (vinfo, first_element,
+   if (! vect_analyze_slp_instance (vinfo, first_element,
 max_tree_size))
-   ok = true;
- else
-   return false;
-
- /* Don't try to vectorize SLP reductions if reduction chain was
-detected.  */
- return ok;
+ return false;
}
 
   /* Find SLP sequences starting from groups of reductions.  */
-  if (loop_vinfo->reductions.length () > 1
- && vect_analyze_slp_instance (vinfo, loop_vinfo->reductions[0],
-   max_tree_size))
-   ok = true;
+  if (loop_vinfo->reductions.length () > 1)
+   vect_analyze_slp_instance (vinfo, loop_vinfo->reductions[0],
+  max_tree_size);
 }
 
   return true;
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-8.c
===
--- gcc/testsuite/gcc.dg/vect/slp-reduc-8.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-8.c (working copy)
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+static int a[512], b[512];
+
+void __attribute__((noinline,noclone))
+foo (int *sum1p, int *sum2p, int *sum3p)
+{
+  int sum1 = 0;
+  int sum2 = 0;
+  int sum3 = 0;
+  /* Check that we vectorize a reduction chain and a SLP reduction
+ at the same time.  */
+  for (int i = 0; i < 256; ++i)
+{
+  sum1 += a[2*i];
+  sum1 += a[2*i + 1];
+  sum2 += b[2*i];
+  sum3 += b[2*i + 1];
+}
+  *sum1p = sum1;
+  *sum2p = sum2;
+  *sum3p = sum3;
+}
+
+int main()
+{
+  check_vect ();
+
+  for (int i = 0; i < 256; ++i)
+{
+  a[2*i] = i;
+  a[2*i + 1] = i/2;
+  b[2*i] = i + 1;
+  b[2*i + 1] = i/2 + 1;
+  __asm__ volatile ("" : : : "memory");
+}
+  int sum1, sum2, sum3;
+  foo (&sum1, &sum2, &sum3);
+  if (sum1 !=

Re: [v2] PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-29 Thread Richard Biener
On Wed, Jun 28, 2017 at 3:29 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Jun 26, 2017 at 1:50 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford
  wrote:
> I don't think the problem is the lack of a cap.  In the test case we
> see that:
>
> 1. B is known at compile time to be X * vecsize + Y when considered in
>isolation, because the base alignment derived from its DR_REF >= 
> vecsize.
>So DR_MISALIGNMENT (B) == Y.
>
> 2. A's misalignment wrt vecsize is not known at compile time when
>considered in isolation, because no useful base alignment can be
>derived from its DR_REF.  (The DR_REF is to a plain int rather than
>to a structure with a high alignment.)  So DR_MISALIGNMENT (A) == -1.
>
> 3. A and B when considered as a pair trivially have the same misalignment
>wrt vecsize, for the reasons above.
>
> Each of these results is individually correct.  The problem is that the
> assert is conflating two things: it's saying that if we know two datarefs
> have the same misaligment, we must either be able to calculate a
> compile-time misalignment for both datarefs in isolation, or we must
> fail to calculate a compile-time misalignment for both datarefs in
> isolation.  That isn't true: it's valid to have situations in which the
> compile-time misalignment is known for one dataref in isolation but not
> for the other.

 True.  So the assert should then become

   gcc_assert (! known_alignment_for_access_p (dr)
   || DR_MISALIGNMENT (dr) / dr_size ==
 DR_MISALIGNMENT (dr_peel) / dr_peel_size);

 ?
>>>
>>> I think it would need to be:
>>>
>>>   gcc_assert (!known_alignment_for_access_p (dr)
>>>   || !known_alignment_for_access_p (dr_peel)
>>>   || (DR_MISALIGNMENT (dr) / dr_size
>>>   == DR_MISALIGNMENT (dr_peel) / dr_peel_size));
>>
>> I think for !known_alignment_for_access_p (dr_peel) the assert doesn't make
>> any sense (DR_MISALIGNMENT is -1), so yes, you are right.
>>
>>> But yeah, that would work too.  The idea with the assert in the patch was
>>> that for unconditional references we probably *do* want to try to compute
>>> the same compile-time misalignment, but for optimisation reasons rather
>>> than correctness.  Maybe that's more properly a gcc_checking_assert
>>> though, since nothing goes wrong if it fails.  So perhaps:
>>
>> We shouldn't have asserts for optimization reasons, even with checking
>> IMHO.
>
> OK.
>
>>>  gcc_checking_assert (DR_IS_CONDITIONAL_IN_STMT (dr)
>>>   || DR_IS_CONDITIONAL_IN_STMT (dr_peel)
>>>   || (known_alignment_for_access_p (dr)
>>>   == known_alignment_for_access_p (dr_peel)));
>>>
>>> as a follow-on assert.
>>>
>>> Should I split it into two patches, one to change the gcc_assert and
>>> another to add the optimisation?
>>
>> Yes please.
>
> Here's the patch to relax the assert.  I'll post the rest in a new thread.
>
> Tested as before.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> 2017-06-28  Richard Sandiford  
>
> gcc/
> PR tree-optimization/81136
> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Only
> assert that two references with the same misalignment have the same
> compile-time misalignment if those compile-time misalignments
> are known.
>
> gcc/testsuite/
> PR tree-optimization/81136
> * gcc.dg/vect/pr81136.c: New test.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-06-26 19:41:19.549571836 +0100
> +++ gcc/tree-vect-data-refs.c   2017-06-28 14:25:58.811888377 +0100
> @@ -906,8 +906,10 @@ vect_update_misalignment_for_peel (struc
>  {
>if (current_dr != dr)
>  continue;
> -  gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
> -  DR_MISALIGNMENT (dr_peel) / dr_peel_size);
> +  gcc_assert (!known_alignment_for_access_p (dr)
> + || !known_alignment_for_access_p (dr_peel)
> + || (DR_MISALIGNMENT (dr) / dr_size
> + == DR_MISALIGNMENT (dr_peel) / dr_peel_size));
>SET_DR_MISALIGNMENT (dr, 0);
>return;
>  }
> Index: gcc/testsuite/gcc.dg/vect/pr81136.c
> ===
> --- /dev/null   2017-06-28 07:28:02.991792729 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-28 14:25:58.810888422 +0100
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +
> +struct __attribute__((aligned (32)))
> +{
> +  char misaligner;
> +  int foo[100];
> +  int bar[100];
> +} *a;
> +
> +void
> +fn1 (int n)
> +{
> +  int *b = a->foo;
> +  

Re: [Ping ^3][PATCH v2] Generate reproducible output independently of the build-path

2017-06-29 Thread Ximin Luo
Dear GCC Global Reviewers,

Could any of you please review my patch series? It's about being able to 
reproducibly build things, even when the build machines are executing the build 
under different paths.

Overview:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html

Full thread, including individual patches:
https://gcc.gnu.org/ml/gcc-patches/2017-04/threads.html#00513

Follow-up report:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00781.html

In summary, this patch helps ~1800/26000 packages in Debian to become 
reproducible even when the build-path is varied across builds.

I've signed a copyright disclaimer and the FSF has this on record.

X

Ximin Luo:
> Ximin Luo:
>> Joseph Myers:
>>> On Tue, 11 Apr 2017, Ximin Luo wrote:
>>>
 Copyright disclaimer
 

 I dedicate these patches to the public domain by waiving all of my rights 
 to
 the work worldwide under copyright law, including all related and 
 neighboring
 rights, to the extent allowed by law.

 See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full 
 text.

 Please let me know if the above is insufficient and I will be happy to 
 sign any
 relevant forms.
>>>
>>> I believe the FSF wants its own disclaimer forms signed as evidence code 
>>> is in the public domain.  The process for getting disclaimer forms is to 
>>> complete 
>>> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes
>>>  
>>> and then you should be sent a disclaimer form for disclaiming the 
>>> particular set of changes you have completed (if you then make further 
>>> significant changes afterwards, the disclaimer form would then need 
>>> completing for them as well).
>>>
>>
>> I've now done this, and the copyright clerk at the FSF has told me that this 
>> is complete on their side as well.
>>
>> Did any of you get a chance to look at the patch yet?
>>
> 
> Hi GCC patches list,
> 
> Any progress or feedback on this patch series?
> 
> Ximin
> 

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

2017-06-29 Thread Jakub Jelinek
On Tue, Jun 20, 2017 at 03:06:56PM +0200, Martin Liška wrote:
> +/* Rewrite all usages of tree OP which is a PARM_DECL with a VAR_DECL
> +   that is it's DECL_VALUE_EXPR.  */
> +
> +static tree
> +rewrite_usage_of_param (tree *op, int *walk_subtrees, void *)
> +{
> +  if (TREE_CODE (*op) == PARM_DECL && DECL_VALUE_EXPR (*op) != NULL_TREE)

DECL_VALUE_EXPR testing is costly (it is a hash table lookup).
Therefore you should test DECL_HAS_VALUE_EXPR_P (*op) after checking
== PARM_DECL.  And DECL_HAS_VALUE_EXPR_P should apply non-NULL
DECL_VALUE_EXPR.
That said, I wonder if we don't create DECL_VALUE_EXPR for PARM_DECLs in
other parts of the compiler, whether it wouldn't be safer to also test here
after == PARM_DECL and DECL_HAS_VALUE_EXPR_P check whether *op is in
addressable_params hash table.

> +{
> +  *op = DECL_VALUE_EXPR (*op);
> +  *walk_subtrees = 0;
> +}
> +
> +  return NULL;
> +}
> +
> +/* For a given function FUN, rewrite all addressable parameters so that
> +   a new automatic variable is introduced.  Right after function entry
> +   a parameter is assigned to the variable.  */
> +
> +static void
> +sanitize_rewrite_addressable_params (function *fun)
> +{
> +  gimple *g;
> +  gimple_seq stmts = NULL;
> +  auto_vec addressable_params;

You don't really use the addressable_params vector anywhere, right?
Except for:

> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +   arg; arg = DECL_CHAIN (arg))
> +{
> +  if (TREE_ADDRESSABLE (arg) && !TREE_ADDRESSABLE (TREE_TYPE (arg)))
> + {
> +   TREE_ADDRESSABLE (arg) = 0;
> +   /* The parameter is no longer addressable.  */
> +   tree type = TREE_TYPE (arg);
> +   addressable_params.safe_push (arg);

pushing stuff into it and later

> +  if (addressable_params.is_empty ())
> +return;

If you only need that, a bool flag if any params have been changed is
enough.  But see above whether it wouldn't be safer to use a hash table
to verify it.  Plus, I think it would be desirable to clear
DECL_HAS_VALUE_EXPR_P and SET_DECL_VALUE_EXPR to NULL afterwards
if (target_for_debug_bind (arg)) - whch can be done either the with vec
or with a hash table traversal, for that we don't care about the ordering.

> +
> +   /* Create a new automatic variable.  */
> +   tree var = build_decl (DECL_SOURCE_LOCATION (arg),
> +  VAR_DECL, DECL_NAME (arg), type);
> +   TREE_ADDRESSABLE (var) = 1;
> +   DECL_ARTIFICIAL (var) = 1;
> +   DECL_SEEN_IN_BIND_EXPR_P (var) = 0;

This is 0 already from build_decl, IMHO no need to set it.

> +   gimple_add_tmp_var (var);
> +
> +   if (dump_file)
> + fprintf (dump_file,
> +  "Rewriting parameter whose address is taken: %s\n",
> +  IDENTIFIER_POINTER (DECL_NAME (arg)));
> +
> +   SET_DECL_VALUE_EXPR (arg, var);

But obviously you miss setting DECL_HAS_VALUE_EXPR_P here.

> +   /* Assign value of parameter to newly created variable.  */
> +   if ((TREE_CODE (type) == COMPLEX_TYPE
> +|| TREE_CODE (type) == VECTOR_TYPE))
> + {
> +   /* We need to create a SSA name that will be used for the
> +  assignment.  */

Why don't you just set DECL_GIMPLE_REG_P (arg) = 1; for
COMPLEX_TYPE/VECTOR_TYPE?  The arg is going to be only used to copy it into
the new var.  And then just use get_or_create_ssa_default_def,
regardless of whether if is complex/vector or other.

> +  /* Replace all usages of PARM_DECLs with the newly
> + created variable VAR.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +{
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> +   gimple *stmt = gsi_stmt (gsi);
> +   gimple_stmt_iterator it = gsi_for_stmt (stmt);
> +   walk_gimple_stmt (&it, NULL, rewrite_usage_of_param, NULL);
> + }
> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> +   gphi *phi = dyn_cast (gsi_stmt (gsi));
> +   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> + {
> +   hash_set visited_nodes;
> +   walk_tree (gimple_phi_arg_def_ptr (phi, i),
> +  rewrite_usage_of_param, NULL, &visited_nodes);
> + }

Doesn't walk_gimple_stmt on the PHI handle this?

Jakub


Re: Add DR_BASE_ALIGNMENT

2017-06-29 Thread Richard Biener
On Wed, Jun 28, 2017 at 3:40 PM, Richard Sandiford
 wrote:
> This patch records the base alignment in data_reference, to avoid the
> second-guessing that was previously done in vect_compute_data_ref_alignment.
> It also makes vect_analyze_data_refs use dr_analyze_innermost, instead
> of having an almost-copy of the same code.
>
> I'd originally tried to do the second part as a standalone patch,
> but on its own it caused us to miscompute the base alignment (due to
> the second-guessing not quite working).  This was previously latent
> because the old code set STMT_VINFO_DR_ALIGNED_TO to a byte value,
> whereas it should have been bits.
>
> After the previous patch, the only thing that dr_analyze_innermost
> read from the dr was the DR_REF.  I thought it would be better to pass
> that in and make data_reference write-only.  This means that callers
> like vect_analyze_data_refs don't have to set any fields first
> (and thus not worry about *which* fields they should set).
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2017-06-28  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (data_reference): Add a base_alignment field.
> (DR_BASE_ALIGNMENT): New macro.
> (dr_analyze_innermost): Add a tree argument.
> * tree-data-ref.c: Include builtins.h.
> (dr_analyze_innermost): Take the tree reference as argument.
> Set up DR_BASE_ALIGNMENT.
> (create_data_ref): Update call accordingly.
> * tree-predcom.c (find_looparound_phi): Likewise.
> * tree-vectorizer.h (_stmt_vec_info): Add dr_base_alignment.
> (STMT_VINFO_DR_BASE_ALIGNMENT): New macro.
> * tree-vect-data-refs.c: Include tree-cfg.h.
> (vect_compute_data_ref_alignment): Use DR_BASE_ALIGNMENT instead
> of calculating an alignment here.
> (vect_analyze_data_refs): Use dr_analyze_innermost.  Record the
> base alignment in STMT_VINFO_DR_BASE_ALIGNMENT.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-06-26 19:41:19.549571836 +0100
> +++ gcc/tree-data-ref.h 2017-06-28 14:26:19.651051322 +0100
> @@ -119,6 +119,10 @@ struct data_reference
>/* True when the data reference is in RHS of a stmt.  */
>bool is_read;
>
> +  /* The alignment of INNERMOST.base_address, in bits.  This is logically
> + part of INNERMOST, but is kept here to avoid unnecessary padding.  */
> +  unsigned int base_alignment;
> +

But then it would be nice to have dr_analyze_innermost take a struct
innermost_loop_behavior *
only.  That way the vectorizer copy for the outer loop behavior can
just be a innermost_loop_behavior
sub-struct as well and predcom wouldn't need to invent an interesting
DR_STMT either.  The
DR_ accessors wouldn't work on that but I guess they could be made work with

inline tree dr_base_address (struct data_reference *dr) { return
dr->innermost.base_address; }
inline tree dr_base_address (struct innermost_loop_behavior *p) {
return p->base_address; }
#define DR_BASE_ADDRESS(DR) (dr_base_address (dr))

anyway, that's implementation detail ;)

>/* Behavior of the memory reference in the innermost loop.  */
>struct innermost_loop_behavior innermost;
>
> @@ -139,6 +143,7 @@ #define DR_NUM_DIMENSIONS(DR)  DR_AC
>  #define DR_IS_READ(DR) (DR)->is_read
>  #define DR_IS_WRITE(DR)(!DR_IS_READ (DR))
>  #define DR_BASE_ADDRESS(DR)(DR)->innermost.base_address
> +#define DR_BASE_ALIGNMENT(DR)  (DR)->base_alignment
>  #define DR_OFFSET(DR)  (DR)->innermost.offset
>  #define DR_INIT(DR)(DR)->innermost.init
>  #define DR_STEP(DR)(DR)->innermost.step
> @@ -322,7 +327,7 @@ #define DDR_DIST_VECT(DDR, I) \
>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>
>
> -bool dr_analyze_innermost (struct data_reference *, struct loop *);
> +bool dr_analyze_innermost (struct data_reference *, tree, struct loop *);
>  extern bool compute_data_dependences_for_loop (struct loop *, bool,
>vec *,
>vec *,
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-06-28 14:26:12.946306736 +0100
> +++ gcc/tree-data-ref.c 2017-06-28 14:26:19.651051322 +0100
> @@ -94,6 +94,7 @@ Software Foundation; either version 3, o
>  #include "dumpfile.h"
>  #include "tree-affine.h"
>  #include "params.h"
> +#include "builtins.h"
>
>  static struct datadep_stats
>  {
> @@ -749,7 +750,7 @@ canonicalize_base_object_address (tree a
>return build_fold_addr_expr (TREE_OPERAND (addr, 0));
>  }
>
> -/* Analyze the behavior of memory reference DR.  There are two modes:
> +/* Analyze the behavior of memory reference REF.  There are two modes:
>
> - BB analysis.  In this case we simply split the address into base,
>   init a

Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Wilco Dijkstra
Richard Biener wrote:
> Hurugalawadi, Naveen wrote:
> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.

> What's the reason of this transform?  I expect that the HW multiplier
> is quite fast given one operand is either zero or one and a multiplication
> is a gimple operation that's better handled in optimizations than
> COND_EXPRs which eventually expand to conditional code which
> would be much slower.

Even really fast multipliers have several cycles latency, and this is generally
fixed irrespectively of the inputs. Maybe you were thinking about division?

Additionally integer multiply typically has much lower throughput than other 
ALU operations like conditional move - a modern CPU may have 4 ALUs
but only 1 multiplier, so removing redundant integer multiplies is always good.

Note (m1 > m2) is also a conditional expression which will result in branches
for floating point expressions and on some targets even for integers. Moving
the multiply into the conditional expression generates the best code:

Integer version:
f1:
cmpw0, 100
csel   w0, w1, wzr, gt
ret
f2:
cmpw0, 100
cset   w0, gt
mulw0, w0, w1
ret

Float version:
f3:
movi   v1.2s, #0
cmpw0, 100
fcsel  s0, s0, s1, gt
ret
f4:
cmpw0, 100
bgt.L8
movi   v1.2s, #0
fmul   s0, s0, s1  // eh???
.L8:
ret

Wilco

[PATCH 1/7] sparc: put bmask* instructions in it's own insn type and adjust DFAs

2017-06-29 Thread Jose E. Marchesi
This patch introduces a new value for the insn type attribute bmask.
bmask instructions, which were previously typed as `array', are
adapted to use it, and finally the several DFA schedulers are updated
accordingly.

gcc/ChangeLog:

* config/sparc/sparc.md: New instruction type `bmask'.
(bmaskdi_vis): Use the `bmask' type.
(bmasksi_vis): Likewise.
* config/sparc/ultra3.md (us3_array): Likewise.
* config/sparc/niagara7.md (n7_array): Likewise.
* config/sparc/niagara4.md (n4_array): Likewise.
* config/sparc/niagara2.md (niag2_vis): Likewise.
(niag3_vis): Likewise.
* config/sparc/niagara.md (niag_vis): Likewise.
---
 gcc/ChangeLog| 12 
 gcc/config/sparc/niagara.md  |  2 +-
 gcc/config/sparc/niagara2.md |  4 ++--
 gcc/config/sparc/niagara4.md |  2 +-
 gcc/config/sparc/niagara7.md |  4 ++--
 gcc/config/sparc/sparc.md|  6 +++---
 gcc/config/sparc/ultra3.md   |  2 +-
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index f79771f..f9a1f6d 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array"))
+(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
   "niag_pipe*8")
diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md
index 9bcdd06..34ee630 100644
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@@ -111,10 +111,10 @@
 
 (define_insn_reservation "niag2_vis" 6
   (and (eq_attr "cpu" "niagara2")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*6")
 
 (define_insn_reservation "niag3_vis" 9
   (and (eq_attr "cpu" "niagara3")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*9")
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index ad0a04b..cc1bb75 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -66,7 +66,7 @@
 
 (define_insn_reservation "n4_array" 12
   (and (eq_attr "cpu" "niagara4")
-(eq_attr "type" "array,edge,edgen"))
+(eq_attr "type" "array,bmask,edge,edgen"))
   "n4_slot1, nothing*11")
 
 (define_insn_reservation "n4_vis_move_1cycle" 1
diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 12d6ab0..3dc8f9e 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -71,7 +71,7 @@
 
 (define_insn_reservation "n7_array" 12
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "array,edge,edgen"))
+(eq_attr "type" "array,bmask,edge,edgen"))
   "n7_slot1, nothing*11")
 
 (define_insn_reservation "n7_fpdivs" 24
@@ -133,4 +133,4 @@
   (eq_attr "v3pipe" "false")))
   "n7_slot1, nothing*10")
 
-(define_bypass 3 "*_v3pipe" "*_v3pipe")
+(define_bypass 3 "n7*_v3pipe" "n7_*_v3pipe")
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 5c5096b..da23060 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -281,7 +281,7 @@
fpcmp,
fpmul,fpdivs,fpdivd,
fpsqrts,fpsqrtd,
-   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,
+   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
cmove,
ialuX,
multi,savew,flushw,iflush,trap,lzd"
@@ -9134,7 +9134,7 @@
 (plus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_VIS2 && TARGET_ARCH64"
   "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
+  [(set_attr "type" "bmask")
(set_attr "v3pipe" "true")])
 
 (define_insn "bmasksi_vis"
@@ -9145,7 +9145,7 @@
 (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
   "TARGET_VIS2"
   "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
+  [(set_attr "type" "bmask")
(set_attr "v3pipe" "true")])
 
 (define_insn "bshuffle_vis"
diff --git a/gcc/config/sparc/ultra3.md b/gcc/config/sparc/ultra3.md
index 6296b38..f5b81d6 100644
--- a/gcc/config/sparc/ultra3.md
+++ b/gcc/config/sparc/ultra3.md
@@ -56,7 +56,7 @@
 
 (define_insn_reservation "us3_array" 2
   (and (eq_attr "cpu" "ultrasparc3")
-(eq_attr "type" "array,edgen"))
+(eq_attr "type" "array,edgen,bmask"))
   "us3_ms + us3_slotany, nothing")
 
 ;; ??? Not entirely accurate.
-- 
2.3.4



[PATCH 3/7] sparc: introduce insn subtypes

2017-06-29 Thread Jose E. Marchesi
This patch introduces a new insn attribute `subtype', and marks
existing insns appropriately.  The resulting instruction hierarchy is
documented in a comment.

gcc/ChangeLog:

* config/sparc/sparc.md ("subtype"): New insn attribute.
("*wrgsr_sp64"): Set insn subtype.
("*rdgsr_sp64"): Likewise.
("alignaddrsi_vis"): Likewise.
("alignaddrdi_vis"): Likewise.
("alignaddrlsi_vis"): Likewise.
("alignaddrldi_vis"): Likewise.
("3"): Likewise.
("fexpand_vis"): Likewise.
("fpmerge_vis"): Likewise.
("faligndata_vis"): Likewise.
("bshuffle_vis"): Likewise.
("cmask8_vis"): Likewise.
("cmask16_vis"): Likewise.
("cmask32_vis"): Likewise.
("fchksm16_vis"): Likewise.
("v3"): Likewise.
("fmean16_vis"): Likewise.
("fp64_vis"): Likewise.
("v8qi3"): Likewise.
("3"): Likewise.
("3"): Likewise.
("3"): Likewise.
("v8qi3"): Likewise.
("3"): Likewise.
("*movqi_insn"): Likewise.
("*movhi_insn"): Likewise.
("*movsi_insn"): Likewise.
("movsi_pic_gotdata_op"): Likewise.
("*movdi_insn_sp32"): Likewise.
("*movdi_insn_sp64"): Likewise.
("movdi_pic_gotdata_op"): Likewise.
("*movsf_insn"): Likewise.
("*movdf_insn_sp32"): Likewise.
("*movdf_insn_sp64"): Likewise.
("*zero_extendhisi2_insn"): Likewise.
("*zero_extendqihi2_insn"): Likewise.
("*zero_extendqisi2_insn"): Likewise.
("*zero_extendqidi2_insn"): Likewise.
("*zero_extendhidi2_insn"): Likewise.
("*zero_extendsidi2_insn_sp64"): Likewise.
("ldfsr"): Likewise.
("prefetch_64"): Likewise.
("prefetch_32"): Likewise.
("tie_ld32"): Likewise.
("tie_ld64"): Likewise.
("*tldo_ldub_sp32"): Likewise.
("*tldo_ldub1_sp32"): Likewise.
("*tldo_ldub2_sp32"): Likewise.
("*tldo_ldub_sp64"): Likewise.
("*tldo_ldub1_sp64"): Likewise.
("*tldo_ldub2_sp64"): Likewise.
("*tldo_ldub3_sp64"): Likewise.
("*tldo_lduh_sp32"): Likewise.
("*tldo_lduh1_sp32"): Likewise.
("*tldo_lduh_sp64"): Likewise.
("*tldo_lduh1_sp64"): Likewise.
("*tldo_lduh2_sp64"): Likewise.
("*tldo_lduw_sp32"): Likewise.
("*tldo_lduw_sp64"): Likewise.
("*tldo_lduw1_sp64"): Likewise.
("*tldo_ldx_sp64"): Likewise.
("*mov_insn"): Likewise.
("*mov_insn_sp64"): Likewise.
("*mov_insn_sp32"): Likewise.
---
 gcc/ChangeLog |  68 
 gcc/config/sparc/sparc.md | 199 --
 2 files changed, 243 insertions(+), 24 deletions(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 04da8ae..d1bf6a7 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -268,7 +268,86 @@
  (eq_attr "cpu_feature" "vis4") (symbol_ref "TARGET_VIS4")]
 (const_int 0)))
 
-;; Insn type.
+;; The SPARC instructions used by the backend are organized into a
+;; hierarchy using the insn attributes "type" and "subtype".
+;;
+;; The mnemonics used in the list below are the architectural names
+;; used in the Oracle SPARC Architecture specs.  A / character
+;; separates the type from the subtype where appropriate.  For
+;; brevity, text enclosed in {} denotes alternatives, while text
+;; enclosed in [] is optional.
+;;
+;; Please keep this list updated.  It is of great help for keeping the
+;; correctness and coherence of the DFA schedulers.
+;;
+;; ialu:  
+;; ialuX: ADD[X]C SUB[X]C
+;; shift: SLL[X] SRL[X] SRA[X]
+;; cmove: MOV{A,N,NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;MOVF{A,N,U,G,UG,L,UL,LG,NE,E,UE,GE,UGE,LE,ULE,O}
+;;MOVR{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; compare: ADDcc ADDCcc ANDcc ORcc SUBcc SUBCcc XORcc XNORcc
+;; imul: MULX SMUL[cc] UMUL UMULXHI XMULX XMULXHI
+;; idiv: UDIVX SDIVX
+;; flush: FLUSH
+;; load/regular: LD{UB,UH,UW} LDFSR
+;; load/prefetch: PREFETCH
+;; fpload: LDF LDDF LDQF
+;; sload: LD{SB,SH,SW}
+;; store: ST{B,H,W,X} STFSR
+;; fpstore: STF STDF STQF
+;; cbcond: CWB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; CXB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; uncond_branch: BA BPA JMPL
+;; branch: B{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; BP{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; FB{U,G,UG,L,UL,LG,NE,BE,UE,GE,UGE,LE,ULE,O}
+;; call: CALL
+;; return: RESTORE RETURN
+;; fpmove: FABS{s,d,q} FMOV{s,d,q} FNEG{s,d,q}
+;; fpcmove: FMOV{S,D,Q}{icc,xcc,fcc}
+;; fpcrmove: FMOVR{s,d,q}{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; fp: FADD{s,d,q} FSUB{s,d,q} FHSUB{s,d} FNHADD{s,d} FNADD{s,d}
+;; FiTO{s,d,q} FsTO{i,x,d,q} FdTO{i,x,s,q} FxTO{d,s,q} FqTO{i,x,s,d}
+;; fpcmp: FCMP{s,d,q} FCMPE{s,d,q}
+;; fpmul: FMADD{s,d}  FMSUB{s,d} FMUL{s,d,q} FNMADD{s,d}
+;;FNMSUB{s,d} FNMUL{s,d} FNsMULd FsMULd
+;;FdMU

[PATCH 0/7] Support for the SPARC M8 cpu

2017-06-29 Thread Jose E. Marchesi
This patch serie adds support for the SPARC M8 processor to GCC.
The SPARC M8 processor implements the Oracle SPARC Architecture 2017.

The first four patches are preparatory work:

- bmask* instructions are put in their own instruction type.  It makes
  little sense to have them in the same category than array
  instructions.

- Similarly, VIS compare instructions are put in their own instruction
  type.  This is to better accommodate subtypes, which are not quite
  the same than the subtypes of `visl' instructions.

- The introduction of a new `subtype' insn attribute in sparc.md
  avoids the need for adjusting the instruction scheduler DFAs for
  previous cpu models every time a new cpu is introduced.

- The full set of SPARC instructions used in sparc.md, and their
  position in the type/subtype hierarchy, is documented in a comment.
  This eases the modification of the DFA schedulers, and the addition
  of new cpus.

- The M7 DFA scheduler is reworked:

  + To use the new type/subtype hierarchy.
  + The v3pipe insn attribute is no longer needed.
  + More accurate latencies for instructions.
  + The C4 core pipeline is documented in a comment in niagara7.md.

The next three patches introduce M8 support proper:

- Support for -mcpu=m8 (we are thus suggesting to abandon the niagaraN
  denomination for M8 and later processors.)

- Support for a new VIS level, VIS4B, covering the new VIS
  instructions introduced in OSA2017 and implemented in the M8.  Also
  built-ins.

  Note that no new VIS level was formally introduced in OSA2017, even
  if many new VIS instructions were added to the spec.  We introduced
  VIS4B for coherence (like availability of builtins and visintrin.h
  depending on the value of __VIS__) and avoided using VIS5 in case it
  is introduced in future versions of the Oracle SPARC Architecture.

- A M8 DFA scheduler:

  + Also based on the new type/subtype hierarchy.
  + The functional units in the C5 core are explicitly documented in a
comment in m8.md.

See the individual patch descriptions for more information and
associated ChangeLog entries.

After this serie gets integrated upstream we will be contributing more
support for M8 capabilities, such as support for using the new
misaligned load/store instructions for memory accesses known to be
misaligned at compile-time.

Note that full binutils support for M8 was upstreamed in May 19.
Bootstrapped and tested in sparc64-linux-gnu.  No regressions.


Jose E. Marchesi (7):
  sparc: put bmask* instructions in it's own insn type and adjust DFAs
  sparc: put VIS compare instructions in it's own insn type and adjust
DFAs
  sparc: introduce insn subtypes
  sparc: reworked M7 DFA based on instruction subtypes
  sparc: basic support for the SPARC M8 cpu
  sparc: support for VIS4B instructions
  sparc: M8 DFA scheduler

 gcc/ChangeLog   | 226 +
 gcc/config.gcc  |   2 +-
 gcc/config.in   |   4 +
 gcc/config/sparc/constraints.md |  12 +-
 gcc/config/sparc/driver-sparc.c |   1 +
 gcc/config/sparc/m8.md  | 242 ++
 gcc/config/sparc/niagara.md |   2 +-
 gcc/config/sparc/niagara2.md|   4 +-
 gcc/config/sparc/niagara4.md|   7 +-
 gcc/config/sparc/niagara7.md| 181 +-
 gcc/config/sparc/predicates.md  |  27 +++
 gcc/config/sparc/sol2.h |  14 +-
 gcc/config/sparc/sparc-c.c  |   7 +-
 gcc/config/sparc/sparc-opts.h   |   1 +
 gcc/config/sparc/sparc.c| 312 ++--
 gcc/config/sparc/sparc.h|  20 +-
 gcc/config/sparc/sparc.md   | 364 +---
 gcc/config/sparc/sparc.opt  |   7 +
 gcc/config/sparc/ultra1_2.md|   8 +-
 gcc/config/sparc/ultra3.md  |   4 +-
 gcc/configure   |  35 +++
 gcc/configure.ac|  12 +
 gcc/doc/extend.texi |  39 +++
 gcc/doc/invoke.texi |  25 +-
 gcc/testsuite/ChangeLog |   8 +
 gcc/testsuite/gcc.target/sparc/dictunpack.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpshl.c   |  81 +++
 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpushl.c  |  43 
 30 files changed, 1579 insertions(+), 184 deletions(-)
 create mode 100644 gcc/config/sparc/m8.md
 create mode 100644 gcc/testsuite/gcc.target/sparc/dictunpack.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpushl.c

-- 
2.3.4



Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra  wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform?  I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is 
> generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always 
> good.
>
> Note (m1 > m2) is also a conditional expression which will result in branches
> for floating point expressions and on some targets even for integers. Moving
> the multiply into the conditional expression generates the best code:
>
> Integer version:
> f1:
> cmpw0, 100
> csel   w0, w1, wzr, gt
> ret
> f2:
> cmpw0, 100
> cset   w0, gt
> mulw0, w0, w1
> ret
>
> Float version:
> f3:
> movi   v1.2s, #0
> cmpw0, 100
> fcsel  s0, s0, s1, gt
> ret
> f4:
> cmpw0, 100
> bgt.L8
> movi   v1.2s, #0
> fmul   s0, s0, s1  // eh???
> .L8:
> ret

But then

int f (int m, int c)
{
  return (m & 1) * c;
}
int g (int m, int c)
{
  if (m & 1 != 0)
return c;
  return 0;
}

f:
.LFB0:
.cfi_startproc
andl$1, %edi
movl%edi, %eax
imull   %esi, %eax
ret
g:
.LFB1:
.cfi_startproc
movl%edi, %eax
andl$1, %eax
cmovne  %esi, %eax
ret

anyway.  As a general comment to the patch please do it as
a pattern in match.pd

(match boolean_value_range_p
 @0
 (if (INTEGRAL_TYPE_P (type)
  && TYPE_PRECISION (type) == 1)))
(match boolean_value_range_p
 INTEGER_CST
 (if (integer_zerop (t) || integer_onep (t
(match boolean_value_range_p
 SSA_NAME
 (if (INTEGRAL_TYPE_P (type)
  && ~get_nonzero_bits (t) == 1)))

(simplify
 (mult:c boolean_value_range_p@0 @1)
 (cond @0 @1 @0))

or something like that.

Richard.

> Wilco


[PATCH 2/7] sparc: put VIS compare instructions in it's own insn type and adjust DFAs

2017-06-29 Thread Jose E. Marchesi
This patch introduces a new value for the insn type attribute viscmp.
VIS comparison insn are adapted to use it, and finally the DFA
schedulers are updated accordingly.

gcc/ChangeLog:

* config/sparc/sparc.md ("type"): New insn type viscmp.
("fcmp_vis"): Set insn type to
viscmp.
("fpcmp8_vis"): Likewise.
("fucmp8_vis"): Likewise.
("fpcmpu_vis"): Likewise.
* config/sparc/niagara7.md ("n7_vis_logical_v3pipe"): Handle
viscmp.
("n7_vis_logical_11cycle"): Likewise.
* config/sparc/niagara4.md ("n4_vis_logical"): Likewise.
* config/sparc/niagara2.md ("niag3_vis": Likewise.
* config/sparc/niagara.md ("niag_vis"): Likewise.
* config/sparc/ultra3.md ("us3_fga"): Likewise.
* config/sparc/ultra1_2.md ("us1_fga_double"): Likewise.
---
 gcc/ChangeLog| 17 +
 gcc/config/sparc/niagara.md  |  2 +-
 gcc/config/sparc/niagara2.md |  4 ++--
 gcc/config/sparc/niagara4.md |  5 +++--
 gcc/config/sparc/niagara7.md |  4 ++--
 gcc/config/sparc/sparc.md| 15 ++-
 gcc/config/sparc/ultra1_2.md |  8 
 gcc/config/sparc/ultra3.md   |  2 +-
 8 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index f9a1f6d..a8e23b8 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
+(eq_attr "type" 
"fga,visl,viscmp,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
   "niag_pipe*8")
diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md
index 34ee630..3190d55 100644
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@@ -111,10 +111,10 @@
 
 (define_insn_reservation "niag2_vis" 6
   (and (eq_attr "cpu" "niagara2")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*6")
 
 (define_insn_reservation "niag3_vis" 9
   (and (eq_attr "cpu" "niagara3")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*9")
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index cc1bb75..a3417d2 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -90,8 +90,9 @@
 
 (define_insn_reservation "n4_vis_logical" 3
   (and (eq_attr "cpu" "niagara4")
-(and (eq_attr "type" "visl,pdistn")
-  (eq_attr "fptype" "double")))
+   (ior (and (eq_attr "type" "visl,pdistn")
+ (eq_attr "fptype" "double"))
+(eq_attr "type" "viscmp")))
   "n4_slot1, nothing*2")
 
 (define_insn_reservation "n4_vis_logical_11cycle" 11
diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 3dc8f9e..3f46198 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -123,13 +123,13 @@
 
 (define_insn_reservation "n7_vis_logical_v3pipe" 11
   (and (eq_attr "cpu" "niagara7")
-(and (eq_attr "type" "visl,pdistn")
+(and (eq_attr "type" "visl,viscmp,pdistn")
  (eq_attr "v3pipe" "true")))
   "n7_slot1, nothing*2")
 
 (define_insn_reservation "n7_vis_logical_11cycle" 11
   (and (eq_attr "cpu" "niagara7")
-(and (eq_attr "type" "visl")
+(and (eq_attr "type" "visl,viscmp")
   (eq_attr "v3pipe" "false")))
   "n7_slot1, nothing*10")
 
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index da23060..04da8ae 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -281,7 +281,8 @@
fpcmp,
fpmul,fpdivs,fpdivd,
fpsqrts,fpsqrtd,
-   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
+   fga,visl,vismv,viscmp,
+   fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
cmove,
ialuX,
multi,savew,flushw,iflush,trap,lzd"
@@ -9059,8 +9060,7 @@
 UNSPEC_FCMP))]
   "TARGET_VIS"
   "fcmp\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")
+  [(set_attr "type" "viscmp")
(set_attr "v3pipe" "true")])
 
 (define_insn "fpcmp8_vis"
@@ -9070,8 +9070,7 @@
 UNSPEC_FCMP))]
   "TARGET_VIS4"
   "fpcmp8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")])
+  [(set_attr "type" "viscmp")])
 
 (define_expand "vcond"
   [(match_operand:GCM 0 "register_operand" "")
@@ -9427,8 +9426,7 @@
 UNSPEC_FUCMP))]
   "TARGET_VIS3"
   "fucmp8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "viscmp")])
 
 (define_insn "fpcmpu_vis"
   [(set (match_operand:P 0 "register_operand" "=r")
@@ -9437,8 +9435,7 @@
 UNSPE

[PATCH 5/7] sparc: basic support for the SPARC M8 cpu

2017-06-29 Thread Jose E. Marchesi
This patch adds the following support for the SPARC M8 cpu, which
implements the Oracle SPARC Architecture 2017:

- Support for -mcpu=m8 and -mtune=m8.
- Definition of cpu target macros and specs in the backend.
- Tuning of backend parameters for the M8.
- Addition of a new cpu type m8 in the machine description.

gcc/ChangeLog:

* config.gcc: Handle m8 in --with-{cpu,tune} options.
* config.in: Add HAVE_AS_SPARC6 define.
* config/sparc/driver-sparc.c (cpu_names): Add entry for the SPARC
M8.
* config/sparc/sol2.h (CPP_CPU64_DEFAULT_SPEC): Define for
TARGET_CPU_m8.
(ASM_CPU32_DEFAUILT_SPEC): Likewise.
(CPP_CPU_SPEC): Handle m8.
(ASM_CPU_SPEC): Likewise.
* config/sparc/sparc-opts.h (enum processor_type): Add
PROCESSOR_M8.
* config/sparc/sparc.c (m8_costs): New struct.
(sparc_option_override): Handle TARGET_CPU_m8.
(sparc32_initialize_trampoline): Likewise.
(sparc64_initialize_trampoline): Likewise.
(sparc_issue_rate): Likewise.
(sparc_register_move_cost): Likewise.
* config/sparc/sparc.h (TARGET_CPU_m8): Define.
(CPP_CPU64_DEFAULT_SPEC): Define for M8.
(ASM_CPU64_DEFAULT_SPEC): Likewise.
(CPP_CPU_SPEC): Handle M8.
(ASM_CPU_SPEC): Likewise.
(AS_M8_FLAG): Define.
* config/sparc/sparc.md: Add m8 to the cpu attribute.
* config/sparc/sparc.opt: New option -mcpu=m8 for sparc targets.
* configure.ac (HAVE_AS_SPARC6): Check for assembler support for
M8 instructions.
* configure: Regenerate.
* doc/invoke.texi (SPARC Options): Document -mcpu=m8 and
-mtune=m8.
---
 gcc/ChangeLog   | 33 +
 gcc/config.gcc  |  2 +-
 gcc/config.in   |  4 +++
 gcc/config/sparc/driver-sparc.c |  1 +
 gcc/config/sparc/sol2.h | 14 +++--
 gcc/config/sparc/sparc-opts.h   |  1 +
 gcc/config/sparc/sparc.c| 65 ++---
 gcc/config/sparc/sparc.h| 16 +-
 gcc/config/sparc/sparc.md   |  3 +-
 gcc/config/sparc/sparc.opt  |  3 ++
 gcc/configure   | 35 ++
 gcc/configure.ac| 12 
 gcc/doc/invoke.texi | 12 
 13 files changed, 181 insertions(+), 20 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a97bbc8..090b81f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4425,7 +4425,7 @@ case "${target}" in
| sparclite | f930 | f934 | sparclite86x \
| sparclet | tsc701 \
| v9 | ultrasparc | ultrasparc3 | niagara | niagara2 \
-   | niagara3 | niagara4 | niagara7)
+   | niagara3 | niagara4 | niagara7 | m8)
# OK
;;
*)
diff --git a/gcc/config.in b/gcc/config.in
index bf2aa7b..bff886a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -660,6 +660,10 @@
 #undef HAVE_AS_SPARC5_VIS4
 #endif
 
+/* Define if your assembler supports SPARC6 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPARC6
+#endif
 
 /* Define if your assembler and linker support GOTDATA_OP relocs. */
 #ifndef USED_FOR_TARGET
diff --git a/gcc/config/sparc/driver-sparc.c b/gcc/config/sparc/driver-sparc.c
index b96ef47..0c25d6c 100644
--- a/gcc/config/sparc/driver-sparc.c
+++ b/gcc/config/sparc/driver-sparc.c
@@ -79,6 +79,7 @@ static const struct cpu_names {
 #endif
   { "SPARC-M7","niagara7" },
   { "SPARC-S7","niagara7" },
+  { "SPARC-M8","m8" },
   { NULL,  NULL }
   };
 
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
index 8a50bfe..b8177c0 100644
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -174,13 +174,22 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_NIAGARA7_FLAG
 #endif
 
+#if TARGET_CPU_DEFAULT == TARGET_CPU_m8
+#undef CPP_CPU64_DEFAULT_SPEC
+#define CPP_CPU64_DEFAULT_SPEC ""
+#undef ASM_CPU32_DEFAULT_SPEC
+#define ASM_CPU32_DEFAULT_SPEC AS_SPARC32_FLAG AS_M8_FLAG
+#undef ASM_CPU64_DEFAULT_SPEC
+#define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_M8_FLAG
+#endif
+
 #undef CPP_CPU_SPEC
 #define CPP_CPU_SPEC "\
 %{mcpu=sparclet|mcpu=tsc701:-D__sparclet__} \
 %{mcpu=sparclite|mcpu-f930|mcpu=f934:-D__sparclite__} \
 %{mcpu=v8:" DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{mcpu=supersparc:-D__supersparc__ " DEF_ARCH32_SPEC("-D__sparcv8") "} \
-%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7:"
 DEF_ARCH32_SPEC("-D__sparcv8") "} \
+%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7|mcpu=m8:"
 DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{!mcpu*:%(cpp_cpu_default)} \
 "
 
@@ -290

[PATCH 4/7] sparc: reworked M7 DFA based on instruction subtypes

2017-06-29 Thread Jose E. Marchesi
This patch reworks the M7 DFA scheduler to use instruction subtypes.  It
also removes the v3pipe insn attribute from sparc.md, as it is no longer
needed.

gcc/ChangeLog:

* config/sparc/niagara7.md: Rework the DFA scheduler to use insn
subtypes.
* config/sparc/sparc.md: Remove the `v3pipe' insn attribute.
("*movdi_insn_sp32"): Likewise.
("*movsi_insn"): Likewise.
("*movdi_insn_sp64"): Likewise.
("*movsf_insn"): Likewise.
("*movdf_insn_sp32"): Likewise.
("*movdf_insn_sp64"): Likewise.
("*zero_extendsidi2_insn_sp64"): Likewise.
("*sign_extendsidi2_insn"): Likewise.
("*mov_insn"): Likewise.
("*mov_insn_sp64"): Likewise.
("*mov_insn_sp32"): Likewise.
("3"): Likewise.
("3"): Likewise.
("*not_3"): Likewise.
("*nand_vis"): Likewise.
("*_not1_vis"): Likewise.
("*_not2_vis"): Likewise.
("one_cmpl2"): Likewise.
("faligndata_vis"): Likewise.
("alignaddrsi_vis"): Likewise.
("alignaddrdi_vis"): Likweise.
("alignaddrlsi_vis"): Likewise.
("alignaddrldi_vis"): Likewise.
("fcmp_vis"): Likewise.
("bmaskdi_vis"): Likewise.
("bmasksi_vis"): Likewise.
("bshuffle_vis"): Likewise.
("cmask8_vis"): Likewise.
("cmask16_vis"): Likewise.
("cmask32_vis"): Likewise.
("pdistn_vis"): Likewise.
("3"): Likewise.
---
 gcc/ChangeLog|  38 +
 gcc/config/sparc/niagara7.md | 181 ++-
 gcc/config/sparc/sparc.md|  93 +++---
 3 files changed, 192 insertions(+), 120 deletions(-)

diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 3f46198..23b6707 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -19,64 +19,120 @@
 
 (define_automaton "niagara7_0")
 
-(define_cpu_unit "n7_slot0,n7_slot1,n7_slot2" "niagara7_0")
-(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1 + n7_slot2")
+;; The S4 core has a dual-issue queue.  This queue is divided into two
+;; slots.  One instruction can be issued each cycle to each slot, and
+;; up to 2 instructions are committed each cycle.  Each slot serves
+;; several execution units, as depicted below:
+;;
+;;
+;; m7_slot0 - Integer unit.
+;;  - Load/Store unit.
+;; === QUEUE ==>
+;;
+;; m7_slot1 - Integer unit.
+;;  - Branch unit.
+;;  - Floating-point and graphics unit.
+;;  - 3-cycles crypto unit.
 
-(define_cpu_unit "n7_load_store" "niagara7_0")
+(define_cpu_unit "n7_slot0,n7_slot1" "niagara7_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1")
 
 (define_insn_reservation "n7_single" 1
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "multi,savew,flushw,trap"))
   "n7_single_issue")
 
-(define_insn_reservation "n7_iflush" 27
-  (and (eq_attr "cpu" "niagara7")
-   (eq_attr "type" "iflush"))
-  "(n7_slot0 | n7_slot1), nothing*26")
+;; Most of the instructions executing in the integer unit have a
+;; latency of 1.
 
 (define_insn_reservation "n7_integer" 1
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "ialu,ialuX,shift,cmove,compare"))
   "(n7_slot0 | n7_slot1)")
 
+;; Flushing the instruction memory takes 27 cycles.
+
+(define_insn_reservation "n7_iflush" 27
+  (and (eq_attr "cpu" "niagara7")
+   (eq_attr "type" "iflush"))
+  "(n7_slot0 | n7_slot1), nothing*26")
+
+;; The integer multiplication instructions have a latency of 12 cycles
+;; and execute in the integer unit.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+
 (define_insn_reservation "n7_imul" 12
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "imul"))
-  "n7_slot1, nothing*11")
+(eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(n7_slot0 | n7_slot1), nothing*11")
+
+;; The integer division instructions have a latency of 35 cycles and
+;; execute in the integer unit.
 
 (define_insn_reservation "n7_idiv" 35
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "idiv"))
-  "n7_slot1, nothing*34")
+  "(n7_slot0 | n7_slot1), nothing*34")
+
+;; Both integer and floating-point load instructions have a latency of
+;; 5 cycles, and execute in the slot0.
+;;
+;; The prefetch instruction also executes in the load/store unit, but
+;; its latency is only 1 cycle.
 
 (define_insn_reservation "n7_load" 5
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "load,fpload,sload"))
-  "(n7_slot0 + n7_load_store), nothing*4")
+   (ior (eq_attr "type" "fpload,sload")
+(and (eq_attr "type" "load")
+ (eq_attr "subtype" "regular"
+  "n7_slot0, nothing*4")
+
+(define_in

[PATCH 6/7] sparc: support for VIS4B instructions

2017-06-29 Thread Jose E. Marchesi
This patch adds suppport for the following VIS instructions, which are
introduced in the Oracle SPARC Architecture 2017 and implemented by the
SPARC M8 cpu:

- Dictionary unpack.
- Partitioned compare with shifted result.
- Unsigned partitioned compare with shifted result.
- Partitioned dual-equal compare with shifted result.
- Partitioned unsigned range compare with shifted result.

The facilities introduced are:

- A new option -mvis4b.
- Compiler built-ins for the above mentioned instructions.

Tests and documentation are also provided.

gcc/ChangeLog:

* config/sparc/sparc.opt: New option -mvis4b.
* config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_VIS4B.
(sparc_option_override): Handle VIS4B.
(enum sparc_builtins): Define
SPARC_BUILTIN_DICTUNPACK{8,16,32},
SPARC_BUILTIN_FPCMP{LE,GT,EQ,NE}{8,16,32}SHL,
SPARC_BUILTIN_FPCMPU{LE,GT}{8,16,32}SHL,
SPARC_BUILTIN_FPCMPDE{8,16,32}SHL and
SPARC_BUILTIN_FPCMPUR{8,16,32}SHL.
(check_constant_argument): New function.
(sparc_vis_init_builtins): Define builtins
__builtin_vis_dictunpack{8,16,32},
__builtin_vis_fpcmp{le,gt,eq,ne}{8,16,32}shl,
__builtin_vis_fpcmpu{le,gt}{8,16,32}shl,
__builtin_vis_fpcmpde{8,16,32}shl and
__builtin_vis_fpcmpur{8,16,32}shl.
(sparc_expand_builtin): Check that the constant operands to
__builtin_vis_fpcmp*shl and _builtin_vis_dictunpack* are indeed
constant and in range.
* config/sparc/sparc-c.c (sparc_target_macros): Handle
TARGET_VIS4B.
* config/sparc/sparc.h (SPARC_IMM2_P): Define.
(SPARC_IMM5_P): Likewise.
* config/sparc/sparc.md (cpu_feature): Add new feagure "vis4b".
(enabled): Handle vis4b.
(UNSPEC_DICTUNPACK): New unspec.
(UNSPEC_FPCMPSHL): Likewise.
(UNSPEC_FPUCMPSHL): Likewise.
(UNSPEC_FPCMPDESHL): Likewise.
(UNSPEC_FPCMPURSHL): Likewise.
(cpu_feature): New CPU feature `vis4b'.
(dictunpack{8,16,32}): New insns.
(FPCSMODE): New mode iterator.
(fpcscond): New code iterator.
(fpcsucond): Likewise.
(fpcmp{le,gt,eq,ne}{8,16,32}{si,di}shl): New insns.
(fpcmpu{le,gt}{8,16,32}{si,di}shl): Likewise.
(fpcmpde{8,16,32}{si,di}shl): Likewise.
(fpcmpur{8,16,32}{si,di}shl): Likewise.
* config/sparc/constraints.md: Define constraints `q' for unsigned
2-bit integer constants and `t' for unsigned 5-bit integer
constants.
* config/sparc/predicates.md (imm5_operand_dictunpack8): New
predicate.
(imm5_operand_dictunpack16): Likewise.
(imm5_operand_dictunpack32): Likewise.
(imm2_operand): Likewise.
* doc/invoke.texi (SPARC Options): Document -mvis4b.
* doc/extend.texi (SPARC VIS Built-in Functions): Document the
ditunpack* and fpcmp*shl builtins.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/dictunpack.c: New file.
* gcc.target/sparc/fpcmpdeshl.c: Likewise.
* gcc.target/sparc/fpcmpshl.c: Likewise.
* gcc.target/sparc/fpcmpurshl.c: Likewise.
* gcc.target/sparc/fpcmpushl.c: Likewise.
---
 gcc/ChangeLog   |  53 ++
 gcc/config/sparc/constraints.md |  12 +-
 gcc/config/sparc/predicates.md  |  27 +++
 gcc/config/sparc/sparc-c.c  |   7 +-
 gcc/config/sparc/sparc.c| 247 +++-
 gcc/config/sparc/sparc.h|   4 +
 gcc/config/sparc/sparc.md   |  73 +++-
 gcc/config/sparc/sparc.opt  |   4 +
 gcc/doc/extend.texi |  39 +
 gcc/doc/invoke.texi |  13 ++
 gcc/testsuite/ChangeLog |   8 +
 gcc/testsuite/gcc.target/sparc/dictunpack.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpshl.c   |  81 +
 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpushl.c  |  43 +
 16 files changed, 677 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/sparc/dictunpack.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpushl.c

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 7c9ef74..cff5a61 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B
-;;;ajklq  tuv xyz
+;;;ajkluv xyz
 
 
 ;; Register constraints
@@ -58,6 +58,16 @@
 
 ;; Integer constant constraints
 
+(define_constraint "q"
+ "Unsigned 2-bit integer constant"
+  (and (match_code "const_

[PATCH 7/7] sparc: M8 DFA scheduler

2017-06-29 Thread Jose E. Marchesi
This patch adds a DFA scheduler modelling the core S5 in the SPARC M8
processors.

gcc/ChangeLog:

* config/sparc/m8.md: New file.
* config/sparc/sparc.md: Include m8.md.
---
 gcc/ChangeLog |   5 +
 gcc/config/sparc/m8.md| 242 ++
 gcc/config/sparc/sparc.md |   1 +
 3 files changed, 248 insertions(+)
 create mode 100644 gcc/config/sparc/m8.md

diff --git a/gcc/config/sparc/m8.md b/gcc/config/sparc/m8.md
new file mode 100644
index 000..f0fe1b2
--- /dev/null
+++ b/gcc/config/sparc/m8.md
@@ -0,0 +1,242 @@
+;; Scheduling description for the SPARC M8.
+;;   Copyright (C) 2017 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; Thigs to improve:
+;;
+;; - Store instructions are implemented by micro-ops, one of which
+;;   generates the store address and is executed in the store address
+;;   generation unit in the slot0.  We need to model that.
+;;
+;; - There are two V3 pipes connected to different slots.  The current
+;;   implementation assumes that all the instructions executing in a
+;;   V3 pipe are issued to the unit in slot3.
+;;
+;; - Single-issue ALU operations incur an additional cycle of latency to
+;;   slot 0 and slot 1 instructions.  This is not currently reflected
+;;   in the DFA.
+
+(define_automaton "m8_0")
+
+;; The S5 core has two dual-issue queues, PQLS and PQEX.  Each queue
+;; is divided into two slots: PQLS corresponds to slots 0 and 1, and
+;; PQEX corresponds to slots 2 and 3.  The core can issue 4
+;; instructions per-cycle, and up to 4 instructions are committed each
+;; cycle.
+;;
+;;
+;;   m8_slot0  - Load Unit.
+;; - Store address gen. Unit.
+;;   
+;;
+;;   === PQLS ==>m8_slot1  - Store data unit.
+;; - Branch unit.
+;;
+;; 
+;;   === PQEX ==>m8_slot2  - Integer Unit (EXU2). 
+;; - 3-cycles Crypto Unit (SPU2).
+;; 
+;;   m8_slot3  - Integer Unit (EXU3).
+;; - 3-cycles Crypto Unit (SPU3).
+;; - Floating-point and graphics unit (FPG).
+;; - Long-latency Crypto Unit.
+;; - Oracle Numbers Unit (ONU).
+
+(define_cpu_unit "m8_slot0,m8_slot1,m8_slot2,m8_slot3" "m8_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "m8_single_issue" "m8_slot0 + m8_slot1 + m8_slot2 + 
m8_slot3")
+
+(define_insn_reservation "m8_single" 1
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "multi,savew,flushw,trap,bmask"))
+  "m8_single_issue")
+
+;; Most of the instructions executing in the integer units have a
+;; latency of 1.
+
+(define_insn_reservation "m8_integer" 1
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "ialu,ialuX,shift,cmove,compare,bmask"))
+  "(m8_slot2 | m8_slot3)")
+
+;; Flushing the instruction memory takes 27 cycles.
+
+
+(define_insn_reservation "m8_iflush" 27
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "iflush"))
+  "(m8_slot2 | m8_slot3), nothing*26")
+
+;; The integer multiplication instructions have a latency of 10 cycles
+;; and execute in integer units.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+;;
+;; However, the latency is only 9 cycles if the consumer of the
+;; operation is also capable of 9 cycles latency.  We model this with
+;; a bypass.
+
+(define_insn_reservation "m8_imul" 10
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(m8_slot2 | m8_slot3), nothing*12")
+
+(define_bypass 9 "m8_imul" "m8_imul")
+
+;; The integer division instructions `sdiv' and `udivx' have a latency
+;; of 30 cycles and execute in integer units.
+
+(define_insn_reservation "m8_idiv" 30
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "idiv"))
+  "(m8_slot2 | m8_slot3), nothing*29")
+
+;; Both integer and floating-point load instructions have a latency of
+;; only 3 cycles,

Re: [PATCH v2][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-06-29 Thread Jakub Jelinek
Hi!

Sorry for the review delay.

On Mon, Jun 26, 2017 at 03:49:23PM +0300, Maxim Ostapenko wrote:
>   (handle_builtin_stackrestore): Likewise.

The function is called with _ between stack and restore.

>   * match.pd: Add new pattern.

Unless the patch relies on this, I think it should be posted separately
and reviewed by Richard.

> @@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
>  static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>  static bool asan_shadow_offset_computed;
>  static vec sanitized_sections;
> +static tree last_alloca_addr = NULL_TREE;

You are shadowing this variable in multiple places.  Either rename it to
something different, or rename the results of get_last_alloca_addr.
And the " = NULL_TREE" part is not needed.

>  
>  /* Set of variable declarations that are going to be guarded by
> use-after-scope sanitizer.  */
> @@ -529,11 +531,183 @@ get_mem_ref_of_assignment (const gassign *assignment,
>return true;
>  }
>  
> +/* Return address of last allocated dynamic alloca.  */
> +
> +static tree
> +get_last_alloca_addr ()
> +{
> +  if (last_alloca_addr)
> +return last_alloca_addr;
> +
> +  gimple_seq seq = NULL;
> +  gassign *g;
> +
> +  last_alloca_addr = create_tmp_reg (ptr_type_node, "last_alloca_addr");
> +  g = gimple_build_assign (last_alloca_addr, NOP_EXPR,
> +build_int_cst (ptr_type_node, 0));

Instead of build_int_cst (ptr_type_node, 0) you should use
null_pointer_node.  And the NOP_EXPR there is just wrong, either it
should be gimple_build_assign (last_alloca_addr, null_pointer_node);
or gimple_build_assign (last_alloca_addr, INTEGER_CST, null_pointer_node);

> +  gimple_seq_add_stmt_without_update (&seq, g);

Why the seq stuff at all?  You have a single stmt you want to insert on
edge.

> +
> +  edge e = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  gsi_insert_seq_on_edge_immediate (e, seq);

So just use here
  gsi_insert_on_edge_immediate (e, g);
instead.

> +  return last_alloca_addr;
> +}
> +
> +/* Insert __asan_allocas_unpoison (top, bottom) call after
> +   __builtin_stack_restore (new_sp) call.
> +   The pseudocode of this routine should look like this:
> + __builtin_stack_restore (new_sp);
> + top = last_alloca_addr;
> + bot = virtual_dynamic_stack_rtx;
> + __asan_allocas_unpoison (top, bottom);
> + last_alloca_addr = new_sp;

The comment doesn't seem to agree with what you actually implement.
There is no virtual_dynamic_stack_rtx during the asan pass, it is there
only during expansion until the virtual regs are instantiated in the next
pass.  Furthermore, you have bot variable, but then use bottom.

> +  tree last_alloca_addr = get_last_alloca_addr ();

Here is the shadowing I talked about.

> +  tree restored_stack = gimple_call_arg (call, 0);
> +  tree fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCAS_UNPOISON);
> +  gimple *g = gimple_build_call (fn, 2, last_alloca_addr, restored_stack);

Here you clearly use the first argument of __builtin_stack_restore, which
is that new_sp.

> +  gimple_seq_add_stmt_without_update (&seq, g);

Why the messing up with sequences?  Just insert the stmt immediately in,
and the others as well.

> +  g = gimple_build_assign (last_alloca_addr, NOP_EXPR, restored_stack);

This is again wrong, here you really don't know what restored_stack is,
it could be SSA_NAME, but also something different, so you should use
gimple_build_assign (last_alloca_addr, restored_stack);
and let it figure out the rhs code.

> +  /* Extract lower bits from old_size.  */
> +  wide_int size_nonzero_bits = get_nonzero_bits (old_size);
> +  wide_int rz_mask
> += wi::uhwi (redzone_mask, wi::get_precision (size_nonzero_bits));
> +  wide_int old_size_lower_bits = wi::bit_and (size_nonzero_bits, rz_mask);
> +
> +  /* If alloca size is aligned to ASAN_RED_ZONE_SIZE, we don't need partial
> + redzone.  Otherwise, compute its size here.  */
> +  if (wi::ne_p (old_size_lower_bits, 0))
> +{
> +  /* misalign = size & (ASAN_RED_ZONE_SIZE - 1)
> + partial_size = ASAN_RED_ZONE_SIZE - misalign.  */
> +  g = gimple_build_assign (make_ssa_name (size_type_node, NULL),
> +BIT_AND_EXPR, old_size, alloca_rz_mask);
> +  gimple_seq_add_stmt_without_update (&seq, g);
> +  tree misalign = gimple_assign_lhs (g);
> +  g = gimple_build_assign (make_ssa_name (size_type_node, NULL), 
> MINUS_EXPR,
> +redzone_size, misalign);
> +  gimple_seq_add_stmt_without_update (&seq, g);

Again, why add the stmts into a seq first instead of just adding it
immediately into the IL?
> @@ -4962,6 +4962,20 @@ expand_builtin_alloca (tree exp)
>return result;
>  }
>  

Missing function comment here.

> +static rtx
> +expand_asan_emit_allocas_unpoison (tree exp)
> +{
> +  tree arg0 = CALL_EXPR_ARG (exp, 0);
> +  rtx top = expand_expr (arg0, NULL_RTX, GET_MODE 
> (virtual_stack_dynamic_rtx),
> + 

[PATCH] Fix PR81249

2017-06-29 Thread Richard Biener

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-06-29  Richard Biener  

PR tree-optimization/81249
* tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
condition reduction result to original scalar type.

* g++.dg/torture/pr81249.C: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 249780)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4833,12 +4858,9 @@ vect_create_epilog_for_reduction (vecu.fld[1].rt_rtx;
+   if (e)
+ c = 0;
+}
+}


[committed, PATCH] Add a testcase for PR ipa/80565

2017-06-29 Thread H.J. Lu
The testcase in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80565

is passing now.  Check in the testcase and resolve it as fixed.

PR ipa/80565
* gcc.dg/torture/pr80565.c: New file.
---
 gcc/testsuite/ChangeLog|  5 +
 gcc/testsuite/gcc.dg/torture/pr80565.c | 41 ++
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr80565.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index c38422a..ad77af2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-06-29  H.J. Lu  
+
+   PR ipa/80565
+   * gcc.dg/torture/pr80565.c: New file.
+
 2017-06-29  Bin Cheng  
 
PR tree-optimization/81196
diff --git a/gcc/testsuite/gcc.dg/torture/pr80565.c 
b/gcc/testsuite/gcc.dg/torture/pr80565.c
new file mode 100644
index 000..6662065
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr80565.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+
+int a, b, c, e, h, j;
+char d;
+short f, g;
+static short fn2(int p1) {
+  for (;;)
+for (; g; g++)
+  if (p1)
+break;
+}
+
+static short fn3(void);
+static char fn4(char p1) {
+  int i;
+  for (; d;)
+f = 8;
+  for (; f; f = 0)
+for (; i; i++) {
+  j = 0;
+  for (; j; j++)
+;
+}
+}
+
+static short fn1(short p1) { fn2(b || fn3()); }
+
+short fn3(void) {
+  if (c) {
+fn4(e);
+h = 0;
+for (;; h++)
+  ;
+  }
+}
+
+int main() {
+  for (; a;)
+fn1(c);
+  return 0;
+}
-- 
2.9.4



Re: [PATCH, rs6000] Signed builtin support

2017-06-29 Thread Bill Schmidt
On Jun 28, 2017, at 4:48 PM, Segher Boessenkool  
wrote:
> 
> Hi Carl,
> 
> On Wed, Jun 28, 2017 at 08:37:35AM -0700, Carl Love wrote:
>> The following patch adds support for the vec_signed, vec_signede,
>> vec_signedo and vec_signed2 builtins.
> 
> Are those names ABI-dictated?

Unfortunately, yes.  I didn't like those names much, either, but it's where we 
ended up.

Bill



Re: [PATCH] Fix PR middle-end/81194, ICE during RTL pass: expand

2017-06-29 Thread Peter Bergner
On 6/29/17 3:51 AM, Richard Biener wrote:
> On Thu, Jun 29, 2017 at 10:34 AM, Richard Biener
> To answer myself the unreachable case vanishes at
> execute_cleanup_cfg_post_optimizing
> via group_case_labels.  find_taken_edge wouldn't handle this case either.
> 
> I am testing a patch fixing both - your patch should still go in.

Ok, committed as revision 249783.  Thanks.

Peter



Re: [PATCH-v3] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-29 Thread Eric Botcazou
> This patch adds a workaround to the Sparc backend for the LEON3FT
> store-store errata. It is enabled when using the -mfix-ut699,
> -mfix-ut700, or -mfix-gr712rc flag.

Let's forget -mfix-gr712rc for now, -mfix-ut700 is enough I think.

> The workaround inserts NOP instructions to prevent the following two
> instruction sequences from being generated:
> 
> std -> stb/sth/st/std
> stb/sth/st -> any single non-store/load instruction -> stb/sth/st/std
> 
> The __FIX_B2BST define can be used to only enable workarounds in assembly
> code when the flag is used.

I'm not thrilled with this, it's undocumented, the other workaround don't have 
it and I don't think that we really need it.

> See GRLIB-TN-0009, "LEON3FT Stale Cache Entry After Store with Data Tag
> Parity Error", for more information.
> 
> gcc/ChangeLog:
> 
> 2017-06-21  Daniel Cederman  
> 
>   * config/sparc/sparc.c (sparc_do_work_around_errata): Insert NOP
>   instructions to prevent sequences that can trigger the store-store
>   errata for certain LEON3FT processors.
>   (sparc_option_override): -mfix-ut699, -mfix-ut700, and
>   -mfix-gr712rc enables the errata workaround.
>   * config/sparc/sparc-c.c (sparc_target_macros): Define __FIX_B2BST
>   when errata workaround is enabled.
>   * config/sparc/sparc.md: Prevent stores in delay slot.
>   * config/sparc/sparc.opt: Add -mfix-ut700 and -mfix-gr712rc flag.
>   * doc/invoke.texi: Document -mfix-ut700 and -mfix-gr712rc flag.

OK for mainline and 7 branch modulo the above two remarks.

-- 
Eric Botcazou


Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-06-29 Thread Alan Modra
Ping?  Linux startfile and endfile specs.
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01678.html

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Jeff Law
On 06/29/2017 05:20 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>>> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
> 
>> What's the reason of this transform?  I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
> 
> Even really fast multipliers have several cycles latency, and this is 
> generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
And on some targets, just getting the arguments into the right register
bank is many cycles.  Think HPPA where integer multiply occurs in the
floating point unit.  Though I don't think that oddity should drive this
discussion.


> 
> Additionally integer multiply typically has much lower throughput than other 
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always 
> good.
I'd tend to agree in general.

jeff


[PATCH, GCC/ARM, 0/3] Add support for ARMv8-R

2017-06-29 Thread Thomas Preudhomme

Hi,

This patch series adds support for the ARMv8-R architecture[1] and ARM 
Cortex-R52[2] to GCC. The patch series consist of the following patches:


[ 1/3] Add missing MIDR information for ARM Cortex-R7 and Cortex-R8 processor
[ 2/3] Add support for ARMv8-R architecture
[ 3/3] Add support for ARM Cortex-R52

[1] 
https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile

[2] https://developer.arm.com/products/processors/cortex-r/cortex-r52


[PATCH 1/3, GCC/ARM] Add MIDR info for ARM Cortex-R7 and Cortex-R8

2017-06-29 Thread Thomas Preudhomme

Hi,

The driver is missing MIDR information for processors ARM Cortex-R7 and
Cortex-R8 to support -march/-mcpu/-mtune=native on the command line.
This patch adds the missing information.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
Cortex-R7 and Cortex-R8 processors.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},
+{"0xc17", "armv7-r", "cortex-r7"},
+{"0xc18", "armv7-r", "cortex-r8"},
 {"0xc20", "armv6-m", "cortex-m0"},
 {"0xc21", "armv6-m", "cortex-m1"},
 {"0xc23", "armv7-m", "cortex-m3"},


Re: [PATCH] make find_taken_edge handle case with just default

2017-06-29 Thread Peter Bergner
On 6/29/17 4:03 AM, Richard Biener wrote:
> 
> This refactors things a bit to make CFG cleanup handle switches with
> just a default label.  If we make sure to cleanup the CFG after
> group_case_labels removes cases with just __builtin_unreachable ()
> inside then this fixes the ICE seen in PR81994 as well.
> 
> I wonder if find_taken_edge should generally handle successors
> with __builtin_unreachable () -- OTOH that would get rid of those
> too early I guess.

Should we offer an early out of group_case_labels_stmt() for the
fairly common case of new_size == old_size?  There's no reason to
execute the compress labels loop if we didn't combine any of the
labels.

Peter


Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 249783)
+++ gcc/tree-cfg.c  (working copy)
@@ -1747,6 +1747,11 @@ group_case_labels_stmt (gswitch *stmt)
}
 }

+  gcc_assert (new_size <= old_size);
+
+  if (new_size == old_size)
+return false;
+
   /* Compress the case labels in the label vector, and adjust the
  length of the vector.  */
   for (i = 0, j = 0; i < new_size; i++)
@@ -1757,9 +1762,8 @@ group_case_labels_stmt (gswitch *stmt)
   gimple_switch_label (stmt, j++));
 }

-  gcc_assert (new_size <= old_size);
   gimple_switch_set_num_labels (stmt, new_size);
-  return new_size < old_size;
+  return true;
 }

 /* Look for blocks ending in a multiway branch (a GIMPLE_SWITCH),



[PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture

2017-06-29 Thread Thomas Preudhomme

Hi,

This patch adds support for ARMv8-R architecture [1] which was recently
announced. User level instructions for ARMv8-R are the same as those in
ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same
features as ARMv8-A in ARM backend.

[1] 
https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry.
* config/arm/arm-cpu-cdata.h: Regenerate.
* config/arm/arm-cpu-data.h: Regenerate.
* config/arm/arm-isa.h (ISA_ARMv8r): Define macro.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R
enumerator.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and
ARMv8-R with CRC extensions.
* doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc
options.  Document meaning of -march=armv8-r+rcr.

*** gcc/testsuite/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* lib/target-supports.exp: Generate
check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r
and check_effective_target_arm_arch_v8r_multilib.

*** libgcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R.

Tested by building an arm-none-eabi GCC cross-compiler targetting
ARMv8-R.

Is this ok for stage1?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
index b3888120daa8494eb41bde0368122ad2f06d81af..0a122f5febaaceeeb5a405cb5a64e1edd9b044f3 100644
--- a/gcc/config/arm/arm-cpu-cdata.h
+++ b/gcc/config/arm/arm-cpu-cdata.h
@@ -1041,6 +1041,20 @@ static const struct arm_arch_core_flag arm_arch_core_flags[] =
 },
   },
   {
+"armv8-r",
+{
+  ISA_ARMv8r,
+  isa_nobit
+},
+  },
+  {
+"armv8-r+crc",
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+  },
+  {
 "iwmmxt",
 {
   ISA_ARMv5te,isa_bit_xscale,isa_bit_iwmmxt,
diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
index d6200f9bdc09a9d0c973853b0152a2800eaf2fe5..48c1d88032c1c5dc7c6cba71511f79fe9f2533ea 100644
--- a/gcc/config/arm/arm-cpu-data.h
+++ b/gcc/config/arm/arm-cpu-data.h
@@ -1478,6 +1478,26 @@ static const struct processors all_architectures[] =
 NULL
   },
   {
+"armv8-r", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,
+  isa_nobit
+},
+NULL
+  },
+  {
+"armv8-r+crc", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+NULL
+  },
+  {
 "iwmmxt", TARGET_CPU_iwmmxt,
 (TF_LDSCHED | TF_STRONG | TF_XSCALE),
 "5TE", BASE_ARCH_5TE,
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index fc5d935182ba70de5ab2aefeec492318f42e95c5..be1f0ca4e38ae76683b77d8c3b79a066e62325d7 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -287,6 +287,20 @@ begin arch armv8-m.main+dsp
  isa ARMv8m_main bit_ARMv7em
 end arch armv8-m.main+dsp
 
+begin arch armv8-r
+ tune for cortex-r4
+ tune flags CO_PROC
+ base 8R
+ isa ARMv8r
+end arch armv8-r
+
+begin arch armv8-r+crc
+ tune for cortex-r4
+ tune flags CO_PROC
+ base 8R
+ isa ARMv8r bit_crc32
+end arch armv8-r+crc
+
 begin arch iwmmxt
  tune for iwmmxt
  tune flags LDSCHED STRONG XSCALE
diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
index 6050bca95587f68a3671dd2144cf845b83da3692..24ec398b346f8effb346235d6f3ab20eb6f70e0f 100644
--- a/gcc/config/arm/arm-isa.h
+++ b/gcc/config/arm/arm-isa.h
@@ -125,6 +125,7 @@ enum isa_feature
 #define ISA_ARMv8_2a	ISA_ARMv8_1a, isa_bit_ARMv8_2
 #define ISA_ARMv8m_base ISA_ARMv6m, isa_bit_ARMv8, isa_bit_cmse, isa_bit_tdiv
 #define ISA_ARMv8m_main ISA_ARMv7m, isa_bit_ARMv8, isa_bit_cmse
+#define ISA_ARMv8r	ISA_ARMv8a
 
 /* List of all FPU bits to strip out if -mfpu is used to override the
default.  isa_bit_fp16 is deliberately missing from this list.  */
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index cbcd85d9906d1fc797ab33b3d61969f32b9cc566..7bab5de5a39e9192c97851929b83175648158cdf 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -461,10 +461,16 @@ EnumValue
 Enum(arm_arch) String(armv8-m.main+dsp) Value(33)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(34)
+Enum(arm_arch) String(armv8-r) Value(34)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(35)
+Enum(arm_arch) String(armv8-r+crc) Value(35)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(36)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(37)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/config/arm/arm.h b/gcc/config/ar

[PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor

2017-06-29 Thread Thomas Preudhomme

Hi,

This patch adds support for the ARM Cortex-R52 processor rencently
announced.

[1] https://developer.arm.com/products/processors/cortex-r/cortex-r52

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/arm-cpus.in (cortex-r52): Add new entry.
* config/arm/arm-cpu.h: Regenerate.
* config/arm/arm-cpu-cdata.h: Regenerate.
* config/arm/arm-cpu-data.h: Regenerate.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARM Cortex-R52.
* config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
Cortex-R52.
* doc/invoke.texi: Mention -mtune=cortex-r52.

Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52.

Is this ok for stage1?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
index 0a122f5febaaceeeb5a405cb5a64e1edd9b044f3..043b5b2db09146b5686a5fe602f907164f9d84c5 100644
--- a/gcc/config/arm/arm-cpu-cdata.h
+++ b/gcc/config/arm/arm-cpu-cdata.h
@@ -803,6 +803,13 @@ static const struct arm_arch_core_flag arm_arch_core_flags[] =
 },
   },
   {
+"cortex-r52",
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+  },
+  {
 "armv2",
 {
   ISA_ARMv2,isa_bit_mode26,
diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
index 48c1d88032c1c5dc7c6cba71511f79fe9f2533ea..0677132382fad2f1baf1fbdf5c0b03fe32f752e2 100644
--- a/gcc/config/arm/arm-cpu-data.h
+++ b/gcc/config/arm/arm-cpu-data.h
@@ -1132,6 +1132,16 @@ static const struct processors all_cores[] =
 },
 &arm_v7m_tune
   },
+  {
+"cortex-r52", TARGET_CPU_cortexr52,
+(TF_LDSCHED),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+&arm_cortex_tune
+  },
   {NULL, TARGET_CPU_arm_none, 0, NULL, BASE_ARCH_0, {isa_nobit}, NULL}
 };
 
diff --git a/gcc/config/arm/arm-cpu.h b/gcc/config/arm/arm-cpu.h
index cd282db02f56f4416ff82eb3d8d569cd99fb0d41..4d6ea61d07dc98540f0f75679d8ef6f7eafc10bb 100644
--- a/gcc/config/arm/arm-cpu.h
+++ b/gcc/config/arm/arm-cpu.h
@@ -132,6 +132,7 @@ enum processor_type
   TARGET_CPU_cortexa73cortexa53,
   TARGET_CPU_cortexm23,
   TARGET_CPU_cortexm33,
+  TARGET_CPU_cortexr52,
   TARGET_CPU_arm_none
 };
 
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index be1f0ca4e38ae76683b77d8c3b79a066e62325d7..139aa561d3f918655978e44b5bcb6c0b50747a08 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1104,6 +1104,16 @@ begin cpu cortex-m33
  costs v7m
 end cpu cortex-m33
 
+
+# V8 R-profile implementations.
+begin cpu cortex-r52
+ cname cortexr52
+ tune flags LDSCHED
+ architecture armv8-r+crc
+ costs cortex
+end cpu cortex-r52
+
+
 # FPU entries
 # format:
 # begin fpu 
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 7bab5de5a39e9192c97851929b83175648158cdf..ccd1a7661fb97938ddea7670eebe1a0f48efb929 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -354,6 +354,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23)
 EnumValue
 Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33)
 
+EnumValue
+Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52)
+
 Enum
 Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index c394ac805c7577113ed72b31a06ff93dc7f5f490..c3dca1cd4833afd67e56a276ef0e9c1e17f4fae4 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -100,7 +100,7 @@
|march=armv8-m.main	\
|march=armv8-m.main+dsp|mcpu=cortex-m33		\
|march-armv8-r	\
-   |march-armv8-r+crc	\
+   |march-armv8-r+crc|mcpu=cortex-r52			\
:%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -142,7 +142,7 @@
|march=armv8-m.main	\
|march=armv8-m.main+dsp|mcpu=cortex-m33		\
|march=armv8-r	\
-   |march=armv8-r+crc	\
+   |march=armv8-r+crc|mcpu=cortex-r52			\
:%{!r:--be8}}}"
 #endif
 
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 29873d57a1e45fd989f6ff01dd4a2ae7320d93bb..00f8128e6911a79f83da03bf731c1cc9127c7285 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -56,6 +56,7 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xc15", "armv7-r", "cortex-r5"},
 {"0xc17", "armv7-r", "cortex-r7"},
 {"0xc18", "armv7-r", "cortex-r8"},
+{"0xd13", "armv8-r+crc", "cortex-r52"},
 {"0xc20", "armv6-m", "cortex-m0"},
 {"0xc21", "armv6-m", "cortex-m1"},
 {"0xc23", "armv7-m", "cortex-m3"},
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9ea580626749dc9d27bb72d56bbbef6a474a5055..a871837426485dd6a87c541386964bf85dfafde7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15212,6 +15212,7 @@ Permissible names are: @samp{arm2}, @samp{arm250},
 @sa

Re: [PATCH] make find_taken_edge handle case with just default

2017-06-29 Thread Richard Biener
On Thu, 29 Jun 2017, Peter Bergner wrote:

> On 6/29/17 4:03 AM, Richard Biener wrote:
> > 
> > This refactors things a bit to make CFG cleanup handle switches with
> > just a default label.  If we make sure to cleanup the CFG after
> > group_case_labels removes cases with just __builtin_unreachable ()
> > inside then this fixes the ICE seen in PR81994 as well.
> > 
> > I wonder if find_taken_edge should generally handle successors
> > with __builtin_unreachable () -- OTOH that would get rid of those
> > too early I guess.
> 
> Should we offer an early out of group_case_labels_stmt() for the
> fairly common case of new_size == old_size?  There's no reason to
> execute the compress labels loop if we didn't combine any of the
> labels.

We can also merge both loops, counting new_size upwards, storing
to label new_size if new_size != i ...

> Peter
> 
> 
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c(revision 249783)
> +++ gcc/tree-cfg.c(working copy)
> @@ -1747,6 +1747,11 @@ group_case_labels_stmt (gswitch *stmt)
>   }
>  }
> 
> +  gcc_assert (new_size <= old_size);
> +
> +  if (new_size == old_size)
> +return false;
> +
>/* Compress the case labels in the label vector, and adjust the
>   length of the vector.  */
>for (i = 0, j = 0; i < new_size; i++)
> @@ -1757,9 +1762,8 @@ group_case_labels_stmt (gswitch *stmt)
>  gimple_switch_label (stmt, j++));
>  }
> 
> -  gcc_assert (new_size <= old_size);
>gimple_switch_set_num_labels (stmt, new_size);
> -  return new_size < old_size;
> +  return true;
>  }
> 
>  /* Look for blocks ending in a multiway branch (a GIMPLE_SWITCH),
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] fold_builtin_FUNCTION

2017-06-29 Thread Nathan Sidwell

I notied the __builtin_FUNCTION () builtin was using raw DECL_NAME,
which for C++ dtors and conversion operators gives non-useful names.
(no ~ and 'operator N' for N= some int).

This patch fixes its folder to use the lang hook that provides a 
printable name.  I did contemplate passing 1 (add scope) to the hook, 
but decided against changing that.


Applied as obvious.

nathan

--
Nathan Sidwell
diff -p -U2 -r ./builtins.c /data/users/nathans/modules/src/gcc/builtins.c
--- ./builtins.c	2017-06-26 07:48:42.206366313 -0700
+++ /data/users/nathans/modules/src/gcc/builtins.c	2017-06-28 13:52:53.026518086 -0700
@@ -8740,11 +8740,10 @@ static inline tree
 fold_builtin_FUNCTION ()
 {
+  const char *name = "";
+
   if (current_function_decl)
-{
-  const char *name = IDENTIFIER_POINTER (DECL_NAME (current_function_decl));
-  return build_string_literal (strlen (name) + 1, name);
-}
+name= lang_hooks.decl_printable_name (current_function_decl, 0);
 
-  return build_string_literal (1, "");
+  return build_string_literal (strlen (name) + 1, name);
 }
 
diff -p -U2 -r ./cp/call.c /data/users/nathans/modules/src/gcc/cp/call.c
--- ./cp/call.c	2017-06-27 08:34:25.902966687 -0700
+++ /data/users/nathans/modules/src/gcc/cp/call.c	2017-06-28 12:05:23.188713566 -0700
@@ -232,7 +232,6 @@ check_dtor_name (tree basetype, tree nam
 {
   if ((MAYBE_CLASS_TYPE_P (basetype)
-	   && name == constructor_name (basetype))
-	  || (TREE_CODE (basetype) == ENUMERAL_TYPE
-	  && name == TYPE_IDENTIFIER (basetype)))
+	   || TREE_CODE (basetype) == ENUMERAL_TYPE)
+	  && name == constructor_name (basetype))
 	return true;
   else
@@ -8879,5 +8878,5 @@ static char *
 name_as_c_string (tree name, tree type, bool *free_p)
 {
-  char *pretty_name;
+  const char *pretty_name;
 
   /* Assume that we will not allocate memory.  */
@@ -8887,5 +8886,5 @@ name_as_c_string (tree name, tree type,
 {
   pretty_name
-	= CONST_CAST (char *, identifier_to_locale (IDENTIFIER_POINTER (constructor_name (type;
+	= identifier_to_locale (IDENTIFIER_POINTER (constructor_name (type)));
   /* For a destructor, add the '~'.  */
   if (IDENTIFIER_DTOR_P (name))
@@ -8906,7 +8905,7 @@ name_as_c_string (tree name, tree type,
 }
   else
-pretty_name = CONST_CAST (char *, identifier_to_locale (IDENTIFIER_POINTER (name)));
+pretty_name = identifier_to_locale (IDENTIFIER_POINTER (name));
 
-  return pretty_name;
+  return CONST_CAST (char *, pretty_name);
 }
 
@@ -9119,9 +9115,8 @@ build_new_method_call_1 (tree instance,
 }
   else
-{
-  add_candidates (fns, first_mem_arg, user_args, optype,
-		  explicit_targs, template_only, conversion_path,
-		  access_binfo, flags, &candidates, complain);
-}
+add_candidates (fns, first_mem_arg, user_args, optype,
+		explicit_targs, template_only, conversion_path,
+		access_binfo, flags, &candidates, complain);
+
   any_viable_p = false;
   candidates = splice_viable (candidates, false, &any_viable_p);
@@ -9141,8 +9136,9 @@ build_new_method_call_1 (tree instance,
 	  tree arglist = build_tree_list_vec (user_args);
 	  tree errname = name;
+	  bool twiddle = false;
 	  if (IDENTIFIER_CDTOR_P (errname))
 		{
-		  tree fn = DECL_ORIGIN (OVL_FIRST (fns));
-		  errname = DECL_NAME (fn);
+		  twiddle = IDENTIFIER_DTOR_P (errname);
+		  errname = constructor_name (basetype);
 		}
 	  if (explicit_targs)
@@ -9150,6 +9146,6 @@ build_new_method_call_1 (tree instance,
 	  if (skip_first_for_error)
 		arglist = TREE_CHAIN (arglist);
-	  error ("no matching function for call to %<%T::%E(%A)%#V%>",
-		 basetype, errname, arglist,
+	  error ("no matching function for call to %<%T::%s%E(%A)%#V%>",
+		 basetype, &"~"[!twiddle], errname, arglist,
 		 TREE_TYPE (instance));
 	}
diff -p -U2 -r ./cp/class.c /data/users/nathans/modules/src/gcc/cp/class.c
--- ./cp/class.c	2017-06-27 08:33:21.756909868 -0700
+++ /data/users/nathans/modules/src/gcc/cp/class.c	2017-06-28 15:50:01.612979147 -0700
@@ -6296,6 +6240,5 @@ include_empty_classes (record_layout_inf
  offset.  However, now we need to make sure that RLI is big enough
  to reflect the entire class.  */
-  eoc = end_of_class (rli->t,
-		  CLASSTYPE_AS_BASE (rli->t) != NULL_TREE);
+  eoc = end_of_class (rli->t, CLASSTYPE_AS_BASE (rli->t) != NULL_TREE);
   rli_size = rli_size_unit_so_far (rli);
   if (TREE_CODE (rli_size) == INTEGER_CST
@@ -7445,5 +7342,5 @@ finish_struct (tree t, tree attributes)
 	  {
 	tree fn = strip_using_decl (x);
-	if (is_overloaded_fn (fn))
+  	if (OVL_P (fn))
 	  for (lkp_iterator iter (fn); iter; ++iter)
 		add_method (t, *iter, true);
@@ -8506,5 +8403,4 @@ get_vfield_name (tree type)
 {
   tree binfo, base_binfo;
-  char *buf;
 
   for (binfo = TYPE_BINFO (type);
@@ -8520,8 +8416,8 @@ get_vfield_name (tree type)
 
   type = BINFO_TYPE (binfo);
-  buf = (char *) alloca (sizeof (VFIELD_NAME_FORMAT)
-			 + TYPE_

Re: [PATCH] make find_taken_edge handle case with just default

2017-06-29 Thread Peter Bergner
On 6/29/17 8:58 AM, Richard Biener wrote:
> On Thu, 29 Jun 2017, Peter Bergner wrote:
>> Should we offer an early out of group_case_labels_stmt() for the
>> fairly common case of new_size == old_size?  There's no reason to
>> execute the compress labels loop if we didn't combine any of the
>> labels.
> 
> We can also merge both loops, counting new_size upwards, storing
> to label new_size if new_size != i ...

Yeah.  I can implement that if you like...unless you want to...
or already have. :-)

Peter



[C++ PATCH] whitespace cleanups

2017-06-29 Thread Nathan Sidwell

A bunch of minor reformatting and cleanups I'd collected.

Applied to trunk.

nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	Whitespace cleanups.
	* call.c (name_as_c_string): Move CONST_CAST to return.
	(build_new_method_call_1): Remove unneeded bracing.
	* class.c (include_empty_classes): Unbreak line.
	* constraint.cc (tsubst_check_constraint): Add space.
	* cp-tree.h (lang_decl_ns): Add comment.
	(PTRMEM_CST_MEMBER): Break line.
	* decl.c (grokfndecl): Add blank lines. Unbreak some others.
	(grokdeclarator): Remove lines, move declaration to first use.
	* decl2.c (decl_needed_p): Fix indentation.
	(c_parse_final_cleanups): Remove blank line.
	* method.c (implicitly_declare_fn): Move declaration to first use.
	* search.c (current_scope): Add blank lines.

Index: call.c
===
--- call.c	(revision 249779)
+++ call.c	(working copy)
@@ -8878,7 +8878,7 @@ build_special_member_call (tree instance
 static char *
 name_as_c_string (tree name, tree type, bool *free_p)
 {
-  char *pretty_name;
+  const char *pretty_name;
 
   /* Assume that we will not allocate memory.  */
   *free_p = false;
@@ -8886,7 +8886,7 @@ name_as_c_string (tree name, tree type,
   if (IDENTIFIER_CDTOR_P (name))
 {
   pretty_name
-	= CONST_CAST (char *, identifier_to_locale (IDENTIFIER_POINTER (constructor_name (type;
+	= identifier_to_locale (IDENTIFIER_POINTER (constructor_name (type)));
   /* For a destructor, add the '~'.  */
   if (IDENTIFIER_DTOR_P (name))
 	{
@@ -8905,9 +8905,9 @@ name_as_c_string (tree name, tree type,
   *free_p = true;
 }
   else
-pretty_name = CONST_CAST (char *, identifier_to_locale (IDENTIFIER_POINTER (name)));
+pretty_name = identifier_to_locale (IDENTIFIER_POINTER (name));
 
-  return pretty_name;
+  return CONST_CAST (char *, pretty_name);
 }
 
 /* Build a call to "INSTANCE.FN (ARGS)".  If FN_P is non-NULL, it will
@@ -9118,11 +9118,10 @@ build_new_method_call_1 (tree instance,
 			   &candidates, complain);
 }
   else
-{
-  add_candidates (fns, first_mem_arg, user_args, optype,
-		  explicit_targs, template_only, conversion_path,
-		  access_binfo, flags, &candidates, complain);
-}
+add_candidates (fns, first_mem_arg, user_args, optype,
+		explicit_targs, template_only, conversion_path,
+		access_binfo, flags, &candidates, complain);
+
   any_viable_p = false;
   candidates = splice_viable (candidates, false, &any_viable_p);
 
Index: class.c
===
--- class.c	(revision 249779)
+++ class.c	(working copy)
@@ -6295,8 +6295,7 @@ include_empty_classes (record_layout_inf
  because we are willing to overlay multiple bases at the same
  offset.  However, now we need to make sure that RLI is big enough
  to reflect the entire class.  */
-  eoc = end_of_class (rli->t,
-		  CLASSTYPE_AS_BASE (rli->t) != NULL_TREE);
+  eoc = end_of_class (rli->t, CLASSTYPE_AS_BASE (rli->t) != NULL_TREE);
   rli_size = rli_size_unit_so_far (rli);
   if (TREE_CODE (rli_size) == INTEGER_CST
   && tree_int_cst_lt (rli_size, eoc))
Index: constraint.cc
===
--- constraint.cc	(revision 249779)
+++ constraint.cc	(working copy)
@@ -1580,7 +1580,7 @@ tsubst_check_constraint (tree t, tree ar
 
   /* Substitute through by building an template-id expression
  and then substituting into that. */
-  tree expr = build_nt(TEMPLATE_ID_EXPR, tmpl, targs);
+  tree expr = build_nt (TEMPLATE_ID_EXPR, tmpl, targs);
   ++processing_template_decl;
   tree result = tsubst_expr (expr, args, complain, in_decl, false);
   --processing_template_decl;
Index: cp-tree.h
===
--- cp-tree.h	(revision 249779)
+++ cp-tree.h	(working copy)
@@ -2556,7 +2556,9 @@ struct GTY(()) lang_decl_ns {
   vec *usings;
   vec *inlinees;
 
-  /* Map from IDENTIFIER nodes to DECLS.  */
+  /* Map from IDENTIFIER nodes to DECLS.  It'd be nice to have this
+ inline, but as the hash_map has a dtor, we can't then put this
+ struct into a union (until moving to c++11).  */
   hash_map *bindings;
 };
 
@@ -4312,7 +4314,8 @@ more_aggr_init_expr_args_p (const aggr_i
 
 /* For a pointer-to-member constant `X::Y' this is the _DECL for
`Y'.  */
-#define PTRMEM_CST_MEMBER(NODE) (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member)
+#define PTRMEM_CST_MEMBER(NODE) \
+  (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member)
 
 /* The expression in question for a TYPEOF_TYPE.  */
 #define TYPEOF_TYPE_EXPR(NODE) (TYPE_VALUES_RAW (TYPEOF_TYPE_CHECK (NODE)))
Index: decl.c
===
--- decl.c	(revision 249779)
+++ decl.c	(working copy)
@@ -8502,9 +8502,11 @@ grokfndecl (tree ctype,
   /* Allocate space to hold the vptr bit if needed.  */
   SET_DECL_ALIGN (decl, MINIMUM_METHOD_BOUNDARY);

[C++ PATCH] parser indentation cleanup

2017-06-29 Thread Nathan Sidwell

This is almost a reformatting diff, converting

if (cond) {
  if (a) A
  else B
} else C

into

if (!cond) C;
else if (a) A
else B

but it also removes an unnecessary constructor_name assignment:

  if ( ... constructor_name_p (unqualified_name, class_type))
  {
unqualified_name = constructor_name (class_type);
...

We've just checked it already is the constructor name.

applied to trunk.

nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	* parser.c (cp_parser_direct_declarator): Reorder if to avoid
	indentation. Remove unnecessary assignment of constructor name.

Index: parser.c
===
--- parser.c	(revision 249779)
+++ parser.c	(working copy)
@@ -20106,26 +20106,8 @@ cp_parser_direct_declarator (cp_parser*
 		if (TREE_CODE (unqualified_name) == TYPE_DECL)
 		  {
 		tree name_type = TREE_TYPE (unqualified_name);
-		if (class_type && same_type_p (name_type, class_type))
-		  {
-			if (qualifying_scope
-			&& CLASSTYPE_USE_TEMPLATE (name_type))
-			  {
-			error_at (declarator_id_start_token->location,
-  "invalid use of constructor as a template");
-			inform (declarator_id_start_token->location,
-"use %<%T::%D%> instead of %<%T::%D%> to "
-"name the constructor in a qualified name",
-class_type,
-DECL_NAME (TYPE_TI_TEMPLATE (class_type)),
-class_type, name_type);
-			declarator = cp_error_declarator;
-			break;
-			  }
-			else
-			  unqualified_name = constructor_name (class_type);
-		  }
-		else
+
+		if (!class_type || !same_type_p (name_type, class_type))
 		  {
 			/* We do not attempt to print the declarator
 			   here because we do not have enough
@@ -20135,6 +20117,21 @@ cp_parser_direct_declarator (cp_parser*
 			declarator = cp_error_declarator;
 			break;
 		  }
+		else if (qualifying_scope
+			 && CLASSTYPE_USE_TEMPLATE (name_type))
+		  {
+			error_at (declarator_id_start_token->location,
+  "invalid use of constructor as a template");
+			inform (declarator_id_start_token->location,
+"use %<%T::%D%> instead of %<%T::%D%> to "
+"name the constructor in a qualified name",
+class_type,
+DECL_NAME (TYPE_TI_TEMPLATE (class_type)),
+class_type, name_type);
+			declarator = cp_error_declarator;
+			break;
+		  }
+		unqualified_name = constructor_name (class_type);
 		  }
 
 		if (class_type)
@@ -20164,14 +20161,10 @@ cp_parser_direct_declarator (cp_parser*
 struct S {
   friend void N::S();
 };  */
-			 && !(friend_p
-  && class_type != qualifying_scope)
+			 && (!friend_p || class_type == qualifying_scope)
 			 && constructor_name_p (unqualified_name,
 		class_type))
-		  {
-			unqualified_name = constructor_name (class_type);
-			sfk = sfk_constructor;
-		  }
+		  sfk = sfk_constructor;
 		else if (is_overloaded_fn (unqualified_name)
 			 && DECL_CONSTRUCTOR_P (get_first_fn
 		(unqualified_name)))


Re: [PATCH] make find_taken_edge handle case with just default

2017-06-29 Thread Richard Biener
On Thu, 29 Jun 2017, Peter Bergner wrote:

> On 6/29/17 8:58 AM, Richard Biener wrote:
> > On Thu, 29 Jun 2017, Peter Bergner wrote:
> >> Should we offer an early out of group_case_labels_stmt() for the
> >> fairly common case of new_size == old_size?  There's no reason to
> >> execute the compress labels loop if we didn't combine any of the
> >> labels.
> > 
> > We can also merge both loops, counting new_size upwards, storing
> > to label new_size if new_size != i ...
> 
> Yeah.  I can implement that if you like...unless you want to...
> or already have. :-)

Go ahead!

Richard.


Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R

2017-06-29 Thread Christophe Lyon
On 29 June 2017 at 15:52, Thomas Preudhomme
 wrote:
> Hi,
>
> This patch series adds support for the ARMv8-R architecture[1] and ARM
> Cortex-R52[2] to GCC. The patch series consist of the following patches:

Hi Thomas,

I think you need to rebase your patch because Richard's recent series
changed the contents
of arm-cpu-data.h and arm-cpu-cdata.h.

Why do you link armv8-r architecture definition to cortex-r4?

>
> [ 1/3] Add missing MIDR information for ARM Cortex-R7 and Cortex-R8
> processor
> [ 2/3] Add support for ARMv8-R architecture
> [ 3/3] Add support for ARM Cortex-R52
>
> [1]
> https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile
> [2] https://developer.arm.com/products/processors/cortex-r/cortex-r52


Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R

2017-06-29 Thread Thomas Preudhomme

On 29/06/17 15:34, Christophe Lyon wrote:

On 29 June 2017 at 15:52, Thomas Preudhomme
 wrote:

Hi,

This patch series adds support for the ARMv8-R architecture[1] and ARM
Cortex-R52[2] to GCC. The patch series consist of the following patches:


Hi Thomas,

I think you need to rebase your patch because Richard's recent series
changed the contents
of arm-cpu-data.h and arm-cpu-cdata.h.


Err yes indeed. Thanks!



Why do you link armv8-r architecture definition to cortex-r4?


I understand, where did I do such a thing?

Best regards,

Thomas


[C++ PATCH] constructor_name

2017-06-29 Thread Nathan Sidwell
This cleans up constructor_name use.  I use constructor_name when 
checking for ctor/dtor type things and DECL_NAME (TYPE_NAME (X)) when 
dealing with a class's self reference.  IMHO these are conceptually 
different, even if we might end up with the same identifier in the end.


1) check_dtor_name can use constructor_name for enums too.

2) build_new_method_call_1 should do so for cdtors, and also print a ~ 
for the destructor use[*]


3) build_self_reference and push_clas_level_binding_1 should use 
DECL_NAME ...


4) constructor_name can grab DECL_NAME (TYPE_NAME (X)) to get the name. 
And constructor_name_p doesn't need to be so complex


applied to trunk.

[*] perhaps `&"~"[!twiddle]' is too cute a trick?

nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	* call.c (check_dtor_name): Use constructor_name for enums too.
	(build_new_method_call_1): Use constructor_name for cdtors and
	show ~ for dtor.
	* class.c (build_self_reference): Use TYPE_NAME to get name of
	self reference.
	* name-lookup (constructor_name): Use DECL_NAME directly.
	(constructor_name_p): Reimplement.
	(push_class_level_binding_1): Use TYPE_NAME directly.

Index: call.c
===
--- call.c	(revision 249786)
+++ call.c	(working copy)
@@ -231,9 +231,8 @@ check_dtor_name (tree basetype, tree nam
   else if (identifier_p (name))
 {
   if ((MAYBE_CLASS_TYPE_P (basetype)
-	   && name == constructor_name (basetype))
-	  || (TREE_CODE (basetype) == ENUMERAL_TYPE
-	  && name == TYPE_IDENTIFIER (basetype)))
+	   || TREE_CODE (basetype) == ENUMERAL_TYPE)
+	  && name == constructor_name (basetype))
 	return true;
   else
 	name = get_type_value (name);
@@ -9139,17 +9138,18 @@ build_new_method_call_1 (tree instance,
 	{
 	  tree arglist = build_tree_list_vec (user_args);
 	  tree errname = name;
+	  bool twiddle = false;
 	  if (IDENTIFIER_CDTOR_P (errname))
 		{
-		  tree fn = DECL_ORIGIN (OVL_FIRST (fns));
-		  errname = DECL_NAME (fn);
+		  twiddle = IDENTIFIER_DTOR_P (errname);
+		  errname = constructor_name (basetype);
 		}
 	  if (explicit_targs)
 		errname = lookup_template_function (errname, explicit_targs);
 	  if (skip_first_for_error)
 		arglist = TREE_CHAIN (arglist);
-	  error ("no matching function for call to %<%T::%E(%A)%#V%>",
-		 basetype, errname, arglist,
+	  error ("no matching function for call to %<%T::%s%E(%A)%#V%>",
+		 basetype, &"~"[!twiddle], errname, arglist,
 		 TREE_TYPE (instance));
 	}
 	  print_z_candidates (location_of (name), candidates);
Index: class.c
===
--- class.c	(revision 249788)
+++ class.c	(working copy)
@@ -8550,9 +8550,8 @@ print_class_statistics (void)
 void
 build_self_reference (void)
 {
-  tree name = constructor_name (current_class_type);
+  tree name = DECL_NAME (TYPE_NAME (current_class_type));
   tree value = build_lang_decl (TYPE_DECL, name, current_class_type);
-  tree saved_cas;
 
   DECL_NONLOCAL (value) = 1;
   DECL_CONTEXT (value) = current_class_type;
@@ -8563,7 +8562,7 @@ build_self_reference (void)
   if (processing_template_decl)
 value = push_template_decl (value);
 
-  saved_cas = current_access_specifier;
+  tree saved_cas = current_access_specifier;
   current_access_specifier = access_public_node;
   finish_member_declaration (value);
   current_access_specifier = saved_cas;
Index: name-lookup.c
===
--- name-lookup.c	(revision 249779)
+++ name-lookup.c	(working copy)
@@ -3188,7 +3188,9 @@ set_identifier_type_value (tree id, tree
 tree
 constructor_name (tree type)
 {
-  return TYPE_IDENTIFIER (TYPE_MAIN_VARIANT (type));
+  tree decl = TYPE_NAME (TYPE_MAIN_VARIANT (type));
+
+  return decl ? DECL_NAME (decl) : NULL_TREE;
 }
 
 /* Returns TRUE if NAME is the name for the constructor for TYPE,
@@ -3199,19 +3201,12 @@ constructor_name_p (tree name, tree type
 {
   gcc_assert (MAYBE_CLASS_TYPE_P (type));
 
-  if (!name)
-return false;
-
-  if (!identifier_p (name))
-return false;
-
   /* These don't have names.  */
   if (TREE_CODE (type) == DECLTYPE_TYPE
   || TREE_CODE (type) == TYPEOF_TYPE)
 return false;
 
-  tree ctor_name = constructor_name (type);
-  if (name == ctor_name)
+  if (name && name == constructor_name (type))
 return true;
 
   return false;
@@ -3962,7 +3957,7 @@ push_class_level_binding_1 (tree name, t
/* A data member of an anonymous union.  */
|| (TREE_CODE (x) == FIELD_DECL
 	   && DECL_CONTEXT (x) != current_class_type))
-  && DECL_NAME (x) == constructor_name (current_class_type))
+  && DECL_NAME (x) == DECL_NAME (TYPE_NAME (current_class_type)))
 {
   tree scope = context_for_name_lookup (x);
   if (TYPE_P (scope) && same_type_p (scope, current_class_type))


[C++ PATCH] maybe_add_lang_type_raw simplification

2017-06-29 Thread Nathan Sidwell
I found maybe_add_lang_type_raw benefitted from simply returning false 
early, rather than use a bool flag.


applied to trunk.

nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	* lex.c (maybe_add_lang_type_raw): Exit early, rather than use a
	flag.

Index: lex.c
===
--- lex.c	(revision 249779)
+++ lex.c	(working copy)
@@ -741,21 +741,21 @@ copy_type (tree type MEM_STAT_DECL)
 static bool
 maybe_add_lang_type_raw (tree t)
 {
-  bool add = (RECORD_OR_UNION_CODE_P (TREE_CODE (t))
-	  || TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM);
-  if (add)
-{
-  TYPE_LANG_SPECIFIC (t)
-	= (struct lang_type *) (ggc_internal_cleared_alloc
-(sizeof (struct lang_type)));
+  if (!(RECORD_OR_UNION_CODE_P (TREE_CODE (t))
+	|| TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM))
+return false;
+  
+  TYPE_LANG_SPECIFIC (t)
+= (struct lang_type *) (ggc_internal_cleared_alloc
+			(sizeof (struct lang_type)));
 
-  if (GATHER_STATISTICS)
-	{
-	  tree_node_counts[(int)lang_type] += 1;
-	  tree_node_sizes[(int)lang_type] += sizeof (struct lang_type);
-	}
+  if (GATHER_STATISTICS)
+{
+  tree_node_counts[(int)lang_type] += 1;
+  tree_node_sizes[(int)lang_type] += sizeof (struct lang_type);
 }
-  return add;
+
+  return true;
 }
 
 tree


[C++ PATCH] special identifiers

2017-06-29 Thread Nathan Sidwell
We currently #define a bunch of names in cp-tree.h, never override them, 
and then use them exactly once during initialization.  I think these 
were hold-overs from when changing from the old to new ABI.  Anyway, I 
see no reason for the indirection.


I also renamed the cdtor variant names to be consistent with the generic 
name.  IIRC '__ct' and '__dt' were from the old-abi cdtor names.  These 
never escape the compiler.


I think that's it for today's patching run.

nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	* cp-tree.h (THIS_NAME, IN_CHARGE_NAME, VTBL_PTR_TYPE,
	VTABLE_DELTA_NAME, VTABLE_PFN_NAME): Delete.
	* decl.c (initialize_predefined_identifiers): Name cdtor special
	names consistently.  Use literals for above deleted defines.
	(cxx_init_decl_processing): Use literal for vtbl_ptr_type name,

Index: cp-tree.h
===
--- cp-tree.h	(revision 249788)
+++ cp-tree.h	(working copy)
@@ -5195,14 +5195,6 @@ extern GTY(()) vec *keyed_c
 #endif	/* NO_DOLLAR_IN_LABEL */
 #endif	/* NO_DOT_IN_LABEL */
 
-#define THIS_NAME "this"
-
-#define IN_CHARGE_NAME "__in_chrg"
-
-#define VTBL_PTR_TYPE		"__vtbl_ptr_type"
-#define VTABLE_DELTA_NAME	"__delta"
-#define VTABLE_PFN_NAME		"__pfn"
-
 #define LAMBDANAME_PREFIX "__lambda"
 #define LAMBDANAME_FORMAT LAMBDANAME_PREFIX "%d"
 
Index: decl.c
===
--- decl.c	(revision 249788)
+++ decl.c	(working copy)
@@ -3955,16 +3955,16 @@ initialize_predefined_identifiers (void)
 /* Some of these names have a trailing space so that it is
impossible for them to conflict with names written by users.  */
 {"__ct ", &ctor_identifier, cik_ctor},
-{"__base_ctor ", &base_ctor_identifier, cik_ctor},
-{"__comp_ctor ", &complete_ctor_identifier, cik_ctor},
+{"__ct_base ", &base_ctor_identifier, cik_ctor},
+{"__ct_comp ", &complete_ctor_identifier, cik_ctor},
 {"__dt ", &dtor_identifier, cik_dtor},
-{"__comp_dtor ", &complete_dtor_identifier, cik_dtor},
-{"__base_dtor ", &base_dtor_identifier, cik_dtor},
-{"__deleting_dtor ", &deleting_dtor_identifier, cik_dtor},
-{IN_CHARGE_NAME, &in_charge_identifier, cik_normal},
-{THIS_NAME, &this_identifier, cik_normal},
-{VTABLE_DELTA_NAME, &delta_identifier, cik_normal},
-{VTABLE_PFN_NAME, &pfn_identifier, cik_normal},
+{"__dt_base ", &base_dtor_identifier, cik_dtor},
+{"__dt_comp ", &complete_dtor_identifier, cik_dtor},
+{"__dt_del ", &deleting_dtor_identifier, cik_dtor},
+{"__in_chrg", &in_charge_identifier, cik_normal},
+{"this", &this_identifier, cik_normal},
+{"__delta", &delta_identifier, cik_normal},
+{"__pfn", &pfn_identifier, cik_normal},
 {"_vptr", &vptr_identifier, cik_normal},
 {"__vtt_parm", &vtt_parm_identifier, cik_normal},
 {"::", &global_identifier, cik_normal},
@@ -4094,7 +4094,7 @@ cxx_init_decl_processing (void)
 
 vtable_entry_type = build_pointer_type (vfunc_type);
   }
-  record_builtin_type (RID_MAX, VTBL_PTR_TYPE, vtable_entry_type);
+  record_builtin_type (RID_MAX, "__vtbl_ptr_type", vtable_entry_type);
 
   vtbl_type_node
 = build_cplus_array_type (vtable_entry_type, NULL_TREE);


Re: [PING][PATCH] Move the check for any_condjump_p from sched-deps to target macros

2017-06-29 Thread Jeff Law
On 06/26/2017 10:19 PM, Hurugalawadi, Naveen wrote:
> Hi Jeff,
> 
> Thanks for the review and your approval for final patch.
> Sorry, It was a long weekend and hence could not revert to your
> comments earlier.
> 
>>> You need a ChangeLog entry, but I think that's it.  Can you
>>> please repost with a ChangeLog entry for final approval?
> 
> Please find the final patch and ChangeLog entry updated as required.
> Please review the same and let me know if its okay to commit?
> 
> Thanks,
> Naveen
> 
> 2017-06-27  Naveen H.S  
> 
>   * config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Push the
>   check for CC usage into AARCH64_FUSE_CMP_BRANCH.
>   * config/i386/i386.c (ix86_macro_fusion_pair_p): Push the check for
>   CC usage from generic code to here.
>   * sched-deps.c (sched_macro_fuse_insns): Move the condition for
>   CC usage into the target macros.
> 
OK for the trunk.  Thanks for your patience.

Jeff


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Wilco Dijkstra
Richard Biener wrote:

> int f (int m, int c)
> {
>  return (m & 1) * c;
> }

This case (integer[0,1] rather than boolean input) should be transformed into c 
& -(m & 1).

Wilco


Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R

2017-06-29 Thread Christophe Lyon
On 29 June 2017 at 16:37, Thomas Preudhomme
 wrote:
> On 29/06/17 15:34, Christophe Lyon wrote:
>>
>> On 29 June 2017 at 15:52, Thomas Preudhomme
>>  wrote:
>>>
>>> Hi,
>>>
>>> This patch series adds support for the ARMv8-R architecture[1] and ARM
>>> Cortex-R52[2] to GCC. The patch series consist of the following patches:
>>
>>
>> Hi Thomas,
>>
>> I think you need to rebase your patch because Richard's recent series
>> changed the contents
>> of arm-cpu-data.h and arm-cpu-cdata.h.
>
>
> Err yes indeed. Thanks!
>
>>
>> Why do you link armv8-r architecture definition to cortex-r4?
>
>
> I understand, where did I do such a thing?
>

In patch #2 you have:
diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
index 
d6200f9bdc09a9d0c973853b0152a2800eaf2fe5..48c1d88032c1c5dc7c6cba71511f79fe9f2533ea
100644
--- a/gcc/config/arm/arm-cpu-data.h
+++ b/gcc/config/arm/arm-cpu-data.h
@@ -1478,6 +1478,26 @@ static const struct processors all_architectures[] =
 NULL
   },
   {
+"armv8-r", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,
+  isa_nobit
+},
+NULL
+  },
+  {
+"armv8-r+crc", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+NULL
+  },
+  {
 "iwmmxt", TARGET_CPU_iwmmxt,
 (TF_LDSCHED | TF_STRONG | TF_XSCALE),
 "5TE", BASE_ARCH_5TE,

Both entries point to TARGET_CPU_cortexr4. I guess that's because r52
is only defined in patch #3, but then why not update this in patch #3
are replace r4 with r52?

Not sure I'm very clear :-)

Thanks,

Christophe

> Best regards,
>
> Thomas


Re: [PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture

2017-06-29 Thread Thomas Preudhomme

Please ignore this patch. I'll respin the patch on a more recent GCC.

Best regards,

Thomas

On 29/06/17 14:55, Thomas Preudhomme wrote:

Hi,

This patch adds support for ARMv8-R architecture [1] which was recently
announced. User level instructions for ARMv8-R are the same as those in
ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same
features as ARMv8-A in ARM backend.

[1] 
https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile 



ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry.
 * config/arm/arm-cpu-cdata.h: Regenerate.
 * config/arm/arm-cpu-data.h: Regenerate.
 * config/arm/arm-isa.h (ISA_ARMv8r): Define macro.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R
 enumerator.
 * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and
 ARMv8-R with CRC extensions.
 * doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc
 options.  Document meaning of -march=armv8-r+rcr.

*** gcc/testsuite/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * lib/target-supports.exp: Generate
 check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r
 and check_effective_target_arm_arch_v8r_multilib.

*** libgcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R.

Tested by building an arm-none-eabi GCC cross-compiler targetting
ARMv8-R.

Is this ok for stage1?

Best regards,

Thomas


Re: [PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor

2017-06-29 Thread Thomas Preudhomme

Please ignore this patch. I'll respin the patch on a more recent GCC.

Best regards,

Thomas

On 29/06/17 14:56, Thomas Preudhomme wrote:

Hi,

This patch adds support for the ARM Cortex-R52 processor rencently
announced.

[1] https://developer.arm.com/products/processors/cortex-r/cortex-r52

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/arm-cpus.in (cortex-r52): Add new entry.
 * config/arm/arm-cpu.h: Regenerate.
 * config/arm/arm-cpu-cdata.h: Regenerate.
 * config/arm/arm-cpu-data.h: Regenerate.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARM Cortex-R52.
 * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
 Cortex-R52.
 * doc/invoke.texi: Mention -mtune=cortex-r52.

Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52.

Is this ok for stage1?

Best regards,

Thomas


libgo patch committed: Fixes for go tool for other build modes

2017-06-29 Thread Ian Lance Taylor
This libgo patch fixes the go tool when using
-buildmode={c-archive,c-shared,pie} with gccgo.  The tests are
misc/cgo tests that are not currently run but will be
run soon.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249713)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-66d14d95a5a453682fe387319c80bc4fc40d96ad
+8c4d6fd19f6d5dc9b41be384c60507f2236f05ec
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/build.go
===
--- libgo/go/cmd/go/build.go(revision 249205)
+++ libgo/go/cmd/go/build.go(working copy)
@@ -342,16 +342,20 @@ func buildModeInit() {
}
return p
}
-   switch platform {
-   case "darwin/arm", "darwin/arm64":
-   codegenArg = "-shared"
-   default:
-   switch goos {
-   case "dragonfly", "freebsd", "linux", "netbsd", 
"openbsd", "solaris":
-   // Use -shared so that the result is
-   // suitable for inclusion in a PIE or
-   // shared library.
+   if gccgo {
+   codegenArg = "-fPIC"
+   } else {
+   switch platform {
+   case "darwin/arm", "darwin/arm64":
codegenArg = "-shared"
+   default:
+   switch goos {
+   case "dragonfly", "freebsd", "linux", "netbsd", 
"openbsd", "solaris":
+   // Use -shared so that the result is
+   // suitable for inclusion in a PIE or
+   // shared library.
+   codegenArg = "-shared"
+   }
}
}
exeSuffix = ".a"
@@ -374,10 +378,14 @@ func buildModeInit() {
case "default":
switch platform {
case "android/arm", "android/arm64", "android/amd64", 
"android/386":
-   codegenArg = "-shared"
+   if !gccgo {
+   codegenArg = "-shared"
+   }
ldBuildmode = "pie"
case "darwin/arm", "darwin/arm64":
-   codegenArg = "-shared"
+   if !gccgo {
+   codegenArg = "-shared"
+   }
fallthrough
default:
ldBuildmode = "exe"
@@ -387,7 +395,7 @@ func buildModeInit() {
ldBuildmode = "exe"
case "pie":
if gccgo {
-   fatalf("-buildmode=pie not supported by gccgo")
+   codegenArg = "-fPIE"
} else {
switch platform {
case "linux/386", "linux/amd64", "linux/arm", 
"linux/arm64", "linux/ppc64le", "linux/s390x",
@@ -1053,7 +1061,7 @@ func (b *builder) action1(mode buildMode
// Install header for cgo in c-archive and c-shared modes.
if p.usesCgo() && (buildBuildmode == "c-archive" || 
buildBuildmode == "c-shared") {
hdrTarget := 
a.target[:len(a.target)-len(filepath.Ext(a.target))] + ".h"
-   if buildContext.Compiler == "gccgo" {
+   if buildContext.Compiler == "gccgo" && *buildO == "" {
// For the header file, remove the "lib"
// added by go/build, so we generate pkg.h
// rather than libpkg.h.
@@ -3025,6 +3033,8 @@ func (tools gccgoToolchain) link(b *buil
ldflags = append(ldflags, "-shared", "-nostdlib", 
"-Wl,--whole-archive", "-lgolibbegin", "-Wl,--no-whole-archive", "-lgo", 
"-lgcc_s", "-lgcc", "-lc", "-lgcc")
case "shared":
ldflags = append(ldflags, "-zdefs", "-shared", "-nostdlib", 
"-lgo", "-lgcc_s", "-lgcc", "-lc")
+   case "pie":
+   ldflags = append(ldflags, "-pie")
 
default:
fatalf("-buildmode=%s not supported for gccgo", buildmode)
@@ -3100,7 +3110,7 @@ func (tools gccgoToolchain) cc(b *builde
 // maybePIC adds -fPIC to the list of arguments if needed.
 func (tools gccgoToolchain) maybePIC(args []string) []string {
switch buildBuildmode {
-   case "c-shared", "shared", "plugin":
+   case "c-archive", "c-shared", "shared", "plugin":
   

Re: [PATCH-v3] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-29 Thread Daniel Cederman

Hello Eric,

Thank you for reviewing the patch.


Let's forget -mfix-gr712rc for now, -mfix-ut700 is enough I think.


I think it would be confusing to use the -mfix-ut700 flag when compiling 
for the GR712RC. Now when we are not using a generic name for the errata 
workaround we should at least have unique flags for the two major CPUs 
that are afflicted by this errata.



I'm not thrilled with this, it's undocumented, the other workaround don't have
it and I don't think that we really need it.


The B2BST errata workaround requires more changes to assembler routines 
commonly used by operating systems, such as for example register window 
handling, than what the UT699 workaround needed. It would be nice to 
have a way to only enable these modification when the -mfix- flag is 
used. The alternative would be to provide a define directly on the 
compiler command line in conjunction with -mfix flag. But if more 
changes are required later on it would be good to have the define more 
closely tied to the flag to minimize the number of changes to Makefiles 
and etc.


Would it be OK to add if we document it properly?

--
Daniel Cederman
Cobham Gaisler


libgo patch committed: Fix testcarchive to work with gccgo

2017-06-29 Thread Ian Lance Taylor
This patch fixes the misc/cgo/testcachive test to work with gccgo.
This test is not currently run, but it will be soon.  Bootstrapped on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249794)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8c4d6fd19f6d5dc9b41be384c60507f2236f05ec
+12c65e8310956eb3cc412d9dc9f9e88cbd928c8e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/misc/cgo/testcarchive/carchive_test.go
===
--- libgo/misc/cgo/testcarchive/carchive_test.go(revision 249674)
+++ libgo/misc/cgo/testcarchive/carchive_test.go(working copy)
@@ -12,6 +12,7 @@ import (
"os"
"os/exec"
"path/filepath"
+   "runtime"
"strings"
"syscall"
"testing"
@@ -81,13 +82,17 @@ func init() {
cc = append(cc, []string{"-framework", "CoreFoundation", 
"-framework", "Foundation"}...)
}
libgodir = GOOS + "_" + GOARCH
-   switch GOOS {
-   case "darwin":
-   if GOARCH == "arm" || GOARCH == "arm64" {
+   if runtime.Compiler == "gccgo" {
+   libgodir = "gccgo_" + libgodir + "_fPIC"
+   } else {
+   switch GOOS {
+   case "darwin":
+   if GOARCH == "arm" || GOARCH == "arm64" {
+   libgodir += "_shared"
+   }
+   case "dragonfly", "freebsd", "linux", "netbsd", "openbsd", 
"solaris":
libgodir += "_shared"
}
-   case "dragonfly", "freebsd", "linux", "netbsd", "openbsd", "solaris":
-   libgodir += "_shared"
}
cc = append(cc, "-I", filepath.Join("pkg", libgodir))
 
@@ -149,6 +154,9 @@ func testInstall(t *testing.T, exe, libg
} else {
ccArgs = append(ccArgs, "main_unix.c", libgoa)
}
+   if runtime.Compiler == "gccgo" {
+   ccArgs = append(ccArgs, "-lgo")
+   }
t.Log(ccArgs)
if out, err := exec.Command(ccArgs[0], ccArgs[1:]...).CombinedOutput(); 
err != nil {
t.Logf("%s", out)
@@ -157,7 +165,11 @@ func testInstall(t *testing.T, exe, libg
defer os.Remove(exe)
 
binArgs := append(cmdToRun(exe), "arg1", "arg2")
-   if out, err := exec.Command(binArgs[0], 
binArgs[1:]...).CombinedOutput(); err != nil {
+   cmd = exec.Command(binArgs[0], binArgs[1:]...)
+   if runtime.Compiler == "gccgo" {
+   cmd.Env = append(os.Environ(), "GCCGO=1")
+   }
+   if out, err := cmd.CombinedOutput(); err != nil {
t.Logf("%s", out)
t.Fatal(err)
}
@@ -166,8 +178,13 @@ func testInstall(t *testing.T, exe, libg
 func TestInstall(t *testing.T) {
defer os.RemoveAll("pkg")
 
+   libgoa := "libgo.a"
+   if runtime.Compiler == "gccgo" {
+   libgoa = "liblibgo.a"
+   }
+
testInstall(t, "./testp1"+exeSuffix,
-   filepath.Join("pkg", libgodir, "libgo.a"),
+   filepath.Join("pkg", libgodir, libgoa),
filepath.Join("pkg", libgodir, "libgo.h"),
"go", "install", "-buildmode=c-archive", "libgo")
 
@@ -206,6 +223,9 @@ func TestEarlySignalHandler(t *testing.T
}
 
ccArgs := append(cc, "-o", "testp"+exeSuffix, "main2.c", "libgo2.a")
+   if runtime.Compiler == "gccgo" {
+   ccArgs = append(ccArgs, "-lgo")
+   }
if out, err := exec.Command(ccArgs[0], ccArgs[1:]...).CombinedOutput(); 
err != nil {
t.Logf("%s", out)
t.Fatal(err)
@@ -243,6 +263,9 @@ func TestSignalForwarding(t *testing.T)
}
 
ccArgs := append(cc, "-o", "testp"+exeSuffix, "main5.c", "libgo2.a")
+   if runtime.Compiler == "gccgo" {
+   ccArgs = append(ccArgs, "-lgo")
+   }
if out, err := exec.Command(ccArgs[0], ccArgs[1:]...).CombinedOutput(); 
err != nil {
t.Logf("%s", out)
t.Fatal(err)
@@ -293,6 +316,9 @@ func TestSignalForwardingExternal(t *tes
}
 
ccArgs := append(cc, "-o", "testp"+exeSuffix, "main5.c", "libgo2.a")
+   if runtime.Compiler == "gccgo" {
+   ccArgs = append(ccArgs, "-lgo")
+   }
if out, err := exec.Command(ccArgs[0], ccArgs[1:]...).CombinedOutput(); 
err != nil {
t.Logf("%s", out)
t.Fatal(err)
@@ -380,6 +406,9 @@ func TestOsSignal(t *testing.T) {
}
 
ccArgs := append(cc, "-o", "testp"+exeSuffix, "main3.c", "libgo3.a")
+   if runtime.Compiler == "gccgo" {
+   ccArgs = append(ccArgs, "-lgo")
+   }
if out, err := exec.Command(ccArgs[0], ccArgs[1:]...).CombinedOutput(); 
err !=

C++ PATCH for c++/81164, ICE with invalid inherited constructor

2017-06-29 Thread Jason Merrill
It's ill-formed to inherit a constructor from an indirect base, but we
were failing to catch that for virtual bases, whose binfos look like
direct bases.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit b8d3d6f1932db56d61c522bcdcff5142b307e23c
Author: Jason Merrill 
Date:   Thu Jun 29 11:06:06 2017 -0400

PR c++/81164 - ICE with invalid inherited constructor.

* search.c (binfo_direct_p): New.
* name-lookup.c (do_class_using_decl): Use it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 946a916..a70b909 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6603,6 +6603,7 @@ extern tree dfs_walk_all (tree, tree (*) (tree, void *),
 extern tree dfs_walk_once (tree, tree (*) (tree, void *),
   tree (*) (tree, void *), void *);
 extern tree binfo_via_virtual  (tree, tree);
+extern bool binfo_direct_p (tree);
 extern tree build_baselink (tree, tree, tree, tree);
 extern tree adjust_result_of_qualified_name_lookup
(tree, tree, tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 2ca71b6..3eb2c0d 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -4172,8 +4172,7 @@ do_class_using_decl (tree scope, tree name)
  return NULL_TREE;
}
}
-  else if (name == ctor_identifier
-  && BINFO_INHERITANCE_CHAIN (BINFO_INHERITANCE_CHAIN (binfo)))
+  else if (name == ctor_identifier && !binfo_direct_p (binfo))
{
  error ("cannot inherit constructors from indirect base %qT", scope);
  return NULL_TREE;
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index d7895a0..c37488d 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -2973,6 +2973,28 @@ binfo_via_virtual (tree binfo, tree limit)
   return NULL_TREE;
 }
 
+/* BINFO is for a base class in some hierarchy.  Return true iff it is a
+   direct base.  */
+
+bool
+binfo_direct_p (tree binfo)
+{
+  tree d_binfo = BINFO_INHERITANCE_CHAIN (binfo);
+  if (BINFO_INHERITANCE_CHAIN (d_binfo))
+/* A second inheritance chain means indirect.  */
+return false;
+  if (!BINFO_VIRTUAL_P (binfo))
+/* Non-virtual, so only one inheritance chain means direct.  */
+return true;
+  /* A virtual base looks like a direct base, so we need to look through the
+ direct bases to see if it's there.  */
+  tree b_binfo;
+  for (int i = 0; BINFO_BASE_ITERATE (d_binfo, i, b_binfo); ++i)
+if (b_binfo == binfo)
+  return true;
+  return false;
+}
+
 /* BINFO is a base binfo in the complete type BINFO_TYPE (HERE).
Find the equivalent binfo within whatever graph HERE is located.
This is the inverse of original_binfo.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/inh-ctor28.C 
b/gcc/testsuite/g++.dg/cpp0x/inh-ctor28.C
new file mode 100644
index 000..90a06c6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/inh-ctor28.C
@@ -0,0 +1,7 @@
+// PR c++/81164
+// { dg-do compile { target c++11 } }
+
+struct A {};
+struct B : virtual A {};
+struct C : virtual A {};
+struct D : B,C { using A::A; };// { dg-error "indirect" }


Re: [Doc, AArch64] Fix/Update AArch64 options.

2017-06-29 Thread Sandra Loosemore

On 06/28/2017 01:28 AM, Yvan Roux wrote:

Hi Sandra,

[snip]

OK, here is the new patch with the comments addressed.  I've spotted
that there is also some m / -mno  options at least in the ARM section,
I'll make another patch to fix that.


This version looks fine.  Thanks for taking care of this!

-Sandra




Re: [Doc, AArch64] Fix/Update AArch64 options.

2017-06-29 Thread Richard Earnshaw (lists)
On 28/06/17 08:28, Yvan Roux wrote:
> Hi Sandra,
> 
> On 27 June 2017 at 18:05, Sandra Loosemore  wrote:
>> On 06/27/2017 06:19 AM, Yvan Roux wrote:
>>
>>> diff --git a/gcc/config/aarch64/aarch64.opt
>>> b/gcc/config/aarch64/aarch64.opt
>>> index 942a7d5..0fd1bfa 100644
>>> --- a/gcc/config/aarch64/aarch64.opt
>>> +++ b/gcc/config/aarch64/aarch64.opt
>>> @@ -146,7 +146,7 @@ EnumValue
>>>  Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
>>>
>>>  mpc-relative-literal-loads
>>> -Target Report Save Var(pcrelative_literal_loads) Init(2) Save
>>> +Target Report Var(pcrelative_literal_loads) Init(2) Save
>>>  PC relative literal loads.
>>>
>>>  msign-return-address=
>>
>>
>> I think this qualifies as an obvious fix.  I can't approve it if it isn't,
>> anyway  ;-)
> 
> Ok, I'll commit it separately unless there is an objection to its obviousness.
> 
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index d1e097b..6e0e776 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -595,7 +595,9 @@ Objective-C and Objective-C++ Dialects}.
>>>  -mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
>>>  -mlow-precision-sqrt  -mno-low-precision-sqrt@gol
>>>  -mlow-precision-div  -mno-low-precision-div @gol
>>> --march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
>>> +-mpc-relative-literal-loads -mno-pc-relative-literal-loads @gol
>>
>>
>> For options that have both positive and negative variants, we should only be
>> listing the one that is not the default in the Option Summary table.  Can
>> you please remove the existing redundant options listed for AArch64, instead
>> of adding a new one?
>>
>>> +-msign-return-address=@var{scope} @gol
>>> +-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}
>>> -moverride=@var{string}}
>>>
>>>  @emph{Adapteva Epiphany Options}
>>>  @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
>>> @@ -14158,8 +14160,10 @@ across releases.
>>>  This option is only intended to be useful when developing GCC.
>>>
>>>  @item -mpc-relative-literal-loads
>>> +@item -mno-pc-relative-literal-loads
>>
>>
>> It is OK to list both the positive and negative forms in the full
>> description, but in a table with multiple items in the same entry, the
>> second and subsequent ones should use @itemx markup instead of @item.
>>
>>>  @opindex mpc-relative-literal-loads
>>> -Enable PC-relative literal loads.  With this option literal pools are
>>> +@opindex mno-pc-relative-literal-loads
>>> +Enable or disable PC-relative literal loads.  With this option literal
>>> pools are
>>>  accessed using a single instruction and emitted after each function.
>>> This
>>>  limits the maximum size of functions to 1MB.  This is enabled by default
>>> for
>>>  @option{-mcmodel=tiny}.
> 
> OK, here is the new patch with the comments addressed.  I've spotted
> that there is also some m / -mno  options at least in the ARM section,
> I'll make another patch to fix that.
> 
> Thanks
> Yvan
> 

OK.

R.

> 
>>
>> -Sandra
>>
>>
>> fix-aarch64-opt.patch
>>
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index d1e097b..e1bb8a8 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -587,15 +587,14 @@ Objective-C and Objective-C++ Dialects}.
>>  -mgeneral-regs-only @gol
>>  -mcmodel=tiny  -mcmodel=small  -mcmodel=large @gol
>>  -mstrict-align @gol
>> --momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
>> +-momit-leaf-frame-pointer @gol
>>  -mtls-dialect=desc  -mtls-dialect=traditional @gol
>>  -mtls-size=@var{size} @gol
>> --mfix-cortex-a53-835769  -mno-fix-cortex-a53-835769 @gol
>> --mfix-cortex-a53-843419  -mno-fix-cortex-a53-843419 @gol
>> --mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
>> --mlow-precision-sqrt  -mno-low-precision-sqrt@gol
>> --mlow-precision-div  -mno-low-precision-div @gol
>> --march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
>> +-mfix-cortex-a53-835769  -mfix-cortex-a53-843419 @gol
>> +-mlow-precision-recip-sqrt  -mlow-precision-sqrt  -mlow-precision-div @gol
>> +-mpc-relative-literal-loads @gol
>> +-msign-return-address=@var{scope} @gol
>> +-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  
>> -moverride=@var{string}}
>>  
>>  @emph{Adapteva Epiphany Options}
>>  @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
>> @@ -14158,8 +14157,10 @@ across releases.
>>  This option is only intended to be useful when developing GCC.
>>  
>>  @item -mpc-relative-literal-loads
>> +@itemx -mno-pc-relative-literal-loads
>>  @opindex mpc-relative-literal-loads
>> -Enable PC-relative literal loads.  With this option literal pools are
>> +@opindex mno-pc-relative-literal-loads
>> +Enable or disable PC-relative literal loads.  With this option literal 
>> pools are
>>  accessed using a single instruction and emitted after each function.  This
>>  limits the maximum size of functions to 1MB.  This is enabled by default for
>>  @option{-mcmodel=tiny}.



Re: [PATCH, VAX] Correct ffs instruction constraint

2017-06-29 Thread coypu
Ping.

On Tue, Jun 20, 2017 at 08:05:42PM +, co...@sdf.org wrote:
> VAX' FFS as variable-length bit field instruction uses a "base"
> operand of type "vb" meaning "byte address".
> "base" can be 32 bits (SI) and due to the definition of
> ffssi2/__builtin_ffs() with the operand constraint "m", code can be
> emitted which incorrectly implies a mode-dependent (= longword, for
> the 32-bit operand) address.
> File scsipi_base.c compiled with -Os for our VAX install kernel shows:
> 
> ffs $0x0,$0x20,0x50(r11)[r0],r9
> 
> Apparently, 0x50(r11)[r0] as a longword address is assumed to be
> evaluated in longword context by FFS, but the instruction expects a
> byte address.
> 
> Our fix is to change the operand constraint from "m" to "Q", i. e.
> "operand is a MEM that does not have a mode-dependent address", which
> results in:
> 
> moval 0x50(r11)[r0],r1
> ffs $0x0,$0x20,(r1),r9
> 
> MOVAL evaluates the source operand/address in longword context, so
> effectively converts the word address to a byte address for FFS.
> 
> See NetBSD PR port-vax/51761 (http://gnats.netbsd.org/51761) and
> discussion on port-vax mailing list
> (http://mail-index.netbsd.org/port-vax/2017/01/06/msg002954.html).
> 
> Changlog:
> 
> 2017-06-20  Maya Rashish  
> 
>   * gcc/config/vax/builtins.md: Correct ffssi2_internal
>   instruction constraint.
> 
> 
> ---
>  gcc/config/vax/builtins.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
> index fb0f69acb..b78fb5616 100644
> --- a/gcc/config/vax/builtins.md
> +++ b/gcc/config/vax/builtins.md
> @@ -41,7 +41,7 @@
>  
>  (define_insn "ffssi2_internal"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=rQ")
> - (ffs:SI (match_operand:SI 1 "general_operand" "nrmT")))
> + (ffs:SI (match_operand:SI 1 "general_operand" "nrQT")))
> (set (cc0) (match_dup 0))]
>""
>"ffs $0,$32,%1,%0")
> -- 
> 2.13.1


Re: [PATCH rs6000] remove implicit static var outputs of toc_relative_expr_p

2017-06-29 Thread Aaron Sawdey
On Wed, 2017-06-28 at 18:19 -0500, Segher Boessenkool wrote:
> On Wed, Jun 28, 2017 at 03:21:49PM -0500, Aaron Sawdey wrote:
> > -toc_relative_expr_p (const_rtx op, bool strict)
> > +toc_relative_expr_p (const_rtx op, bool strict, const_rtx
> > *tocrel_base_ret,
> > +    const_rtx *tocrel_offset_ret)
> >  {
> >    if (!TARGET_TOC)
> >  return false;
> > -  tocrel_base = op;
> > -  tocrel_offset = const0_rtx;
> > +  const_rtx tocrel_base = op;
> > +  const_rtx tocrel_offset = const0_rtx;
> > +
> >    if (GET_CODE (op) == PLUS && add_cint_operand (XEXP (op, 1),
> > GET_MODE (op)))
> >  {
> >    tocrel_base = XEXP (op, 0);
> > -  tocrel_offset = XEXP (op, 1);
> > +  if (tocrel_offset_ret)
> > +   tocrel_offset = XEXP (op, 1);
> 
> Lose the "if"?  Or do you get a compiler warning then?

I was just trying to avoid unnecessary work in the case where the
pointer is NULL. In that case tocrel_offset isn't actually used for
anything. Probably I should just let the compiler figure that one out,
I will delete the if for clarity.

> > @@ -8674,7 +8686,8 @@
> >  legitimate_constant_pool_address_p (const_rtx x, machine_mode
> > mode,
> >     bool strict)
> >  {
> > -  return (toc_relative_expr_p (x, strict)
> > +  const_rtx tocrel_base, tocrel_offset;
> > +  return (toc_relative_expr_p (x, strict, &tocrel_base,
> > &tocrel_offset)
> >       && (TARGET_CMODEL != CMODEL_MEDIUM
> >       || constant_pool_expr_p (XVECEXP (tocrel_base, 0,
> > 0))
> >       || mode == QImode
> 
> Use NULL for the args here, instead?

The diff didn't include all the context. Both tocrel_base and
tocrel_offset are used in the function:

bool
legitimate_constant_pool_address_p (const_rtx x, machine_mode mode,
bool strict)
{
  const_rtx tocrel_base, tocrel_offset;
  return (toc_relative_expr_p (x, strict, &tocrel_base, &tocrel_offset)
  && (TARGET_CMODEL != CMODEL_MEDIUM
  || constant_pool_expr_p (XVECEXP (tocrel_base, 0, 0))
  || mode == QImode
  || offsettable_ok_by_alignment (XVECEXP (tocrel_base, 0, 0),
  INTVAL (tocrel_offset), mode)));
}


> 
> The patch is okay for trunk with those things taken care of.  Thanks,
> 
> 
> Segher
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[PATCH] Use a specfile that actually allows building programs on NetBSD

2017-06-29 Thread coypu
I was thinking of holding a party for the upcoming one year anniversary
of pinging this patch, that was committed to NetBSD's copy of GCC about
a decade ago. without it, I can't compile simple programs.
---
 gcc/config/netbsd.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/netbsd.h b/gcc/config/netbsd.h
index 4001f240d..f4ac23a73 100644
--- a/gcc/config/netbsd.h
+++ b/gcc/config/netbsd.h
@@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.  If not see
%{!pg:-lposix}} \
  %{p:-lposix_p}\
  %{pg:-lposix_p}}  \
+   %{shared:-lc}   \
%{!shared:  \
  %{!symbolic:  \
%{!p:   \
@@ -109,6 +110,7 @@ along with GCC; see the file COPYING3.  If not see
%{!pg:-lposix}} \
  %{p:-lposix_p}\
  %{pg:-lposix_p}}  \
+   %{shared:-lc}   \
%{!shared:  \
  %{!symbolic:  \
%{!p:   \
-- 
2.13.1



[PATCH] Fix PR77765

2017-06-29 Thread Cesar Philippidis
PR77765 exposed an ICE triggered in gfortran's acc routine parser by an
uninitialized proc_name. That situation occurred because the function
containing the acc routine directive has an error, so
gfc_current_ns->proc_name was never set.

Although it could be argued that the acc routine parser should not run
if any errors have been detected inside the routine containing such a
directive, this patch just teaches gfc_match_oacc_routine to check for
the existence of gfc_current_ns->proc_name before comparing the
procedure's name with the routine name specified by the user.

Is this patch OK for trunk and gcc7?

Thanks,
Cesar
2017-06-29  Cesar Philippidis  

	PR fortran/77765
	gcc/fortran/
	* openmp.c (gfc_match_oacc_routine): Check if proc_name exist before
	comparing the routine name against it.

	gcc/testsuite/
	* gfortran.dg/goacc/pr77765.f90: New test.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 1d191d2..236ecb2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2474,7 +2474,8 @@ gfc_match_oacc_routine (void)
 	  if (st)
 	{
 	  sym = st->n.sym;
-	  if (strcmp (sym->name, gfc_current_ns->proc_name->name) == 0)
+	  if (gfc_current_ns->proc_name != NULL
+		  && strcmp (sym->name, gfc_current_ns->proc_name->name) == 0)
 	sym = NULL;
 	}
 	  else if (isym == NULL)
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr77765.f90 b/gcc/testsuite/gfortran.dg/goacc/pr77765.f90
new file mode 100644
index 000..3819cf7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/pr77765.f90
@@ -0,0 +1,19 @@
+! Test the presence of an ACC ROUTINE directive inside a function
+! containg an error.
+
+! { dg-do compile }
+
+module m
+contains
+  recursive function f(x)
+  end function f
+  recursive function f(x)
+!$acc routine (f)
+  end function f
+end module m
+
+! { dg-error "Procedure 'f' at .1. is already defined" "" { target *-*-* } 8 }
+! { dg-error "Duplicate RECURSIVE attribute specified" "" { target *-*-* } 8 }
+! { dg-error ".1." "" { target *-*-* } 10 }
+! { dg-error "Unexpected ..ACC ROUTINE" "" { target *-*-* } 11 }
+! { dg-error "Expecting END MODULE statement" "" { target *-*-* } 12 }


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-29 Thread Joseph Myers
On Wed, 28 Jun 2017, Martin Sebor wrote:

> > The more limited interfaces could, of course, be __typeof_noqual in some
> > form.
> 
> Actually, despite what I've been arguing, I agree.  I've come
> to realize that what makes me uneasy about it is its name: it
> makes it sound like a special purpose flavor of __typeof__,
> when it really is a general purpose __remove_qualifiers trait.
> How does renaming it to something like that sound?

__typeof__ makes clear that it returns a type, whether given a type or an 
expression.  Can __remove_qualifiers be applied to an expression, and, if 
so, what does it do - return a type, or return the result of converting 
the expression to the corresponding type with whatever qualifiers removed?

-- 
Joseph S. Myers
jos...@codesourcery.com


gotools patch committed: Test runtime, misc/cgo/{test,testcarchive}

2017-06-29 Thread Ian Lance Taylor
This patch to the gotools Makefile adds tests to `make check`.  We now
test the runtime package using the newly built go tool, and test that
cgo works by running the misc/cgo/test and misc/cgo/testcarchive
tests.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

2017-06-29  Ian Lance Taylor  

* Makefile.am (MOSTLYCLEANFILES): Remove testing files and logs.
(mostlyclean-local): Remove check-runtime-dir, cgo-test-dir,
carchive-test-dir.
(ECHO_ENV): Define.
(check-go-tool): Depend on cgo.  Write command to testlog.
(check-runtime): New target.
(check-cgo-test): New target.
(check-carchive-test): New target.
(check): Depend on check-runtime, check-cgo-test,
check-carchive-test.  Add @ to prettify output.
(.PHONY): Add check-runtime, check-cgo-test, check-carchive-test.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 249758)
+++ Makefile.am (working copy)
@@ -44,6 +44,7 @@ GOLINK = $(GOCOMPILER) $(GOCFLAGS) $(AM_
 
 libgosrcdir = $(srcdir)/../libgo/go
 cmdsrcdir = $(libgosrcdir)/cmd
+libgomiscdir = $(srcdir)/../libgo/misc
 
 go_cmd_go_files = \
$(cmdsrcdir)/go/alldocs.go \
@@ -106,7 +107,12 @@ s-zdefaultcc: Makefile
$(SHELL) $(srcdir)/../move-if-change zdefaultcc.go.tmp zdefaultcc.go
$(STAMP) $@ 
 
-MOSTLYCLEANFILES = zdefaultcc.go s-zdefaultcc
+MOSTLYCLEANFILES = \
+   zdefaultcc.go s-zdefaultcc \
+   check-gccgo gotools.head *-testlog gotools.sum gotools.log
+
+mostlyclean-local:
+   rm -rf check-go-dir check-runtime-dir cgo-test-dir carchive-test-dir
 
 if NATIVE
 
@@ -156,6 +162,7 @@ check-gccgo: Makefile
chmod +x $@
 
 # CHECK_ENV sets up the environment to run the newly built go tool.
+# If you change this, change ECHO_ENV, below.
 CHECK_ENV = \
PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
export PATH; \
@@ -169,25 +176,81 @@ CHECK_ENV = \
LD_LIBRARY_PATH=`echo $${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
export LD_LIBRARY_PATH;
 
+# ECHO_ENV is a variant of CHECK_ENV to put into a testlog file.
+# It assumes that abs_libgodir is set.
+ECHO_ENV = PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'` GCCGO='$(abs_builddir)/check-gccgo' 
GCCGOTOOLDIR='$(abs_builddir)' GO_TESTING_GOTOOLS=yes LD_LIBRARY_PATH=`echo 
$${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`
+
 # check-go-tools runs `go test cmd/go` in our environment.
-check-go-tool: go$(EXEEXT) check-head check-gccgo
-   rm -rf check-go-dir
+check-go-tool: go$(EXEEXT) cgo$(EXEEXT) check-head check-gccgo
+   rm -rf check-go-dir cmd_go-testlog
$(MKDIR_P) check-go-dir/src/cmd/go
cp $(cmdsrcdir)/go/*.go check-go-dir/src/cmd/go/
cp $(libgodir)/zstdpkglist.go check-go-dir/src/cmd/go/
cp zdefaultcc.go check-go-dir/src/cmd/go/
cp -r $(cmdsrcdir)/go/testdata check-go-dir/src/cmd/go/
+   @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   abs_checkdir=`cd check-go-dir && $(PWD_COMMAND)`; \
+   echo "cd check-go-dir/src/cmd/go && $(ECHO_ENV) GOPATH=$${abs_checkdir} 
$(abs_builddir)/go$(EXEEXT) test -test.short -test.v" > cmd_go-testlog
$(CHECK_ENV) \
GOPATH=`cd check-go-dir && $(PWD_COMMAND)`; \
export GOPATH; \
(cd check-go-dir/src/cmd/go && $(abs_builddir)/go$(EXEEXT) test 
-test.short -test.v) > cmd_go-testlog 2>&1 || true
grep '^--- ' cmd_go-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/'
 
+# check-runtime runs `go test runtime` in our environment.
+# The runtime package is also tested as part of libgo,
+# but the runtime tests use the go tool heavily, so testing
+# here too will catch more problems.
+check-runtime: go$(EXEEXT) cgo$(EXEEXT) check-head check-gccgo
+   rm -rf check-runtime-dir runtime-testlog
+   $(MKDIR_P) check-runtime-dir
+   @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   LD_LIBRARY_PATH=`echo $${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
+   GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
+   GOOS=`$(abs_builddir)/go$(EXEEXT) env GOOS`; \
+   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_sysinfo.go $(libgodir)/sigtab.go" 
--tag=libffi`; \
+   echo "$(ECHO_ENV) GC='$(abs_builddir)/check-gccgo 
-fgo-compiling-runtime' GOARCH=$${GOARCH} GOOS=$${GOOS} $(SHELL) 
$(libgosrcdir)/../testsuite/gotest --goarch=$${GOARCH} --goos=$${GOOS} 
--basedir=$(libgosrcdir)/.. --srcdir=$(libgosrcdir)/runtime --pkgpath=runtime 
--pkgfiles='$${files}' -test.v" > runtime-testlog
+   $(CHECK_ENV) \
+   GC="$${GCCGO} -fgo-compiling-runtime"; \
+   export GC; \
+   GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
+   GOOS=`$(abs_builddir)/go$(EXEEXT)

Re: [PATCH] Use a specfile that actually allows building programs on NetBSD

2017-06-29 Thread Joseph Myers
On Thu, 29 Jun 2017, coypu wrote:

> I was thinking of holding a party for the upcoming one year anniversary
> of pinging this patch, that was committed to NetBSD's copy of GCC about
> a decade ago. without it, I can't compile simple programs.

I advise CC:ing the listed NetBSD maintainers (Jason Thorpe and Krister 
Walfridsson) on any ping of a NetBSD GCC patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH-v3] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-29 Thread David Miller
From: Daniel Cederman 
Date: Thu, 29 Jun 2017 17:15:43 +0200

>> I'm not thrilled with this, it's undocumented, the other workaround
>> don't have
>> it and I don't think that we really need it.
> 
> The B2BST errata workaround requires more changes to assembler
> routines commonly used by operating systems, such as for example
> register window handling, than what the UT699 workaround needed. It
> would be nice to have a way to only enable these modification when the
> -mfix- flag is used. The alternative would be to provide a define
> directly on the compiler command line in conjunction with -mfix
> flag. But if more changes are required later on it would be good to
> have the define more closely tied to the flag to minimize the number
> of changes to Makefiles and etc.

Personally, I have never seen compiler based CPP defines as ever being
useful for tailoring OS assembler code.  Ever.

In most cases you will want to support several families of CPUs and
therefore sort out the individual cpu support assembler routines
internally in the kernel sources.


Re: [PATCH] warn on mem calls modifying objects of non-trivial types (PR 80560)

2017-06-29 Thread Jan Hubicka
Hello,
> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index 0f7e21a..443d16c 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -803,7 +803,10 @@ hash_table::empty_slow ()
>m_size_prime_index = nindex;
>  }
>else
> -memset (entries, 0, size * sizeof (value_type));
> +{
> +  for ( ; size; ++entries, --size)
> + *entries = value_type ();
> +}
>m_n_deleted = 0;
>m_n_elements = 0;
>  }

This change sends our periodic testers into an infinite loop.  It is fault of 
gcc 4.2 being used
as bootstrap compiler, but perhaps that can be worked around?

Honza


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-29 Thread Martin Sebor

On 06/29/2017 09:56 AM, Joseph Myers wrote:

On Wed, 28 Jun 2017, Martin Sebor wrote:


The more limited interfaces could, of course, be __typeof_noqual in some
form.


Actually, despite what I've been arguing, I agree.  I've come
to realize that what makes me uneasy about it is its name: it
makes it sound like a special purpose flavor of __typeof__,
when it really is a general purpose __remove_qualifiers trait.
How does renaming it to something like that sound?


__typeof__ makes clear that it returns a type, whether given a type or an
expression.  Can __remove_qualifiers be applied to an expression, and, if
so, what does it do - return a type, or return the result of converting
the expression to the corresponding type with whatever qualifiers removed?


The C++ traits primitives
(https://gcc.gnu.org/onlinedocs/gcc/Type-Traits.html) work on types,
not expressions, so I would suggest to have __remove_qualifiers (and
the related __remove_const et al., if they should be added) follow
the same approach.

Martin

PS There are at least a couple of traits on the list above that
would be useful in C as well (__is_enum and __is_union, and maybe
also __underlying_type).


Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R

2017-06-29 Thread Thomas Preudhomme

On 29/06/17 16:12, Christophe Lyon wrote:

On 29 June 2017 at 16:37, Thomas Preudhomme




Why do you link armv8-r architecture definition to cortex-r4?



I understand, where did I do such a thing?



In patch #2 you have:
diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
index 
d6200f9bdc09a9d0c973853b0152a2800eaf2fe5..48c1d88032c1c5dc7c6cba71511f79fe9f2533ea
100644
--- a/gcc/config/arm/arm-cpu-data.h
+++ b/gcc/config/arm/arm-cpu-data.h
@@ -1478,6 +1478,26 @@ static const struct processors all_architectures[] =
  NULL
},
{
+"armv8-r", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,
+  isa_nobit
+},
+NULL
+  },
+  {
+"armv8-r+crc", TARGET_CPU_cortexr4,
+(TF_CO_PROC),
+"8R", BASE_ARCH_8R,
+{
+  ISA_ARMv8r,isa_bit_crc32,
+  isa_nobit
+},
+NULL
+  },
+  {
  "iwmmxt", TARGET_CPU_iwmmxt,
  (TF_LDSCHED | TF_STRONG | TF_XSCALE),
  "5TE", BASE_ARCH_5TE,

Both entries point to TARGET_CPU_cortexr4. I guess that's because r52
is only defined in patch #3, but then why not update this in patch #3
are replace r4 with r52?

Not sure I'm very clear :-)


You are. I must have forgotten about that setting when working on patch #3. I'll 
update this. Thanks for your vigilance :-)


Best regards,

Thomas


[PATCH] Fix vec_extract_lo_* patterns (PR target/81225)

2017-06-29 Thread Jakub Jelinek
Hi!

This patch fixes various issues with the vec_extract_lo_* patterns.
There are splitters for these, but only for some cases (no mask, and
in one case also not xmm32+ reg) that change those into just a copy or load
of the low part subreg, but if those can't be used, the vextract* insns
don't accept memory input operand, but 3 of the 4 patterns have
nonimmediate_operand input, which is wrong for the masked case, and the
other one uses register_operand, even when the splitter can handle
nonimmediate_operand when not masked.

Thus this patch makes sure that the input is nonimmediate_operand and v,vm
if not masked and register_operand and v,v if masked, returns "#" to ensure
splitting in cases the input is a memory, simplifies the conditions (for
masked we don't need to test at runtime if both arguments aren't MEMs,
because the predicate is now register_operand with v constraint), and
changes the single case that used register_operand to follow the rest.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-06-29  Jakub Jelinek  

PR target/81225
* config/i386/sse.md (vec_extract_lo_): For
V8FI, V16FI and VI8F_256 iterators, use  instead
of nonimmediate_operand and  instead of m for
the input operand.  For V8FI iterator, always split if input is a MEM.
For V16FI and V8SF_256 iterators, don't test if both operands are MEM
if .  For VI4F_256 iterator, use 
instead of register_operand and  instead of v for
the input operand.  Make sure both operands aren't MEMs for if not
.

* gcc.target/i386/pr81225.c: New test.

--- gcc/config/i386/sse.md.jj   2017-06-21 22:01:41.0 +0200
+++ gcc/config/i386/sse.md  2017-06-28 12:30:49.304820307 +0200
@@ -7359,13 +7359,13 @@ (define_insn "vec_extract_lo__mask
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" 
"=,v")
(vec_select:
- (match_operand:V8FI 1 "nonimmediate_operand" "v,m")
+ (match_operand:V8FI 1 "" 
"v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F
&& ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
-  if ( || !TARGET_AVX512VL)
+  if ( || (!TARGET_AVX512VL && !MEM_P (operands[1])))
 return "vextract64x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
   else
 return "#";
@@ -7515,14 +7515,15 @@ (define_expand "avx_vextractf128"
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=v,m")
(vec_select:
- (match_operand:V16FI 1 "nonimmediate_operand" "vm,v")
+ (match_operand:V16FI 1 ""
+",v")
  (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)
  (const_int 4) (const_int 5)
  (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F
&& 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x8\t{$0x0, %1, 
%0|%0, %1, 0x0}";
@@ -7546,11 +7547,12 @@ (define_split
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" "=v,m")
(vec_select:
- (match_operand:VI8F_256 1 "nonimmediate_operand" "vm,v")
+ (match_operand:VI8F_256 1 ""
+   ",v")
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX
&&  && 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}";
@@ -7610,12 +7612,16 @@ (define_split
   "operands[1] = gen_lowpart (mode, operands[1]);")
 
 (define_insn "vec_extract_lo_"
-  [(set (match_operand: 0 "" 
"=")
+  [(set (match_operand: 0 ""
+ "=,v")
(vec_select:
- (match_operand:VI4F_256 1 "register_operand" "v")
+ (match_operand:VI4F_256 1 ""
+   "v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
-  "TARGET_AVX &&  && "
+  "TARGET_AVX
+   &&  && 
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
--- gcc/testsuite/gcc.target/i386/pr81225.c.jj  2017-06-28 12:51:10.606338225 
+0200
+++ gcc/testsuite/gcc.target/i386/pr81225.c 2017-06-28 12:50:52.0 
+0200
@@ -0,0 +1,14 @@
+/* PR target/81225 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512ifma -O3 -ffloat-store" } */
+
+long a[24];
+float b[4], c[24];
+int d;
+
+void
+foo ()
+{
+  for (d = 0; d < 24; d++)
+c[d] = (float) d ? : b[a[d]];
+}

Jakub


Re: [PATCH] Call BUILT_IN_ASAN_HANDLE_NO_RETURN before BUILT_IN_UNWIND_RESUME (PR sanitizer/81021).

2017-06-29 Thread Jeff Law
On 06/13/2017 02:09 AM, Martin Liška wrote:
> Hi.
> 
> For a function that does not handle an expection (and calls 
> BUILT_IN_UNWIND_RESUME),
> we need to emit call to BUILT_IN_ASAN_HANDLE_NO_RETURN. That will clean up 
> stack
> which can possibly contain poisoned shadow memory that will not be cleaned-up
> in function prologue.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-12  Martin Liska  
> 
>   PR sanitizer/81021
>   * g++.dg/asan/pr81021.C: New test.
> 
> gcc/ChangeLog:
> 
> 2017-06-12  Martin Liska  
> 
>   PR sanitizer/81021
>   * tree-eh.c (lower_resx): Call BUILT_IN_ASAN_HANDLE_NO_RETURN
>   before BUILT_IN_UNWIND_RESUME when ASAN is used.
OK.
Jeff


[PATCH 1/2] combine: Print insns with the cost dump

2017-06-29 Thread Segher Boessenkool
In the combine dump file, at the start there is a list of the RTL cost
of every insn.  The only thing listed about the insns is the UID though.
To make it more useful, this patch prints the insn itself as well (in
slim format).

Tested on powerpc64-linux {-m32,-m64}, committing to trunk


Segher


2017-06-29  Segher Boessenkool  

* combine.c (combine_instructions): Print insns to dump_file, together
with their costs.

---
 gcc/combine.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 73895b6..c49b2b2 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1213,8 +1213,10 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
  INSN_COST (insn) = insn_rtx_cost (PATTERN (insn),
optimize_this_for_speed_p);
if (dump_file)
- fprintf (dump_file, "insn_cost %d: %d\n",
-  INSN_UID (insn), INSN_COST (insn));
+ {
+   fprintf (dump_file, "insn_cost %d for ", INSN_COST (insn));
+   dump_insn_slim (dump_file, insn);
+ }
  }
 }
 
-- 
1.9.3



[PING*3, Ada] Re: Handle data dependence relations with different bases

2017-06-29 Thread Richard Sandiford
Ping*3

Richard Sandiford  writes:
> Ping*2
>
> Richard Sandiford  writes:
>> Ping for this Ada patch/question.
>>
>> Richard Sandiford  writes:
>>> Richard Biener  writes:
>> How does this look?  Changes since v1:
>>
>> - Added access_fn_component_p to check for valid access function
>> components.
>>
>> - Added access_fn_components_comparable_p instead of using
>>   types_compatibloe_p directly.
>>
>> - Added more commentary.
>>
>> - Added local structures to represent the sequence, so that it's
>>   more obvious which variables are temporaries and which aren't.
>>
>> - Added the test above to vect-alias-check-3.c.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

 This is ok.
>>>
>>> Thanks.  Just been retesting, and I think I must have forgotten
>>> to include Ada last time.  It turns out that the patch causes a dg-scan
>>> regression in gnat.dg/vect17.adb, because we now think that if the
>>> array RECORD_TYPEs *do* alias in:
>>>
>>>procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is
>>>begin
>>>   for I in Sarray'Range loop
>>>  R(I) := X(I) + Y(I);
>>>   end loop;
>>>end;
>>>
>>> then the dependence distance must be zero.  Eric, does that hold true
>>> for Ada?  I.e. if X and R (or Y and R) alias, must it be the case that
>>> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)?  Or are
>>> the arrays allowed to overlap by an arbitrary number of indices?
>>>
>>> If the assumption is correct, is the patch below OK?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> 2017-06-07  Richard Sandiford  
>>>
>>> gcc/testsuite/
>>> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5.
>>> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1
>>> when X = R or Y = R.
>>>
>>> Index: gcc/testsuite/gnat.dg/vect17.ads
>>> ===
>>> --- gcc/testsuite/gnat.dg/vect17.ads2015-10-14 14:58:56.0 
>>> +0100
>>> +++ gcc/testsuite/gnat.dg/vect17.ads2017-06-07 22:10:24.796368118 
>>> +0100
>>> @@ -1,6 +1,6 @@
>>>  package Vect17 is
>>>  
>>> -   type Sarray is array (1 .. 4) of Long_Float;
>>> +   type Sarray is array (1 .. 5) of Long_Float;
>>> for Sarray'Alignment use 16;
>>>  
>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray);
>>> Index: gcc/testsuite/gnat.dg/vect17.adb
>>> ===
>>> --- gcc/testsuite/gnat.dg/vect17.adb2015-10-14 14:58:56.0 
>>> +0100
>>> +++ gcc/testsuite/gnat.dg/vect17.adb2017-06-07 22:10:24.796368118 
>>> +0100
>>> @@ -5,8 +5,9 @@ package body Vect17 is
>>>  
>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is
>>> begin
>>> -  for I in Sarray'Range loop
>>> - R(I) := X(I) + Y(I);
>>> +  R(1) := X(5) + Y(5);
>>> +  for I in 1 .. 4 loop
>>> + R(I + 1) := X(I) + Y(I);
>>>end loop;
>>> end;
>>>  


Re: libdecnumber/bid/bid2dpd_dpd2bid.c: Simplify code

2017-06-29 Thread Jeff Law
On 05/26/2017 07:34 AM, Sylvestre Ledru wrote:
> Hello,
> 
> The attach patch (dup.diff) performs the following changes:
> 
> * bid/bid2dpd_dpd2bid.c: Remove identical code for different
> branches (CID 1286836, 1286837, 1286838)
>   Remove some useless } else { declaration as we are returning
>   Remove some whitespace changes & tab
> 
> i attached the word diff to highlight the change.
> 
> No functional changes! The identical code has been found by coverity.
> 
> Thanks!
> 
> S
> 
> 
> 
> dup.diff
> 
> 
> libdecnumber/ChangeLog:
> 
> 2017-05-26  Sylvestre Ledru  
> 
>   * bid/bid2dpd_dpd2bid.c: Remove identical code for different branches 
> (CID 1286836, 1286837, 1286838)
>   Remove some useless } else { declaration as we are returning
>   Remove some whitespace changes & tab
I fixed up the ChangeLog and committed your change.  Thanks,

jeff


Re: [PATCH] Fix PR77765

2017-06-29 Thread Jakub Jelinek
On Thu, Jun 29, 2017 at 08:54:54AM -0700, Cesar Philippidis wrote:
> PR77765 exposed an ICE triggered in gfortran's acc routine parser by an
> uninitialized proc_name. That situation occurred because the function
> containing the acc routine directive has an error, so
> gfc_current_ns->proc_name was never set.
> 
> Although it could be argued that the acc routine parser should not run
> if any errors have been detected inside the routine containing such a
> directive, this patch just teaches gfc_match_oacc_routine to check for
> the existence of gfc_current_ns->proc_name before comparing the
> procedure's name with the routine name specified by the user.
> 
> Is this patch OK for trunk and gcc7?
> 
> Thanks,
> Cesar

> 2017-06-29  Cesar Philippidis  
> 
>   PR fortran/77765
>   gcc/fortran/
>   * openmp.c (gfc_match_oacc_routine): Check if proc_name exist before
>   comparing the routine name against it.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/pr77765.f90: New test.

Ok, thanks.

Jakub


Re: PR80806

2017-06-29 Thread Jeff Law
On 05/23/2017 09:58 AM, Martin Sebor wrote:
> On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:
>> Hi,
>> The attached patch tries to fix PR80806 by warning when a variable is
>> set using memset (and friends) but not used. I chose to warn in dse
>> pass since dse would detect if the variable passed as 1st argument is
>> a dead store. Does this approach look OK ?
> 
> Detecting -Wunused-but-set-variable in the optimizer means that
> the warning will not be issued without optimization.  It also
> means that the warning will trigger in cases where the variable
> is used conditionally and the condition is subject to constant
> propagation.  For instance:
Yea.  There's definitely tradeoffs for implementing warnings early vs
late.  There's little doubt we could construct testcases where an early
warning would miss cases that could be caught by a late warning.


> 
>   void sink (void*);
> 
>   void test (int i)
>   {
>   char buf[10];   // -Wunused-but-set-variable
>   memset (buf, 0, sizeof(buf));
> 
>   if (i)
> sink (buf);
>   }
> 
>   void f (void)
>   {
>   test (0);
>   }
> 
> I suspect this would be considered a false positive by most users.
> In my view, it would be more in line with the design of the warning
> to enhance the front end to detect this case, and it would avoid
> these issues.
Given no knowledge of sink() here, don't we have to assume that buf is
used?  So, yea, I'd probably consider that a false positive.


> 
> I have a patch that does that.  Rather than checking the finite
> set of known built-in functions like memset that are known not
> to read the referenced object, I took the approach of adding
> a new  function attribute (I call it write-only) and avoiding
> setting the DECL_READ_P flag for DECLs that are passed to
> function arguments decorated with the attribute.  That makes
> it possible to issue the warning even if the variable is passed
> to ordinary (non-built-in) functions like getline(), and should
> open up optimization opportunities beyond built-ins. 
ISTM like this would be generally useful.


 The only
> wrinkle is that the front end sets DECL_READ_P even for uses that
> aren't reads such as a sizeof expression, so while an otherwise
> unused buf is diagnosed given a call to memset(buf, 0, 10), it
> isn't diagnosed if a call is made to memset(buf, 0, sizeof buf).
> I am yet to see what impact not setting DECL_READ_P would have
> when the decl is used without being evaluated.  (In any event,
> setting DECL_READ_P on a use that doesn't involve reading the
> DECL doesn't seem right.)
Agreed.

> 
> I attach what I have so far in case you would like to check it
> out.  I think you have more experience with DSE than me so I'd
> be interested in your thoughts on making use of the attribute
> for optimization.  (Another couple attributes I'm considering
> to complement write-only is read-only and read-write, also
> with the hope of improving both warnings and code generation.
> Ideas on those would be welcome as well.)
Ideally we'd integrate this into the memory web, but I don't think we
have that capability these days with the sparser representation.
Essentially a definition is always considered a read and a write of the
underlying memory object.

This (write-only) likely wouldn't be used directly in DSE, but instead
would live in the alias oracle support routines.  In particular
ref_maybe_used_by_call seems natural as that code already knows about
similar situations with various builtins.  DSE would use it implicitly
as would other optimizers.

Jeff


Re: PR80806

2017-06-29 Thread Jeff Law
On 06/29/2017 11:57 AM, Jeff Law wrote:
> On 05/23/2017 09:58 AM, Martin Sebor wrote:
>> On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:
>>> Hi,
>>> The attached patch tries to fix PR80806 by warning when a variable is
>>> set using memset (and friends) but not used. I chose to warn in dse
>>> pass since dse would detect if the variable passed as 1st argument is
>>> a dead store. Does this approach look OK ?
>>
>> Detecting -Wunused-but-set-variable in the optimizer means that
>> the warning will not be issued without optimization.  It also
>> means that the warning will trigger in cases where the variable
>> is used conditionally and the condition is subject to constant
>> propagation.  For instance:
> Yea.  There's definitely tradeoffs for implementing warnings early vs
> late.  There's little doubt we could construct testcases where an early
> warning would miss cases that could be caught by a late warning.
> 
> 
>>
>>   void sink (void*);
>>
>>   void test (int i)
>>   {
>>   char buf[10];   // -Wunused-but-set-variable
>>   memset (buf, 0, sizeof(buf));
>>
>>   if (i)
>> sink (buf);
>>   }
>>
>>   void f (void)
>>   {
>>   test (0);
>>   }
>>
>> I suspect this would be considered a false positive by most users.
>> In my view, it would be more in line with the design of the warning
>> to enhance the front end to detect this case, and it would avoid
>> these issues.
> Given no knowledge of sink() here, don't we have to assume that buf is
> used?  So, yea, I'd probably consider that a false positive.
Oh, wait, I missed the constant propagation.  That makes this one less
clear cut in my mind -- it means its context sensitive.  I could easily
argue either way on this one.

Jeff


Re: PR80806

2017-06-29 Thread Jeff Law
On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:
> Hi,
> The attached patch tries to fix PR80806 by warning when a variable is
> set using memset (and friends) but not used. I chose to warn in dse
> pass since dse would detect if the variable passed as 1st argument is
> a dead store. Does this approach look OK ?
[ ... ]
So I think the biggest question is whether or not the case like Martin's
deserves a warning.

What we have is an object that is conditionally set but not used
depending on inlining context.  We've generally "allowed" inlining to
expose new warnings in the sense that inlining may (for example) allow
us to remove the addressibility property on an object -- which makes the
object subject to the usual -Wuninitialized analysis.  In fact, I think
we've generally considered that a positive outcome because it's exposing
bugs in subtle paths.

I'm less sure that this case falls into that same category.  What we're
really talking about is warning for a partially dead store.   Would we
want a warning if rather than a memset this was a simple store?   Is
that the right guiding principle here?

I hate to say it, but I wonder if this is another case where there
likely won't be a clear consensus and we're going to end up with a two
level warning system?

For something like Martin's case what I really think we should do is
sink the memset call into the conditional.  In cases where "i" is not a
constant, but actually has the value zero at runtime we win.

--

So I've got no objections to the idea of using DSE to detect the dead
store and potentially warn.  My concern is are we in a case where that
warning is going to annoy users and we end up needing a level of
-Wunused-but-set.

Jeff


[PR c++/81247] ICE with bogus input

2017-06-29 Thread Nathan Sidwell

This fixes 81247.  There are two parts,

firstly it seems pointless to parse try and parse a namespace body when 
there's a missing '{'.  I changed the parser to immediately close the 
new namespace and return in that case.


However, the bug is in my updated do_pushdecl handling.  When the 
incoming decl's CONTEXT is not CURRENT_NAMESPACE, we search the context 
for a matching decl, but push into CURRENT_NAMESPACE if there's no 
match.  But I'd not updated OLD, which holds the current binding for 
UPDATE_BINDING's use.  In this case that meant we silently smashed the 
existing binding to a namespace.


This behaviour of pushing into current namespace does seem odd, but it's 
what the previous code did.  I don't fully understand why.


nathan
--
Nathan Sidwell
2017-06-29  Nathan Sidwell  

	PR c++/81247
	* parser.c (cp_parser_namespace_definition): Immediately close the
	namespace if there's no open-brace.
	* name-lookup.c (do_pushdecl): Reset OLD when pushing into new
	namespace.

	PR c++/81247
	* g++.dg/parse/pr81247-[abc].C: New.

Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 249796)
+++ cp/name-lookup.c	(working copy)
@@ -2422,6 +2422,9 @@ do_pushdecl (tree decl, bool is_friend)
 	{
 	  ns = current_namespace;
 	  slot = find_namespace_slot (ns, name, true);
+	  /* Update OLD to reflect the namespace we're going to be
+	 pushing into.  */
+	  old = MAYBE_STAT_DECL (*slot);
 	}
 
   old = update_binding (level, binding, slot, old, decl, is_friend);
Index: cp/parser.c
===
--- cp/parser.c	(revision 249796)
+++ cp/parser.c	(working copy)
@@ -18397,13 +18397,14 @@ cp_parser_namespace_definition (cp_parse
   warning  (OPT_Wnamespaces, "namespace %qD entered", current_namespace);
 
   /* Look for the `{' to validate starting the namespace.  */
-  cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
+  if (cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE))
+{
+  /* Parse the body of the namespace.  */
+  cp_parser_namespace_body (parser);
 
-  /* Parse the body of the namespace.  */
-  cp_parser_namespace_body (parser);
-
-  /* Look for the final `}'.  */
-  cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+  /* Look for the final `}'.  */
+  cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+}
 
   if (has_visibility)
 pop_visibility (1);
Index: testsuite/g++.dg/parse/pr81247-a.C
===
--- testsuite/g++.dg/parse/pr81247-a.C	(revision 0)
+++ testsuite/g++.dg/parse/pr81247-a.C	(working copy)
@@ -0,0 +1,13 @@
+// PR c++/81247 ICE
+
+namespace N  // { dg-message "previous declaration" }
+// { dg-error "expected" "" { target *-*-* } .+1 }
+template < typename T > class A
+{ // { dg-error "redeclared as different" }
+  template < T > friend class N;
+};
+
+void f ()
+{
+  A < int > a1; //  { dg-message "required from here" }
+}
Index: testsuite/g++.dg/parse/pr81247-b.C
===
--- testsuite/g++.dg/parse/pr81247-b.C	(revision 0)
+++ testsuite/g++.dg/parse/pr81247-b.C	(working copy)
@@ -0,0 +1,14 @@
+// PR c++/81247 confused error
+
+namespace N { // { dg-message "previous declaration" }
+}
+
+template < typename T > class A
+{ // { dg-error "redeclared as different" }
+  template < T > friend class N;
+};
+
+void f ()
+{
+  A < int > a1;
+}
Index: testsuite/g++.dg/parse/pr81247-c.C
===
--- testsuite/g++.dg/parse/pr81247-c.C	(revision 0)
+++ testsuite/g++.dg/parse/pr81247-c.C	(working copy)
@@ -0,0 +1,13 @@
+// PR c++/81247 confused error
+
+namespace N { // { dg-message "previous declaration" }
+  template < typename T > class A
+  { // { dg-error "conflicts with a previous" }
+template < T > friend class N;
+  };
+}
+
+void f ()
+{
+  N::A < int > a1;
+}


Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage

2017-06-29 Thread Steve Ellcey
On Tue, 2017-06-27 at 16:45 -0600, Jeff Law wrote:

> > +  /* If we have a boolean variable allow it and generate a compare
> > + to zero reg when expanding.  */
> > +  if (!g)
> > +return (TREE_CODE (TREE_TYPE (t)) == BOOLEAN_TYPE);
> Depending on how you use T, you might be better off checking T's range
> and considering anything with the [0,1] range as a boolean.  That would
> also pick up the case where T was set via a comparison, or the output of
> a PHI with arguments that are all [0,1], etc.  I've found that to be a
> useful improvement in a couple places.
> 
> See ssa_name_has_boolean_range.  I don't consider it a requirement for
> this patch to go forward, but more something you might want to
> investigate as a future improvement.
> 
> OK for the trunk.  Sorry about the delay.
> 
> jeff

Thanks Jeff, I checked this in.  I hadn't considered integers with a
restricted range but it might be worth adding.  I will look into that.

Steve Ellcey
sell...@cavium.com


Re: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-29 Thread H.J. Lu
On Sun, Jun 25, 2017 at 2:28 PM, Andrew Pinski  wrote:
> On Sun, Jun 25, 2017 at 11:18 AM, Andrew Pinski  wrote:
>> On Sun, Jun 25, 2017 at 1:28 AM, Marc Glisse  wrote:
>>> +(for cmp (gt ge lt le)
>>> + outp (convert convert negate negate)
>>> + outn (negate negate convert convert)
>>> + /* Transform (X > 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>>> + /* Transform (X >= 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>>> + /* Transform (X < 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>>> + /* Transform (X <= 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>>> + (simplify
>>> +  (cond (cmp @0 real_zerop) real_onep real_minus_onep)
>>> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
>>> +   && types_match (type, TREE_TYPE (@0)))
>>> +   (switch
>>> +(if (types_match (type, float_type_node))
>>> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outp @0)))
>>> +(if (types_match (type, double_type_node))
>>> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outp @0)))
>>> +(if (types_match (type, long_double_type_node))
>>> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outp @0))
>>>
>>> There is already a 1.0 of the right type in the input, it would be easier to
>>> reuse it in the output than build a new one.
>>
>> Right.  Fixed.
>>
>>>
>>> Non-generic builtins like copysign are such a pain... We also end up missing
>>> the 128-bit case that way (pre-existing problem, not your patch). We seem to
>>> have a corresponding internal function, but apparently it is not used until
>>> expansion (well, maybe during vectorization).
>>
>> Yes I noticed that while working on a different patch related to
>> copysign; The generic version of a*copysign(1.0, b) [see the other
>> thread where the ARM folks started a patch for it; yes it was by pure
>> accident that I was working on this and really did not notice that
>> thread until yesterday].
>> I was looking into a nice way of creating copysign without having to
>> do the switch but I could not find one.  In the end I copied was done
>> already in a different location in match.pd; this is also the reason
>> why I had the build_one_cst there.
>>
>>>
>>> + /* Transform (X > 0.0 ? -1.0 : 1.0) into copysign(1,-X). */
>>> + /* Transform (X >= 0.0 ? -1.0 : 1.0) into copysign(1,-X). */
>>> + /* Transform (X < 0.0 ? -1.0 : 1.0) into copysign(1,X). */
>>> + /* Transform (X <= 0.0 ? -1.0 : 1.0) into copysign(1,X). */
>>> + (simplify
>>> +  (cond (cmp @0 real_zerop) real_minus_onep real_onep)
>>> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
>>> +   && types_match (type, TREE_TYPE (@0)))
>>> +   (switch
>>> +(if (types_match (type, float_type_node))
>>> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outn @0)))
>>> +(if (types_match (type, double_type_node))
>>> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outn @0)))
>>> +(if (types_match (type, long_double_type_node))
>>> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outn @0)))
>>> +
>>> +/* Transform X * copysign (1.0, X) into abs(X). */
>>> +(simplify
>>> + (mult:c @0 (COPYSIGN real_onep @0))
>>> + (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
>>> +  (abs @0)))
>>>
>>> I would have expected it do to the right thing for signed zero and qNaN. Can
>>> you describe a case where it would give the wrong result, or are the
>>> conditions just conservative?
>>
>> I was just being conservative; maybe too conservative but I was a bit
>> worried I could get it incorrect.
>>
>>>
>>> +/* Transform X * copysign (1.0, -X) into -abs(X). */
>>> +(simplify
>>> + (mult:c @0 (COPYSIGN real_onep (negate @0)))
>>> + (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
>>> +  (negate (abs @0
>>> +
>>> +/* Transform copysign (-1.0, X) into copysign (1.0, X). */
>>> +(simplify
>>> + (COPYSIGN real_minus_onep @0)
>>> + (COPYSIGN { build_one_cst (type); } @0))
>>>
>>> (simplify
>>>  (COPYSIGN REAL_CST@0 @1)
>>>  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0)))
>>>   (COPYSIGN (negate @0) @1)))
>>> ? Or does that create trouble with sNaN and only the 1.0 case is worth
>>> the trouble?
>>
>> No that is the correct way; I Noticed the other thread about copysign
>> had something similar as what should be done too.
>>
>> I will send out a new patch after testing soon.
>
> New patch.
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * match.pd (X >/>=/ (X * copysign (1.0, X)): New pattern.
> (X * copysign (1.0, -X)): New pattern.
> (copysign (-1.0, CST)): New pattern.
>
> testsuite/ChangeLog:
> * gcc.dg/tree-ssa/copy-sign-1.c: New testcase.
> * gcc.dg/tree-ssa/copy-sign-2.c: New testcase.
> * gcc.dg/tree-ssa/mult-abs-2.c: New testcase.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81255

-- 
H.J.


Small tweak to RTL expansion of some array accesses on RISC targets

2017-06-29 Thread Eric Botcazou
I noticed that, when a variable-sized object declared on the stack turns out 
to be of fixed size, the optimizer can replace the call to __builtin_alloca by 
the declaration of fixed-size local array.  Now, even if the alignment of the 
object is explicitly preserved, the alignment of its type is not since the 
type of the local array is always unsigned char.

On RISC targets (STRICT_ALIGNMENT / SLOW_UNALIGNED_ACCESS to be precise), this 
causes any read larger than unsigned char to go through the bitfield expansion 
circuitry, because expand_expr_real_1 has:

/* If the field isn't aligned enough to fetch as a memref,
   fetch it as a bit field.  */
|| (mode1 != BLKmode
&& (((TYPE_ALIGN (TREE_TYPE (tem)) < GET_MODE_ALIGNMENT (mode)
  || (bitpos % GET_MODE_ALIGNMENT (mode) != 0)
  || (MEM_P (op0)
  && (MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode1)
  || (bitpos % GET_MODE_ALIGNMENT (mode1) != 0
 && modifier != EXPAND_MEMORY
 && ((modifier == EXPAND_CONST_ADDRESS
  || modifier == EXPAND_INITIALIZER)
 ? STRICT_ALIGNMENT
 : SLOW_UNALIGNED_ACCESS (mode1, MEM_ALIGN (op0

In other words, even if the alignment of the MEM is sufficient (and it is), 
the test on TYPE_ALIGN (TREE_TYPE (tem)) is true since TREE_TYPE (tem) is 
unsigned char.

I think that the test on TYPE_ALIGN (TREE_TYPE (tem)) is superfluous when op0 
is a MEM because the second part of the test is more precise and sufficient, 
so the attached patchlet uses a conditional expression to implement that.

Bootstrapped/regtested on SPARC/Solaris, SPARC64/Linux, PowerPC64/Linux and 
Aarch64/Linux, applied on the mainline as obvious.


2017-06-29  Eric Botcazou  

* expr.c (expand_expr) : When testing for unaligned
objects, take into account only the alignment of 'op0' and 'mode1' if
'op0' is a MEM.

-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 249619)
+++ expr.c	(working copy)
@@ -10631,11 +10631,11 @@ expand_expr_real_1 (tree exp, rtx target
 	/* If the field isn't aligned enough to fetch as a memref,
 	   fetch it as a bit field.  */
 	|| (mode1 != BLKmode
-		&& (((TYPE_ALIGN (TREE_TYPE (tem)) < GET_MODE_ALIGNMENT (mode)
-		  || (bitpos % GET_MODE_ALIGNMENT (mode) != 0)
-		  || (MEM_P (op0)
-			  && (MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode1)
-			  || (bitpos % GET_MODE_ALIGNMENT (mode1) != 0
+		&& (((MEM_P (op0)
+		  ? MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode1)
+		|| (bitpos % GET_MODE_ALIGNMENT (mode1) != 0)
+		  : TYPE_ALIGN (TREE_TYPE (tem)) < GET_MODE_ALIGNMENT (mode)
+		|| (bitpos % GET_MODE_ALIGNMENT (mode) != 0))
 		 && modifier != EXPAND_MEMORY
 		 && ((modifier == EXPAND_CONST_ADDRESS
 			  || modifier == EXPAND_INITIALIZER)


C++ PATCH for c++/81188, error matching decltype of member function call

2017-06-29 Thread Jason Merrill
Here, cp_tree_equal was assuming that the member operand of
COMPONENT_REF will be == if it is equivalent; that isn't accurate for
a reference to a member function.  So let's remove the special case
and fall back on normal expression handling.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit a4a899baf962a73d81ae776ab3a7d80ea0a5bf8b
Author: Jason Merrill 
Date:   Thu Jun 29 15:15:48 2017 -0400

PR c++/81188 - matching decltype of member function call.

* tree.c (cp_tree_equal): Remove COMPONENT_REF special case.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 4535af6..a52a9e8 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3589,11 +3589,6 @@ cp_tree_equal (tree t1, tree t2)
return false;
   return cp_tree_equal (TREE_OPERAND (t1, 1), TREE_OPERAND (t1, 1));
 
-case COMPONENT_REF:
-  if (TREE_OPERAND (t1, 1) != TREE_OPERAND (t2, 1))
-   return false;
-  return cp_tree_equal (TREE_OPERAND (t1, 0), TREE_OPERAND (t2, 0));
-
 case PARM_DECL:
   /* For comparing uses of parameters in late-specified return types
 with an out-of-class definition of the function, but can also come
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-call4.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-call4.C
new file mode 100644
index 000..d504954
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-call4.C
@@ -0,0 +1,13 @@
+// PR c++/81188
+// { dg-do compile { target c++11 } }
+
+template 
+struct C {
+  F fast(long i) const;
+  auto operator[](long i) const -> decltype(this->fast(i));
+};
+
+template 
+auto C::operator[](long i) const -> decltype(this->fast(i)) {
+  return fast(i);
+}


Re: [PATCH, VAX] Correct ffs instruction constraint

2017-06-29 Thread Jeff Law
On 06/29/2017 09:47 AM, co...@sdf.org wrote:
> Ping.
> 
> On Tue, Jun 20, 2017 at 08:05:42PM +, co...@sdf.org wrote:
>> VAX' FFS as variable-length bit field instruction uses a "base"
>> operand of type "vb" meaning "byte address".
>> "base" can be 32 bits (SI) and due to the definition of
>> ffssi2/__builtin_ffs() with the operand constraint "m", code can be
>> emitted which incorrectly implies a mode-dependent (= longword, for
>> the 32-bit operand) address.
>> File scsipi_base.c compiled with -Os for our VAX install kernel shows:
>>
>> ffs $0x0,$0x20,0x50(r11)[r0],r9
>>
>> Apparently, 0x50(r11)[r0] as a longword address is assumed to be
>> evaluated in longword context by FFS, but the instruction expects a
>> byte address.
>>
>> Our fix is to change the operand constraint from "m" to "Q", i. e.
>> "operand is a MEM that does not have a mode-dependent address", which
>> results in:
>>
>> moval 0x50(r11)[r0],r1
>> ffs $0x0,$0x20,(r1),r9
>>
>> MOVAL evaluates the source operand/address in longword context, so
>> effectively converts the word address to a byte address for FFS.
>>
>> See NetBSD PR port-vax/51761 (http://gnats.netbsd.org/51761) and
>> discussion on port-vax mailing list
>> (http://mail-index.netbsd.org/port-vax/2017/01/06/msg002954.html).
>>
>> Changlog:
>>
>> 2017-06-20  Maya Rashish  
>>
>>  * gcc/config/vax/builtins.md: Correct ffssi2_internal
>>  instruction constraint.
Thanks.  Installed.

Ideally we'd like to have a testcase for this in the regression suite.

If you could provide the .i file and options used which generated the
incorrect ffs instruction I can use the reduction tools with a cross
compiler to produce a nice simple test for the testsuite.

jeff


[PATCH] [PR 81245] Fix tree-if-conv calling of update_stmt after fold_stmt

2017-06-29 Thread Andrew Pinski
Hi,
  As described in the bug, tree-if-conv is calling update_stmt on an
old stmt which might have been removed from the IR already
(transforming of an assignment to a call in this case).  This fixes
the problem by calling update_stmt on the new statement that fold_stmt
might have created.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski
ChangeLog:
* tree-if-conv.c (predicate_scalar_phi): Update new_stmt if fold_stmt
returned true.

testsuite/ChangeLog:
* gcc.dg/torture/pr81245.c: New testcase.
Index: testsuite/gcc.dg/torture/pr81245.c
===
--- testsuite/gcc.dg/torture/pr81245.c  (nonexistent)
+++ testsuite/gcc.dg/torture/pr81245.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-options "-ffast-math" } */
+/* { dg-do compile } */
+/* This test used to crash the vectorizer as the ifconvert pass
+   used to convert the if to copysign but called update_stmt on
+   the old statement after calling fold_stmt. */
+double sg[18];
+void f(void)
+{
+  for (int i = 0 ;i < 18;i++)
+  {
+if (sg[i] < 0.0)
+  sg[i] = -1.0;
+else
+  sg[i] = 1.0;
+  }
+}
Index: tree-if-conv.c
===
--- tree-if-conv.c  (revision 249769)
+++ tree-if-conv.c  (working copy)
@@ -1853,7 +1853,8 @@
   new_stmt = gimple_build_assign (res, rhs);
   gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
   gimple_stmt_iterator new_gsi = gsi_for_stmt (new_stmt);
-  fold_stmt (&new_gsi, ifcvt_follow_ssa_use_edges);
+  if (fold_stmt (&new_gsi, ifcvt_follow_ssa_use_edges))
+   new_stmt = gsi_stmt (new_gsi);
   update_stmt (new_stmt);
 
   if (dump_file && (dump_flags & TDF_DETAILS))


Re: [PATCH] warn on mem calls modifying objects of non-trivial types (PR 80560)

2017-06-29 Thread Martin Sebor

On 06/29/2017 10:15 AM, Jan Hubicka wrote:

Hello,

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 0f7e21a..443d16c 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -803,7 +803,10 @@ hash_table::empty_slow ()
   m_size_prime_index = nindex;
 }
   else
-memset (entries, 0, size * sizeof (value_type));
+{
+  for ( ; size; ++entries, --size)
+   *entries = value_type ();
+}
   m_n_deleted = 0;
   m_n_elements = 0;
 }


This change sends our periodic testers into an infinite loop.  It is fault of 
gcc 4.2 being used
as bootstrap compiler, but perhaps that can be worked around?


The warning in the original code could have been suppressed (by
casting the pointer to char*), but it was valid so I opted not
to.  I'd expect it to be possible to work around the bug but
I don't have easy access to GCC 4.2 to reproduce it or verify
the fix.

FWIW, after looking at the function again, I wondered if zeroing
out the elements (either way) was the right thing to do and if
they shouldn't be cleared by calling Descriptor::mark_empty()
instead, like in alloc_entries(), but making that change broke
a bunch of ipa/ipa-pta-*.c tests.  It's not really clear to me
what this code is supposed to do.

Martin

PS Does this help at all?

@@ -804,8 +804,8 @@ hash_table::empty_slow ()
 }
   else
 {
-  for ( ; size; ++entries, --size)
-   *entries = value_type ();
+  for (size_t i = 0; i != size; ++i)
+   entries[i] = value_type ();
 }
   m_n_deleted = 0;
   m_n_elements = 0;



Re: [PATCH] PR target/80556

2017-06-29 Thread Simon Wright
On 28 Jun 2017, at 18:40, Jeff Law  wrote:
> 
> On 06/09/2017 07:57 AM, Simon Wright wrote:
>>2017-06-09 Simon Wright 
>> 
>>PR target/80556
>>* configure.ac (stage1_ldflags): For Darwin, include -lSystem.
>>  (poststage1_ldflags): likewise.
>>* configure: regenerated.
> I'm a bit confused here.  Isn't -lSystem included in darwin's LIB_SPEC
> in which case the right things ought to already be happening, shouldn't it?

The specs that involve -lSystem are

*link_gcc_c_sequence:
%:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind)
%{!static:%{!static-libgcc:   %:version-compare(>= 10.6 
mmacosx-version-min= -lSystem) } }
%{fno-pic|fno-PIC|fno-pie|fno-PIE|fapple-kext|mkernel|static|mdynamic-no-pic:   
%:version-compare(>= 10.7 mmacosx-version-min= -no_pie) } %G %L

*lib:
%{!static:-lSystem}

but I also see

*libgcc:
%{static-libgcc|static: -lgcc_eh -lgcc; 

which might be the root of the problem?

Looking at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80556#c39, I report that

   $ gnatmake raiser -largs -static-libgcc -static-libstdc++

resulted in the link command

   /usr/bin/ld -dynamic -arch x86_64 -macosx_version_min 10.12.0
   -weak_reference_mismatches non-weak -o raiser -L./
   -L/opt/gcc-7.1.0/lib/gcc/x86_64-apple-darwin15/7.1.0/adalib/
   -L/opt/gcc-7.1.0/lib/gcc/x86_64-apple-darwin15/7.1.0
   -L/opt/gcc-7.1.0/lib/gcc/x86_64-apple-darwin15/7.1.0/../../.. b~raiser.o
   ./raiser.o -v
   /opt/gcc-7.1.0/lib/gcc/x86_64-apple-darwin15/7.1.0/adalib/libgnat.a
   -no_compact_unwind -lgcc_eh -lgcc -lSystem

i.e. -lSystem is *after* -lgcc, so that its exception handling won't be invoked.

I don't know what -lgcc_eh does, but my patch would be pretty much equivalent 
to changing the libgcc spec above to

*libgcc:
%{static-libgcc|static: -lSystem -lgcc_eh -lgcc; 

and if that would be OK it would obviously be much better.

I've rebuilt gcc-8-20170528 with this change alone (i.e. not the patch 
currently posted here), successfully.

If I propose this alternative patch, should it be a new post, or should I 
continue this thread?

C++ PATCH for c++/81180, ICE with C++17 member class template deduction

2017-06-29 Thread Jason Merrill
The substitution we do in build_deduction_guide needs
processing_template_decl set, since it's only partial instantiation,
and also needs to be done on the template; substituting the
FUNCTION_DECL pattern doesn't properly adjust the template parameters
on the template, which we will use in building the deduction guide.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit a1ecbf88833602a7d37fd44f831d4c1efd6ffaf8
Author: Jason Merrill 
Date:   Thu Jun 29 14:12:09 2017 -0400

PR c++/81180 - ICE with C++17 deduction of member class template.

* pt.c (build_deduction_guide): Correct member template handling.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 047d3ba..3ecacbd 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -25216,17 +25216,16 @@ build_deduction_guide (tree ctor, tree outer_args, 
tsubst_flags_t complain)
 }
   else
 {
+  ++processing_template_decl;
+
+  tree fn_tmpl
+   = (TREE_CODE (ctor) == TEMPLATE_DECL ? ctor
+  : DECL_TI_TEMPLATE (ctor));
   if (outer_args)
-   ctor = tsubst (ctor, outer_args, complain, ctor);
+   fn_tmpl = tsubst (fn_tmpl, outer_args, complain, ctor);
+  ctor = DECL_TEMPLATE_RESULT (fn_tmpl);
+
   type = DECL_CONTEXT (ctor);
-  tree fn_tmpl;
-  if (TREE_CODE (ctor) == TEMPLATE_DECL)
-   {
- fn_tmpl = ctor;
- ctor = DECL_TEMPLATE_RESULT (fn_tmpl);
-   }
-  else
-   fn_tmpl = DECL_TI_TEMPLATE (ctor);
 
   tparms = DECL_TEMPLATE_PARMS (fn_tmpl);
   /* If type is a member class template, DECL_TI_ARGS (ctor) will have
@@ -25248,7 +25247,6 @@ build_deduction_guide (tree ctor, tree outer_args, 
tsubst_flags_t complain)
  /* For a member template constructor, we need to flatten the two
 template parameter lists into one, and then adjust the function
 signature accordingly.  This gets...complicated.  */
- ++processing_template_decl;
  tree save_parms = current_template_parms;
 
  /* For a member template we should have two levels of parms/args, one
@@ -25309,8 +25307,8 @@ build_deduction_guide (tree ctor, tree outer_args, 
tsubst_flags_t complain)
ci = tsubst_constraint_info (ci, tsubst_args, complain, ctor);
 
  current_template_parms = save_parms;
- --processing_template_decl;
}
+  --processing_template_decl;
 }
 
   if (!memtmpl)
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction40.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction40.C
new file mode 100644
index 000..eeffa69
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction40.C
@@ -0,0 +1,19 @@
+// PR c++/81180
+// { dg-options -std=c++1z }
+
+template < int I > struct int_{};
+
+template < typename T >
+struct A{
+template < typename U, int I >
+struct B{
+B(U u, int_< I >){}
+};
+};
+
+
+int main(){
+A< int >::B v(0, int_< 0 >());
+(void)v;
+}
+


[PATCH] Add RDMA support to Falkor.

2017-06-29 Thread Jim Wilson
Falkor is an ARMV8-A part, but also includes the RDMA extension from ARMV8.1-A.
I'd like to enable support for the RDMA instructions when -mcpu=falkor is used,
and also make the RDMA intrisics available.  To do that, I need to add rdma
as an architecture extension, and modify a few things to use it.  Binutils
already supports rdma as an architecture extension.

I only did the aarch64 port, and not the arm port.  There are no supported
targets that have the RDMA instructions and also aarch32 support.  There are
also no aarch32 RDMA testcases.  So there is no way to test it.  It wasn't
clear whether it was better to add something untested or leave it out.  I chose
to leave it out for now.

I also needed a few testcase changes.  There were redundant options being
added for the RDMA tests that I had to remove as they are now wrong.  Also
the fact that I only did aarch64 means we need to check both armv8-a+rdma and
armv8.1-a for the rdma support.

This was tested with an aarch64 bootstrap and make check.  There were no
regressions.

OK?

Jim

gcc/
* config/aarch64/aarch64-cores.def (falkor): Add AARCH64_FL_RDMA.
(qdf24xx): Likewise.
* config/aarch64/aarch64-options-extensions.def (rdma); New.
* config/aarch64/aarch64.h (AARCH64_FL_RDMA): New.
(AARCH64_FL_V8_1): Renumber.
(AARCH64_FL_FOR_ARCH8_1): Add AARCH64_FL_RDMA.
(AARCH64_ISA_RDMA): Use AARCH64_FL_RDMA.
* config/aarch64/arm_neon.h: Use +rdma instead of arch=armv8.1-a.
* doc/invoke.texi (AArch64 Options): Mention +rmda in -march docs.  Add
rdma to feature modifiers list.

gcc/testsuite/
* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Delete
redundant -march option.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Try armv8-a+rdma
in addition to armv8.1-a.
---
 gcc/config/aarch64/aarch64-cores.def |  4 ++--
 gcc/config/aarch64/aarch64-option-extensions.def |  4 
 gcc/config/aarch64/aarch64.h |  8 +---
 gcc/config/aarch64/arm_neon.h|  2 +-
 gcc/doc/invoke.texi  |  5 -
 gcc/testsuite/lib/target-supports.exp| 18 ++
 6 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index f8342ca..b8d0ba6 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -65,8 +65,8 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
+AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 0xC00, 
-1)
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 0xC00, 
-1)
 
 /* Samsung ('S') cores. */
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001, -1)
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index c0752ce..c4f059a 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -63,4 +63,8 @@ AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 
0, "fphp asimdhp")
 /* Enabling or disabling "rcpc" only changes "rcpc".  */
 AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
 
+/* Enabling "rdma" also enables "fp", "simd".
+   Disabling "rdma" just disables "rdma".  */
+AARCH64_OPT_EXTENSION("rdma", AARCH64_FL_RDMA, AARCH64_FL_FP | 
AARCH64_FL_SIMD, 0, "rdma")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 106cf3a..7f91edb 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -144,7 +144,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3) /* Has CRC.  */
 /* ARMv8.1-A architecture extensions.  */
 #define AARCH64_FL_LSE   (1 << 4)  /* Has Large System Extensions.  */
-#define AARCH64_FL_V8_1  (1 << 5)  /* Has ARMv8.1-A extensions.  */
+#define AARCH64_FL_RDMA  (1 << 5)  /* Has Round Double Multiply 
Add.  */
+#define AARCH64_FL_V8_1  (1 << 6)  /* Has ARMv8.1-A extensions.  */
 /* ARMv8.2-A architecture extensions.  */
 #define AARCH64_FL_V8_2  (1 << 8)  /* Has ARMv

  1   2   >