Re: [PATCH, rs6000] Fix PR83332 (missing vcond patterns)

2017-12-12 Thread Richard Biener
On Mon, Dec 11, 2017 at 10:55 PM, Bill Schmidt
 wrote:
> Hi,
>
> A new test case introduced for PR81303 failed on powerpc64 (BE, LE).  This
> turns out to be due to a missing standard pattern (vcondv2div2df).  This
> and a couple of other patterns are easy to support with existing logic
> by just adding new patterns with appropriate modes.  That's all this patch
> does.  That's sufficient to cause the failing test to pass.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
> this okay for trunk?
>
> Thanks,
> Bill
>
>
> 2017-12-11  Bill Schmidt  
>
> PR target/83332
> * config/rs6000/vector.md (vcondv2dfv2di): New define_expand.
> (vcondv2div2df): Likewise.
> (vconduv2dfv2di): Likewise.
>
> Index: gcc/config/rs6000/vector.md
> ===
> --- gcc/config/rs6000/vector.md (revision 255539)
> +++ gcc/config/rs6000/vector.md (working copy)
> @@ -455,6 +455,44 @@
>  FAIL;
>  }")
>
> +(define_expand "vcondv2dfv2di"

given you already have vcondv4siv4sf this asks for macroization, no?  I see you
have vcond already, you can use

vcond

and guard with GET_MODE_NUNITS () == GET_MODE_NUNINTS ()?

> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
> +   (if_then_else:V2DF
> +(match_operator 3 "comparison_operator"
> +[(match_operand:V2DI 4 "vint_operand" "")
> + (match_operand:V2DI 5 "vint_operand" "")])
> +(match_operand:V2DF 1 "vfloat_operand" "")
> +(match_operand:V2DF 2 "vfloat_operand" "")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
> +  "
> +{
> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> +   operands[3], operands[4], operands[5]))
> +DONE;
> +  else
> +FAIL;
> +}")
> +
> +(define_expand "vcondv2div2df"
> +  [(set (match_operand:V2DI 0 "vint_operand" "")
> +   (if_then_else:V2DI
> +(match_operator 3 "comparison_operator"
> +[(match_operand:V2DF 4 "vfloat_operand" "")
> + (match_operand:V2DF 5 "vfloat_operand" "")])
> +(match_operand:V2DI 1 "vint_operand" "")
> +(match_operand:V2DI 2 "vint_operand" "")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
> +  "
> +{
> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> +   operands[3], operands[4], operands[5]))
> +DONE;
> +  else
> +FAIL;
> +}")
> +
>  (define_expand "vcondu"
>[(set (match_operand:VEC_I 0 "vint_operand")
> (if_then_else:VEC_I
> @@ -492,6 +530,25 @@
>  FAIL;
>  }")
>
> +(define_expand "vconduv2dfv2di"
> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
> +   (if_then_else:V2DF
> +(match_operator 3 "comparison_operator"
> +[(match_operand:V2DI 4 "vint_operand" "")
> + (match_operand:V2DI 5 "vint_operand" "")])
> +(match_operand:V2DF 1 "vfloat_operand" "")
> +(match_operand:V2DF 2 "vfloat_operand" "")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
> +  "
> +{
> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> +   operands[3], operands[4], operands[5]))
> +DONE;
> +  else
> +FAIL;
> +}")
> +
>  (define_expand "vector_eq"
>[(set (match_operand:VEC_C 0 "vlogical_operand" "")
> (eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand" "")
>


Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Richard Biener
On Mon, Dec 4, 2017 at 4:34 PM, Qing Zhao  wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538 
> 
> missing -Wformat-overflow with %s and non-member array arguments
>
> -Wformat-overflow uses the routine "get_range_strlen" to decide the
> maximum string length, however, currently "get_range_strlen" misses
> the handling of non-member arrays.
>
> Adding the handling of non-member array resolves the issue.
> Adding test case pr79538.c into gcc.dg.
>
> During gcc bootstrap, 2 source files (c-family/c-cppbuiltin.c,
> fortran/class.c) were detected new warnings by -Wformat-overflow
> due to the new handling of non-member array in "get_range_strlen".
> in order to avoid these new warnings and continue with bootstrap,
> updating these 2 files to avoid the warnings.
>
> in c-family/c-cppbuiltin.c, the warning is following:
>
> ../../latest_gcc_2/gcc/c-family/c-cppbuiltin.c:1627:15: note:
> ‘sprintf’ output 2 or more bytes (assuming 257) into a destination
> of size 256
>   sprintf (buf1, "%s=%s", macro, buf2);
>   ^~~~
> in the above, buf1 and buf2 are declared as:
> char buf1[256], buf2[256];
>
> i.e, buf1 and buf2 have same size. adjusting the size of buf1 and
> buf2 resolves the warning.
>
> fortran/class.c has the similar issue as above. Instead of adjusting
> size of the buffers, replacing sprintf with xasprintf.
>
> bootstraped and tested on both X86 and aarch64. no regression.
>
> Okay for trunk?
>
> thanks.
>
> Qing
>
> ===
>
> gcc/ChangeLog
>
> 2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>
>
>   PR middle_end/79538
>   * gimple-fold.c (get_range_strlen): Add the handling of non-member 
> array.
>
> gcc/fortran/ChangeLog
>
> 2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>
>
>PR middle_end/79538
>* class.c (gfc_build_class_symbol): Replace call to
>sprintf with xasprintf to avoid format-overflow warning.
>(generate_finalization_wrapper): Likewise.
>(gfc_find_derived_vtab): Likewise.
>(find_intrinsic_vtab): Likewise.
>
>
> gcc/c-family/ChangeLog
>
> 2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>
>
>   PR middle_end/79538
> * c-cppbuiltin.c (builtin_define_with_hex_fp_value):
>   Adjust the size of buf1 and buf2, add a new buf to avoid
>   format-overflow warning.
>
> gcc/testsuite/ChangeLog
>
> 2017-11-30  Qing Zhao  mailto:qing.z...@oracle.com>>
>
>   PR middle_end/79538
>   * gcc.dg/pr79538.c: New test.
>
> ---
> gcc/c-family/c-cppbuiltin.c| 10 -
> gcc/fortran/class.c| 49 --
> gcc/gimple-fold.c  | 13 +++
> gcc/testsuite/gcc.dg/pr79538.c | 23 
> 4 files changed, 69 insertions(+), 26 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/pr79538.c
>
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 2ac9616..9e33aed 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1613,7 +1613,7 @@ builtin_define_with_hex_fp_value (const char *macro,
>   const char *fp_cast)
> {
>   REAL_VALUE_TYPE real;
> -  char dec_str[64], buf1[256], buf2[256];
> +  char dec_str[64], buf[256], buf1[128], buf2[64];
>
>   /* This is very expensive, so if possible expand them lazily.  */
>   if (lazy_hex_fp_value_count < 12
> @@ -1656,11 +1656,11 @@ builtin_define_with_hex_fp_value (const char *macro,
>
>   /* Assemble the macro in the following fashion
>  macro = fp_cast [dec_str fp_suffix] */
> -  sprintf (buf1, "%s%s", dec_str, fp_suffix);
> -  sprintf (buf2, fp_cast, buf1);
> -  sprintf (buf1, "%s=%s", macro, buf2);
> +  sprintf (buf2, "%s%s", dec_str, fp_suffix);
> +  sprintf (buf1, fp_cast, buf2);
> +  sprintf (buf, "%s=%s", macro, buf1);
>
> -  cpp_define (parse_in, buf1);
> +  cpp_define (parse_in, buf);
> }

This change looks ok.

> /* Return a string constant for the suffix for a value of type TYPE
> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index ebbd41b..a08fb8d 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -602,7 +602,8 @@ bool
> gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
> gfc_array_spec **as)
> {
> -  char name[GFC_MAX_SYMBOL_LEN+1], tname[GFC_MAX_SYMBOL_LEN+1];
> +  char tname[GFC_MAX_SYMBOL_LEN+1];
> +  char *name;
>   gfc_symbol *fclass;
>   gfc_symbol *vtab;
>   gfc_component *c;
> @@ -633,17 +634,17 @@ gfc_build_class_symbol (gfc_typespec *ts, 
> symbol_attribute *attr,
>   rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank;
>   get_unique_hashed_string (tname, ts->u.derived);
>   if ((*as) && attr->allocatable)
> -sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank);
> +name = xasprintf ("__class_%s_%d_%da", tname, rank, (*as)->corank);

Re: [PATCH] avoid alignment error for attribute warn_if_not_aligned

2017-12-12 Thread Richard Biener
On Tue, Dec 12, 2017 at 1:57 AM, Martin Sebor  wrote:
> My enhancement to improve the detection of attribute collisions
> introduced a regression of sorts in the g++.dg/pr53037-4.C test
> on ia64.
>
> https://gcc.gnu.org/ml/gcc-testresults/2017-12/msg00672.html
>
> The regression hasn't been noticed anywhere else only because
> all the other targets apparently have no low limit on function
> alignment, whereas ia64 has a limit of 32.  As a result of
> the limit the i64 compiler would issue not one but two errors
> for a function declaration with attribute warn_if_not_aligned
> that specifies a value less than 32:
> one for the attribute itself (because it's not allowed on
> functions),
> and another for the lower value.
>
> The attached patch fixes this by issuing the error only for
> attribute aligned with a lower requirement but not attribute
> warn_if_not_aligned.
>
> I tried to come up with a test for this that would fail on
> other targets besides ia64 (those with no low bound on
> function alignment) but couldn't find a way to do it.  If
> someone knows of a way I'd be glad to add a better test
> than pr53037-4.C.

Ok.

Richard.

> Martin


Re: [PR81165] discount killed stmts when sizing blocks for threading

2017-12-12 Thread Richard Biener
On Tue, Dec 12, 2017 at 6:17 AM, Alexandre Oliva  wrote:
> On Dec 11, 2017, Jeff Law  wrote:
>
>>> +  gcc_assert (path->length () == 0 || path->last ()->e == e);
>>> +  if (path->length () == 0)
>>> +return estimate_threading_killed_stmts (e->dest);
>>> +
>>> +  int total = 0;
>>> +  int i = 0;
>>> +  jump_thread_edge *je;
>>> +  FOR_EACH_VEC_ELT (*path, i, je)
>>> +switch (je->type)
>>> +  {
>>> +  case EDGE_START_JUMP_THREAD:
>>> +  case EDGE_COPY_SRC_BLOCK:
>>> +  case EDGE_COPY_SRC_JOINER_BLOCK:
>>> +total += estimate_threading_killed_stmts (je->e->dest);
>>> +break;
>>> +
>>> +  default:
>>> +gcc_unreachable ();
>>> +
>>> +  case EDGE_FSM_THREAD:
>>> +  case EDGE_NO_COPY_SRC_BLOCK:
>>> +break;
>>> +  }
>>> +
>>> +  return total;
>>> +}
>
>
>> So I'd mark je->e->src rather than je->e->dest.
>
> Ah, but then we have to mark e->dest regardless of path.  Which does
> indeed make it all look nicer.
>
>> And you only need this
>> handling for EDGE_COPY_SRC_BLOCK.  So I don't think you need a switch,
>> just a simple je->type == EDGE_COPY_SRC_BLOCK.
>
>> While we do make a copy for EDGE_COPY_SRC_JOINER_BLOCK, the conditional
>> at the end of that block is not dead, so we can't really do anything
>> special with it.
>
> I see, thanks.
>
> I've updated it according to richi's and your feedbacks.  Regstrapped on
> {x86_64,i686}-linux-gnu.  Ok to install?

Thanks for the numbers on size, OK for the parts I reviewed, leaving the
final ack to Jeff.

Richard.

>
> We limit the amount of copying for jump threading based on counting
> stmts.  This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed.  Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
>
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
>
> for  gcc/ChangeLog
>
> PR tree-optimization/81165
> * tree-ssa-threadedge.c (uses_in_bb): New.
> (estimate_threading_killed_stmts): New.
> (estimate_threading_killed_stmts): New overload.
> (record_temporary_equivalences_from_stmts_at_dest): Add path
> parameter; adjust caller.  Expand limit when it's hit.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/81165
> * gcc.dg/pr81165.c: New.
> ---
>  gcc/testsuite/gcc.dg/pr81165.c |   59 +
>  gcc/tree-ssa-threadedge.c  |  176 
> +++-
>  2 files changed, 232 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr81165.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr81165.c b/gcc/testsuite/gcc.dg/pr81165.c
> new file mode 100644
> index ..8508d893bed6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr81165.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not " \[/%\] " "optimized" } } */
> +
> +/* Testcase submitted for PR81165, with its main function removed as
> +   it's turned into a compile test.  We want to make sure that all of
> +   the divide/remainder computations are removed by tree optimizers.
> +
> +   We can figure out that we don't need to compute at runtime even the
> +   condition to enter the loop: the initial i==0 would have to be
> +   greater than the sum of two small unsigned values: 1U>>t1 is in the
> +   range 0..1, whereas the char value is bounded by the range 0..127,
> +   being 128 % a positive number (zero would invoke undefined
> +   behavior, so we can assume it doesn't happen).  (We know it's
> +   nonnegative because it's 10 times a number that has no more than
> +   the bits for 16, 8 and 1 set.)
> +
> +   We don't realize that the loop is useless right away: jump
> +   threading helps remove some of the complexity, particularly of the
> +   computation within the loop: t1 is compared with 1, but it can
> +   never be 1.  (We could assume as much, since its being 1 would
> +   divide by zero, but we don't.)
> +
> +   If we don't enter the conditional block, t1 remains at 2; if we do,
> +   it's set to either -1.  If we jump thread at the end of the
> +   conditional block, we can figure out the ranges exclude 1 and the
> +   jump body is completely optimized out.  However, we used to fail to
> +   consider the block for jump threading due to the amount of
> +   computation in it, without realizing most of it would die in
> +   consequence of the threading.
> +
> +   We now take the dying code into account when deciding whether or
> +   not to try jump threading.  That might enable us to optimize the
> +   function into { if (x2 != 0 || (x1 & 1) == 0) abort (); 

[PATCH][i386] Fix PR83358 - increase divide/mod latencies a bit

2017-12-12 Thread Markus Trippelsdorf
As the testcase shows, trunk currently generates horrible code for
divisions used in tight loops. This happens because the algorithm
expanding div/mod doesn't take parallelism into account and this makes
the cost model unrealistic.
Fix the issue by increasing the estimated latencies a bit.

Tested on X86_64.
OK for trunk?
(I will keep an eye on the periodic SPEC testers.)

PR target/83358
* x86-tune-costs.h (skylake_cost, core_cost): Increase div/mod
latencies a bit.

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 312467d9788f..648219338308 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1541,9 +1541,11 @@ struct processor_costs skylake_cost = {
COSTS_N_INSNS (4),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
-  {COSTS_N_INSNS (8),  /* cost of a divide/mod for QI */
-   COSTS_N_INSNS (8),  /*  HI */
-   COSTS_N_INSNS (11), /*  SI */
+  /* Expanding div/mod currently doesn't consider parallelism. So the cost
+ model is not realistic. We compensate by increasing the latencies a bit.  
*/
+  {COSTS_N_INSNS (11), /* cost of a divide/mod for QI */
+   COSTS_N_INSNS (11), /*  HI */
+   COSTS_N_INSNS (14), /*  SI */
COSTS_N_INSNS (76), /*  DI */
COSTS_N_INSNS (76)},/*  
other */
   COSTS_N_INSNS (1),   /* cost of movsx */
@@ -2342,11 +2344,11 @@ struct processor_costs core_cost = {
COSTS_N_INSNS (4),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
-  {COSTS_N_INSNS (8),  /* cost of a divide/mod for QI */
-   COSTS_N_INSNS (8),  /*  HI */
-   /* 8-11 */
-   COSTS_N_INSNS (11), /*  SI */
-   /* 24-81 */
+  /* Expanding div/mod currently doesn't consider parallelism. So the cost
+ model is not realistic. We compensate by increasing the latencies a bit.  
*/
+  {COSTS_N_INSNS (11), /* cost of a divide/mod for QI */
+   COSTS_N_INSNS (11), /*  HI */
+   COSTS_N_INSNS (14), /*  SI */
COSTS_N_INSNS (81), /*  DI */
COSTS_N_INSNS (81)},/*  
other */
   COSTS_N_INSNS (1),   /* cost of movsx */
diff --git a/gcc/testsuite/gcc.target/i386/pr83358-1.c 
b/gcc/testsuite/gcc.target/i386/pr83358-1.c
new file mode 100644
index ..96427b2f56dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr83358-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=core2" } */
+
+#include 
+
+void bin2ascii(uint64_t val, char *dst) {
+  const int64_t POW10_10 = ((int64_t)10) * 1000 * 1000 * 1000;
+  int64_t hix = val / POW10_10;
+  int64_t lox = val % POW10_10;
+  int32_t v0 = hix / 10;
+  int32_t v1 = hix % 10;
+  int32_t v2 = lox / 10;
+  int32_t v3 = lox % 10;
+  for (int i = 4; i != 0; --i) {
+dst[i + 0 * 5] = v0 % 10 + '0';
+v0 /= 10;
+dst[i + 1 * 5] = v1 % 10 + '0';
+v1 /= 10;
+dst[i + 2 * 5] = v2 % 10 + '0';
+v2 /= 10;
+dst[i + 3 * 5] = v3 % 10 + '0';
+v3 /= 10;
+  }
+  dst[0 * 5] = v0 + '0';
+  dst[1 * 5] = v1 + '0';
+  dst[2 * 5] = v2 + '0';
+  dst[3 * 5] = v3 + '0';
+  dst[4 * 5] = 0;
+}
+
+/* { dg-final { scan-assembler-not "idiv" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr83358-2.c 
b/gcc/testsuite/gcc.target/i386/pr83358-2.c
new file mode 100644
index ..f6039bf72feb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr83358-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=skylake-avx512" } */
+
+#include 
+
+void bin2ascii(uint64_t val, char *dst) {
+  const int64_t POW10_10 = ((int64_t)10) * 1000 * 1000 * 1000;
+  int64_t hix = val / POW10_10;
+  int64_t lox = val % POW10_10;
+  int32_t v0 = hix / 10;
+  int32_t v1 = hix % 10;
+  int32_t v2 = lox / 10;
+  int32_t v3 = lox % 10;
+  for (int i = 4; i != 0; --i) {
+dst[i + 0 * 5] = v0 % 10 + '0';
+v0 /= 10;
+dst[i + 1 * 5] = v1 % 10 + '0';
+v1 /= 10;
+dst[i + 2 * 5] = v2 % 10 + '0';
+v2 /= 10;
+dst[i + 3 * 5] = v3 % 10 + '0';
+v3 /= 10;
+  }
+  dst[0 * 5] = v0 + '0';
+  dst[1 * 5] = v1 + '0';
+  dst[2 * 5] = v2 + '0';
+  dst[3 * 5] = v3 + '0';
+  dst[4 * 5] = 0;
+}
+
+

Re: [PATCH][RFA][P1 PR tree-optimization/83298] Avoid over-optimistic result range for COND_EXPR

2017-12-12 Thread Richard Biener
On Mon, Dec 11, 2017 at 8:34 PM, Jeff Law  wrote:
> On 12/08/2017 04:17 AM, Richard Biener wrote:
>>
>> I'm not convinced that when you look forward past the dominance frontier
>> and do VRP analysis on stmts without analyzing all intermediate
>> stmts on the path (or at least push all defs on that path temporarily
>> to VR_VARYING) is fixed by this patch.  It merely looks like a wrong
>> workaround for a fundamental issue in how DOM now uses the
>> interface?
> So here's the bits to record ranges (with unwind entries so we can roll
> them back) as we process beyond the dominance frontier.  This was always
> in the plan, but I wanted to do some cleanups prior to adding this
> capability.
>
> I've stayed away from doing any of the cleanups at this time.
>
> At a high level we break out routines to push a marker and pop to a
> marker on the unwinding stack.  We then define the enter/leave in terms
> of those new routines.  This allows us to push/pop scopes as we process
> thread paths which don't want the same processing we see in the enter
> method.
>
> We add a boolean to record_ranges_from_stmt to indicate if any generated
> range is of a temporary nature   The pre-existing calls all pass false
> here to indicate the range is global.  WHen true (from threading) we
> generate the necessary unwind entries and avoid changing any global state.
>
> push_value_range becomes a public member function so it can be used when
> threading to record a temporary range created by a PHI.  It's not
> usually necessary, but could be for cases where we're unable to
> propagate the PHI equivalences.
>
> --
>
>
> We pass in the evrp_range_analyzer instance from DOM into the threader.
> From VRP we just pass in a NULL pointer as VRP doesn't use the EVRP
> analyzer and we have to check it in various places.  This is one of the
> many cleanups that will occur as we drop threading from tree-vrp.c.
>
> We pass that down to a couple functions.  Nothing significant.
>
> Then it's just a matter of recording something for PHIs and statements
> we encounter -- ensuring that in both cases we create unwind entries.
> That obviously fixes 83298 and it's duplicate (testcases for both are
> included).  It probably enables more jump threading as well, but I
> didn't specifically look for cases where that happened.
>
> Bootstrapped and regression tested on x86.
>
> OK for the trunk?

LGTM.  I notice (existing code) that we use set_vr_value from push_value_range
vs. update_value_range from the regular code.  Also for some unrelated missed
optimization I'd like to be able to set and unwind SSA range info via the stack
as well.  Both future things we might want to cleanup.  I guess once the
SSA propagator VRP can go there's an opportunity to cleanup even more of the
code.

Ok.

Thanks,
Richard.

>
> Jeff
>
>
>
> * gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Make
> push_value_range a public interface.  Add new argument to
> record_ranges_from_stmt.
> * gimple-ssa-evrp-analyze.c
> (evrp_range_analyzer::record_ranges_from_stmt): Add new argument.
> Update comments.  Handle recording temporary equivalences.
> * tree-ssa-dom.c (dom_opt_opt_walker::before_dom_children): Add
> new argument to call to evrp_range_analyzer::record_ranges_from_stmt.
> * gimple-ssa-evrp.c (evrp_dom_walker::before_dom_children): Likewise.
> * tree-ssa-threadedge.c: Include alloc-pool.h, vr-values.h and
> gimple-ssa-evrp-analyze.h.
> (record_temporary_equivalences_from_phis): Add new argument.  When
> the PHI arg is an SSA_NAME, set the result's range to the range
> of the PHI arg.
> (record_temporary_equivalences_from_stmts_at_dest): Record ranges
> from statements too.
> (thread_through_normal_block): Accept new argument, 
> evrp_range_analyzer.
> Pass it down to children as needed.
> (thread_outgoing_edges): Likewise.
> (thread_across_edge): Likewise.   Push/pop range state as needed.
> * tree-ssa-threadedge.h (thread_outgoing_edges): Update prototype.
>
> * gcc.c-torture/execute/pr83298.c: New test.
> * gcc.c-torture/execute/pr83362.c New test.
>
> diff --git a/gcc/gimple-ssa-evrp-analyze.h b/gcc/gimple-ssa-evrp-analyze.h
> index 4783e6f772e..3968cfd805a 100644
> --- a/gcc/gimple-ssa-evrp-analyze.h
> +++ b/gcc/gimple-ssa-evrp-analyze.h
> @@ -31,13 +31,18 @@ class evrp_range_analyzer
>}
>
>void enter (basic_block);
> +  void push_marker (void);
> +  void pop_to_marker (void);
>void leave (basic_block);
> -  void record_ranges_from_stmt (gimple *);
> +  void record_ranges_from_stmt (gimple *, bool);
>
>/* Main interface to retrieve range information.  */
>value_range *get_value_range (const_tree op)
>  { return vr_values->get_value_range (op); }
>
> +  /* Record a new unwindable range.  */
> +  void push_value_range (tree var, value_range *vr);
> +
>/* Du

[PATCH,AIX] Nil check size threshold control for AIX

2017-12-12 Thread REIX, Tony
Description:
 * This patch tells the Go frontend to insert explicit guards (check for nil -> 
panic) for AIX since relying on a fault does not work on AIX for page 0.

Tests:
 * AIX: Build: SUCCESS
   - build made by means of gmake.

ChangeLog:
  * go-lang.c (go_langhook_init): Handle AIX case for nil_check_size_threshold.


Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader

Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.netIndex: gcc/go/go-lang.c
===
--- ./gcc/go/go-lang.c	(revision 255466)
+++ ./gcc/go/go-lang.c	(working copy)
@@ -112,7 +112,11 @@ go_langhook_init (void)
   args.check_divide_overflow = go_check_divide_overflow;
   args.compiling_runtime = go_compiling_runtime;
   args.debug_escape_level = go_debug_escape_level;
+#ifdef TARGET_AIX
+  args.nil_check_size_threshold = -1;
+#else
   args.nil_check_size_threshold = 4096;
+#endif
   args.linemap = go_get_linemap();
   args.backend = go_get_backend();
   go_create_gogo (&args);


RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]

2017-12-12 Thread Koval, Julia
Looks good. How to put it there(sorry, noob question)?

Thanks,
Julia

> -Original Message-
> From: Gerald Pfeifer [mailto:ger...@pfeifer.com]
> Sent: Saturday, December 09, 2017 2:49 PM
> To: Koval, Julia 
> Cc: Kirill Yukhin ; GCC Patches  patc...@gcc.gnu.org>
> Subject: RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]
> 
> Hi Julia,
> 
> On Mon, 4 Dec 2017, Koval, Julia wrote:
> > Do you think it is ok to copypaste it from GCC-6?
> 
> you mean copy, past, and adjust?  Yes, that should work.
> 
> > GCC now supports the Intel CPU, named Cannonlake through
> > -march=cannonlake. The switch enables the following ISA extensions:
> > AVX512VBMI, AVX512IFMA, SHA.
> > GCC now supports the Intel CPU, named and Icelake through
> > -march=icelake. The switch enables the following ISA extensions:
> > AVX512VNNI, GFNI, VAES, AVX512VBMI2, VPCLMULQDQ, AVX512BITALG,
> RDPID,
> > AVX512VPOPCNTDQ.
> 
> No comma before "named".
> 
> -march=
> 
> And perhaps "enables the AVX..., AVX...,and... ISA extensions"?
> 
> Gerald


RE: [x86][patch] Fix clwb for skylake

2017-12-12 Thread Koval, Julia
Sorry,

gcc/
* config/i386/i386.c (PTA_SKYLAKE_AVX512): Add PTA_CLWB.
(PTA_CANNONLAKE): Remove PTA_CLWB.

Thanks,
Julia

> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Monday, December 11, 2017 9:47 AM
> To: Koval, Julia 
> Cc: GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [x86][patch] Fix clwb for skylake
> 
> On Mon, Dec 11, 2017 at 9:34 AM, Koval, Julia  wrote:
> > Hi Uros, Kirill,
> > According to isa-extensions doc CLWB appeared first in Skylake-avx512, but 
> > it
> isn't in the PTA. This patch fixes it. Ok for trunk?
> 
> Please also include ChangeLog entry in your patch submission.
> 
> Uros.


[PATCH][PR ada/65696] ASAN reports global-buffer-overrun for local tagged types

2017-12-12 Thread Yuta Tomino
Hello.

This PR is fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65696 .
The cause is that the implicit generated code for making non-static DT is
copying the number Max_Predef_Prims of the predefined primitives from the
parent type, but the real number is smaller than.
AddressSanitizer can find it as an error. That hasn’t changed since gcc 4.8.

I have tried this patch for a long while with from gcc 4.8 to 7, and
succeeded to compile the trunk.

This attached patch is same as “Comment 2” except adjusting the indent.

Is it OK?
Thank you.

gcc/ada/Changelog:

PR ada/65696
* exp_atag.ads, exp_atag.adb (Build_Inherit_Predefined_Prims):
  Add new formal parameter Num_Predef_Prims for copying predefined 
primitives.
* exp_disp.adb (Make_DT):
  Use the counted number of predefined primitives as Cp_Predef_Prims
  instead of Max_Predef_Prims for making non-static inherited DT.



patch-asan-1.diff
Description: Binary data


Re: [SFN+LVU+IEPM v4 6/9] [SFN] Introduce -gstatement-frontiers option, enable debug markers

2017-12-12 Thread Christophe Lyon
Hi,


On 12 December 2017 at 03:42, Alexandre Oliva  wrote:
> On Dec  7, 2017, Jeff Law  wrote:
>
>> On 11/09/2017 07:34 PM, Alexandre Oliva wrote:
>>> Introduce a command line option to enable statement frontiers, enabled
>>> by default in optimized builds with DWARF2+ debug information.
>> OK once all prereqs are ack'd.
>
> Thanks, here's what's installed, FTR:
>
> From aa2fd8850c18861c8e3811b9174b0b1667111783 Mon Sep 17 00:00:00 2001
> From: aoliva 
> Date: Tue, 12 Dec 2017 02:16:31 +
> Subject: [PATCH 6/7] [SFN] Introduce -gstatement-frontiers option, enable
>  debug markers
>
> Introduce a command line option to enable statement frontiers, enabled
> by default in optimized builds with DWARF2+ debug information.
>
> This patch depends on an earlier patch that completed the
> infrastructure for debug markers, and on another patch that turns -g
> into a negatable option prefix.
>
> for  gcc/ChangeLog
>
> * common.opt (gstatement-frontiers): New, setting
> debug_nonbind_markers_p.
> * rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): Activate.
> * toplev.c (process_options): Autodetect value for debug statement
> frontiers option.
> * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Activate.
> * doc/invoke.texi (gstatement-frontiers, gno-statement-frontiers): 
> New.
>
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@255569 
> 138bc75d-0d04-0410-961f-82ee72b054a4


Since this was checked in, I've noticed GCC/libatomic build failures
on ARM targets when configuring
--with-mode thumb (--with-mode arm is OK).

I'm seeing this:
/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-linux-gnueabi/gcc3/./gcc/xgcc
-B/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-
linux-gnueabi/gcc3/./gcc/
-B/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/tools/arm-none-linux-gnueabi/bin/
-B/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/tools/arm-non
e-linux-gnueabi/lib/ -isystem
/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/tools/arm-none-linux-gnueabi/include
-isystem /tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/t
ools/arm-none-linux-gnueabi/sys-include -DHAVE_CONFIG_H
-I/tmp/923972_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libatomic/config/arm
-I/tmp/923972_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libato
mic/config/linux/arm
-I/tmp/923972_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libatomic/config/posix
-I/tmp/923972_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libatomic
-I. -Wall -Werror -pthread -g
 -O2 -MT gload.lo -MD -MP -MF .deps/gload.Tpo -c
/tmp/923972_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libatomic/gload.c
 -fPIC -DPIC -o .libs/gload.o
/tmp/923972_2.tmpdir/ccxBYdCX.s: Assembler messages:
/tmp/923972_2.tmpdir/ccxBYdCX.s: Error: unaligned opcodes detected in
executable segment
make[4]: *** [gload.lo] Error 1
make[4]: Leaving directory
`/tmp/923972_2.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libatomic'

Christophe

> ---
>  gcc/ChangeLog   |  8 
>  gcc/common.opt  |  4 
>  gcc/doc/invoke.texi | 12 
>  gcc/rtl.h   |  2 +-
>  gcc/toplev.c|  4 
>  gcc/tree.h  |  2 +-
>  6 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 01bd2b9f49ad..a53e7b8173f5 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,13 @@
>  2017-12-12  Alexandre Oliva 
>
> +   * common.opt (gstatement-frontiers): New, setting
> +   debug_nonbind_markers_p.
> +   * rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): Activate.
> +   * toplev.c (process_options): Autodetect value for debug statement
> +   frontiers option.
> +   * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Activate.
> +   * doc/invoke.texi (gstatement-frontiers, gno-statement-frontiers): 
> New.
> +
> * cfgexpand.c (expand_gimple_basic_block): Handle begin stmt
> markers.  Integrate source bind into debug stmt expand loop.
> (pass_expand::execute): Check debug marker limit.  Avoid deep
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 57b3cd7304ab..d80ae5b7f39b 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2936,6 +2936,10 @@ gstabs+
>  Common Driver JoinedOrMissing Negative(gvms)
>  Generate debug information in extended STABS format.
>
> +gstatement-frontiers
> +Common Driver Var(debug_nonbind_markers_p) Init(2)
> +Emit progressive recommended breakpoint locations.
> +
>  gstrict-dwarf
>  Common Driver Report Var(dwarf_strict) Init(0)
>  Don't emit DWARF additions beyond selected version.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 189b3e438fff..6402a5ae1734 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -346,6 +346,7 @@ Objective-C and Objective-C++ Dialects}.
>  -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
>  -gstabs  -gstabs+  -gstrict-dwarf  -gno-stric

Re: [PATCH] Handle LOOP_DIST_ALIAS ifns in move_sese_region_to_fn (PR tree-optimization/83359)

2017-12-12 Thread Richard Biener
On Mon, 11 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> Unlike LOOP_VECTORIZED ifns, LOOP_DIST_ALIAS is added by the ldist pass
> and needs to be maintained until the vectorizer, and parloops in between
> that.  Earlier I've added code to update or drop orig_loop_num during
> move_sese_region_to_fn, but that is not sufficient.  If we move
> the whole pair of loops with the associated LOOP_DIST_ALIAS call into
> the outlined loopfn, we need to update the first argument, as orig_loop_num
> is likely changing.  If the whole triplet (two loops with orig_loop_num
> and LOOP_DIST_ALIAS with the same first argument) stays in parent function,
> we don't need to adjust it.  In all other cases, this patch folds the
> LOOP_DIST_ALIAS ifn to the second argument, like the vectorizer does if
> it fails to vectorize it.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> bootstrapped on powerpc64-linux, regtest there pending.  Ok for trunk?

Ok.

Thanks,
Richard.

> 2017-12-11  Jakub Jelinek  
> 
>   PR tree-optimization/83359
>   * tree-cfg.h (fold_loop_internal_call): Declare.
>   * tree-vectorizer.c (fold_loop_internal_call): Moved to ...
>   * tree-cfg.c (fold_loop_internal_call): ... here.  No longer static.
>   (find_loop_dist_alias): New function.
>   (move_sese_region_to_fn): If any dloop->orig_loop_num value is
>   updated, also adjust any corresponding LOOP_DIST_ALIAS internal
>   calls.
> 
>   * gcc.dg/graphite/pr83359.c: New test.
> 
> --- gcc/tree-cfg.h.jj 2017-09-05 23:28:14.0 +0200
> +++ gcc/tree-cfg.h2017-12-11 12:35:24.284777550 +0100
> @@ -77,6 +77,7 @@ extern void gather_blocks_in_sese_region
> vec *bbs_p);
>  extern void verify_sese (basic_block, basic_block, vec *);
>  extern bool gather_ssa_name_hash_map_from (tree const &, tree const &, void 
> *);
> +extern void fold_loop_internal_call (gimple *, tree);
>  extern basic_block move_sese_region_to_fn (struct function *, basic_block,
>  basic_block, tree);
>  extern void dump_function_to_file (tree, FILE *, dump_flags_t);
> --- gcc/tree-vectorizer.c.jj  2017-09-01 09:26:37.0 +0200
> +++ gcc/tree-vectorizer.c 2017-12-11 12:33:41.436055580 +0100
> @@ -464,27 +464,6 @@ vect_loop_vectorized_call (struct loop *
>return NULL;
>  }
>  
> -/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS
> -   to VALUE and update any immediate uses of it's LHS.  */
> -
> -static void
> -fold_loop_internal_call (gimple *g, tree value)
> -{
> -  tree lhs = gimple_call_lhs (g);
> -  use_operand_p use_p;
> -  imm_use_iterator iter;
> -  gimple *use_stmt;
> -  gimple_stmt_iterator gsi = gsi_for_stmt (g);
> -
> -  update_call_from_tree (&gsi, value);
> -  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
> -{
> -  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> - SET_USE (use_p, value);
> -  update_stmt (use_stmt);
> -}
> -}
> -
>  /* If LOOP has been versioned during loop distribution, return the gurading
> internal call.  */
>  
> --- gcc/tree-cfg.c.jj 2017-12-07 18:05:30.0 +0100
> +++ gcc/tree-cfg.c2017-12-11 12:34:55.054140750 +0100
> @@ -7337,6 +7337,47 @@ gather_ssa_name_hash_map_from (tree cons
>return true;
>  }
>  
> +/* Return LOOP_DIST_ALIAS call if present in BB.  */
> +
> +static gimple *
> +find_loop_dist_alias (basic_block bb)
> +{
> +  gimple *g = last_stmt (bb);
> +  if (g == NULL || gimple_code (g) != GIMPLE_COND)
> +return NULL;
> +
> +  gimple_stmt_iterator gsi = gsi_for_stmt (g);
> +  gsi_prev (&gsi);
> +  if (gsi_end_p (gsi))
> +return NULL;
> +
> +  g = gsi_stmt (gsi);
> +  if (gimple_call_internal_p (g, IFN_LOOP_DIST_ALIAS))
> +return g;
> +  return NULL;
> +}
> +
> +/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS
> +   to VALUE and update any immediate uses of it's LHS.  */
> +
> +void
> +fold_loop_internal_call (gimple *g, tree value)
> +{
> +  tree lhs = gimple_call_lhs (g);
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *use_stmt;
> +  gimple_stmt_iterator gsi = gsi_for_stmt (g);
> +
> +  update_call_from_tree (&gsi, value);
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
> +{
> +  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> + SET_USE (use_p, value);
> +  update_stmt (use_stmt);
> +}
> +}
> +
>  /* Move a single-entry, single-exit region delimited by ENTRY_BB and
> EXIT_BB to function DEST_CFUN.  The whole region is replaced by a
> single basic block in the original CFG and the new basic block is
> @@ -7510,7 +7551,6 @@ move_sese_region_to_fn (struct function
> }
>  }
>  
> -
>/* Adjust the number of blocks in the tree root of the outlined part.  */
>get_loop (dest_cfun, 0)->num_nodes = bbs.length () + 2;
>  
> @@ -7521,19 +7561,77 @@ move_sese_region_to_fn (struct function
>/* Fix up orig_loop_num.  If the block referenced in it has

[committed] Add testcases for 2 fixed PRs (PR rtl-optimization/8336[34])

2017-12-12 Thread Jakub Jelinek
Hi!

These 2 PRs were fixed with r255506, I've tested these testcases
on x86_64-linux {-m32,-m64} with current trunk as well as trunk before that
change (where it FAILed) and committed to trunk as obvious.

2017-12-12  Jakub Jelinek  

PR rtl-optimization/83363
PR rtl-optimization/83364
* gcc.dg/pr83363.c: New test.
* gcc.dg/pr83364.c: New test.

--- gcc/testsuite/gcc.dg/pr83363.c.jj   2017-12-12 10:46:08.639485075 +0100
+++ gcc/testsuite/gcc.dg/pr83363.c  2017-12-12 10:43:03.253793460 +0100
@@ -0,0 +1,26 @@
+/* PR rtl-optimization/83363 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-forward-propagate" } */
+
+unsigned char a;
+unsigned int b;
+
+static unsigned short __attribute__ ((noinline, noclone))
+foo (unsigned short x)
+{
+  x -= b;
+  x <<= x == 0;
+  a = ~0;
+  a >>= (unsigned char) x == 0;
+  x *= a -= ~a;
+  return x;
+}
+
+int
+main ()
+{
+  asm volatile ("" : : : "memory");
+  if (foo (3) != (unsigned short) (3 * (unsigned char) ~0))
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pr83364.c.jj   2017-12-12 10:46:11.422450423 +0100
+++ gcc/testsuite/gcc.dg/pr83364.c  2017-12-12 10:46:02.486561690 +0100
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/83364 */
+/* { dg-do run } */
+/* { dg-options "-O -fno-forward-propagate -fno-tree-coalesce-vars 
-fno-tree-ter" } */
+
+int a;
+
+static int __attribute__ ((noinline, noclone))
+foo (unsigned char c)
+{
+  c <<= (long long) c != a;
+  c = c << 7 | c >> 1;
+  return c;
+}
+
+int
+main ()
+{
+  asm volatile ("" : : : "memory");
+  if (foo (0) != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-12 Thread Christophe Lyon
Hi,



On 11 December 2017 at 18:12, Sudakshina Das  wrote:
> On 30/11/17 16:01, Sudakshina Das wrote:
>>
>> Hi
>>
>> This patch is the fix for gcc-7 for the same issue as mentioned in:
>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html
>>
>>
>> For the following test case:
>> __fp16
>> test_select (__fp16 a, __fp16 b, __fp16 c)
>> {
>>return (a < b) ? b : c;
>> }
>>
>> when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
>> -mfloat-abi=hard generates wrong code:
>>
>> test_select:
>>  @ args = 0, pretend = 0, frame = 0
>>  @ frame_needed = 0, uses_anonymous_args = 0
>>  @ link register save eliminated.
>>  vcvtb.f32.f16s0, s0
>>  vcvtb.f32.f16s15, s1
>>  vmov.f16r3, s2@ __fp16
>>  vcmpe.f32s0, s15
>>  vmrsAPSR_nzcv, FPSCR
>>  // <-- No conditional branch
>>  vmov.f16r3, s1@ __fp16
>> .L1:
>>  vmov.f16s0, r3@ __fp16
>>  bxlr
>>
>> There should have been a conditional branch there to skip one of the
>> VMOVs.
>> This patch fixes this problem by making *movhf_vfp_fp16 unconditional.
>>
>> Testing done: Add a new test case and checked for regressions on
>> bootstrapped arm-none-linux-gnueabihf.
>>
>> Is this ok for gcc-7?
>>
>> Sudi
>>
>> ChangeLog entry are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>> 2017-11-30  Sudakshina Das  
>>
>>  * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2017-11-30  Sudakshina Das  
>>
>>  * gcc.target/arm/armv8_2-fp16-move-2.c: New test.
>
>
> As per the trunk thread for this
> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as
> r255536 on gcc-7-branch for the backport.
>

I've noticed that this backport fails on arm-none-linux-gnueabi and
arm-none-eabi.
I suspect this is partly due to the fact that I use a "recent"
dejagnu, and has to do with whether
dg-add-options are appended or pre-pended.
I'm seeing a compilation line with:

-mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard
-mfpu=fp-armv8 -mfloat-abi=softfp -march=armv8.2-a+fp16

leading to:
FAIL: gcc.target/arm/armv8_2-fp16-move-2.c scan-assembler bpl

I'm not sure why this works on trunk, but there I have only:
-marm -mfloat-abi=softfp -march=armv8.2-a+fp16

Maybe this has to do with the new way cpu/fpu options are parsed on trunk.

Christophe


> Thanks
> Sudi


Re: [PATCH][AArch64] Specify fp16 support for Cortex-A55 and Cortex-A75

2017-12-12 Thread James Greenhalgh
On Mon, Dec 11, 2017 at 01:44:23PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> The Cortex-A55 and Cortex-A75 processors support the fp16 extension.
> We already specify them as such in the arm port.
> This patch makes aarch64 consistent on this front.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Manually checked that compiling with aarch64-none-linux-gnu-gcc 
> -mcpu=cortex-a55 -dM -E - < /dev/null
> shows __ARM_FEATURE_FP16_VECTOR_ARITHMETIC and 
> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC being specified
> as expected whereas they were not before this patch.
> 
> Ok for trunk?

OK.

Sorry for the oversight when I added this support earlier in the year.

James

> 2017-12-11  Kyrylo Tkachov  
> 
>  * config/aarch64/aarch64-cores.def (cortex-a55, cortex-a75,
>  cortex-a75.cortex-a55): Specify AARCH64_FL_F16 in the arch features.



Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-12 Thread Sudakshina Das

Hi Christophe

On 12/12/17 09:59, Christophe Lyon wrote:

Hi,



On 11 December 2017 at 18:12, Sudakshina Das  wrote:

On 30/11/17 16:01, Sudakshina Das wrote:


Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
-mfloat-abi=hard generates wrong code:

test_select:
  @ args = 0, pretend = 0, frame = 0
  @ frame_needed = 0, uses_anonymous_args = 0
  @ link register save eliminated.
  vcvtb.f32.f16s0, s0
  vcvtb.f32.f16s15, s1
  vmov.f16r3, s2@ __fp16
  vcmpe.f32s0, s15
  vmrsAPSR_nzcv, FPSCR
  // <-- No conditional branch
  vmov.f16r3, s1@ __fp16
.L1:
  vmov.f16s0, r3@ __fp16
  bxlr

There should have been a conditional branch there to skip one of the
VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on
bootstrapped arm-none-linux-gnueabihf.

Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

  * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

  * gcc.target/arm/armv8_2-fp16-move-2.c: New test.



As per the trunk thread for this
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as
r255536 on gcc-7-branch for the backport.



I've noticed that this backport fails on arm-none-linux-gnueabi and
arm-none-eabi.
I suspect this is partly due to the fact that I use a "recent"
dejagnu, and has to do with whether
dg-add-options are appended or pre-pended.
I'm seeing a compilation line with:

-mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard
-mfpu=fp-armv8 -mfloat-abi=softfp -march=armv8.2-a+fp16

leading to:
FAIL: gcc.target/arm/armv8_2-fp16-move-2.c scan-assembler bpl

I'm not sure why this works on trunk, but there I have only:
-marm -mfloat-abi=softfp -march=armv8.2-a+fp16

Maybe this has to do with the new way cpu/fpu options are parsed on trunk.


Sorry for this. I will try to investigate.

Thanks
Sudi



Christophe



Thanks
Sudi




RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]

2017-12-12 Thread Gerald Pfeifer
On Tue, 12 Dec 2017, Koval, Julia wrote:
> Looks good. How to put it there(sorry, noob question)?

Does https://gcc.gnu.org/about.html help?  If not, let me know 
and I'll work with you (and update those docs on the way).

Of course, even if things work for you, any suggestions on how
to improve this little page are very welcome. :)

Gerald


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-12-12 Thread Alan Hayward
Ping.

> On 30 Nov 2017, at 11:03, Alan Hayward  wrote:
> 
> 
>> On 27 Nov 2017, at 17:29, Jeff Law  wrote:
>> 
>> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>>> 
 On 22 Nov 2017, at 17:33, Jeff Law  wrote:
 
 On 11/22/2017 04:31 AM, Alan Hayward wrote:
> 
>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>>> 
 
 You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
 totally forgotten about it.  And in fact it seems to come pretty close
 to what you need…
>>> 
>>> Yes, some of the code is similar to the way
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>> main difference
>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>> specific Instruction,
>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied 
>>> to an expression (tls_desc).
>>> It meant there wasn’t really any opportunity to resume any existing 
>>> code.
>> Understood.  Though your first patch mentions that you're trying to
>> describe partial preservation "around TLS calls". Presumably those are
>> represented as normal insns, not call_insn.
>> 
>> That brings me back to Richi's idea of exposing a set of the low subreg
>> to itself using whatever mode is wide enough to cover the neon part of
>> the register.
>> 
>> That should tell the generic parts of the compiler that you're just
>> clobbering the upper part and at least in theory you can implement in
>> the aarch64 backend and the rest of the compiler should "just work"
>> because that's the existing semantics of a subreg store.
>> 
>> The only worry would be if a pass tried to get overly smart and
>> considered that kind of set a nop -- but I think I'd argue that's simply
>> wrong given the semantics of a partial store.
>> 
> 
> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
> It’s something we considered, and then dismissed.
> 
> The problem then is you are now using SET semantics on those registers, 
> and it
> would make the register live around the function, which might not be the 
> case.
> Whereas clobber semantics will just make the register dead - which is 
> exactly
> what we want (but only conditionally).
 ?!?  A set of the subreg is the *exact* semantics you want.  It says the
 low part is preserved while the upper part is clobbered across the TLS
 insns.
 
 jeff
>>> 
>>> Consider where the TLS call is inside a loop. The compiler would normally 
>>> want
>>> to hoist that out of the loop. By adding a set(x,x) into the parallel of 
>>> the tls_desc we
>>> are now making x live across the loop, x is dependant on the value from the 
>>> previous
>>> iteration, and the tls_desc can no longer be hoisted.
>> Hmm.  I think I see the problem you're trying to point out.  Let me
>> restate it and see if you agree.
>> 
>> The low subreg set does clearly indicate the upper part of the SVE
>> register is clobbered.  The problem is from a liveness standpoint the
>> compiler is considering the low part live, even though it's a self-set.
>> 
> 
> Yes.
> 
>> In fact, if that is the case, then a single TLS call (independent of a
>> loop) would make the low part of the register globally live.  This
>> should be testable.  Include one of these low part self sets on the
>> existing TLS calls and compile a little test function and let's look at
>> the liveness data.
>> 
> 
> 
> Ok, so I’ve put 
> (set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM))
> (set (reg:TI V1_REGNUM) (reg:TI V1_REGNUM))
> etc up to V31 in the UNSPEC_TLSDESC pattern in aarch64.md, using up all the 
> neon registers.
> 
> In an ideal world, this change would do nothing as neon reg size is TI.
> 
> First I compiled up  sve_tls_preserve_1.c (Test1 below)
> 
> Without the SET changes, gcc does not backup any neon registers before the 
> tlsdesccall (good).
> With the SET changes, gcc backs up all the neon registers (not ideal).
> 
> Without the SET changes, dump logs give:
> 
> cse1:
> ;;  regs ever live 0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 66 [cc]
> 
> cprop1:
> ;; lr  in  29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; lr  use 29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; lr  def 0 [x0] 30 [x30] 32 [v0] 66 [cc] 79 80 81 82 84 85 86 87 88 89
> ;; live  in29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; live  gen   0 [x0] 32 [v0] 78 79 80 81 82 83 84 85 86 87 88 89
> ;; live  kill  30 [x30] 66 [cc]
> ;; lr  out 29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
> ;; live  out   29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
> 
> reload:
> ;;  regs ever live 0 [x0] 1 [x1] 2 [x2] 29 [x29] 30 [x30] 32 [v0] 33 [v1] 
> 34 [v2] 35 [v3] 36 [v4] 66 [cc]
> 
> With the set changes:

[PATCH] Fix PR83385

2017-12-12 Thread Richard Biener

Boostrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-12-12  Richard Biener  

PR tree-optimization/83385
* graphite-scop-detection.c (get_order, order): Remove.
(bb_to_rpo): New global.
(cmp_pbbs): Adjust.
(build_scops): Sort pbbs in RPO order.

* gcc.dg/graphite/pr83385.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 255574)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1608,26 +1608,7 @@ gather_bbs::after_dom_children (basic_bl
 /* Compute sth like an execution order, dominator order with first executing
edges that stay inside the current loop, delaying processing exit edges.  */
 
-static vec order;
-
-static void
-get_order (scop_p scop, basic_block bb, vec *order, unsigned 
*dfs_num)
-{
-  if (! bb_in_sese_p (bb, scop->scop_info->region))
-return;
-
-  (*order)[bb->index] = (*dfs_num)++;
-  for (basic_block son = first_dom_son (CDI_DOMINATORS, bb);
-   son;
-   son = next_dom_son (CDI_DOMINATORS, son))
-if (flow_bb_inside_loop_p (bb->loop_father, son))
-  get_order (scop, son, order, dfs_num);
-  for (basic_block son = first_dom_son (CDI_DOMINATORS, bb);
-   son;
-   son = next_dom_son (CDI_DOMINATORS, son))
-if (! flow_bb_inside_loop_p (bb->loop_father, son))
-  get_order (scop, son, order, dfs_num);
-}
+static int *bb_to_rpo;
 
 /* Helper for qsort, sorting after order above.  */
 
@@ -1636,9 +1617,11 @@ cmp_pbbs (const void *pa, const void *pb
 {
   poly_bb_p bb1 = *((const poly_bb_p *)pa);
   poly_bb_p bb2 = *((const poly_bb_p *)pb);
-  if (order[bb1->black_box->bb->index] < order[bb2->black_box->bb->index])
+  if (bb_to_rpo[bb1->black_box->bb->index]
+  < bb_to_rpo[bb2->black_box->bb->index])
 return -1;
-  else if (order[bb1->black_box->bb->index] > order[bb2->black_box->bb->index])
+  else if (bb_to_rpo[bb1->black_box->bb->index]
+  > bb_to_rpo[bb2->black_box->bb->index])
 return 1;
   else
 return 0;
@@ -1662,7 +1645,7 @@ build_scops (vec *scops)
   /* Domwalk needs a bb to RPO mapping.  Compute it once here.  */
   int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
-  int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+  bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
   for (int i = 0; i < postorder_num; ++i)
 bb_to_rpo[postorder[i]] = i;
   free (postorder);
@@ -1676,16 +1659,8 @@ build_scops (vec *scops)
   /* Record all basic blocks and their conditions in REGION.  */
   gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest);
 
-  /* domwalk does not fulfil our code-generations constraints on the
- order of pbb which is to produce sth like execution order, delaying
-exection of loop exit edges.  So compute such order and sort after
-that.  */
-  order.create (last_basic_block_for_fn (cfun));
-  order.quick_grow (last_basic_block_for_fn (cfun));
-  unsigned dfs_num = 0;
-  get_order (scop, s->entry->dest, &order, &dfs_num);
+  /* Sort pbbs after execution order for initial schedule generation.  */
   scop->pbbs.qsort (cmp_pbbs);
-  order.release ();
 
   if (! build_alias_set (scop))
{
@@ -1732,6 +1707,7 @@ build_scops (vec *scops)
 }
 
   free (bb_to_rpo);
+  bb_to_rpo = NULL;
   DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0););
 }
 
Index: gcc/testsuite/gcc.dg/graphite/pr83385.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83385.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83385.c (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -floop-nest-optimize" } */
+
+int xc, n1 = 0;
+int bx[2];
+
+int
+main (void)
+{
+  int aj = 1;
+  int cs;
+
+  for (cs = aj; cs >= 0; --cs)
+{
+  int sq;
+
+  for (sq = 0; sq < 2; ++sq)
+   {
+ if (aj != 0)
+   --n1;
+
+ for (xc = 0; xc < 2; ++xc)
+   bx[xc] = 0;
+   }
+
+  --aj;
+}
+
+  if (n1 != -2)
+__builtin_abort ();
+  return 0;
+}


Re: [AArch64] Fix ICEs in aarch64_print_operand_internal (PR target/83335)

2017-12-12 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Tue, Dec 12, 2017 at 06:21:37AM +, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > Hi!
>> >
>> > On Fri, Dec 08, 2017 at 08:10:08PM +0100, Christophe Lyon wrote:
>> >> >> Can you check?
>> >> >
>> >> > I think that's a separate preexisting problem.  Could you file a PR?
>> >> >
>> >> 
>> >> Sure, I filed:
>> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83335
>> >> 
>> >> > Personally I'd just remove the assert, but I'm guessing that wouldn't
>> >> > be acceptable...
>> >
>> > So, I think either we can return false instead of dying on the assertion,
>> > but then it will emit output_addr_const and often just silently emit it
>> > without diagnosing it (first patch), or just call output_operand_lossage
>> > there, which will ICE except for inline asm, where it will error.
>> > It is true there is no code provided, but output_addr_const wouldn't
>> > provide that either:
>> > default:
>> >   if (targetm.asm_out.output_addr_const_extra (file, x))
>> > break;
>> >
>> >   output_operand_lossage ("invalid expression as operand");
>> > in final.c.
>> 
>> I think the testcase is valid even for ILP32, so the first sounds better
>> to me.
>
> I think it is not valid to print "X" constraint operand in any way, because
> "X" operand can be anything.  All we need to make sure is that we don't ICE
> on it if in inline asm.  The purpose of "X" is for operands that aren't
> needed at all, or are needed just to denote they are read or written through
> other means.  If you look at the testsuite, no other test but
> aarch64/asm-{2,3}.c attempts to print "X" operads.

The ICE triggers for "i" as well as "X" though:

  asm volatile ("%a0" :: "i" (&x));

Thanks,
Richard


Re: [PATCH GCC]More conservative interchanging small loops with const initialized simple reduction

2017-12-12 Thread Bin.Cheng
On Fri, Dec 8, 2017 at 2:40 PM, Richard Biener
 wrote:
> On Fri, Dec 8, 2017 at 1:43 PM, Bin.Cheng  wrote:
>> On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener
>>  wrote:
>>> On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng  wrote:
 Hi,
 This simple patch makes interchange even more conservative for small loops 
 with constant initialized simple reduction.
 The reason is undoing such reduction introduces new data reference and 
 cond_expr, which could cost too much in a small
 loop.
 Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Is it OK if 
 test passes?
>>>
>>> Shouldn't we do this even for non-constant initialzied simple
>>> reduction?  Because for any simple
>>> reduction we add two DRs that are not innermost, for constant
>>> initialized we add an additional
>>> cond-expr.  So ...
>>>
>>> +  /* Conservatively skip interchange in cases only have few data references
>>> + and constant initialized simple reduction since it introduces new data
>>> + reference as well as ?: operation.  */
>>> +  if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= datarefs.length 
>>> ())
>>> +return false;
>>> +
>>>
>>> can you, instead of carrying num_const_init_simple_reduc simply loop
>>> over m_reductions
>>> and classify them in this function accordingly?  I think we want to
>>> cost non-constant-init
>>> reductions as well.  The :? can eventually count for another DR for
>>> cost purposes.
>> Number of non-constant-init reductions can still be carried in struct
>> loop_cand?  I am not very sure what's the advantage of an additional
>> loop over m_reductions getting the same information.
>> Perhaps the increase of stmts should be counted like:
>>   num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs
>> Question is which number should this be compared against.  (we may
>> need to shift num_new_inv_drs to the other side for wrapping issue).
>>
>>>
>>> It looks like we do count the existing DRs for the reduction?  Is that
>>> why you arrive
>>> at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:)
>> Yes.
>>> But we don't really know whether the DR was invariant in the outer
>>> loop (well, I suppose
>> Hmm, I might misunderstand here.  num_old_inv_drs tracks the number of
>> invariant reference with regarding to inner loop, rather than the
>> outer loop.  The same to num_new_inv_drs,
>> which means a reference become invariant after loop interchange with
>> regarding to (the new) inner loop.  This invariant information is
>> always known from data reference, right?
>> As for DRs for reduction, we know it's invariant because we set its
>> inner loop stride to zero.
>>
>>> we could remember the DR in m_reductions).
>>>
>>> Note that the good thing is that the ?: has an invariant condition and
>>> thus vectorization
>>> can hoist the mask generation out of the vectorized loop which means
>>> it boils down to
>>> cheap operations.  My gut feeling is that just looking at the number
>>> of memory references
>>> isn't a good indicator of profitability as the regular stmt workload
>>> has a big impact on
>>> profitability of vectorization.
>> It's not specific to vectorization.  The generated new code also costs
>> too much in small loops without vectorization.  But yes, # of mem_refs
>> may be too inaccurate, maybe we should check against num_stmts.
>
> Not specific to vectorization but the interchange may pay off only when
> vectorizing a loop.  Would the loop in loop-interchange-5.c be still
> interchanged?  If we remove the multiplication and just keep
> c[i][j] = c[i][j] + b[k][j];
> ?  That is, why is the constant init so special?  Even for non-constant init
> we're changing two outer loop DRs to two non-consecutive inner loop DRs.
Hi Richard,
This is updated patch taking stmt cost into consideration.

Firstly stmt cost (from # of stmt)
of loops are recorded.  Then stmt cost of outer loop is adjusted by decreasing
number of IVs, increasing by the number of constant initialized simple
reductions.
Lastly we check stmt cost between inner/outer loops and give up on interchange
if outer loop has too many stmts.

Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Bootstrap and test
on x86_64 andAArch64.  Any comment?

Thanks,
bin
2017-12-12  Bin Cheng  

* gimple-loop-interchange.cc (STMT_COST_RATIO): New macro.
(loop_cand::m_num_stmts, loop_cand::m_const_init_reduc): New members.
(loop_cand::loop_cand): Initialize above members.
(loop_cand::supported_operations): Delete.
(loop_cand::can_interchange_p): Inline above function.
(loop_cand::classify_simple_reduction): Record number of constant
initialized simple reductions.
(should_interchange_loops): New parameters.  Check stmt cost of loops
to be interchange.
(tree_loop_interchange::interchange): Prepare stmt cost of outer loop.
Update call to should_interchange_loops.
(should_interchange_loop_nest): Update call to
should_int

[testsuite] Fix g++.old-deja/g++.pt/const2.C on Solaris

2017-12-12 Thread Rainer Orth
Between 20171112 (r254663) and 20171113 (r254700) a testcase regressed
on Solaris (32 and 64-bit, sparc and x86):

+FAIL: g++.old-deja/g++.pt/const2.C  -std=c++11 (test for excess errors)
+FAIL: g++.old-deja/g++.pt/const2.C  -std=c++14 (test for excess errors)
+FAIL: g++.old-deja/g++.pt/const2.C  -std=c++98 (test for excess errors)

We now have

Excess errors:
Undefined   first referenced
A::i   /var/tmp//ccgwU71c.o

compared to

output is:
Undefined   first referenced
 symbol in file
A::i   /var/tmp//ccgwU71c.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status

before.  The failure was caused by

[Diagnostic Patch] don't print column zero
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01911.html
[...]
* lib/gcc-dg.exp (process-message): Use -: for no column.

With runtest -v -v -v, I now find

process-message:
{{} WARNING { [^
]*i} {}}

while before it had been

process-message:
{{} WARNING {[^
]*i} {}}

Note the additional space after "WARNING {" now.

This space is printed unconditionally here

set expmsg "$column $msgprefix\[^\n\]*$expmsg"

which inserts space after $column even if $column is empty

The following patch fixes it by emitting the space only as needed.

Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu, installed on mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-11-14  Rainer Orth  

* lib/gcc-dg.exp (process-message): Avoid additional whitespace in
$expmsg.

# HG changeset patch
# Parent  6257289977f79d04fd93d77940bc71433e2aa7aa
Fix g++.old-deja/g++.pt/const2.C on Solaris

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -1103,14 +1103,15 @@ proc process-message { msgproc msgprefix
 	# Remove it from the original expression and move it
 	# to the proper place in the search expression.
 	set expmsg [string range $expmsg [string length $column] end]
+	set column "$column "
 } elseif [string match "" [lindex $newentry 0]] {
 	# The specified line number is 0; don't expect a column number.
 } else {
 	# There is no column number in the search expression, but we
 	# should expect one in the message itself.
-	set column {[0-9]+:}
+	set column {[0-9]+: }
 }
-set expmsg "$column $msgprefix\[^\n\]*$expmsg"
+set expmsg "$column$msgprefix\[^\n\]*$expmsg"
 set newentry [lreplace $newentry 2 2 $expmsg]
 
 set dg-messages [lreplace ${dg-messages} end end $newentry]


[SFN] Bootstrap broken

2017-12-12 Thread David Edelsohn
Something in this series broke bootstrap on AIX, probably Power in general.

- David

/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c: In function 'void
pp_gimple_stmt_1(pretty_printer*, gimple*, int, dump_flags_t)':
/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error:
invalid rtl sharing found in the insn
 }
 ^
(debug_insn 5082 5081 5083 (var_location:DI flags (concatn/v:DI [
(reg:SI 1564 [ flags ])
(reg:SI 1565 [ flags+4 ])
])) "/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c":2583 -1
 (nil))
/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error: shared rtx
(concatn/v:DI [
(reg:SI 1564 [ flags ])
(reg:SI 1565 [ flags+4 ])
])
during RTL pass: outof_cfglayout
/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: internal
compiler error: internal consistency failure


Re: [PATCH, rs6000] Fix PR83332 (missing vcond patterns)

2017-12-12 Thread Bill Schmidt

> On Dec 12, 2017, at 2:02 AM, Richard Biener  
> wrote:
> 
> On Mon, Dec 11, 2017 at 10:55 PM, Bill Schmidt
>  wrote:
>> Hi,
>> 
>> A new test case introduced for PR81303 failed on powerpc64 (BE, LE).  This
>> turns out to be due to a missing standard pattern (vcondv2div2df).  This
>> and a couple of other patterns are easy to support with existing logic
>> by just adding new patterns with appropriate modes.  That's all this patch
>> does.  That's sufficient to cause the failing test to pass.
>> 
>> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
>> this okay for trunk?
>> 
>> Thanks,
>> Bill
>> 
>> 
>> 2017-12-11  Bill Schmidt  
>> 
>>PR target/83332
>>* config/rs6000/vector.md (vcondv2dfv2di): New define_expand.
>>(vcondv2div2df): Likewise.
>>(vconduv2dfv2di): Likewise.
>> 
>> Index: gcc/config/rs6000/vector.md
>> ===
>> --- gcc/config/rs6000/vector.md (revision 255539)
>> +++ gcc/config/rs6000/vector.md (working copy)
>> @@ -455,6 +455,44 @@
>> FAIL;
>> }")
>> 
>> +(define_expand "vcondv2dfv2di"
> 
> given you already have vcondv4siv4sf this asks for macroization, no?  I see 
> you
> have vcond already, you can use
> 
> vcond
> 
> and guard with GET_MODE_NUNITS () == GET_MODE_NUNINTS ()?

Yes, that can be done -- although the guard conditions start to get kind of ugly
now, since the v4siv4sf patterns require VECTOR_UNIT_ALTIVEC_P (V4SImode)
while the v2div2df ones require VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode).
What I have may be more straightforward to understand.  I can implement
whichever is considered preferable.

Thanks,
Bill

> 
>> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
>> +   (if_then_else:V2DF
>> +(match_operator 3 "comparison_operator"
>> +[(match_operand:V2DI 4 "vint_operand" "")
>> + (match_operand:V2DI 5 "vint_operand" "")])
>> +(match_operand:V2DF 1 "vfloat_operand" "")
>> +(match_operand:V2DF 2 "vfloat_operand" "")))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>> +  "
>> +{
>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>> +   operands[3], operands[4], operands[5]))
>> +DONE;
>> +  else
>> +FAIL;
>> +}")
>> +
>> +(define_expand "vcondv2div2df"
>> +  [(set (match_operand:V2DI 0 "vint_operand" "")
>> +   (if_then_else:V2DI
>> +(match_operator 3 "comparison_operator"
>> +[(match_operand:V2DF 4 "vfloat_operand" "")
>> + (match_operand:V2DF 5 "vfloat_operand" "")])
>> +(match_operand:V2DI 1 "vint_operand" "")
>> +(match_operand:V2DI 2 "vint_operand" "")))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>> +  "
>> +{
>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>> +   operands[3], operands[4], operands[5]))
>> +DONE;
>> +  else
>> +FAIL;
>> +}")
>> +
>> (define_expand "vcondu"
>>   [(set (match_operand:VEC_I 0 "vint_operand")
>>(if_then_else:VEC_I
>> @@ -492,6 +530,25 @@
>> FAIL;
>> }")
>> 
>> +(define_expand "vconduv2dfv2di"
>> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
>> +   (if_then_else:V2DF
>> +(match_operator 3 "comparison_operator"
>> +[(match_operand:V2DI 4 "vint_operand" "")
>> + (match_operand:V2DI 5 "vint_operand" "")])
>> +(match_operand:V2DF 1 "vfloat_operand" "")
>> +(match_operand:V2DF 2 "vfloat_operand" "")))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>> +  "
>> +{
>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>> +   operands[3], operands[4], operands[5]))
>> +DONE;
>> +  else
>> +FAIL;
>> +}")
>> +
>> (define_expand "vector_eq"
>>   [(set (match_operand:VEC_C 0 "vlogical_operand" "")
>>(eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand" "")



Re: [PATCH, rs6000] Fix PR83332 (missing vcond patterns)

2017-12-12 Thread Richard Biener
On Tue, Dec 12, 2017 at 2:46 PM, Bill Schmidt
 wrote:
>
>> On Dec 12, 2017, at 2:02 AM, Richard Biener  
>> wrote:
>>
>> On Mon, Dec 11, 2017 at 10:55 PM, Bill Schmidt
>>  wrote:
>>> Hi,
>>>
>>> A new test case introduced for PR81303 failed on powerpc64 (BE, LE).  This
>>> turns out to be due to a missing standard pattern (vcondv2div2df).  This
>>> and a couple of other patterns are easy to support with existing logic
>>> by just adding new patterns with appropriate modes.  That's all this patch
>>> does.  That's sufficient to cause the failing test to pass.
>>>
>>> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
>>> this okay for trunk?
>>>
>>> Thanks,
>>> Bill
>>>
>>>
>>> 2017-12-11  Bill Schmidt  
>>>
>>>PR target/83332
>>>* config/rs6000/vector.md (vcondv2dfv2di): New define_expand.
>>>(vcondv2div2df): Likewise.
>>>(vconduv2dfv2di): Likewise.
>>>
>>> Index: gcc/config/rs6000/vector.md
>>> ===
>>> --- gcc/config/rs6000/vector.md (revision 255539)
>>> +++ gcc/config/rs6000/vector.md (working copy)
>>> @@ -455,6 +455,44 @@
>>> FAIL;
>>> }")
>>>
>>> +(define_expand "vcondv2dfv2di"
>>
>> given you already have vcondv4siv4sf this asks for macroization, no?  I see 
>> you
>> have vcond already, you can use
>>
>> vcond
>>
>> and guard with GET_MODE_NUNITS () == GET_MODE_NUNINTS ()?
>
> Yes, that can be done -- although the guard conditions start to get kind of 
> ugly
> now, since the v4siv4sf patterns require VECTOR_UNIT_ALTIVEC_P (V4SImode)
> while the v2div2df ones require VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode).
> What I have may be more straightforward to understand.  I can implement
> whichever is considered preferable.

Up to the target maintainers of course.

Richard.

> Thanks,
> Bill
>
>>
>>> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
>>> +   (if_then_else:V2DF
>>> +(match_operator 3 "comparison_operator"
>>> +[(match_operand:V2DI 4 "vint_operand" "")
>>> + (match_operand:V2DI 5 "vint_operand" "")])
>>> +(match_operand:V2DF 1 "vfloat_operand" "")
>>> +(match_operand:V2DF 2 "vfloat_operand" "")))]
>>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>>> +  "
>>> +{
>>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>>> +   operands[3], operands[4], operands[5]))
>>> +DONE;
>>> +  else
>>> +FAIL;
>>> +}")
>>> +
>>> +(define_expand "vcondv2div2df"
>>> +  [(set (match_operand:V2DI 0 "vint_operand" "")
>>> +   (if_then_else:V2DI
>>> +(match_operator 3 "comparison_operator"
>>> +[(match_operand:V2DF 4 "vfloat_operand" "")
>>> + (match_operand:V2DF 5 "vfloat_operand" "")])
>>> +(match_operand:V2DI 1 "vint_operand" "")
>>> +(match_operand:V2DI 2 "vint_operand" "")))]
>>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>>> +  "
>>> +{
>>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>>> +   operands[3], operands[4], operands[5]))
>>> +DONE;
>>> +  else
>>> +FAIL;
>>> +}")
>>> +
>>> (define_expand "vcondu"
>>>   [(set (match_operand:VEC_I 0 "vint_operand")
>>>(if_then_else:VEC_I
>>> @@ -492,6 +530,25 @@
>>> FAIL;
>>> }")
>>>
>>> +(define_expand "vconduv2dfv2di"
>>> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")
>>> +   (if_then_else:V2DF
>>> +(match_operator 3 "comparison_operator"
>>> +[(match_operand:V2DI 4 "vint_operand" "")
>>> + (match_operand:V2DI 5 "vint_operand" "")])
>>> +(match_operand:V2DF 1 "vfloat_operand" "")
>>> +(match_operand:V2DF 2 "vfloat_operand" "")))]
>>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
>>> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
>>> +  "
>>> +{
>>> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
>>> +   operands[3], operands[4], operands[5]))
>>> +DONE;
>>> +  else
>>> +FAIL;
>>> +}")
>>> +
>>> (define_expand "vector_eq"
>>>   [(set (match_operand:VEC_C 0 "vlogical_operand" "")
>>>(eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand" "")
>


Re: [PATCH v7 4/4] Add gdb for or1k build

2017-12-12 Thread Stafford Horne
Hello,

I brought this up on the #gcc chat room.  I was asked to check with Jakub and
Richi.

This is a small patch to configure/configure.ac to enable gdb to build for the
or1k target.  I am working on upstreaming the gdb port and have everything OKed
and all copyrights in place for the binutils/gdb/sim projects.

Since binutils copies the configure/configure.ac from the gcc project I would
like to get this patch incorporated into gcc before committing the
binutils/gdb/sim patches.

Please let me know if there are any questions/concerns.

-Stafford

On Sat, Dec 09, 2017 at 06:28:28AM +0900, Stafford Horne wrote:
> Hello,
> 
> This patch was sent in May, now everything is in place to commit the
> OpenRISC gdb port upstream except for the OK from GCC on this patch.
> 
> Are there any concerns?  Who will commit this to GCC?
> 
> On Mon, May 29, 2017 at 11:48 PM, Stafford Horne  wrote:
> > * ChangeLog:
> >
> > 2017-02-12  Stafford Horne  
> >
> > * configure.ac: Remove logic adding gdb to noconfigsdirs for or1k.
> > * configure: Regenerate.
> >
> > Cc: gcc-patches@gcc.gnu.org
> > ---
> >  configure| 7 ---
> >  configure.ac | 7 ---
> >  2 files changed, 14 deletions(-)
> >
> > diff --git a/configure b/configure
> > index be9dd89..0bf47fa 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3632,10 +3632,6 @@ case "${target}" in
> >  ;;
> >*-*-rtems*)
> >  noconfigdirs="$noconfigdirs target-libgloss"
> > -# this is not caught below because this stanza matches earlier
> > -case $target in
> > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > -esac
> >  ;;
> >  # The tpf target doesn't support gdb yet.
> >*-*-tpf*)
> > @@ -3841,9 +3837,6 @@ case "${target}" in
> >nvptx*-*-*)
> >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > target-libobjc"
> >  ;;
> > -  or1k*-*-*)
> > -noconfigdirs="$noconfigdirs gdb"
> > -;;
> >sh-*-*)
> >  case "${target}" in
> >sh*-*-elf)
> > diff --git a/configure.ac b/configure.ac
> > index 532c5c2..9d16792 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -966,10 +966,6 @@ case "${target}" in
> >  ;;
> >*-*-rtems*)
> >  noconfigdirs="$noconfigdirs target-libgloss"
> > -# this is not caught below because this stanza matches earlier
> > -case $target in
> > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > -esac
> >  ;;
> >  # The tpf target doesn't support gdb yet.
> >*-*-tpf*)
> > @@ -1175,9 +1171,6 @@ case "${target}" in
> >nvptx*-*-*)
> >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > target-libobjc"
> >  ;;
> > -  or1k*-*-*)
> > -noconfigdirs="$noconfigdirs gdb"
> > -;;
> >sh-*-*)
> >  case "${target}" in
> >sh*-*-elf)
> > --
> > 2.9.4
> >


Re: [PATCH v7 4/4] Add gdb for or1k build

2017-12-12 Thread Richard Biener
On Tue, 12 Dec 2017, Stafford Horne wrote:

> Hello,
> 
> I brought this up on the #gcc chat room.  I was asked to check with Jakub and
> Richi.
> 
> This is a small patch to configure/configure.ac to enable gdb to build for the
> or1k target.  I am working on upstreaming the gdb port and have everything 
> OKed
> and all copyrights in place for the binutils/gdb/sim projects.
> 
> Since binutils copies the configure/configure.ac from the gcc project I would
> like to get this patch incorporated into gcc before committing the
> binutils/gdb/sim patches.
> 
> Please let me know if there are any questions/concerns.

Ok.

Thanks,
Richard.

> -Stafford
> 
> On Sat, Dec 09, 2017 at 06:28:28AM +0900, Stafford Horne wrote:
> > Hello,
> > 
> > This patch was sent in May, now everything is in place to commit the
> > OpenRISC gdb port upstream except for the OK from GCC on this patch.
> > 
> > Are there any concerns?  Who will commit this to GCC?
> > 
> > On Mon, May 29, 2017 at 11:48 PM, Stafford Horne  wrote:
> > > * ChangeLog:
> > >
> > > 2017-02-12  Stafford Horne  
> > >
> > > * configure.ac: Remove logic adding gdb to noconfigsdirs for or1k.
> > > * configure: Regenerate.
> > >
> > > Cc: gcc-patches@gcc.gnu.org
> > > ---
> > >  configure| 7 ---
> > >  configure.ac | 7 ---
> > >  2 files changed, 14 deletions(-)
> > >
> > > diff --git a/configure b/configure
> > > index be9dd89..0bf47fa 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -3632,10 +3632,6 @@ case "${target}" in
> > >  ;;
> > >*-*-rtems*)
> > >  noconfigdirs="$noconfigdirs target-libgloss"
> > > -# this is not caught below because this stanza matches earlier
> > > -case $target in
> > > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > > -esac
> > >  ;;
> > >  # The tpf target doesn't support gdb yet.
> > >*-*-tpf*)
> > > @@ -3841,9 +3837,6 @@ case "${target}" in
> > >nvptx*-*-*)
> > >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > > target-libobjc"
> > >  ;;
> > > -  or1k*-*-*)
> > > -noconfigdirs="$noconfigdirs gdb"
> > > -;;
> > >sh-*-*)
> > >  case "${target}" in
> > >sh*-*-elf)
> > > diff --git a/configure.ac b/configure.ac
> > > index 532c5c2..9d16792 100644
> > > --- a/configure.ac
> > > +++ b/configure.ac
> > > @@ -966,10 +966,6 @@ case "${target}" in
> > >  ;;
> > >*-*-rtems*)
> > >  noconfigdirs="$noconfigdirs target-libgloss"
> > > -# this is not caught below because this stanza matches earlier
> > > -case $target in
> > > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > > -esac
> > >  ;;
> > >  # The tpf target doesn't support gdb yet.
> > >*-*-tpf*)
> > > @@ -1175,9 +1171,6 @@ case "${target}" in
> > >nvptx*-*-*)
> > >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > > target-libobjc"
> > >  ;;
> > > -  or1k*-*-*)
> > > -noconfigdirs="$noconfigdirs gdb"
> > > -;;
> > >sh-*-*)
> > >  case "${target}" in
> > >sh*-*-elf)
> > > --
> > > 2.9.4
> > >
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
 wrote:
> This series is a replacement for:
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
> based on the feedback that using VEC_PERM_EXPR would be better.
>
> The changes are:
>
> (1) Remove the restriction that the selector elements have to have the
> same width as the data elements, but only for constant selectors.
> This lets through the cases we need without also allowing
> potentially-expensive ops.  Adding support for the variable
> case can be done later if it seems useful, but it's not trivial.
>
> (2) Encode the integer form of constant selectors (vec_perm_indices)
> in the same way as the new VECTOR_CST encoding, so that it can
> cope with variable-length vectors.
>
> (3) Remove the vec_perm_const optab and reuse the target hook to emit
> code.  This avoids the need to create a CONST_VECTOR for the wide
> selectors, and hence the need to have a corresponding wide vector
> mode (which the target wouldn't otherwise need or support).

Hmm.  Makes sense I suppose.

> (4) When handling the variable vec_perm optab, check that modes can store
> all element indices before using them.
>
> (5) Unconditionally use ssizetype selector elements in permutes created
> by the vectoriser.

Why specifically _signed_ sizetype?  That sounds like an odd choice.  But I'll
eventually see when looking at the patch.  Does that mean we have a
VNDImode vector unconditionally for the permute even though a vector
matching the width of the data members would work?  What happens if the
target doesn't have vec_perm_const but vec_perm to handle all constant
permutes?

Going to look over the patches now.

Thanks for (re-)doing the work.

Richard.

> (6) Make the AArch64 vec_perm_const handling handle variable-length vectors.
>
> Tested directly on trunk on aarch64-linux-gnu, x86_64-linux-gnu and
> powerpc64le-linux-gnu.  Also tested by comparing the before and after
> assembly output for:
>
>arm-linux-gnueabi arm-linux-gnueabihf aarch64-linux-gnu
>aarch64_be-linux-gnu ia64-linux-gnu i686-pc-linux-gnu
>mipsisa64-linux-gnu mipsel-linux-gnu powerpc64-linux-gnu
>powerpc64le-linux-gnu powerpc-eabispe x86_64-linux-gnu
>sparc64-linux-gnu
>
> at -O3, which should cover all the ports that defined vec_perm_const.
> The only difference was one instance of different RA for ia64-linux-gnu,
> caused by using force_reg on a SUBREG that was previously used directly.
>
> OK to install?
>
> Thanks,
> Richard


Re: [02/13] Pass vec_perm_indices by reference

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:09 AM, Richard Sandiford
 wrote:
> This patch makes functions take vec_perm_indices by reference rather
> than value, since a later patch will turn vec_perm_indices into a class
> that would be more expensive to copy.

Ok.

>
> 2017-12-06  Richard Sandiford  
>
> gcc/
> * fold-const.c (fold_vec_perm): Take a const vec_perm_indices &
> instead of vec_perm_indices.
> * tree-vectorizer.h (vect_gen_perm_mask_any): Likewise,
> (vect_gen_perm_mask_checked): Likewise,
> * tree-vect-stmts.c (vect_gen_perm_mask_any): Likewise,
> (vect_gen_perm_mask_checked): Likewise,
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c2017-12-09 22:47:11.840391388 +
> +++ gcc/fold-const.c2017-12-09 22:47:19.119312754 +
> @@ -8801,7 +8801,7 @@ vec_cst_ctor_to_array (tree arg, unsigne
> NULL_TREE otherwise.  */
>
>  static tree
> -fold_vec_perm (tree type, tree arg0, tree arg1, vec_perm_indices sel)
> +fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
>  {
>unsigned int i;
>bool need_ctor = false;
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-12-09 22:47:11.840391388 +
> +++ gcc/tree-vectorizer.h   2017-12-09 22:47:19.120312754 +
> @@ -1204,8 +1204,8 @@ extern void vect_get_load_cost (struct d
>  extern void vect_get_store_cost (struct data_reference *, int,
>  unsigned int *, stmt_vector_for_cost *);
>  extern bool vect_supportable_shift (enum tree_code, tree);
> -extern tree vect_gen_perm_mask_any (tree, vec_perm_indices);
> -extern tree vect_gen_perm_mask_checked (tree, vec_perm_indices);
> +extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
> +extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
>  extern void optimize_mask_stores (struct loop*);
>
>  /* In tree-vect-data-refs.c.  */
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2017-12-09 22:47:11.840391388 +
> +++ gcc/tree-vect-stmts.c   2017-12-09 22:47:19.119312754 +
> @@ -6506,7 +6506,7 @@ vectorizable_store (gimple *stmt, gimple
> vect_gen_perm_mask_checked.  */
>
>  tree
> -vect_gen_perm_mask_any (tree vectype, vec_perm_indices sel)
> +vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
>  {
>tree mask_elt_type, mask_type;
>
> @@ -6527,7 +6527,7 @@ vect_gen_perm_mask_any (tree vectype, ve
> i.e. that the target supports the pattern _for arbitrary input vectors_.  
> */
>
>  tree
> -vect_gen_perm_mask_checked (tree vectype, vec_perm_indices sel)
> +vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel)
>  {
>gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel));
>return vect_gen_perm_mask_any (vectype, sel);


Re: [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:10 AM, Richard Sandiford
 wrote:
> This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p
> for testing permute operations with variable selection vectors, and
> can_vec_perm_const_p for testing permute operations with specific
> constant selection vectors.  This means that we can pass the constant
> selection vector by reference.
>
> Constant permutes can still use a variable permute as a fallback.
> A later patch adds a check to make sure that we don't truncate the
> vector indices when doing this.
>
> However, have_whole_vector_shift checked:
>
>   if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
> return false;
>
> which had the effect of disallowing the fallback to variable permutes.
> I'm not sure whether that was the intention or whether it was just
> supposed to short-cut the loop on targets that don't support permutes.
> (But then why bother?  The first check in the loop would fail and
> we'd bail out straightaway.)
>
> The patch adds a parameter for disallowing the fallback.  I think it
> makes sense to do this for the following code in the VEC_PERM_EXPR
> folder:
>
>   /* Some targets are deficient and fail to expand a single
>  argument permutation while still allowing an equivalent
>  2-argument version.  */
>   if (need_mask_canon && arg2 == op2
>   && !can_vec_perm_p (TYPE_MODE (type), false, &sel)
>   && can_vec_perm_p (TYPE_MODE (type), false, &sel2))
>
> since it's really testing whether the expand_vec_perm_const code expects
> a particular form.

Ok.

>
> 2017-12-09  Richard Sandiford  
>
> gcc/
> * optabs-query.h (can_vec_perm_p): Delete.
> (can_vec_perm_var_p, can_vec_perm_const_p): Declare.
> * optabs-query.c (can_vec_perm_p): Split into...
> (can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions.
> (can_mult_highpart_p): Use can_vec_perm_const_p to test whether a
> particular selector is valid.
> * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
> (vect_grouped_load_supported): Likewise.
> (vect_shift_permute_load_chain): Likewise.
> * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
> (vect_transform_slp_perm_load): Likewise.
> * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
> (vectorizable_bswap): Likewise.
> (vect_gen_perm_mask_checked): Likewise.
> * fold-const.c (fold_ternary_loc): Likewise.  Don't take
> implementations of variable permutation vectors into account
> when deciding which selector to use.
> * tree-vect-loop.c (have_whole_vector_shift): Don't check whether
> vec_perm_const_optab is supported; instead use can_vec_perm_const_p
> with a false third argument.
> * tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p
> to test whether the constant selector is valid and can_vec_perm_var_p
> to test whether a variable selector is valid.
>
> Index: gcc/optabs-query.h
> ===
> --- gcc/optabs-query.h  2017-12-09 22:47:14.730310076 +
> +++ gcc/optabs-query.h  2017-12-09 22:47:21.534314227 +
> @@ -175,7 +175,9 @@ enum insn_code can_float_p (machine_mode
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
>  opt_machine_mode qimode_for_vec_perm (machine_mode);
> -bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
> +bool can_vec_perm_var_p (machine_mode);
> +bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
> +  bool = true);
>  /* Find a widening optab even if it doesn't widen as much as we want.  */
>  #define find_widening_optab_handler(A, B, C) \
>find_widening_optab_handler_and_mode (A, B, C, NULL)
> Index: gcc/optabs-query.c
> ===
> --- gcc/optabs-query.c  2017-12-09 22:47:14.729310075 +
> +++ gcc/optabs-query.c  2017-12-09 22:47:21.534314227 +
> @@ -361,58 +361,75 @@ qimode_for_vec_perm (machine_mode mode)
>return opt_machine_mode ();
>  }
>
> -/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
> -   expanded using SIMD extensions of the CPU.  SEL may be NULL, which
> -   stands for an unknown constant.  Note that additional permutations
> -   representing whole-vector shifts may also be handled via the vec_shr
> -   optab, but only where the second input vector is entirely constant
> -   zeroes; this case is not dealt with here.  */
> +/* Return true if VEC_PERM_EXPRs with variable selector operands can be
> +   expanded using SIMD extensions of the CPU.  MODE is the mode of the
> +   vectors being permuted.  */
>
>  bool
> -can_vec_perm_p (machine_mod

Re: [PATCH v7 4/4] Add gdb for or1k build

2017-12-12 Thread Stafford Horne
On Tue, Dec 12, 2017 at 03:05:41PM +0100, Richard Biener wrote:
> On Tue, 12 Dec 2017, Stafford Horne wrote:
> 
> > Hello,
> > 
> > I brought this up on the #gcc chat room.  I was asked to check with Jakub 
> > and
> > Richi.
> > 
> > This is a small patch to configure/configure.ac to enable gdb to build for 
> > the
> > or1k target.  I am working on upstreaming the gdb port and have everything 
> > OKed
> > and all copyrights in place for the binutils/gdb/sim projects.
> > 
> > Since binutils copies the configure/configure.ac from the gcc project I 
> > would
> > like to get this patch incorporated into gcc before committing the
> > binutils/gdb/sim patches.
> > 
> > Please let me know if there are any questions/concerns.
> 
> Ok.

Thank you,

As discussed you have committed this to the gcc svn.

-Stafford

> Thanks,
> Richard.
> 
> > -Stafford
> > 
> > On Sat, Dec 09, 2017 at 06:28:28AM +0900, Stafford Horne wrote:
> > > Hello,
> > > 
> > > This patch was sent in May, now everything is in place to commit the
> > > OpenRISC gdb port upstream except for the OK from GCC on this patch.
> > > 
> > > Are there any concerns?  Who will commit this to GCC?
> > > 
> > > On Mon, May 29, 2017 at 11:48 PM, Stafford Horne  wrote:
> > > > * ChangeLog:
> > > >
> > > > 2017-02-12  Stafford Horne  
> > > >
> > > > * configure.ac: Remove logic adding gdb to noconfigsdirs for 
> > > > or1k.
> > > > * configure: Regenerate.
> > > >
> > > > Cc: gcc-patches@gcc.gnu.org
> > > > ---
> > > >  configure| 7 ---
> > > >  configure.ac | 7 ---
> > > >  2 files changed, 14 deletions(-)
> > > >
> > > > diff --git a/configure b/configure
> > > > index be9dd89..0bf47fa 100755
> > > > --- a/configure
> > > > +++ b/configure
> > > > @@ -3632,10 +3632,6 @@ case "${target}" in
> > > >  ;;
> > > >*-*-rtems*)
> > > >  noconfigdirs="$noconfigdirs target-libgloss"
> > > > -# this is not caught below because this stanza matches earlier
> > > > -case $target in
> > > > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > > > -esac
> > > >  ;;
> > > >  # The tpf target doesn't support gdb yet.
> > > >*-*-tpf*)
> > > > @@ -3841,9 +3837,6 @@ case "${target}" in
> > > >nvptx*-*-*)
> > > >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > > > target-libobjc"
> > > >  ;;
> > > > -  or1k*-*-*)
> > > > -noconfigdirs="$noconfigdirs gdb"
> > > > -;;
> > > >sh-*-*)
> > > >  case "${target}" in
> > > >sh*-*-elf)
> > > > diff --git a/configure.ac b/configure.ac
> > > > index 532c5c2..9d16792 100644
> > > > --- a/configure.ac
> > > > +++ b/configure.ac
> > > > @@ -966,10 +966,6 @@ case "${target}" in
> > > >  ;;
> > > >*-*-rtems*)
> > > >  noconfigdirs="$noconfigdirs target-libgloss"
> > > > -# this is not caught below because this stanza matches earlier
> > > > -case $target in
> > > > -  or1k*-*-*) noconfigdirs="$noconfigdirs gdb" ;;
> > > > -esac
> > > >  ;;
> > > >  # The tpf target doesn't support gdb yet.
> > > >*-*-tpf*)
> > > > @@ -1175,9 +1171,6 @@ case "${target}" in
> > > >nvptx*-*-*)
> > > >  noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 
> > > > target-libobjc"
> > > >  ;;
> > > > -  or1k*-*-*)
> > > > -noconfigdirs="$noconfigdirs gdb"
> > > > -;;
> > > >sh-*-*)
> > > >  case "${target}" in
> > > >sh*-*-elf)
> > > > --
> > > > 2.9.4
> > > >
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [04/13] Refactor expand_vec_perm

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:13 AM, Richard Sandiford
 wrote:
> This patch splits the variable handling out of expand_vec_perm into
> a subroutine, so that the next patch can use a different interface
> for expanding constant permutes.  expand_vec_perm now does all the
> CONST_VECTOR handling directly and defers to expand_vec_perm_var
> for other rtx codes.  Handling CONST_VECTORs includes handling the
> fallback to variable permutes.
>
> The patch also adds an assert for valid optab modes to expand_vec_perm_1,
> so that we get it when using optabs for CONST_VECTORs.  The MODE_VECTOR_INT
> part was previously in expand_vec_perm and the mode_for_int_vector part
> is new.
>
> Most of the patch is just reindentation, so I've attached a -b version.

Ok.

>
> 2017-12-06  Richard Sandiford  
>
> gcc/
> * optabs.c (expand_vec_perm_1): Assert that SEL has an integer
> vector mode and that that mode matches the mode of the data
> being permuted.
> (expand_vec_perm): Split handling of non-CONST_VECTOR selectors
> out into expand_vec_perm_var.  Do all CONST_VECTOR handling here,
> directly using expand_vec_perm_1 when forcing selectors into
> registers.
> (expand_vec_perm_var): New function, split out from expand_vec_perm.
>
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2017-12-09 22:47:14.731310077 +
> +++ gcc/optabs.c2017-12-09 22:47:23.878315657 +
> @@ -5405,6 +5405,8 @@ expand_vec_perm_1 (enum insn_code icode,
>machine_mode smode = GET_MODE (sel);
>struct expand_operand ops[4];
>
> +  gcc_assert (GET_MODE_CLASS (smode) == MODE_VECTOR_INT
> + || mode_for_int_vector (tmode).require () == smode);
>create_output_operand (&ops[0], target, tmode);
>create_input_operand (&ops[3], sel, smode);
>
> @@ -5431,8 +5433,13 @@ expand_vec_perm_1 (enum insn_code icode,
>return NULL_RTX;
>  }
>
> -/* Generate instructions for vec_perm optab given its mode
> -   and three operands.  */
> +static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> +
> +/* Implement a permutation of vectors v0 and v1 using the permutation
> +   vector in SEL and return the result.  Use TARGET to hold the result
> +   if nonnull and convenient.
> +
> +   MODE is the mode of the vectors being permuted (V0 and V1).  */
>
>  rtx
>  expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
> @@ -5443,6 +5450,9 @@ expand_vec_perm (machine_mode mode, rtx
>rtx tmp, sel_qi = NULL;
>rtvec vec;
>
> +  if (GET_CODE (sel) != CONST_VECTOR)
> +return expand_vec_perm_var (mode, v0, v1, sel, target);
> +
>if (!target || GET_MODE (target) != mode)
>  target = gen_reg_rtx (mode);
>
> @@ -5455,86 +5465,125 @@ expand_vec_perm (machine_mode mode, rtx
>if (!qimode_for_vec_perm (mode).exists (&qimode))
>  qimode = VOIDmode;
>
> -  /* If the input is a constant, expand it specially.  */
> -  gcc_assert (GET_MODE_CLASS (GET_MODE (sel)) == MODE_VECTOR_INT);
> -  if (GET_CODE (sel) == CONST_VECTOR)
> -{
> -  /* See if this can be handled with a vec_shr.  We only do this if the
> -second vector is all zeroes.  */
> -  enum insn_code shift_code = optab_handler (vec_shr_optab, mode);
> -  enum insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
> - ? optab_handler (vec_shr_optab, qimode)
> - : CODE_FOR_nothing);
> -  rtx shift_amt = NULL_RTX;
> -  if (v1 == CONST0_RTX (GET_MODE (v1))
> - && (shift_code != CODE_FOR_nothing
> - || shift_code_qi != CODE_FOR_nothing))
> +  /* See if this can be handled with a vec_shr.  We only do this if the
> + second vector is all zeroes.  */
> +  insn_code shift_code = optab_handler (vec_shr_optab, mode);
> +  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
> +? optab_handler (vec_shr_optab, qimode)
> +: CODE_FOR_nothing);
> +
> +  if (v1 == CONST0_RTX (GET_MODE (v1))
> +  && (shift_code != CODE_FOR_nothing
> + || shift_code_qi != CODE_FOR_nothing))
> +{
> +  rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
> +  if (shift_amt)
> {
> - shift_amt = shift_amt_for_vec_perm_mask (sel);
> - if (shift_amt)
> + struct expand_operand ops[3];
> + if (shift_code != CODE_FOR_nothing)
> {
> - struct expand_operand ops[3];
> - if (shift_code != CODE_FOR_nothing)
> -   {
> - create_output_operand (&ops[0], target, mode);
> - create_input_operand (&ops[1], v0, mode);
> - create_convert_operand_from_type (&ops[2], shift_amt,
> -   sizetype);
> - if (maybe_expand_insn (shift_code, 3, ops))
> - 

Re: [05/13] Remove vec_perm_const optab

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:16 AM, Richard Sandiford
 wrote:
> One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
> long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
> selectors that wouldn't fit in the vectors being permuted.  E.g. a
> permute on two V256QIs can't be done using a V256QI selector.
>
> At the moment constant permutes use two interfaces:
> targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
> valid and the vec_perm_const optab for actually emitting the permute.
> The former gets passed a vec<> selector and the latter an rtx selector.
> Most ports share a lot of code between the hook and the optab, with a
> wrapper function for each interface.
>
> We could try to keep that interface and require ports to define wider
> vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
> V256SI in the example above).  But building a CONST_VECTOR rtx seems a bit
> pointless here, since the expand code only creates the CONST_VECTOR in
> order to call the optab, and the first thing the target does is take
> the CONST_VECTOR apart again.
>
> The easiest approach therefore seemed to be to remove the optab and
> reuse the target hook to emit the code.  One potential drawback is that
> it's no longer possible to use match_operand predicates to force
> operands into the required form, but in practice all targets want
> register operands anyway.
>
> The patch also changes vec_perm_indices into a class that provides
> some simple routines for handling permutations.  A later patch will
> flesh this out and get rid of auto_vec_perm_indices, but I didn't
> want to do all that in this patch and make it more complicated than
> it already is.
>
>
> 2017-12-09  Richard Sandiford  
>
> gcc/
> * Makefile.in (OBJS): Add vec-perm-indices.o.
> * vec-perm-indices.h: New file.
> * vec-perm-indices.c: Likewise.
> * target.h (vec_perm_indices): Replace with a forward class
> declaration.
> (auto_vec_perm_indices): Move to vec-perm-indices.h.
> * optabs.h: Include vec-perm-indices.h.
> (expand_vec_perm): Delete.
> (selector_fits_mode_p, expand_vec_perm_var): Declare.
> (expand_vec_perm_const): Declare.
> * target.def (vec_perm_const_ok): Replace with...
> (vec_perm_const): ...this new hook.
> * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
> (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
> * doc/tm.texi: Regenerate.
> * optabs.def (vec_perm_const): Delete.
> * doc/md.texi (vec_perm_const): Likewise.
> (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
> * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
> expand_vec_perm for constant permutation vectors.  Assert that
> the mode of variable permutation vectors is the integer equivalent
> of the mode that is being permuted.
> * optabs-query.h (selector_fits_mode_p): Declare.
> * optabs-query.c: Include vec-perm-indices.h.
> (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
> is defined, instead of checking whether the vec_perm_const_optab
> exists.  Use targetm.vectorize.vec_perm_const instead of
> targetm.vectorize.vec_perm_const_ok.  Check whether the indices
> fit in the vector mode before using a variable permute.
> * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
> vec_perm_indices instead of an rtx.
> (expand_vec_perm): Replace with...
> (expand_vec_perm_const): ...this new function.  Take the selector
> as a vec_perm_indices rather than an rtx.  Also take the mode of
> the selector.  Update call to shift_amt_for_vec_perm_mask.
> Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
> Use vec_perm_indices::new_expanded_vector to expand the original
> selector into bytes.  Check whether the indices fit in the vector
> mode before using a variable permute.
> (expand_vec_perm_var): Make global.
> (expand_mult_highpart): Use expand_vec_perm_const.
> * fold-const.c: Includes vec-perm-indices.h.
> * tree-ssa-forwprop.c: Likewise.
> * tree-vect-data-refs.c: Likewise.
> * tree-vect-generic.c: Likewise.
> * tree-vect-loop.c: Likewise.
> * tree-vect-slp.c: Likewise.
> * tree-vect-stmts.c: Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
> Delete.
> * config/aarch64/aarch64-simd.md (vec_perm_const): Delete.
> * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
> (aarch64_vectorize_vec_perm_const_ok): Fuse into...
> (aarch64_vectorize_vec_perm_const): ...this new function.
> (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
> (TARGET_VECTORIZE_VEC_PERM

Re: [06/13] Check whether a vector of QIs can store all indices

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:18 AM, Richard Sandiford
 wrote:
> The patch to remove the vec_perm_const optab checked whether replacing
> a constant permute with a variable permute is safe, or whether it might
> truncate the indices.  This patch adds a corresponding check for whether
> variable permutes can be lowered to QImode-based permutes.

Ok.

>
> 2017-12-09  Richard Sandiford  
>
> gcc/
> * optabs-query.c (can_vec_perm_var_p): Check whether lowering
> to qimode could truncate the indices.
> * optabs.c (expand_vec_perm_var): Likewise.
>
> Index: gcc/optabs-query.c
> ===
> --- gcc/optabs-query.c  2017-12-09 22:47:21.534314227 +
> +++ gcc/optabs-query.c  2017-12-09 22:47:25.861316866 +
> @@ -378,7 +378,8 @@ can_vec_perm_var_p (machine_mode mode)
>
>/* We allow fallback to a QI vector mode, and adjust the mask.  */
>machine_mode qimode;
> -  if (!qimode_for_vec_perm (mode).exists (&qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode)
> +  || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
>  return false;
>
>if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2017-12-09 22:47:23.878315657 +
> +++ gcc/optabs.c2017-12-09 22:47:25.861316866 +
> @@ -5595,7 +5595,8 @@ expand_vec_perm_var (machine_mode mode,
>/* As a special case to aid several targets, lower the element-based
>   permutation to a byte-based permutation and try again.  */
>machine_mode qimode;
> -  if (!qimode_for_vec_perm (mode).exists (&qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode)
> +  || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
>  return NULL_RTX;
>icode = direct_optab_handler (vec_perm_optab, qimode);
>if (icode == CODE_FOR_nothing)


Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors

2017-12-12 Thread Richard Sandiford
Richard Biener  writes:
> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
>  wrote:
>> This series is a replacement for:
>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
>> based on the feedback that using VEC_PERM_EXPR would be better.
>>
>> The changes are:
>>
>> (1) Remove the restriction that the selector elements have to have the
>> same width as the data elements, but only for constant selectors.
>> This lets through the cases we need without also allowing
>> potentially-expensive ops.  Adding support for the variable
>> case can be done later if it seems useful, but it's not trivial.
>>
>> (2) Encode the integer form of constant selectors (vec_perm_indices)
>> in the same way as the new VECTOR_CST encoding, so that it can
>> cope with variable-length vectors.
>>
>> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>> code.  This avoids the need to create a CONST_VECTOR for the wide
>> selectors, and hence the need to have a corresponding wide vector
>> mode (which the target wouldn't otherwise need or support).
>
> Hmm.  Makes sense I suppose.
>
>> (4) When handling the variable vec_perm optab, check that modes can store
>> all element indices before using them.
>>
>> (5) Unconditionally use ssizetype selector elements in permutes created
>> by the vectoriser.
>
> Why specifically _signed_ sizetype?  That sounds like an odd choice.  But I'll
> eventually see when looking at the patch.

Sorry, should have said.  The choice doesn't matter for vector lengths
that are a power of 2, but for others, using a signed selector means that
-1 always selects the last input element, whereas for unsigned selectors,
the element selected by -1 would depend on the selector precision.  (And the
use of sizetype precision is pretty arbitrary.)

> Does that mean we have a VNDImode vector unconditionally for the
> permute even though a vector matching the width of the data members
> would work?

A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
if we're selecting 64-bit data elements.

> What happens if the target doesn't have vec_perm_const but vec_perm to
> handle all constant permutes?

We'll try to represent the selector as VN?I, where ? matches the width
of the data, but only after checking that that doesn't change the
selector values.  So for current targets we'll use vec_perm for constant
permutes as before, but we wouldn't for 2-input V256QI interleaves.

Thanks,
Richard


Re: [07/13] Make vec_perm_indices use new vector encoding

2017-12-12 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
 wrote:
> This patch changes vec_perm_indices from a plain vec<> to a class
> that stores a canonicalised permutation, using the same encoding
> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
> information about the number of vectors being permuted (currently
> always 1 or 2) and the number of elements in each input vector.

Before I dive into  the C++ details can you explain why it needs this
info and how it encodes it for variable-length vectors?  To interleave
two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
can directly encode N here, can we?  extract even/odd should just
work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
one or two vectors (the one vector case just has two times the same
vector) or how many elements each of the vectors (or the result) has.

Richard.

> A new vec_perm_builder class is used to actually build up the vector,
> like tree_vector_builder does for trees.  vec_perm_indices is the
> completed representation, a bit like VECTOR_CST is for trees.
>
> The patch just does a mechanical conversion of the code to
> vec_perm_builder: a later patch uses explicit encodings where possible.
>
> The point of all this is that it makes the representation suitable
> for variable-length vectors.  It's no longer necessary for the
> underlying vec<>s to store every element explicitly.
>
> In int-vector-builder.h, "using the same encoding as tree and rtx constants"
> describes the endpoint -- adding the rtx encoding comes later.
>
>
> 2017-12-09  Richard Sandiford  
>
> gcc/
> * int-vector-builder.h: New file.
> * vec-perm-indices.h: Include int-vector-builder.h.
> (vec_perm_indices): Redefine as an int_vector_builder.
> (auto_vec_perm_indices): Delete.
> (vec_perm_builder): Redefine as a stand-alone class.
> (vec_perm_indices::vec_perm_indices): New function.
> (vec_perm_indices::clamp): Likewise.
> * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
> (vec_perm_indices::new_vector): New function.
> (vec_perm_indices::new_expanded_vector): Update for new
> vec_perm_indices class.
> (vec_perm_indices::rotate_inputs): New function.
> (vec_perm_indices::all_in_range_p): Operate directly on the
> encoded form, without computing elided elements.
> (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
> encoding.  Update for new vec_perm_indices class.
> * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
> the given vec_perm_builder.
> (expand_vec_perm_var): Update vec_perm_builder constructor.
> (expand_mult_highpart): Use vec_perm_builder instead of
> auto_vec_perm_indices.
> * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
> vec_perm_indices instead of auto_vec_perm_indices.  Use a single
> or double series encoding as appropriate.
> * fold-const.c (fold_ternary_loc): Use vec_perm_builder and
> vec_perm_indices instead of auto_vec_perm_indices.
> * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
> (vect_permute_store_chain): Likewise.
> (vect_grouped_load_supported): Likewise.
> (vect_permute_load_chain): Likewise.
> (vect_shift_permute_load_chain): Likewise.
> * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
> (vect_transform_slp_perm_load): Likewise.
> (vect_schedule_slp_instance): Likewise.
> * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
> (vectorizable_mask_load_store): Likewise.
> (vectorizable_bswap): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
> vec_perm_indices instead of auto_vec_perm_indices.  Use
> tree_to_vec_perm_builder to read the vector from a tree.
> * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
> vec_perm_builder instead of a vec_perm_indices.
> (have_whole_vector_shift): Use vec_perm_builder and
> vec_perm_indices instead of auto_vec_perm_indices.  Leave the
> truncation to calc_vec_perm_mask_for_shift.
> (vect_create_epilog_for_reduction): Likewise.
> * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
> from auto_vec_perm_indices to vec_perm_indices.
> (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
> instead of changing individual elements.
> (aarch64_vectorize_vec_perm_const): Use new_vector to install
> the vector in d.perm.
> * config/arm/arm.c (expand_vec_perm_d::perm): Change
> from auto_vec_perm_indices to vec_perm_indices.
> (arm

Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors

2017-12-12 Thread Richard Biener
On Tue, Dec 12, 2017 at 4:32 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
>>  wrote:
>>> This series is a replacement for:
>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
>>> based on the feedback that using VEC_PERM_EXPR would be better.
>>>
>>> The changes are:
>>>
>>> (1) Remove the restriction that the selector elements have to have the
>>> same width as the data elements, but only for constant selectors.
>>> This lets through the cases we need without also allowing
>>> potentially-expensive ops.  Adding support for the variable
>>> case can be done later if it seems useful, but it's not trivial.
>>>
>>> (2) Encode the integer form of constant selectors (vec_perm_indices)
>>> in the same way as the new VECTOR_CST encoding, so that it can
>>> cope with variable-length vectors.
>>>
>>> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>>> code.  This avoids the need to create a CONST_VECTOR for the wide
>>> selectors, and hence the need to have a corresponding wide vector
>>> mode (which the target wouldn't otherwise need or support).
>>
>> Hmm.  Makes sense I suppose.
>>
>>> (4) When handling the variable vec_perm optab, check that modes can store
>>> all element indices before using them.
>>>
>>> (5) Unconditionally use ssizetype selector elements in permutes created
>>> by the vectoriser.
>>
>> Why specifically _signed_ sizetype?  That sounds like an odd choice.  But 
>> I'll
>> eventually see when looking at the patch.
>
> Sorry, should have said.  The choice doesn't matter for vector lengths
> that are a power of 2,

which are the only ones we support anyway?

> but for others, using a signed selector means that
> -1 always selects the last input element, whereas for unsigned selectors,
> the element selected by -1 would depend on the selector precision.  (And the
> use of sizetype precision is pretty arbitrary.)

hmm, so you are saying that vec_perm  is equal
to vec_perm 
>> Does that mean we have a VNDImode vector unconditionally for the
>> permute even though a vector matching the width of the data members
>> would work?
>
> A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
> if we're selecting 64-bit data elements.

And on GIMPLE?  Do we have a vector type with ssizetype elements
unconditionally?

>> What happens if the target doesn't have vec_perm_const but vec_perm to
>> handle all constant permutes?
>
> We'll try to represent the selector as VN?I, where ? matches the width
> of the data, but only after checking that that doesn't change the
> selector values.  So for current targets we'll use vec_perm for constant
> permutes as before, but we wouldn't for 2-input V256QI interleaves.

I see.

Richard.

> Thanks,
> Richard


Re: [07/13] Make vec_perm_indices use new vector encoding

2017-12-12 Thread Richard Sandiford
Richard Biener  writes:
> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>  wrote:
>> This patch changes vec_perm_indices from a plain vec<> to a class
>> that stores a canonicalised permutation, using the same encoding
>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>> information about the number of vectors being permuted (currently
>> always 1 or 2) and the number of elements in each input vector.
>
> Before I dive into  the C++ details can you explain why it needs this
> info and how it encodes it for variable-length vectors?  To interleave
> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
> can directly encode N here, can we?  extract even/odd should just
> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
> one or two vectors (the one vector case just has two times the same
> vector) or how many elements each of the vectors (or the result) has.

One of the later patches switches the element types to HOST_WIDE_INT,
so that we can represent all ssizetypes.  Then there's a poly_int
patch (not yet posted) to make that poly_int64, so that we can
represent the N even for variable-length vectors.

The class needs to know the number of elements because that affects
the canonical representation.  E.g. extract even on fixed-length
vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
which we can't encode as a simple series.  Interleave low with both
inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
variable-length vectors.

Also, operator[] is supposed to return an in-range selector even if
the selector element is only implicitly encoded.  So we need to know
the number of input elements there.

Separating the number of input elements into the number of inputs
and the number of elements per input isn't really necessary, but made
it easier to provide routines for testing whether all selected
elements come from a particular input, and for rotating the selector
by a whole number of inputs.

Thanks,
Richard


Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors

2017-12-12 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Dec 12, 2017 at 4:32 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
>>>  wrote:
 This series is a replacement for:
 https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
 based on the feedback that using VEC_PERM_EXPR would be better.

 The changes are:

 (1) Remove the restriction that the selector elements have to have the
 same width as the data elements, but only for constant selectors.
 This lets through the cases we need without also allowing
 potentially-expensive ops.  Adding support for the variable
 case can be done later if it seems useful, but it's not trivial.

 (2) Encode the integer form of constant selectors (vec_perm_indices)
 in the same way as the new VECTOR_CST encoding, so that it can
 cope with variable-length vectors.

 (3) Remove the vec_perm_const optab and reuse the target hook to emit
 code.  This avoids the need to create a CONST_VECTOR for the wide
 selectors, and hence the need to have a corresponding wide vector
 mode (which the target wouldn't otherwise need or support).
>>>
>>> Hmm.  Makes sense I suppose.
>>>
 (4) When handling the variable vec_perm optab, check that modes can store
 all element indices before using them.

 (5) Unconditionally use ssizetype selector elements in permutes created
 by the vectoriser.
>>>
>>> Why specifically _signed_ sizetype?  That sounds like an odd choice.
>>> But I'll
>>> eventually see when looking at the patch.
>>
>> Sorry, should have said.  The choice doesn't matter for vector lengths
>> that are a power of 2,
>
> which are the only ones we support anyway?

Yeah, for fixed-length at the tree level.  The variable-length support
allows (2^N)*X vectors for non-power-of-2 X though, and we support
non-power-of-2 fixed-length vectors in RTL (e.g. V12QI).

>> but for others, using a signed selector means that
>> -1 always selects the last input element, whereas for unsigned selectors,
>> the element selected by -1 would depend on the selector precision.  (And the
>> use of sizetype precision is pretty arbitrary.)
>
> hmm, so you are saying that vec_perm  is equal
> to vec_perm  tree.def defines VEC_PERM_EXPR via
>
>N = length(mask)
>foreach i in N:
>  M = mask[i] % (2*N)
>  A = M < N ? v0[M] : v1[M-N]
>
> which doesn't reflect this behavior.  Does this behavior persist for variable
> vector permutations?

__builtin_shuffle is defined to wrap though:

  The elements of the input vectors are numbered in memory ordering of
  @var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}.  The
  elements of @var{mask} are considered modulo @var{N} in the single-operand
  case and modulo @math{2*@var{N}} in the two-operand case.

I think we need to preserve that for VEC_PERM_EXPR, otherwise we'd need
to introduce the masking operation when lowering __builtin_shuffle to
VEC_PERM_EXPR.

>>> Does that mean we have a VNDImode vector unconditionally for the
>>> permute even though a vector matching the width of the data members
>>> would work?
>>
>> A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
>> if we're selecting 64-bit data elements.
>
> And on GIMPLE?  Do we have a vector type with ssizetype elements
> unconditionally?

For autovectorised permutes, yes.  Other permutes we keep the existing types.

Thanks,
Richard


Re: [PATCH, rs6000] generate loop code for memcmp inline expansion

2017-12-12 Thread Segher Boessenkool
Hi!

On Mon, Dec 11, 2017 at 09:49:45AM -0600, Aaron Sawdey wrote:
> This patch builds on two previously posted patches:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01216.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02599.html
> 
> The previous patches allow the use of bdnzt.
> 
> This patch allows the cmpstrsi pattern expansion to handle longer blocks 
> without
> increasing code size by using a loop:
> 
> .L13:
> ldbrx 7,3,6
> ldbrx 9,10,6
> ldbrx 0,3,5
> ldbrx 4,10,5
> addi 6,6,16
> addi 5,5,16
> subfc. 9,9,7
> bne 0,.L10
> subfc. 9,4,0
> bdnzt 2,.L13
> 
> Performance testing on P8 showed that unroll x2 with 2 IVs was the way to go,
> and it is the same performance as the previous non-loop inlne code for a
> 32 byte compare. Also on P8 the performance crossover with calling glibc 
> memcmp
> happens around 192 bytes length. The loop expansion code can also be generated
> when the length is not known at compile time.
> 
> The gcc.dg/memcmp-1.c and gcc.dg/strncmp-2.c test cases are updated by the
> patch to check the additional memcmp expansion for known and unknown lengths.
> 
> The undocumented option -mblock-compare-inline-limit was changed slightly to
> mean how many bytes will be compared by the non-loop inline code, the default
> is 31 bytes. As previously, -mblock-compare-inlne-limit=0 will disable all
> inline expansion of memcmp(). The new -mblock-compare-inline-loop-limit option
> sets the upper limit, for lengths above that no expansion is done if the 
> length
> is known at compile time. If the length is unknown, the generated code calls
> glibc memcmp() after comparing that many bytes.
> 
> Bootstrap and regtest pass on P8 LE and P9 LE, testing in progress for
> BE (default, P7, and P8). If tests pass, OK for trunk when the first of the
> previously posted patches is approved?

It looks very good.  Just some whitespace etc. issues:

> +/* Generate rtl for a load, shift, and compare of less than a full word.
> +
> +   LOAD_MODE is the machine mode for the loads.
> +   DIFF is the reg for the difference.
> +   CMP_REM is the reg containing the remaining bytes to compare.
> +   DCOND is the CCUNS reg for the compare if we are doing P9 code with setb.
> +   SRC1_ADDR is the first source address.
> +   SRC2_ADDR is the second source address.
> +   ORIG_SRC1 is the original first source block's address rtx.
> +   ORIG_SRC2 is the original second source block's address rtx.   */

You have a tab instead of a space at the end there.

> +/* Generate rtl for an overlapping load and compare of less than a
> +   full load_mode.  This assumes that we have already compared the
> + emit_move_insn (adj_reg, GEN_INT (-addr_adj));

That last line doesn't really make sense; pasto?

> +   ORIG_SRC2 is the original second source block's address rtx. */

Dot-space-space-end-of-comment.

> +static void
> +do_overlap_load_compare (machine_mode load_mode, bool isConst,
> +HOST_WIDE_INT bytes_rem, rtx diff,
> +rtx cmp_rem, rtx dcond, rtx src1_addr, rtx src2_addr,
> +rtx orig_src1, rtx orig_src2)

Should use tabs.

> +  bool isP7 = (rs6000_cpu == PROCESSOR_POWER7);
> +  //bool isP8 = (rs6000_cpu == PROCESSOR_POWER8);
> +  bool isP9 = (rs6000_cpu == PROCESSOR_POWER9);

Just remove the unused code please (certainly when it's trivial to recreate
it, like here).

> +  /* Anything to move?   */
> +  HOST_WIDE_INT bytes = 0;
> +  if (bytes_is_const)
> +bytes = INTVAL (bytes_rtx);
> +
> +  if (bytes_is_const && bytes == 0)
> +return true;
> +
> +  /* Limit the amount we compare, if known statically.   */

These last two comments end in tabs as well; they shouldn't.

> +  /* Allow the option to override the default.   */

And more; check the whole file please.

> +  if (rs6000_block_compare_inline_loop_limit >= 0)
> +max_bytes = (unsigned HOST_WIDE_INT) 
> rs6000_block_compare_inline_loop_limit;

This doesn't need a cast I think?

> +  /* Extend bytes_rtx to word_mode if needed.  But, we expect only to
> +   maybe have to deal with the case were bytes_rtx is SImode and
> +   word_mode is DImode.   */
> +  if (!bytes_is_const)
> +{
> +  if (GET_MODE_SIZE (GET_MODE (bytes_rtx)) > GET_MODE_SIZE (word_mode))
> + return false; /* This is unexpected.  */

Comments at the end of line are unexpected, too ;-)  Put it a line up?
Or even before the "if".

> +  /* Number of bytes per iteration of the unrolled loop.  */
> +  HOST_WIDE_INT loop_bytes = 2*load_mode_size;
> +  /* max iters and bytes compared in the loop.   */
> +  HOST_WIDE_INT max_loop_iter = max_bytes/loop_bytes;
> +  HOST_WIDE_INT max_loop_bytes = max_loop_iter*loop_bytes;
> +  int l2lb = floor_log2 (loop_bytes);

Spaces around binary operators.

> +  rtx final_label = gen_label_rtx();

Space before () (more of these).

> +  /* Difference found is stored here

Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao
Hi, Richard,

thanks a lot for your review.

> 
>>{
>>  /* __copy is always the same for characters.
>> Check to see if copy function already exists.  */
>> - sprintf (name, "__copy_character_%d", ts->kind);
>> + name = xasprintf ("__copy_character_%d", ts->kind);
>>  contained = ns->contained;
>>  for (; contained; contained = contained->sibling)
>>if (contained->proc_name
>> @@ -2796,6 +2802,7 @@ find_intrinsic_vtab (gfc_typespec *ts)
>>  vtab->ts.u.derived = vtype;
>>  vtab->value = gfc_default_initializer (&vtab->ts);
>>}
>> +  free (name);
> 
> It looks like this might be in a performance critical lookup path
> where we'd really
> like to avoid allocating/freeing memory.  Why's GFC_MAX_SYMBOL_LEN+1 not
> enough?  Leaving for fortran folks to comment - note you should CC
> fort...@gcc.gnu.org 
> for fortran patches (done).

I have sent this patch to fort...@gcc.gnu.org  per 
Thomas’s suggestion last week, and got approval by fortran team on last Friday:

https://gcc.gnu.org/ml/fortran/2017-12/msg00027.html

> 
>>}
>> 
>>  found_sym = vtab;
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..fef1969 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>> the array could have zero length.  */
>>  *minlen = ssize_int (0);
>>}
>> +
>> +  if (VAR_P (arg)
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || integer_zerop (val))
> 
> val might be non-constant so you also need to check TREE_CODE (val) !=
> INTEGER_CST here.

Okay.
> 
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + integer_one_node);
> 
>   val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide (val), 1));
> 
> you pass a possibly bogus type of 1 to fold_build2 above so the wide-int 
> variant
> is prefered.
> 
> The gimple-fold.c changes are ok with that change.

Per your comments, the updated gimple-fold.c is like the following:

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..0500fba 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
 the array could have zero length.  */
  *minlen = ssize_int (0);
}
+
+  if (VAR_P (arg) 
+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+{
+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop 
(val))
+return false;
+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+build_int_cst (TREE_TYPE (val), 1));
+  /* Set the minimum size to zero since the string in
+ the array could have zero length.  */
+  *minlen = ssize_int (0);
+}
}
 
   if (!val)

let me know any further issue with the above.

thanks a lot.

Qing





Re: [PR81165] discount killed stmts when sizing blocks for threading

2017-12-12 Thread Jeff Law
On 12/11/2017 10:08 PM, Alexandre Oliva wrote:
> 
>> I wonder if for the handling of a larger threading path it wouldn't be 
>> useful to
>> keep ssa_remaining_uses live and populated across processing of the threaded
>> conditionals?  OTOH it would probably require to track uses on the path vs.
>> not on the path.
> 
> Possibly.  Not sure.  I wondered about that myself.
So I finally found the other BZ in this space that I was looking for.
36550.

In 36550 we're getting a false positive warning at -Os because we
restrict jump threading to cases where the copied block has no real
statements other than the control statement at the end (which we're
going to eliminate).

As it turns out in the 36550 case the key block has real statements, but
they're all going to be dead after threading.  So naturally Alex's code
is of interest here.

I wonder if we could attach the result of Alex's analysis bits to the
thread path, then query it later.  That would allow us to solve 36550
without re-analyzing the path.  See class tree-ssa-threadupdate.h's
definition of jump_thread_edge -- that's where I want to hang the result
of the analysis.

Alex want to take a looksie and see if you can attach the result to the
path, then query it in tree-ssa-threadupdate.c at this point:

 /* If optimizing for size, only thread through block if we don't have
 to duplicate it or it's an otherwise empty redirection block.  */
  if (optimize_function_for_size_p (cfun))
{
  EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
{
  bb = BASIC_BLOCK_FOR_FN (cfun, i);
  if (EDGE_COUNT (bb->preds) > 1
  && !redirection_block_p (bb))
{
  FOR_EACH_EDGE (e, ei, bb->preds)
{
  if (e->aux)
{
  vec *path = THREAD_PATH (e);
  delete_jump_thread_path (path);
  e->aux = NULL;
}
}
}
  else
bitmap_set_bit (threaded_blocks, i);


There may be other BZs of a similar nature -- there's certainly others
were we get spurious -Wuninitialized warnings with -Os where threading
got throttled.  But I haven't verified if all the statements in the
copied block were dead on those other BZs.




> 
> 
>>PARAM_MAX_JUMP_THREAD_DUPLICATION_STMTS specifies the maximum number
>>of statements and PHI nodes allowed in a block which is going to
>>be duplicated for thread jumping purposes.
> 
>> should be adjusted though?
> 
> I guess not, we still abide by that, roughly, we just (as-if) remove
> dead stmts and phi nodes from the block before counting them.
Well, consider a block with 16 statements, 15 of which were dead.
Previously we wouldn't thread through that block.  Now we would thread
resulting in a tiny amount of codesize increase as there's a statement
to copy that would survive DCE.

But the effect should be very small and I think your testing results so
that to be true.

Jeff


Re: [PATCH][RFA][P1 PR tree-optimization/83298] Avoid over-optimistic result range for COND_EXPR

2017-12-12 Thread Jeff Law
On 12/12/2017 01:39 AM, Richard Biener wrote:
> 
> LGTM.  I notice (existing code) that we use set_vr_value from push_value_range
> vs. update_value_range from the regular code.  Also for some unrelated missed
> optimization I'd like to be able to set and unwind SSA range info via the 
> stack
> as well.  Both future things we might want to cleanup.  I guess once the
> SSA propagator VRP can go there's an opportunity to cleanup even more of the
> code.
Right.  It's clearly an area ripe for a bit of cleanup.  The avail_exprs
and const/copies tables take the approach of recording everything in the
unwind tables.  The VRP bits try to be smarter and only use the
unwinding stack when it's absolutely necessary.  I'm torn as I can see
pros and cons of both approaches.  I'd certainly like to settle on just
one for consistency's sake.

The API for update_value_range is painful -- it assumes the new range is
a temporary range and that the equivs hung off it can be free'd.  I'm
not sure why it's done that way, but it introduces a certain amount of
pain.  So we have to be careful in how we use it.

A part of me wants to classify the lowest level range structure
(value_range) and introduce some invariants (ie, can they be shared or
are they copied).  But it feels like not enough gain right now.

I absolutely agree that we'll want the push/pop scope primitives to be
exposed.  I've consistently seen value in that with the other tables,
thus exposing those in the VRP bits seemed advisable.

Jeff

ps.  Another report came in today that is almost certainly a DUP of
83298. I'll include its testcase in the commit.


Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Richard Biener
On December 12, 2017 5:13:19 PM GMT+01:00, Qing Zhao  
wrote:
>Hi, Richard,
>
>thanks a lot for your review.
>
>> 
>>>{
>>>  /* __copy is always the same for characters.
>>> Check to see if copy function already exists. 
>*/
>>> - sprintf (name, "__copy_character_%d", ts->kind);
>>> + name = xasprintf ("__copy_character_%d",
>ts->kind);
>>>  contained = ns->contained;
>>>  for (; contained; contained = contained->sibling)
>>>if (contained->proc_name
>>> @@ -2796,6 +2802,7 @@ find_intrinsic_vtab (gfc_typespec *ts)
>>>  vtab->ts.u.derived = vtype;
>>>  vtab->value = gfc_default_initializer (&vtab->ts);
>>>}
>>> +  free (name);
>> 
>> It looks like this might be in a performance critical lookup path
>> where we'd really
>> like to avoid allocating/freeing memory.  Why's GFC_MAX_SYMBOL_LEN+1
>not
>> enough?  Leaving for fortran folks to comment - note you should CC
>> fort...@gcc.gnu.org 
>> for fortran patches (done).
>
>I have sent this patch to fort...@gcc.gnu.org
> per Thomas’s suggestion last week, and got
>approval by fortran team on last Friday:
>
>https://gcc.gnu.org/ml/fortran/2017-12/msg00027.html
>
>> 
>>>}
>>> 
>>>  found_sym = vtab;
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 353a46e..fef1969 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>bitmap *visited, int type,
>>> the array could have zero length.  */
>>>  *minlen = ssize_int (0);
>>>}
>>> +
>>> +  if (VAR_P (arg)
>>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>>> +{
>>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>>> +  if (!val || integer_zerop (val))
>> 
>> val might be non-constant so you also need to check TREE_CODE (val)
>!=
>> INTEGER_CST here.
>
>Okay.
>> 
>>> +return false;
>>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>>> + integer_one_node);
>> 
>>   val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide
>(val), 1));

I don't see this requested change. 

>> you pass a possibly bogus type of 1 to fold_build2 above so the
>wide-int variant
>> is prefered.
>> 
>> The gimple-fold.c changes are ok with that change.
>
>Per your comments, the updated gimple-fold.c is like the following:
>
>diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>index 353a46e..0500fba 100644
>--- a/gcc/gimple-fold.c
>+++ b/gcc/gimple-fold.c
>@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>bitmap *visited, int type,
>the array could have zero length.  */
> *minlen = ssize_int (0);
>   }
>+
>+  if (VAR_P (arg) 
>+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>+{
>+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>+  if (!val || TREE_CODE (val) != INTEGER_CST ||
>integer_zerop (val))
>+return false;
>+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>+   build_int_cst (TREE_TYPE (val), 1));
>+  /* Set the minimum size to zero since the string in
>+ the array could have zero length.  */
>+  *minlen = ssize_int (0);
>+}
>   }
> 
>   if (!val)
>
>let me know any further issue with the above.
>
>thanks a lot.
>
>Qing



Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-12-12 Thread Segher Boessenkool
On Mon, Dec 11, 2017 at 03:57:51PM -0500, Michael Meissner wrote:
> > > +extern KCtype __divkc3 (KFtype, KFtype, KFtype, KFtype);
> > > +
> > >  KCtype
> > >  __divkc3 (KFtype a, KFtype b, KFtype c, KFtype d)
> > >  {
> > 
> > How does this warn?  -Wmissing-declarations?  Should this declaration be
> > in a header then?
> 
> The compiler creates the call to __mulkc3 and __divkc3, and internally it has
> the appropriate prototype like it does for all built-in functions (in this
> case, returning an _Float128 _Complex type, and taking 4 _Float128 arguments).
> 
> So before adding ifunc support, we never noticed it didn't have a prototype,
> because the compiler already has a prototype.

I still don't get it.  A function definition is also a declaration.

Something very non-intuitive is happening?

What does the patch change here?

> With ifunc support, we now need to create two separate functions, __mulkc3_sw
> and __mulkc3_hw, and make __multkc3 the ifunc resolver.
> 
> So there really isn't an include file that is appropriate to put the
> definitions in.  I could change it to use the soft-fp includes (including
> quadmath-float128.h) if desired.
> 
> Did you want me to do that?

I don't see the point in adding a second declaration right before the
existing declaration (the function definition).  I'm fine with what file
it is in.

> > A code comment explaining why you do a declaration for exactly the same
> > thing as there is two lines later would help; otherwise people will try
> > to delete it again :-)


Segher


RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]

2017-12-12 Thread Koval, Julia
Here is the patch to update these files with my contributions. Ok for trunk?

Thanks,
Julia

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Gerald Pfeifer
> Sent: Tuesday, December 12, 2017 11:34 AM
> To: Koval, Julia 
> Cc: Kirill Yukhin ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]
> 
> On Tue, 12 Dec 2017, Koval, Julia wrote:
> > Looks good. How to put it there(sorry, noob question)?
> 
> Does https://gcc.gnu.org/about.html help?  If not, let me know
> and I'll work with you (and update those docs on the way).
> 
> Of course, even if things work for you, any suggestions on how
> to improve this little page are very welcome. :)
> 
> Gerald


patch
Description: patch


Re: [PATCH, rs6000] Fix PR83332 (missing vcond patterns)

2017-12-12 Thread Segher Boessenkool
Hi!

On Mon, Dec 11, 2017 at 03:55:20PM -0600, Bill Schmidt wrote:
> A new test case introduced for PR81303 failed on powerpc64 (BE, LE).  This
> turns out to be due to a missing standard pattern (vcondv2div2df).  This
> and a couple of other patterns are easy to support with existing logic
> by just adding new patterns with appropriate modes.  That's all this patch
> does.  That's sufficient to cause the failing test to pass.

Nitpicking:

> +(define_expand "vcondv2dfv2di"
> +  [(set (match_operand:V2DF 0 "vfloat_operand" "")

No empty default "" arguments please.

> + (if_then_else:V2DF
> +  (match_operator 3 "comparison_operator"
> +  [(match_operand:V2DI 4 "vint_operand" "")
> +   (match_operand:V2DI 5 "vint_operand" "")])
> +  (match_operand:V2DF 1 "vfloat_operand" "")
> +  (match_operand:V2DF 2 "vfloat_operand" "")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)
> +   && VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DImode)"
> +  "
> +{
> +  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> + operands[3], operands[4], operands[5]))
> +DONE;
> +  else
> +FAIL;
> +}")

And no "" around the block.

Okay for trunk with that fixed up.  Thanks!


Segher


Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Martin Sebor

On 12/12/2017 09:13 AM, Qing Zhao wrote:

Hi, Richard,

thanks a lot for your review.




   {
 /* __copy is always the same for characters.
Check to see if copy function already exists.  */
- sprintf (name, "__copy_character_%d", ts->kind);
+ name = xasprintf ("__copy_character_%d", ts->kind);
 contained = ns->contained;
 for (; contained; contained = contained->sibling)
   if (contained->proc_name
@@ -2796,6 +2802,7 @@ find_intrinsic_vtab (gfc_typespec *ts)
 vtab->ts.u.derived = vtype;
 vtab->value = gfc_default_initializer (&vtab->ts);
   }
+  free (name);


It looks like this might be in a performance critical lookup path
where we'd really
like to avoid allocating/freeing memory.  Why's GFC_MAX_SYMBOL_LEN+1 not
enough?  Leaving for fortran folks to comment - note you should CC
fort...@gcc.gnu.org 
for fortran patches (done).


I have sent this patch to fort...@gcc.gnu.org  per 
Thomas’s suggestion last week, and got approval by fortran team on last Friday:

https://gcc.gnu.org/ml/fortran/2017-12/msg00027.html




   }

 found_sym = vtab;
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..fef1969 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
the array could have zero length.  */
 *minlen = ssize_int (0);
   }
+
+  if (VAR_P (arg)
+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+{
+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+  if (!val || integer_zerop (val))


val might be non-constant so you also need to check TREE_CODE (val) !=
INTEGER_CST here.


Okay.



+return false;
+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+ integer_one_node);


  val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide (val), 1));

you pass a possibly bogus type of 1 to fold_build2 above so the wide-int variant
is prefered.

The gimple-fold.c changes are ok with that change.


Per your comments, the updated gimple-fold.c is like the following:


FWIW, I suspect Richard is thinking of VLAs with the INTEGER_CST
comment above.  This is the case we discussed in private.  It is
handled either way but the handling could be improved by determining
the size of the VLA from the first __builtin_alloca_with_align
argument and using it or its range as the minimum and maximum.
With that, the range of string lengths that fit in vla in
the following could be determined and the sprintf call could
be diagnosed:

  char d[30];

  void f (unsigned n)
  {
if (n < 32)
  {
char vla[n];
__builtin_sprintf (d, "%s", vla);
  }
  }

I think this would be a nice enhancement to add on top of yours,
not just for VLAs but for dynamically allocated arrays as well,
and not just for the sprintf pass but also for the strlen pass
to optimize cases like:

  void f (unsigned n)
  {
if (n < 32)
  {
char vla[n];

fgets (vla, n, stdin);

unsigned len = strlen (vla);
if (len >= n)   // cannot hold
  abort (); // can be eliminated
   }
}

That said, at some point, it might make more sense to change
those passes to start tracking these things as they traverse
the CFG rather than having get_range_strlen() do the work.

Martin



diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..0500fba 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
 the array could have zero length.  */
  *minlen = ssize_int (0);
}
+
+  if (VAR_P (arg)
+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+{
+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop 
(val))
+return false;
+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+build_int_cst (TREE_TYPE (val), 1));
+  /* Set the minimum size to zero since the string in
+ the array could have zero length.  */
+  *minlen = ssize_int (0);
+}
}

   if (!val)

let me know any further issue with the above.

thanks a lot.

Qing







Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao

> On Dec 12, 2017, at 10:50 AM, Richard Biener  wrote:
>>> 
 +return false;
 +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
 + integer_one_node);
>>> 
>>>  val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide
>> (val), 1));
> 
> I don't see this requested change. 

>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));

for my original code:

 +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
 + integer_one_node);

I thought that your original major concern was:

“ you pass a possibly bogus type of 1 to fold_build2 above” 

so I use “build_int_cst (TREE_TYPE (val), 1)" to replace the original 
“integer_one_node”
to make the type of the 1 exactly the same as “val”. 

do I miss anything here?

I don’t quite understand why I have to use:

 val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide(val), 1));

to replace the original fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, 
integer_one_node)? 

thanks.

Qing

>> wide-int variant
>>> is prefered.
>>> 
>>> The gimple-fold.c changes are ok with that change.
>> 
>> Per your comments, the updated gimple-fold.c is like the following:
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..0500fba 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>> bitmap *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg) 
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST ||
>> integer_zerop (val))
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>>  if (!val)
>> 
>> let me know any further issue with the above.
>> 
>> thanks a lot.
>> 
>> Qing



Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-12-12 Thread Andreas Schwab
On Dez 12 2017, Segher Boessenkool  wrote:

> On Mon, Dec 11, 2017 at 03:57:51PM -0500, Michael Meissner wrote:
>> > > +extern KCtype __divkc3 (KFtype, KFtype, KFtype, KFtype);
>> > > +
>> > >  KCtype
>> > >  __divkc3 (KFtype a, KFtype b, KFtype c, KFtype d)
>> > >  {
>> > 
>> > How does this warn?  -Wmissing-declarations?  Should this declaration be
>> > in a header then?
>> 
>> The compiler creates the call to __mulkc3 and __divkc3, and internally it has
>> the appropriate prototype like it does for all built-in functions (in this
>> case, returning an _Float128 _Complex type, and taking 4 _Float128 
>> arguments).
>> 
>> So before adding ifunc support, we never noticed it didn't have a prototype,
>> because the compiler already has a prototype.
>
> I still don't get it.  A function definition is also a declaration.
>
> Something very non-intuitive is happening?

`-Wmissing-prototypes (C and Objective-C only)'
 Warn if a global function is defined without a previous prototype
 declaration.  This warning is issued even if the definition itself
 provides a prototype.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.

2017-12-12 Thread Tamar Christina
Hi All,

The previous test made use of arm_neon.h which made the whole test
rather fragile and only applicable to some of the arm targets.

So instead I make use of different fpus now to test the generation of
fmla instructions. The actual instruction itself is not tested as all
we care about if that the proper .fpu directives are generated.

Regtested on arm-none-eabi and arm-none-linux-gnueabihf
with no regressions.

Ok for trunk?


gcc/testsuite/
2017-12-12  Tamar Christina  

PR target/82641
* gcc.target/arm/pragma_fpu_attribute.c: New.
* gcc.target/arm/pragma_fpu_attribute_2.c: New.

-- 
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
index 5f039d9bfb2b14f9134f138527fc395b8e273bbb..8191deac25965564a3c78dc08959a5e5637a0066 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c
@@ -1,21 +1,19 @@
 /* Test for target attribute assembly extension generations.  */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-require-effective-target arm_neon_ok } */
 /* { dg-additional-options "-std=gnu99" } */
 
 #include 
-#include 
+
+extern uint32_t bar ();
 
 #pragma GCC target("fpu=vfpv3-d16")
 
-extern uint32_t bar();
+extern float fmaf (float, float, float);
 
-__attribute__((target("fpu=crypto-neon-fp-armv8"))) poly64x1_t vsricw(poly64x1_t crc, uint32_t val)
+float
+__attribute__((target("fpu=vfpv4"))) vfma32 (float x, float y, float z)
 {
-poly64x1_t res;
-asm("vsri %0, %1, %2" : "=r"(res) : "r"(crc), "r"(val));
-return res;
+  return fmaf (x, y, z);
 }
 
 uint32_t restored ()
@@ -23,5 +21,5 @@ uint32_t restored ()
   return bar();
 }
 
-/* { dg-final { scan-assembler-times {\.fpu\s+crypto-neon-fp-armv8} 1 } } */
+/* { dg-final { scan-assembler-times {\.fpu\s+vfpv4} 1 } } */
 /* { dg-final { scan-assembler-times {\.fpu\s+vfpv3-d16} 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
index b710de38612707b9109966f7bbc694a913121cb6..b50d4107b56ed7abd8b95fd2a3a08df8ab54410a 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
@@ -1,7 +1,5 @@
 /* Test for #pragma assembly extension generations.  */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-require-effective-target arm_neon_ok } */
 /* { dg-additional-options "-std=gnu99" } */
 
 #include 
@@ -12,12 +10,13 @@
 extern uint32_t bar();
 
 #pragma GCC push_options
-#pragma GCC target("fpu=crypto-neon-fp-armv8")
-poly64x1_t vsricw(poly64x1_t crc, uint32_t val)
+#pragma GCC target("fpu=vfpv4")
+extern float fmaf (float, float, float);
+
+float
+vfma32 (float x, float y, float z)
 {
-poly64x1_t res;
-asm("vsri %0, %1, %2" : "=r"(res) : "r"(crc), "r"(val));
-return res;
+  return fmaf (x, y, z);
 }
 #pragma GCC pop_options
 
@@ -26,5 +25,5 @@ uint32_t restored ()
   return bar();
 }
 
-/* { dg-final { scan-assembler-times {\.fpu\s+crypto-neon-fp-armv8} 1 } } */
+/* { dg-final { scan-assembler-times {\.fpu\s+vfpv4} 1 } } */
 /* { dg-final { scan-assembler-times {\.fpu\s+vfpv3-d16} 1 } } */



Re: [PATCH,AIX] Nil check size threshold control for AIX

2017-12-12 Thread Ian Lance Taylor
On Tue, Dec 12, 2017 at 12:59 AM, REIX, Tony  wrote:
> Description:
>  * This patch tells the Go frontend to insert explicit guards (check for nil 
> -> panic) for AIX since relying on a fault does not work on AIX for page 0.
>
> Tests:
>  * AIX: Build: SUCCESS
>- build made by means of gmake.
>
> ChangeLog:
>   * go-lang.c (go_langhook_init): Handle AIX case for 
> nil_check_size_threshold.

Thanks.  I changed it to avoid #ifdef in the middle of the function,
and committed as follows.

Ian

2017-12-12  Tony Reix  
Ian Lance Taylor  

* go-lang.c (TARGET_AIX): Define if not defined.
(go_langhook_init): Set nil_check_size_threshold to -1 on AIX.
Index: gcc/go/go-lang.c
===
--- gcc/go/go-lang.c(revision 255581)
+++ gcc/go/go-lang.c(working copy)
@@ -39,6 +39,10 @@ along with GCC; see the file COPYING3.
 #include "go-c.h"
 #include "go-gcc.h"
 
+#ifndef TARGET_AIX
+#define TARGET_AIX 0
+#endif
+
 /* Language-dependent contents of a type.  */
 
 struct GTY(()) lang_type
@@ -112,7 +116,7 @@ go_langhook_init (void)
   args.check_divide_overflow = go_check_divide_overflow;
   args.compiling_runtime = go_compiling_runtime;
   args.debug_escape_level = go_debug_escape_level;
-  args.nil_check_size_threshold = 4096;
+  args.nil_check_size_threshold = TARGET_AIX ? -1 : 4096;
   args.linemap = go_get_linemap();
   args.backend = go_get_backend();
   go_create_gogo (&args);


RE: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c

2017-12-12 Thread Tamar Christina
Hi All,

Please consider this patch abandoned. I have submitted a new version.

Thanks,
Tamar

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Monday, December 11, 2017 12:11
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][ARM] Fix failing testcase pragma_fpu_attribute.c
> 
> On 11 December 2017 at 12:56, Tamar Christina 
> wrote:
> >> > >
> >> > > It also works when I build natively using just configure && make.
> >> > > Could be
> >> > something in the configure flags.
> >> > > Looking back at it, if the vanilla compiler doesn't support neon
> >> > > I can see the test failing. But fixing it means Turning on neon
> >> > > and then turning it
> >> > off after the include. Which makes the test do too many things.
> >> >
> >> > What are your configure flags?
> >> > Can you can&paste the command line used to compile the testcase
> >> > (from
> >> > gcc.log) ?
> >>
> >
> > Ah, Richard pointed out to me that the difference is in "soft" abi, I
> > was only testing Softfp and hard. I'll write a new testcase that should work
> for all.
> >
> 
> Indeed, you override the float-abi flags in your RUNTESTFLAGS, which I'm not
> doing.
> 
> I think your arm-none-eabi builds have soft, softfp and hard multilibs?
> 
> With arm-none-linux-gnueabi[hf], you cannot override float-abi as easily, see
> for instance:
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02323.html
> 
> Thanks,
> 
> Christophe
> 
> > Thanks
> >
> >> They are:
> >>
> >> Schedule of variations:
> >> arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-
> >> abi=softfp
> >> arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-
> armv8/-
> >> mfloat-abi=hard
> >>
> >>
> >> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> >> eabi/obj/gcc2/gcc/
> >> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c -marm
> >> - march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp
> >> -fno-diagnostics-show- caret -fdiagnostics-color=never -ansi
> >> -pedantic-errors -std=gnu99 -ffat-lto- objects -S
> >> -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
> >> pragma_fpu_attribute_2.s
> >>
> >> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> >> eabi/obj/gcc2/gcc/
> >> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -marm -
> >> march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp
> >> -fno-diagnostics-show- caret -fdiagnostics-color=never -ansi
> >> -pedantic-errors -std=gnu99 -ffat-lto- objects -S
> >> -specs=aprofile-validation.specs -Wa,-mno-warn-deprecated -o
> >> pragma_fpu_attribute.s
> >>
> >> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> >> eabi/obj/gcc2/gcc/
> >> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c  -
> mthumb -
> >> march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard   -fno-
> >> diagnostics-show-caret -fdiagnostics-color=never  -ansi
> >> -pedantic-errors -
> >> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs -Wa,-mno-
> >> warn-deprecated   -o pragma_fpu_attribute_2.s
> >>
> >> /build-arm-none-eabi/obj/gcc2/gcc/xgcc -B/build-arm-none-
> >> eabi/obj/gcc2/gcc/
> >> /src/gcc/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute.c -mthumb
> >> - march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -fno-
> >> diagnostics-show-caret -fdiagnostics-color=never -ansi
> >> -pedantic-errors -
> >> std=gnu99 -ffat-lto-objects -S -specs=aprofile-validation.specs
> >> -Wa,-mno- warn-deprecated -o pragma_fpu_attribute.s
> >>
> >> It's also weird that you only see one of the testcases failing.
> >> The pragma_fpu_attribute.c and pragma_fpu_attribute_2.c should have
> >> the exact same issues.
> >>
> >> >
> >> > Thanks
> >> >
> >> > >
> >> > > I will try to think of  a testcase that doesn't require neon, if
> >> > > I can't I'll just
> >> > remove the tests.
> >> > > They weren't being tested before and if there's no way to
> >> > > reliably test changing fpu options on ARM Then there's no point
> having them.
> >> > >
> >> >
> >> > Yes, that's becoming way too complex for the purpose :(
> >>
> >> I think I can do one using the fmla instructions. So will try that next.
> >>
> >> >
> >> > > Thanks,
> >> > > Tamar
> >> > >
> >> > >>
> >> > >> Christophe


[PATCH] PR libstdc++/83395 fix invocable traits for INVOKE

2017-12-12 Thread Jonathan Wakely

This restores the is_void check to __is_invocable_impl that I
removed recently, because otherwise we don't handle cv void correctly
(and restoring the is_void check seemed simpler than adding partial
specializations for cv void).

PR libstdc++/83395
* include/std/type_traits (__is_invocable_impl): Remove partial
specialization for INVOKE and restore is_void check in
primary template.
(__is_nt_invocable_impl): Likewise.
* testsuite/20_util/is_invocable/83395.cc: New test.
* testsuite/20_util/is_nothrow_invocable/83395.cc: New test.

Tested powerpc64le-linux, committed to trunk. Backport to follow for
gcc-7-branch.


commit 0f36841953ac86d12a0dbbf5548da7799b7ac7a6
Author: Jonathan Wakely 
Date:   Tue Dec 12 17:03:31 2017 +

PR libstdc++/83395 fix invocable traits for INVOKE

PR libstdc++/83395
* include/std/type_traits (__is_invocable_impl): Remove partial
specialization for INVOKE and restore is_void check in
primary template.
(__is_nt_invocable_impl): Likewise.
* testsuite/20_util/is_invocable/83395.cc: New test.
* testsuite/20_util/is_nothrow_invocable/83395.cc: New test.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1d639e452f3..3d94566fa63 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2589,12 +2589,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct __is_invocable_impl<_Result, _Ret, __void_t>
-: is_convertible::type
-{ };
-
-  template
-struct __is_invocable_impl<_Result, void, __void_t>
-: true_type
+: __or_, is_convertible>::type
 { };
 
   template
@@ -2699,14 +2694,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __is_nt_invocable_impl<_Result, _Ret,
  __void_t>
-: __and_,
-is_nothrow_constructible<_Ret, typename _Result::type>>
-{ };
-
-  template
-struct __is_nt_invocable_impl<_Result, void,
- __void_t>
-: true_type
+: __or_,
+   __and_,
+  is_nothrow_constructible<_Ret, typename _Result::type>>>
 { };
 
   /// std::is_nothrow_invocable_r
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/83395.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/83395.cc
new file mode 100644
index 000..7fb1368cd09
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/83395.cc
@@ -0,0 +1,28 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+// PR libstdc++/83395
+using F = int(*)();
+static_assert(std::is_invocable_r::value);
+static_assert(std::is_invocable_r::value);
+static_assert(std::is_invocable_r::value);
+static_assert(std::is_invocable_r::value);
diff --git a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/83395.cc 
b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/83395.cc
new file mode 100644
index 000..2a253df8d1b
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/83395.cc
@@ -0,0 +1,28 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+// PR libstdc++/83395
+using F = int(*)() noexcept;
+static_assert(std::is_nothrow_invocable_r::value);
+static_assert(std::is_nothrow_invocable_r::value);
+static_assert(std::is_nothrow_in

Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao
Hi, Martin,

thanks for the suggestion, this might be a good enhancement for 
get_range_strlen for a future work. 

my understanding is,  the current get_range_strlen does not use value range 
info yet, and also does not handle VLA. 
we can improve it from both aspects in a later work.

Qing

>> 
>> Per your comments, the updated gimple-fold.c is like the following:
> 
> FWIW, I suspect Richard is thinking of VLAs with the INTEGER_CST
> comment above.  This is the case we discussed in private.  It is
> handled either way but the handling could be improved by determining
> the size of the VLA from the first __builtin_alloca_with_align
> argument and using it or its range as the minimum and maximum.
> With that, the range of string lengths that fit in vla in
> the following could be determined and the sprintf call could
> be diagnosed:
> 
>  char d[30];
> 
>  void f (unsigned n)
>  {
>if (n < 32)
>  {
>char vla[n];
>__builtin_sprintf (d, "%s", vla);
>  }
>  }
> 
> I think this would be a nice enhancement to add on top of yours,
> not just for VLAs but for dynamically allocated arrays as well,
> and not just for the sprintf pass but also for the strlen pass
> to optimize cases like:
> 
>  void f (unsigned n)
>  {
>if (n < 32)
>  {
>char vla[n];
> 
>fgets (vla, n, stdin);
> 
>unsigned len = strlen (vla);
>if (len >= n)   // cannot hold
>  abort (); // can be eliminated
>   }
>}
> 
> That said, at some point, it might make more sense to change
> those passes to start tracking these things as they traverse
> the CFG rather than having get_range_strlen() do the work.
> 
> Martin
> 
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..0500fba 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg)
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop 
>> (val))
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>>   if (!val)
>> 
>> let me know any further issue with the above.
>> 
>> thanks a lot.
>> 
>> Qing



Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-12 Thread Michele Pezzutti

Hi.

Yes, I looked at the text before submitting the patch.

I contacted Devroye and he confirmed that another reader had also 
pointed out this bug but not the solution. I sent him my proposed patch, 
he will look into it (no idea when though).


I would state that "comparison function for x = 1 is e^(1/78)" (which 
becomes 1/78 as the algorithm uses log-probabilities).


I think the change is needed because otherwise, for that particular bin, 
the rejection probability is lower than it should be, resulting in a 
higher number of samples.


Regarding your second point, I understand what you mean and I would be 
glad to help.



On 12/12/2017 01:51 AM, Paolo Carlini wrote:

Hi,

On 11/12/2017 23:16, Michele Pezzutti wrote:

I lowered to N = 250 and still fails with a good margin.
Good. At the moment however, I think we need a bit of rationale for 
the change that you are proposing, what would you put in a comment in 
the code? It's been a while since the last time I looked into these 
algorithms, is there a simple way to explain why the change is needed 
within the basic rejection method proposed by Devroye? Devroye's book 
is freely available, have you been able to study the relevant bits 
already? (http://www.nrbook.com/devroye/). He is also very 
approachable in private email, if I remember correctly.


Eventually, we could also agree on a good way to extend the coverage 
of the testing, maybe for gcc8 simply add the testcase, but then, for 
gcc9 I think we could extend it quite a bit in a consistent way, 
something like a grid from 1.0 to 50 step 1.0 with an increased N. 
Better if we figure out something that looks generic but would also 
have caught anyway 83237, if you see what I mean. I can take care of 
that. For the other discrete distributions too of course.


Thanks a lot for your help!

Paolo.





Re: [SFN] Bootstrap broken

2017-12-12 Thread Rainer Orth
Hi David,

> Something in this series broke bootstrap on AIX, probably Power in general.

I'm seeing the same in a sparc-sun-solaris2.11 bootstrap.

Rainer


> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c: In function 'void
> pp_gimple_stmt_1(pretty_printer*, gimple*, int, dump_flags_t)':
> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error:
> invalid rtl sharing found in the insn
>  }
>  ^
> (debug_insn 5082 5081 5083 (var_location:DI flags (concatn/v:DI [
> (reg:SI 1564 [ flags ])
> (reg:SI 1565 [ flags+4 ])
> ])) "/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c":2583 -1
>  (nil))
> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error: shared rtx
> (concatn/v:DI [
> (reg:SI 1564 [ flags ])
> (reg:SI 1565 [ flags+4 ])
> ])
> during RTL pass: outof_cfglayout
> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: internal
> compiler error: internal consistency failure

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [GOLD] Add plugin API for processing plugin-added input files

2017-12-12 Thread Stephen Crane
Thank you so much. I just added documentation for the new hook to
https://gcc.gnu.org/wiki/whopr/driver so that should finish off this
feature.

I'm happy to fix any bugs that might crop up related to this hook, of
course. I don't follow the binutils list closely, so I apologize in
advance if I miss any issues related to this feature. Just CC me
explicitly and I'll jump on it.

Thanks,
- stephen

On Mon, Dec 11, 2017 at 3:03 PM, Sriraman Tallam  wrote:
> On Mon, Dec 11, 2017 at 2:16 PM, Stephen Crane  wrote:
>> Thanks for committing the GCC portion and following up on this. I had
>> been meaning to write and ask. I don't have commit privs for binutils,
>> so either you or Cary will have to commit the binutils patch as well,
>> if it's not too much trouble. I think much has changed to need a
>> rebase?
>
> I just committed your patch.  I had to make one very minor change to
> plugin_new_section_layout.c to compile, move the loop initialization
> declaration outside as that is not allowed on C.  I tested the patch.
>
> Thanks
> Sri
>
>>
>> Thanks,
>> Stephen
>>
>> On Mon, Dec 11, 2017 at 2:10 PM, Sriraman Tallam  wrote:
>>> On Thu, Nov 9, 2017 at 9:04 PM, Cary Coutant  wrote:
> include/ChangeLog:
> 2017-11-09  Stephen Crane  
>
> * plugin-api.h: Add new plugin hook to allow processing of input
> files added by a plugin.
> (ld_plugin_new_input_handler): New funcion hook type.
> (ld_plugin_register_new_input): New interface.
> (LDPT_REGISTER_NEW_INPUT_HOOK): New enum val.
> (tv_register_new_input): New member.
>
>
> gold/ChangeLog:
> 2017-11-09  Stephen Crane  
>
> * plugin.cc (Plugin::load): Include hooks for register_new_input
> in transfer vector.
> (Plugin::new_input): New function.
> (register_new_input): New function.
> (Plugin_manager::claim_file): Call Plugin::new_input if in
> replacement phase.
> * plugin.h (Plugin::set_new_input_handler): New function.
> * testsuite/plugin_new_section_layout.c: New plugin to test
> new_input plugin API.
> * testsuite/plugin_final_layout.sh: Add new input test.
> * testsuite/Makefile.am (plugin_layout_new_file): New test case.
> * testsuite/Makefile.in: Regenerate.

 These are OK. Thanks!

 Sri, I'm out of town through 11/18, and won't be able to commit the
 include/ patch to GCC before Stage 1 ends. Can you take care of it?
 (If not, I'll take care of it when I get back -- it was approved
 during Stage 1, so I think it's OK to commit early in Stage 3,
 especially since it's nothing but new declarations.)
>>>
>>> Stephen, I was looking at binutils and realized this patch has not
>>> been committed yet.  I only committed the GCC portion, plugin-api.h.
>>>
>>> Thanks
>>> Sri
>>>

 -cary


Re: [SFN] Bootstrap broken

2017-12-12 Thread David Edelsohn
Rainer,

PR83396 opened.  you can add Solaris to the list of targets.

- David

On Tue, Dec 12, 2017 at 1:49 PM, Rainer Orth
 wrote:
> Hi David,
>
>> Something in this series broke bootstrap on AIX, probably Power in general.
>
> I'm seeing the same in a sparc-sun-solaris2.11 bootstrap.
>
> Rainer
>
>
>> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c: In function 'void
>> pp_gimple_stmt_1(pretty_printer*, gimple*, int, dump_flags_t)':
>> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error:
>> invalid rtl sharing found in the insn
>>  }
>>  ^
>> (debug_insn 5082 5081 5083 (var_location:DI flags (concatn/v:DI [
>> (reg:SI 1564 [ flags ])
>> (reg:SI 1565 [ flags+4 ])
>> ])) "/nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c":2583 -1
>>  (nil))
>> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: error: shared rtx
>> (concatn/v:DI [
>> (reg:SI 1564 [ flags ])
>> (reg:SI 1565 [ flags+4 ])
>> ])
>> during RTL pass: outof_cfglayout
>> /nasfarm/edelsohn/src/src/gcc/gimple-pretty-print.c:2645:1: internal
>> compiler error: internal consistency failure
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix IPA flattening ICE (PR ipa/82801)

2017-12-12 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because we first gather all cgraph nodes
in an array and then we walk it backwards and flatten_function anything
that has "flatten" attribute.  But flatten_function can result in cgraph
node removal and so we then try to dereference removed cgraph nodes.

>From the array we are only interested in nodes with flatten attribute, so
the patch ignores all other nodes (thus the common case of no flatten
attributes in the TU is fast) by keeping them at the end of the order array
only, and if we have at least 2 nodes to flatten, we additionally register
a removal hook just in case a flatten node would be removed (the testcase
has a non-flatten node removed only).  

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-12  Jakub Jelinek  

PR ipa/82801
* ipa-inline.c (flatten_remove_node_hook): New function.
(ipa_inline): Keep only nodes with flatten attribute at the end of
the array in the order from ipa_reverse_postorder, only walk that
portion of array for flattening, if there is more than one such
node, temporarily register a removal hook and ignore removed nodes.

* g++.dg/ipa/pr82801.C: New test.

--- gcc/ipa-inline.c.jj 2017-12-05 10:21:18.0 +0100
+++ gcc/ipa-inline.c2017-12-12 11:40:37.396036390 +0100
@@ -2338,6 +2338,19 @@ dump_inline_stats (void)
   (int) reason[i][1], reason_freq[i].to_double (), reason[i][0]);
 }
 
+/* Called when node is removed.  */
+
+static void
+flatten_remove_node_hook (struct cgraph_node *node, void *data)
+{
+  if (lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) == NULL)
+return;
+
+  hash_set *removed
+= (hash_set *) data;
+  removed->add (node);
+}
+
 /* Decide on the inlining.  We do so in the topological order to avoid
expenses on updating data structures.  */
 
@@ -2347,7 +2360,7 @@ ipa_inline (void)
   struct cgraph_node *node;
   int nnodes;
   struct cgraph_node **order;
-  int i;
+  int i, j;
   int cold;
   bool remove_functions = false;
 
@@ -2380,26 +2393,56 @@ ipa_inline (void)
   if (dump_file)
 fprintf (dump_file, "\nFlattening functions:\n");
 
+  /* First shrink order array, so that it only contains nodes with
+ flatten attribute.  */
+  for (i = nnodes - 1, j = i; i >= 0; i--)
+{
+  node = order[i];
+  if (lookup_attribute ("flatten",
+   DECL_ATTRIBUTES (node->decl)) != NULL)
+   order[j--] = order[i];
+}
+
+  /* After this loop, order[j + 1] ... order[nnodes - 1] contain
+ nodes with flatten attribute.  If there is more than one such
+ node, we need to register a node removal hook, as flatten_function
+ could remove other nodes with flatten attribute.  See PR82801.  */
+  struct cgraph_node_hook_list *node_removal_hook_holder = NULL;
+  hash_set *flatten_removed_nodes = NULL;
+  if (j < nnodes - 2)
+{
+  flatten_removed_nodes = new hash_set;
+  node_removal_hook_holder
+   = symtab->add_cgraph_removal_hook (&flatten_remove_node_hook,
+  flatten_removed_nodes);
+}
+
   /* In the first pass handle functions to be flattened.  Do this with
  a priority so none of our later choices will make this impossible.  */
-  for (i = nnodes - 1; i >= 0; i--)
+  for (i = nnodes - 1; i > j; i--)
 {
   node = order[i];
+  if (flatten_removed_nodes
+ && flatten_removed_nodes->contains (node))
+   continue;
 
   /* Handle nodes to be flattened.
 Ideally when processing callees we stop inlining at the
 entry of cycles, possibly cloning that entry point and
 try to flatten itself turning it into a self-recursive
 function.  */
-  if (lookup_attribute ("flatten",
-   DECL_ATTRIBUTES (node->decl)) != NULL)
-   {
- if (dump_file)
-   fprintf (dump_file,
-"Flattening %s\n", node->name ());
- flatten_function (node, false);
-   }
+  if (dump_file)
+   fprintf (dump_file, "Flattening %s\n", node->name ());
+  flatten_function (node, false);
 }
+
+  if (j < nnodes - 2)
+{
+  symtab->remove_cgraph_removal_hook (node_removal_hook_holder);
+  delete flatten_removed_nodes;
+}
+  free (order);
+
   if (dump_file)
 dump_overall_stats ();
 
@@ -2411,7 +2454,6 @@ ipa_inline (void)
  inline functions and virtual functions so we really know what is called
  once.  */
   symtab->remove_unreachable_nodes (dump_file);
-  free (order);
 
   /* Inline functions with a property that after inlining into all callers the
  code size will shrink because the out-of-line copy is eliminated. 
--- gcc/testsuite/g++.dg/ipa/pr82801.C.jj   2017-12-12 11:55:56.879092669 
+0100
+++ gcc/testsuite/g++.dg/ipa/pr82801.C  2017-12-12 11:52:36.0 +0100
@@ -0,0 +1,20 @@
+// PR ipa/82801
+// { dg-do compile }
+// { dg-options "-O2 -Wno-attr

[C++ PATCH] Avoid ICE due to the attribute exclusion additions (PR c++/83322)

2017-12-12 Thread Jakub Jelinek
Hi!

This patch avoids ICEs when last_decl isn't a decl.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I believe it would be better to do the attribute exclusions check
in duplicate_decls instead if the attributes don't appear together already
on a single decl, instead of trying to do another lookup.  I've filed a PR
where it makes a difference.

2017-12-12  Jakub Jelinek  

PR c++/83322
* decl2.c (cplus_decl_attributes): If last_decl is BASELINK, use
BASELINK_FUNCTIONS.  If after OVERLOAD checking last_decl isn't
a decl, clear it.

* g++.dg/warn/pr83322.C: New test.

--- gcc/cp/decl2.c.jj   2017-12-07 18:04:58.0 +0100
+++ gcc/cp/decl2.c  2017-12-12 15:40:33.634654947 +0100
@@ -1486,6 +1486,9 @@ cplus_decl_attributes (tree *decl, tree
   tree last_decl = (DECL_P (*decl) && DECL_NAME (*decl)
? lookup_name (DECL_NAME (*decl)) : NULL_TREE);
 
+  if (last_decl && BASELINK_P (last_decl))
+   last_decl = BASELINK_FUNCTIONS (last_decl);
+
   if (last_decl && TREE_CODE (last_decl) == OVERLOAD)
for (ovl_iterator iter (last_decl, true); ; ++iter)
  {
@@ -1504,6 +1507,8 @@ cplus_decl_attributes (tree *decl, tree
break;
  }
  }
+  if (last_decl && !DECL_P (last_decl))
+   last_decl = NULL_TREE;
 
   decl_attributes (decl, attributes, flags, last_decl);
 }
--- gcc/testsuite/g++.dg/warn/pr83322.C.jj  2017-12-12 15:40:01.054047913 
+0100
+++ gcc/testsuite/g++.dg/warn/pr83322.C 2017-12-12 15:39:13.0 +0100
@@ -0,0 +1,8 @@
+// PR c++/83322
+// { dg-do compile }
+// { dg-options "" }
+
+struct A {
+  template  operator T ();
+  __attribute__((__always_inline__)) operator int ();
+};

Jakub


[PR C++/15272] lookups with ambiguating dependent base

2017-12-12 Thread Nathan Sidwell
When we find a member fn in a non-dependent base during parsing a 
template, we shouldn't repeat the lookup during instantiation.  This 
means we won't find an ambiguating other member fn in a now-instantiated 
base.  Unfortunately we were doing that and rejecting valid programs.


This patch fixes that, by only doing the field lookup when the baselink 
is sufficiently dependent -- tsubsting the containing binfo is not a 
NOP.  Othewise we simply map the parsed baselink into the instantiated 
hierarchy via the existing adjust_result_of_qualified_name_lookup.


We already had a testcase, but it erroneously expected the ambiguous 
message.


I punted on adding a warning if the instantiation lookup would find 
something different.


nathan

--
Nathan Sidwell
2017-12-12  Nathan Sidwell  

	PR c++/15272
	* pt.c (tsubst_baselink): Don't repeat the lookup for
	non-dependent baselinks.

	PR c++/15272
	* g++.dg/template/pr71826.C: Adjust for 15272 fix.

Index: cp/pt.c
===
--- cp/pt.c	(revision 255581)
+++ cp/pt.c	(working copy)
@@ -14408,19 +14408,15 @@ tsubst (tree t, tree args, tsubst_flags_
 }
 
 /* tsubst a BASELINK.  OBJECT_TYPE, if non-NULL, is the type of the
-   expression on the left-hand side of the "." or "->" operator.  A
-   baselink indicates a function from a base class.  Both the
-   BASELINK_ACCESS_BINFO and the base class referenced may indicate
-   bases of the template class, rather than the instantiated class.
-   In addition, lookups that were not ambiguous before may be
-   ambiguous now.  Therefore, we perform the lookup again.  */
+   expression on the left-hand side of the "." or "->" operator.  We
+   only do the lookup if we had a dependent BASELINK.  Otherwise we
+   adjust it onto the instantiated heirarchy.  */
 
 static tree
 tsubst_baselink (tree baselink, tree object_type,
 		 tree args, tsubst_flags_t complain, tree in_decl)
 {
-  bool qualified = BASELINK_QUALIFIED_P (baselink);
-
+  bool qualified_p = BASELINK_QUALIFIED_P (baselink);
   tree qualifying_scope = BINFO_TYPE (BASELINK_ACCESS_BINFO (baselink));
   qualifying_scope = tsubst (qualifying_scope, args, complain, in_decl);
 
@@ -14440,24 +14436,43 @@ tsubst_baselink (tree baselink, tree obj
 	  complain, in_decl);
 }
 
-  tree name = OVL_NAME (fns);
-  if (IDENTIFIER_CONV_OP_P (name))
-name = make_conv_op_name (optype);
+  tree binfo_type = BINFO_TYPE (BASELINK_BINFO (baselink));
+  binfo_type = tsubst (binfo_type, args, complain, in_decl);
+  bool dependent_p = binfo_type != BINFO_TYPE (BASELINK_BINFO (baselink));
 
-  baselink = lookup_fnfields (qualifying_scope, name, /*protect=*/1);
-  if (!baselink)
+  if (dependent_p)
 {
-  if ((complain & tf_error) && constructor_name_p (name, qualifying_scope))
-	error ("cannot call constructor %<%T::%D%> directly",
-	   qualifying_scope, name);
-  return error_mark_node;
+  tree name = OVL_NAME (fns);
+  if (IDENTIFIER_CONV_OP_P (name))
+	name = make_conv_op_name (optype);
+
+  if (name == complete_dtor_identifier)
+	/* Treat as-if non-dependent below.  */
+	dependent_p = false;
+
+  baselink = lookup_fnfields (qualifying_scope, name, /*protect=*/1);
+  if (!baselink)
+	{
+	  if ((complain & tf_error)
+	  && constructor_name_p (name, qualifying_scope))
+	error ("cannot call constructor %<%T::%D%> directly",
+		   qualifying_scope, name);
+	  return error_mark_node;
+	}
+
+  if (BASELINK_P (baselink))
+	fns = BASELINK_FUNCTIONS (baselink);
+}
+  else
+{
+  gcc_assert (optype == BASELINK_OPTYPE (baselink));
+  /* We're going to overwrite pieces below, make a duplicate.  */
+  baselink = copy_node (baselink);
 }
 
   /* If lookup found a single function, mark it as used at this point.
- (If it lookup found multiple functions the one selected later by
+ (If lookup found multiple functions the one selected later by
  overload resolution will be marked as used at that point.)  */
-  if (BASELINK_P (baselink))
-fns = BASELINK_FUNCTIONS (baselink);
   if (!template_id_p && !really_overloaded_fn (fns)
   && !mark_used (OVL_FIRST (fns), complain) && !(complain & tf_error))
 return error_mark_node;
@@ -14467,8 +14482,7 @@ tsubst_baselink (tree baselink, tree obj
   /* Add back the template arguments, if present.  */
   if (template_id_p)
 	BASELINK_FUNCTIONS (baselink)
-	  = build2 (TEMPLATE_ID_EXPR, unknown_type_node,
-		BASELINK_FUNCTIONS (baselink), template_args);
+	  = build2 (TEMPLATE_ID_EXPR, unknown_type_node, fns, template_args);
 
   /* Update the conversion operator type.  */
   BASELINK_OPTYPE (baselink) = optype;
@@ -14477,12 +14491,12 @@ tsubst_baselink (tree baselink, tree obj
   if (!object_type)
 object_type = current_class_type;
 
-  if (qualified || name == complete_dtor_identifier)
+  if (qualified_p || !dependent_p)
 {
   baselink = adjust_result_of_qualified_name_lo

[PATCH][rs6000][PR target/82190] fix mem size info in rtl generated by memcmp and strncmp/strcmp builtin expansion

2017-12-12 Thread Aaron Sawdey
In the code I put in gcc7 for expanding memcmp and strcmp/strncmp as
builtins, I used set_mem_size to set the size of loads to only the
bytes I was actually going to compare. However this is really incorrect
and the test case for 82190 was failing because alias analysis believed
the bogus size and though there was no conflict between an 8byte load
used for comparing 6 bytes and a later store to the 7th byte. As a
result it eliminated that load from the 7 byte compare that followed
later.

This patch changes the set_mem_size calls in expand_block_move and
expand_strn_compare to set the size to the size of the load being done
regardless of how many bytes are being used.

OK for trunk if bootstrap/regtest passes on ppc64le?

2017-12-12  Aaron Sawdey  

PR target/82190
* config/rs6000/rs6000-string.c (expand_block_move,
expand_strn_compare): fix set_mem_size() calls.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c	(revision 255585)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -1247,6 +1247,9 @@
   if (bytes > rs6000_block_move_inline_limit)
 return 0;
 
+  bool isP8 = (rs6000_cpu == PROCESSOR_POWER8);
+  bool isP9 = (rs6000_cpu == PROCESSOR_POWER9);
+
   for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
 {
   union {
@@ -1258,7 +1261,7 @@
 
   /* Altivec first, since it will be faster than a string move
 	 when it applies, and usually not significantly larger.  */
-  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
+  if (TARGET_ALTIVEC && bytes >= 16 && (isP8 || isP9 || align >= 128))
 	{
 	  move_bytes = 16;
 	  mode = V4SImode;
Index: gcc/testsuite/gcc.dg/pr82190.c
===
--- gcc/testsuite/gcc.dg/pr82190.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/pr82190.c	(working copy)
@@ -0,0 +1,22 @@
+/* PR target/82190 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-optimize-strlen -fweb" } */
+
+char src[64] __attribute__ ((aligned)) = "aaa";
+char dst[64] __attribute__ ((aligned));
+
+int
+main ()
+{
+  __builtin_memcpy (dst, src, 6);
+  if (__builtin_memcmp (dst, src, 6))
+__builtin_abort ();
+
+  __builtin_memcpy (dst, src, 7);
+  if (__builtin_memcmp (dst, src, 7))
+__builtin_abort ();
+
+  return 0;
+}
+
+


Re: [PATCH][rs6000][PR target/82190] fix mem size info in rtl generated by memcmp and strncmp/strcmp builtin expansion

2017-12-12 Thread Jakub Jelinek
On Tue, Dec 12, 2017 at 01:40:41PM -0600, Aaron Sawdey wrote:
> 2017-12-12  Aaron Sawdey  
> 
>   PR target/82190
>   * config/rs6000/rs6000-string.c (expand_block_move,
>   expand_strn_compare): fix set_mem_size() calls.

That should be capitalized: Fix instead of fix

> --- gcc/config/rs6000/rs6000-string.c (revision 255585)
> +++ gcc/config/rs6000/rs6000-string.c (working copy)
> @@ -1247,6 +1247,9 @@
>if (bytes > rs6000_block_move_inline_limit)
>  return 0;
>  
> +  bool isP8 = (rs6000_cpu == PROCESSOR_POWER8);
> +  bool isP9 = (rs6000_cpu == PROCESSOR_POWER9);
> +
>for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
>  {
>union {
> @@ -1258,7 +1261,7 @@
>  
>/* Altivec first, since it will be faster than a string move
>when it applies, and usually not significantly larger.  */
> -  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
> +  if (TARGET_ALTIVEC && bytes >= 16 && (isP8 || isP9 || align >= 128))
>   {
> move_bytes = 16;
> mode = V4SImode;

Is this the patch you meant to attach?  First of all, it only changes
expand_block_move, not expand_strn_compare, and the change seems more
like an optimization of P8/P9 rather than actual fix (otherwise, wouldn't
it fail on say P7?).

> +  return 0;
> +}
> +
> +

Please avoid unnecessary trailing whitespace.

Jakub


Re: [C++ PATCH] Avoid ICE due to the attribute exclusion additions (PR c++/83322)

2017-12-12 Thread Martin Sebor

On 12/12/2017 12:01 PM, Jakub Jelinek wrote:

Hi!

This patch avoids ICEs when last_decl isn't a decl.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I believe it would be better to do the attribute exclusions check
in duplicate_decls instead if the attributes don't appear together already
on a single decl, instead of trying to do another lookup.  I've filed a PR
where it makes a difference.


The reason for implementing attribute checking in the middle-end
is so that it can be done consistently across all front-ends and
back-ends, without back-end maintainers having to change the front
end code, and so that conflicting attributes can be dropped before
they are applied.  I went into more detail in my reply to Jason
here:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01371.html

I'm just starting to look into the bug you filed (83394) but from
what I've seen so far it seems to be a problem with calling
lookup_name() to find the already declared class member conversion
operator.  I'm sure there's a way to look it up and fix the bug
without changing the fundamental design of the improvement.

Martin



Re: [PATCH][rs6000][PR target/82190] fix mem size info in rtl generated by memcmp and strncmp/strcmp builtin expansion

2017-12-12 Thread Aaron Sawdey
On Tue, 2017-12-12 at 20:50 +0100, Jakub Jelinek wrote:
> On Tue, Dec 12, 2017 at 01:40:41PM -0600, Aaron Sawdey wrote:
> > 2017-12-12  Aaron Sawdey  
> > 
> > PR target/82190
> > * config/rs6000/rs6000-string.c (expand_block_move,
> > expand_strn_compare): fix set_mem_size() calls.
> 
> That should be capitalized: Fix instead of fix

[wrong patch deleted]

> Is this the patch you meant to attach?  First of all, it only changes
> expand_block_move, not expand_strn_compare, and the change seems more
> like an optimization of P8/P9 rather than actual fix (otherwise,
> wouldn't
> it fail on say P7?).
> 
> > +  return 0;
> > +}
> > +
> > +
> 
> Please avoid unnecessary trailing whitespace.
> 

Jakub,
  Yes that is a different patch unrelated to the 82190 fix. I've
attached the correct patch.

Thanks!
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c   (revision 255585)
+++ gcc/config/rs6000/rs6000-string.c   (working copy)
@@ -459,7 +459,7 @@
  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
  src1 = replace_equiv_address (src1, src1_reg);
}
-  set_mem_size (src1, cmp_bytes);
+  set_mem_size (src1, load_mode_size);
 
   if (!REG_P (XEXP (src2, 0)))
{
@@ -466,7 +466,7 @@
  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
  src2 = replace_equiv_address (src2, src2_reg);
}
-  set_mem_size (src2, cmp_bytes);
+  set_mem_size (src2, load_mode_size);
 
   do_load_for_compare (tmp_reg_src1, src1, load_mode);
   do_load_for_compare (tmp_reg_src2, src2, load_mode);
@@ -937,7 +937,7 @@
  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
  src1 = replace_equiv_address (src1, src1_reg);
}
-  set_mem_size (src1, cmp_bytes);
+  set_mem_size (src1, load_mode_size);
 
   if (!REG_P (XEXP (src2, 0)))
{
@@ -944,7 +944,7 @@
  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
  src2 = replace_equiv_address (src2, src2_reg);
}
-  set_mem_size (src2, cmp_bytes);
+  set_mem_size (src2, load_mode_size);
 
   do_load_for_compare (tmp_reg_src1, src1, load_mode);
   do_load_for_compare (tmp_reg_src2, src2, load_mode);
@@ -1096,7 +1096,7 @@
  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); 
  src1 = replace_equiv_address (src1, src1_reg);
}
-  set_mem_size (src1, cmp_bytes);
+  set_mem_size (src1, load_mode_size); 
 
   if (!REG_P (XEXP (src2, 0)))
{
@@ -1103,7 +1103,7 @@
  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
  src2 = replace_equiv_address (src2, src2_reg);
}
-  set_mem_size (src2, cmp_bytes);
+  set_mem_size (src2, load_mode_size);
 
   /* Construct call to strcmp/strncmp to compare the rest of the string.  */
   if (no_length)
Index: gcc/testsuite/gcc.dg/pr82190.c
===
--- gcc/testsuite/gcc.dg/pr82190.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/pr82190.c	(working copy)
@@ -0,0 +1,20 @@
+/* PR target/82190 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-optimize-strlen -fweb" } */
+
+char src[64] __attribute__ ((aligned)) = "aaa";
+char dst[64] __attribute__ ((aligned));
+
+int
+main ()
+{
+  __builtin_memcpy (dst, src, 6);
+  if (__builtin_memcmp (dst, src, 6))
+__builtin_abort ();
+
+  __builtin_memcpy (dst, src, 7);
+  if (__builtin_memcmp (dst, src, 7))
+__builtin_abort ();
+
+  return 0;
+}


Re: [C++ PATCH] Avoid ICE due to the attribute exclusion additions (PR c++/83322)

2017-12-12 Thread Jakub Jelinek
On Tue, Dec 12, 2017 at 12:55:29PM -0700, Martin Sebor wrote:
> On 12/12/2017 12:01 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > This patch avoids ICEs when last_decl isn't a decl.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Though, I believe it would be better to do the attribute exclusions check
> > in duplicate_decls instead if the attributes don't appear together already
> > on a single decl, instead of trying to do another lookup.  I've filed a PR
> > where it makes a difference.
> 
> The reason for implementing attribute checking in the middle-end
> is so that it can be done consistently across all front-ends and
> back-ends, without back-end maintainers having to change the front
> end code, and so that conflicting attributes can be dropped before
> they are applied.  I went into more detail in my reply to Jason
> here:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01371.html
> 
> I'm just starting to look into the bug you filed (83394) but from
> what I've seen so far it seems to be a problem with calling
> lookup_name() to find the already declared class member conversion
> operator.  I'm sure there's a way to look it up and fix the bug
> without changing the fundamental design of the improvement.

Then you need to duplicate the name lookup, which is quite complicated
and compile time expensive.  Especially in C++, there are too many ways
how attributes can be introduced, e.g. on templates (including late vs.
early attributes), their specializations etc., so I doubt a simple
lookup_name can do what the callers of duplicate_decls do.

Jakub


Re: [PATCH] Fix IPA flattening ICE (PR ipa/82801)

2017-12-12 Thread Andi Kleen
Jakub Jelinek  writes:

> Hi!
>
> We ICE on the following testcase, because we first gather all cgraph nodes
> in an array and then we walk it backwards and flatten_function anything
> that has "flatten" attribute.  But flatten_function can result in cgraph
> node removal and so we then try to dereference removed cgraph nodes.

I wonder if this fixes PR83346 too?

This is also about a flatten crash.

-Andi



Re: [PATCH][rs6000][PR target/82190] fix mem size info in rtl generated by memcmp and strncmp/strcmp builtin expansion

2017-12-12 Thread Segher Boessenkool
On Tue, Dec 12, 2017 at 02:01:16PM -0600, Aaron Sawdey wrote:
> > > 2017-12-12  Aaron Sawdey  
> > > 
> > >   PR target/82190
> > >   * config/rs6000/rs6000-string.c (expand_block_move,
> > >   expand_strn_compare): fix set_mem_size() calls.
> > 
> > That should be capitalized: Fix instead of fix
> 
> [wrong patch deleted]

>   Yes that is a different patch unrelated to the 82190 fix. I've
> attached the correct patch.

That looks better :-)  Okay for trunk, thanks!


Segher


[PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Martin Sebor

Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.

Tested on x86_64-linux.

Martin
PR middle-end/83373 - False positive reported by -Wstringop-overflow
PR tree-optimization/78450 - strlen(s) return value can be assumed to be less than the size of s

gcc/ChangeLog:

	PR middle-end/83373
	PR tree-optimization/78450
	* tree-ssa-strlen.c (maybe_set_strlen_range): New function.
	(handle_builtin_strlen): Call it.


gcc/testsuite/ChangeLog:

	PR middle-end/83373
	PR tree-optimization/78450
	* gcc.dg/pr83373.c: New test.
	* gcc.dg/strlenopt-36.c: New test.
	* gcc.dg/strlenopt-37.c: New test.

diff --git a/gcc/testsuite/gcc.dg/pr83373.c b/gcc/testsuite/gcc.dg/pr83373.c
new file mode 100644
index 000..6b0de09
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83373.c
@@ -0,0 +1,33 @@
+/* PR middle-end/83373 - False positive reported by -Wstringop-overflow
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-overflow" }  */
+
+typedef __SIZE_TYPE__ size_t;
+
+char buf[100];
+
+void get_data (char*);
+
+__attribute__ ((nonnull(1, 2)))
+inline char* my_strcpy (char* dst, const char* src, size_t size)
+{
+  size_t len = __builtin_strlen (src);
+  if (len < size)
+__builtin_memcpy (dst, src, len + 1);
+  else
+{
+  __builtin_memcpy (dst, src, size - 1); /* { dg-bogus "\\\[-Wstringop-oveflow]" } */
+  dst[size - 1] = '\0';
+}
+
+  return dst;
+}
+
+void test(void)
+{
+  char data[20] = "12345";
+
+  get_data (data);
+
+  my_strcpy (buf, data, sizeof buf);
+}
diff --git a/gcc/testsuite/gcc.dg/strlenopt-36.c b/gcc/testsuite/gcc.dg/strlenopt-36.c
new file mode 100644
index 000..d6fcca2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-36.c
@@ -0,0 +1,86 @@
+/* PR tree-optimization/78450 - strlen(s) return value can be assumed
+   to be less than the size of s
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "strlenopt.h"
+
+extern char a7[7], a6[6], a5[5], a4[4], a3[3], a2[2], a1[1];
+extern char a0[0];   /* Intentionally not tested here.  */
+extern char ax[];/* Same.  */
+
+struct MemArrays {
+  char a7[7], a6[6], a5[5], a4[4], a3[3], a2[2], a1[1];
+  char a0[0];   /* Not tested here.  */
+};
+
+struct NestedMemArrays {
+  struct { char a7[7]; } ma7;
+  struct { char a6[6]; } ma6;
+  struct { char a5[5]; } ma5;
+  struct { char a4[4]; } ma4;
+  struct { char a3[3]; } ma3;
+  struct { char a2[2]; } ma2;
+  struct { char a1[1]; } ma1;
+  struct { char a0[0]; } ma0;
+  char last;
+};
+
+extern void failure_on_line (int);
+
+#define TEST_FAIL(line)	\
+  do {			\
+failure_on_line (line);\
+  } while (0)
+
+#define T(expr)		\
+  if (!(expr)) TEST_FAIL (__LINE__); else (void)0
+
+
+void test_array (void)
+{
+  T (strlen (a7) < sizeof a7);
+  T (strlen (a6) < sizeof a6);
+  T (strlen (a5) < sizeof a5);
+  T (strlen (a4) < sizeof a4);
+  T (strlen (a3) < sizeof a3);
+
+  /* The following two calls are folded too early which defeats
+ the strlen() optimization.
+T (strlen (a2) == 1);
+T (strlen (a1) == 0);  */
+}
+
+void test_memarray (struct MemArrays *ma)
+{
+  T (strlen (ma->a7) < sizeof ma->a7);
+  T (strlen (ma->a6) < sizeof ma->a6);
+  T (strlen (ma->a5) < sizeof ma->a5);
+  T (strlen (ma->a4) < sizeof ma->a4);
+  T (strlen (ma->a3) < sizeof ma->a3);
+
+  /* The following two calls are folded too early which defeats
+ the strlen() optimization.
+  T (strlen (ma->a2) == 1);
+  T (strlen (ma->a1) == 0);  */
+}
+
+/* Verify that the range of strlen(A) of a last struct member is
+   set even when the array is the sole member of a struct as long
+   as the struct itself is a member of another struct.  The converse
+   is tested in stlenopt-37.c.  */
+void test_nested_memarray (struct NestedMemArrays *ma)
+{
+  T (strlen (ma->ma7.a7) < sizeof ma->ma7.a7);
+  T (strlen (ma->ma6.a6) < sizeof ma->ma6.a6);
+  T (strlen (ma->ma5.a5) < sizeof ma->ma5.a5);
+  T (strlen (ma->ma4.a4) < sizeof ma->ma4.a4);
+  T (strlen (ma->ma3.a3) < sizeof ma->ma3.a3);
+
+  /* The following two calls are folded too early which defeats
+ the strlen() optimization.
+  T (strlen (ma->ma2.a2) == 1);
+  T (strlen (ma->ma1.a1) == 0);  */
+}
+
+/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" } } */
diff --git a/gcc/testsuite/

[C++ Patch PING] [C++ Patch] PR 82235 (Copy ctor is not found for copying array of an object when it's marked explicit)

2017-12-12 Thread Paolo Carlini

Hi,

On 15/11/2017 00:54, Mukesh Kapoor wrote:

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82235
For the following test case

struct Foo {
    Foo() {}
    explicit Foo(const Foo& aOther) {}
};
struct Bar {
    Foo m[1];
};
void test() {
    Bar a;
    Bar b = a;
}

the compiler issues an error when the compiler generated copy 
constructor of class Bar calls the explicit copy constructor of class 
Foo. The fix is to implement ISO C++/17 16.3.1.4 (over.match.copy) 
correctly.
I'm pinging this patch sent a while by Mukesh (I'm taking over from him 
about it). Any comments?


    https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01133.html

Thanks!
Paolo.


Re: [PATCH] Fix IPA flattening ICE (PR ipa/82801)

2017-12-12 Thread Jakub Jelinek
On Tue, Dec 12, 2017 at 12:11:05PM -0800, Andi Kleen wrote:
> Jakub Jelinek  writes:
> 
> > Hi!
> >
> > We ICE on the following testcase, because we first gather all cgraph nodes
> > in an array and then we walk it backwards and flatten_function anything
> > that has "flatten" attribute.  But flatten_function can result in cgraph
> > node removal and so we then try to dereference removed cgraph nodes.
> 
> I wonder if this fixes PR83346 too?

Yes, it does.

Jakub


Re: [PATCH] Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

2017-12-12 Thread Paolo Carlini

Hi,

On 12/12/2017 19:42, Michele Pezzutti wrote:

Hi.

Yes, I looked at the text before submitting the patch.

I contacted Devroye and he confirmed that another reader had also 
pointed out this bug but not the solution. I sent him my proposed 
patch, he will look into it (no idea when though).

Nice.
I would state that "comparison function for x = 1 is e^(1/78)" (which 
becomes 1/78 as the algorithm uses log-probabilities).


I think the change is needed because otherwise, for that particular 
bin, the rejection probability is lower than it should be, resulting 
in a higher number of samples.
Ok. Ideally I would be much less nervous about committing the patch if 
we either 1- Had Luc's explicit green light; 2- Were able to *rigorously 
deduce* within the framework of the book why the change is needed. That 
said, the patch makes sense to me and so far holds up well in all my 
tests (I'm currently running a full make check). I would say, let's wait 
a week or so and then make the final decision. Jon, do you agree? Ideas 
about further testing? (eg, some code you are aware of stressing Poisson?)


Thanks again,
Paolo.


[PATCH] vrp_prop: Use dom_walker for -Warray-bounds (PR tree-optimization/83312)

2017-12-12 Thread David Malcolm
PR tree-optimization/83312 reports a false positive from -Warray-bounds.
The root cause is that VRP both:

(a) updates a GIMPLE_COND to be always false, and

(b) updates an ARRAY_REF in the now-unreachable other path to use an
ASSERT_EXPR with a negative index:
  def_stmt j_6 = ASSERT_EXPR ;

When vrp_prop::check_all_array_refs () runs, the CFG hasn't yet been
updated to take account of (a), and so a false positive is emitted
when (b) is encountered.

This patch fixes the false warning by converting
  vrp_prop::check_all_array_refs
from a simple iteration over all BBs to use a new dom_walker subclass,
using the "skip_unreachable_blocks = true" mechanism to avoid analyzing
(b).

There didn't seem to be a pre-existing way to determine the unique
out-edge after a GIMPLE_COND (if it has a constant cond), so I added
a new gimple_cond_get_unique_successor_edge function.  Similarly,
something similar may apply for switches, so I put in a
gimple_get_unique_successor_edge (though I wasn't able to create a
reproducer that used a switch).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
PR tree-optimization/83312
* domwalk.h (dom_walker::dom_walker): Fix typo in comment.
* gimple.c: Include "tree-cfg.h".
(gimple_get_unique_successor_edge): New function.
(gimple_cond_get_unique_successor_edge): New function.
* gimple.h (gimple_get_unique_successor_edge): New decl.
(gimple_cond_get_unique_successor_edge): New decl.
* tree-vrp.c (class check_array_bounds_dom_walker): New subclass
of dom_walker.
(vrp_prop::check_all_array_refs): Reimplement as...
(check_array_bounds_dom_walker::before_dom_children): ...this new
vfunc.  Replace linear search through BB block list, excluding
those with non-executable in-edges, with dominator walk.

gcc/testsuite/ChangeLog:
PR tree-optimization/83312
* gcc.dg/pr83312.c: New test case.
---
 gcc/domwalk.h  |  2 +-
 gcc/gimple.c   | 36 +++
 gcc/gimple.h   |  2 ++
 gcc/testsuite/gcc.dg/pr83312.c | 30 
 gcc/tree-vrp.c | 80 +++---
 5 files changed, 120 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr83312.c

diff --git a/gcc/domwalk.h b/gcc/domwalk.h
index 6ac93eb..c7e3450 100644
--- a/gcc/domwalk.h
+++ b/gcc/domwalk.h
@@ -32,7 +32,7 @@ class dom_walker
 public:
   static const edge STOP;
 
-  /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
+  /* Use SKIP_UNREACHABLE_BLOCKS = true when your client can discover
  that some edges are not executable.
 
  If a client can discover that a COND, SWITCH or GOTO has a static
diff --git a/gcc/gimple.c b/gcc/gimple.c
index c986a73..e22fcda 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "asan.h"
+#include "tree-cfg.h"
 
 
 /* All the tuples have their operand vector (if present) at the very bottom
@@ -3087,6 +3088,41 @@ gimple_inexpensive_call_p (gcall *stmt)
   return false;
 }
 
+/* If STMT terminates its basic block, determine if it has a uniquely
+   valid successor edge and if so return it.
+
+   Otherwise, return NULL.  */
+
+edge
+gimple_get_unique_successor_edge (const gimple *stmt)
+{
+  switch (gimple_code (stmt))
+{
+case GIMPLE_COND:
+  return gimple_cond_get_unique_successor_edge
+   (as_a  (stmt));
+default:
+  return NULL;
+}
+}
+
+/* Determine if COND has a uniquely valid successor edge and if so return it.
+
+   Otherwise, return NULL.  */
+
+edge
+gimple_cond_get_unique_successor_edge (const gcond *cond)
+{
+  edge te, fe;
+  extract_true_false_edges_from_block (gimple_bb (cond), &te, &fe);
+  if (gimple_cond_true_p (cond))
+return te;
+  else if (gimple_cond_false_p (cond))
+return fe;
+  else
+return NULL;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0fcdd05..ab4cb8b 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1529,6 +1529,8 @@ extern void gimple_seq_discard (gimple_seq);
 extern void maybe_remove_unused_call_args (struct function *, gimple *);
 extern bool gimple_inexpensive_call_p (gcall *);
 extern bool stmt_can_terminate_bb_p (gimple *);
+extern edge gimple_get_unique_successor_edge (const gimple *stmt);
+extern edge gimple_cond_get_unique_successor_edge (const gcond *cond);
 
 /* Formal (expression) temporary table handling: multiple occurrences of
the same scalar expression are evaluated into the same temporary.  */
diff --git a/gcc/testsuite/gcc.dg/pr83312.c b/gcc/testsuite/gcc.dg/pr83312.c
new file mode 100644
index 000..2eb241d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83312.c
@@ -0,0 +1,30 @@
+/* { dg-options "-O2 -Warray-bounds" } */
+
+struct ptlrpcd_ctl {
+ 

Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-12-12 Thread Michael Meissner
On Tue, Dec 12, 2017 at 11:04:55AM -0600, Segher Boessenkool wrote:
> On Mon, Dec 11, 2017 at 03:57:51PM -0500, Michael Meissner wrote:
> > > > +extern KCtype __divkc3 (KFtype, KFtype, KFtype, KFtype);
> > > > +
> > > >  KCtype
> > > >  __divkc3 (KFtype a, KFtype b, KFtype c, KFtype d)
> > > >  {
> > > 
> > > How does this warn?  -Wmissing-declarations?  Should this declaration be
> > > in a header then?
> > 
> > The compiler creates the call to __mulkc3 and __divkc3, and internally it 
> > has
> > the appropriate prototype like it does for all built-in functions (in this
> > case, returning an _Float128 _Complex type, and taking 4 _Float128 
> > arguments).
> > 
> > So before adding ifunc support, we never noticed it didn't have a prototype,
> > because the compiler already has a prototype.
> 
> I still don't get it.  A function definition is also a declaration.
> 
> Something very non-intuitive is happening?

GCC has the following function declarations built-in:

extern _Float128 _Complex __mulkc3 (_Float128, _Float128, _Float128, 
_Float128);
extern _Float128 _Complex __divkc3 (_Float128, _Float128, _Float128, 
_Float128);

Before the patch, _mulkc3.c looked like:

_Float128 _Complex
__mulkc3 (_Float128 a, _Float128 b, _Float128 c, _Float128 d)
{
// ...
}

Now, with ifunc handling it gets compiled in three separate files:

First in a file compiled with -mno-float128-hardware:

_Float128 _Complex
__mulkc3_sw (_Float128 a, _Float128 b, _Float128 c, _Float128 d)
{
// ...
}

Second in a file compiled with -mfloat128-hardware:

_Float128 _Complex
__mulkc3_hw (_Float128 a, _Float128 b, _Float128 c, _Float128 d)
{
// ...
}

And third as the ifunc handler:

#define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)

static __typeof__ (__mulkc3_sw) *
__mulkc3_resolve (void)
{
  return SW_OR_HW (__mulkc3_sw, __mulkc3_hw);
}

_Float128 _Complex __mulkc3 (Float128, _Float128, _Float128, _Float128)
__attribute__ ((__ifunc__ ("__mulkc3_resolve")));

As Andreas points out, the option -Wmissing-prototypes complains if a global
function is compliled without prototypes for C/Objective C.

Before the patch, the internal definition within the compiler meant that that
__mulkc3 would not get the warning.  Now with separate ifunc handlers, both
__mulkc3_sw and __mulkc3_hw got warnings.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [x86][patch] Fix clwb for skylake

2017-12-12 Thread Uros Bizjak
On Tue, Dec 12, 2017 at 10:08 AM, Koval, Julia  wrote:
> Sorry,
>
> gcc/
> * config/i386/i386.c (PTA_SKYLAKE_AVX512): Add PTA_CLWB.
> (PTA_CANNONLAKE): Remove PTA_CLWB.

Approved and committed to mainline SVN.

Thanks,
Uros.


Re: C++ PATCH for c++/82115, ICE with variable initialized to own address

2017-12-12 Thread Jason Merrill
On Wed, Dec 6, 2017 at 4:40 PM, Jason Merrill  wrote:
> In this testcase, we consider the initializer of b to decide if it's
> value-dependent, but the initializer mentions b, so we were recursing
> infinitely.  But if we're interested in the address, we don't care
> about the value; we already handle that appropriately in the constexpr
> code, this patch updates value_dependent_expression_p accordingly.

I've been uneasy about this approach, since it isn't necessarily true
that we don't care about the value; we might indirect later to get
back to the value.  So instead, this patch sets a flag to indicate a
constant variable with a dependent initializer.

The first patch is after reverting the other patch; the second is the
combination.

I also noticed that type_dependent_init_p is unnecessary since
type-dependence implies value-dependence.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 18e711b75363ec2f304ee8fc162efea968f8f8af
Author: Jason Merrill 
Date:   Fri Dec 8 07:45:02 2017 -0500

PR c++/82115 - ICE with variable initialized with its own address.

* cp-tree.h (struct lang_decl_base): Add dependent_init_p.
(DECL_DEPENDENT_INIT_P, SET_DECL_DEPENDENT_INIT_P): New.
* decl.c (cp_finish_decl): Set it.
(duplicate_decls): Copy it.
* pt.c (tsubst_decl): Clear it.
(value_dependent_expression_p): Check it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 736c4844bcf..a7acf49986f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2449,7 +2449,8 @@ struct GTY(()) lang_decl_base {
   unsigned u2sel : 1;
   unsigned concept_p : 1;  /* applies to vars and functions */
   unsigned var_declared_inline_p : 1; /* var */
-  /* 2 spare bits */
+  unsigned dependent_init_p : 1;  /* var */
+  /* 1 spare bit */
 };
 
 /* True for DECL codes which have template info and access.  */
@@ -3879,6 +3880,13 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
   (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.var_declared_inline_p \
= true)
 
+/* True if NODE is a variable with a value-dependent initializer.  */
+#define DECL_DEPENDENT_INIT_P(NODE)\
+  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))  \
+   && DECL_LANG_SPECIFIC (NODE)->u.base.dependent_init_p)
+#define SET_DECL_DEPENDENT_INIT_P(NODE, X) \
+  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.dependent_init_p = (X))
+
 /* Nonzero if NODE is an artificial VAR_DECL for a C++17 structured binding
declaration or one of VAR_DECLs for the user identifiers in it.  */
 #define DECL_DECOMPOSITION_P(NODE) \
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 99b22dc878c..3601fa117c7 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2120,6 +2120,8 @@ next_arg:;
  DECL_INITIALIZED_P (newdecl) |= DECL_INITIALIZED_P (olddecl);
  DECL_NONTRIVIALLY_INITIALIZED_P (newdecl)
|= DECL_NONTRIVIALLY_INITIALIZED_P (olddecl);
+ if (DECL_DEPENDENT_INIT_P (olddecl))
+   SET_DECL_DEPENDENT_INIT_P (newdecl, true);
  DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (newdecl)
|= DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (olddecl);
   if (DECL_CLASS_SCOPE_P (olddecl))
@@ -6910,14 +6912,18 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
 then it can be used in future constant expressions, so its value
 must be available. */
 
+  bool dep_init = false;
+
   if (!VAR_P (decl) || type_dependent_p)
/* We can't do anything if the decl has dependent type.  */;
+  else if (!init && is_concept_var (decl))
+error ("variable concept has no initializer");
   else if (init
   && init_const_expr_p
   && TREE_CODE (type) != REFERENCE_TYPE
   && decl_maybe_constant_var_p (decl)
-  && !type_dependent_init_p (init)
-  && !value_dependent_init_p (init))
+  && !(dep_init = (type_dependent_init_p (init)
+   || value_dependent_init_p (init
{
  /* This variable seems to be a non-dependent constant, so process
 its initializer.  If check_initializer returns non-null the
@@ -6929,8 +6935,6 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
init = NULL_TREE;
  release_tree_vector (cleanups);
}
-  else if (!init && is_concept_var (decl))
-error ("variable concept has no initializer");
   else if (!DECL_PRETTY_FUNCTION_P (decl))
{
  /* Deduce array size even if the initializer is dependent.  */
@@ -6944,6 +6948,11 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
 
   if (init)
DECL_INITIAL (decl) = init;
+  if (dep_init)
+   {
+ retrofit_lang_decl (decl);
+ SET_DECL_DEPENDENT_INIT_P (decl, true);
+   }
   return;
 }
 
diff -

RE: [PATCH][i386,AVX] Enable VBMI2 support [5/7]

2017-12-12 Thread Gerald Pfeifer
On Tue, 12 Dec 2017, Koval, Julia wrote:
> Here is the patch to update these files with my contributions. Ok for 
> trunk?

Nice, thank you.

The only change I'd suggest is -march=icelake to 
mark up the two options.

(No need to ask for any further approvals.)

Gerald


Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Martin Sebor

On 12/12/2017 10:56 AM, Qing Zhao wrote:

Hi, Martin,

thanks for the suggestion, this might be a good enhancement for
get_range_strlen for a future work.

my understanding is,  the current get_range_strlen does not use value
range info yet, and also does not handle VLA.
we can improve it from both aspects in a later work.


I agree (that's also what I meant).  Your patch is a valuable
improvement in and of itself.

Martin



Qing



Per your comments, the updated gimple-fold.c is like the following:


FWIW, I suspect Richard is thinking of VLAs with the INTEGER_CST
comment above.  This is the case we discussed in private.  It is
handled either way but the handling could be improved by determining
the size of the VLA from the first __builtin_alloca_with_align
argument and using it or its range as the minimum and maximum.
With that, the range of string lengths that fit in vla in
the following could be determined and the sprintf call could
be diagnosed:

 char d[30];

 void f (unsigned n)
 {
   if (n < 32)
 {
   char vla[n];
   __builtin_sprintf (d, "%s", vla);
 }
 }

I think this would be a nice enhancement to add on top of yours,
not just for VLAs but for dynamically allocated arrays as well,
and not just for the sprintf pass but also for the strlen pass
to optimize cases like:

 void f (unsigned n)
 {
   if (n < 32)
 {
   char vla[n];

   fgets (vla, n, stdin);

   unsigned len = strlen (vla);
   if (len >= n)   // cannot hold
 abort (); // can be eliminated
  }
   }

That said, at some point, it might make more sense to change
those passes to start tracking these things as they traverse
the CFG rather than having get_range_strlen() do the work.

Martin



diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..0500fba 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
bitmap *visited, int type,
 the array could have zero length.  */
  *minlen = ssize_int (0);
}
+
+  if (VAR_P (arg)
+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+{
+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+  if (!val || TREE_CODE (val) != INTEGER_CST ||
integer_zerop (val))
+return false;
+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+ build_int_cst (TREE_TYPE (val), 1));
+  /* Set the minimum size to zero since the string in
+ the array could have zero length.  */
+  *minlen = ssize_int (0);
+}
}

  if (!val)

let me know any further issue with the above.

thanks a lot.

Qing






Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Jeff Law
On 12/12/2017 01:15 PM, Martin Sebor wrote:
> Bug 83373 - False positive reported by -Wstringop-overflow, is
> another example of warning triggered by a missed optimization
> opportunity, this time in the strlen pass.  The optimization
> is discussed in pr78450 - strlen(s) return value can be assumed
> to be less than the size of s.  The gist of it is that the result
> of strlen(array) can be assumed to be less than the size of
> the array (except in the corner case of last struct members).
> 
> To avoid the false positive the attached patch adds this
> optimization to the strlen pass.  Although the patch passes
> bootstrap and regression tests for all front-ends I'm not sure
> the way it determines the upper bound of the range is 100%
> correct for languages with arrays with a non-zero lower bound.
> Maybe it's just not as tight as it could be.
What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?

jeff


[PATCH] diagnose attribute conflicts on conversion operators (PR 83394)

2017-12-12 Thread Martin Sebor

In bug 83394 - always_inline vs. noinline no longer diagnosed,
Jakub provided a test case where the recent enhancement to detect
nonsensical attribute combinations fails to detect a pair of
mutually exclusive attributes on separate declarations of
a conversion member operator (see bug 83322 for the origin of
the test case).  This case was previously diagnosed so this is
a regression introduced by the enhancement.

The attached patch restores this diagnostic.  I have very little
experience with lookup and scoping in the C++ front end so if
this isn't the right approach I'd be grateful for suggestions
for what API to use.

In a private conversation Jason mentioned there may be cases
involving templates where the current approach won't have access
to the "last declaration" and so won't be able to detect a mismatch.
I am yet to come up with an example where this happens.  If/when
I do I'll look into enhancing or modifying the current solution
to detect those as well.  But until then I'd like to submit this
as an incremental step in that direction.

The attached patch passes regression testing on x86_64-linux.

Thanks
Martin
PR c++/83394 - [8 Regression] always_inline vs. noinline no longer diagnosed

gcc/cp/ChangeLog:

	PR c++/83394
	* decl2.c (cplus_decl_attributes): Look up member functions
	in the scope of their class.

gcc/testsuite/ChangeLog:

	PR c++/83394
	* g++.dg/Wattributes-3.C: New test.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 5d30369..09bfec0 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -1483,8 +1483,24 @@ cplus_decl_attributes (tree *decl, tree attributes, int flags)
 }
   else
 {
-  tree last_decl = (DECL_P (*decl) && DECL_NAME (*decl)
-			? lookup_name (DECL_NAME (*decl)) : NULL_TREE);
+  tree last_decl = NULL_TREE;
+  if (tree name = DECL_P (*decl) ? DECL_NAME (*decl) : NULL_TREE)
+	{
+	  /* Look up the declaration in its scope.  */
+	  tree pushed_scope = NULL_TREE;
+	  if (tree ctype = DECL_CONTEXT (*decl))
+	pushed_scope = push_scope (ctype);
+
+	  last_decl = lookup_name (name);
+
+	  if (pushed_scope)
+	pop_scope (pushed_scope);
+
+	  /* The declaration may be a member conversion operator
+	 or a bunch of overfloads (handle the latter below).  */
+	  if (last_decl && BASELINK_P (last_decl))
+	last_decl = BASELINK_FUNCTIONS (last_decl);
+	}
 
   if (last_decl && TREE_CODE (last_decl) == OVERLOAD)
 	for (ovl_iterator iter (last_decl, true); ; ++iter)
diff --git a/gcc/testsuite/g++.dg/Wattributes-3.C b/gcc/testsuite/g++.dg/Wattributes-3.C
new file mode 100644
index 000..2479998
--- /dev/null
+++ b/gcc/testsuite/g++.dg/Wattributes-3.C
@@ -0,0 +1,89 @@
+// PR c++/83394 - always_inline vs. noinline no longer diagnosed
+// { dg-do compile }
+// { dg-options "-Wattributes" }
+
+#define ATTR(list) __attribute__ (list)
+
+struct A
+{
+  ATTR ((__noinline__)) operator int ();
+};
+
+ATTR ((__always_inline__))
+A::operator int ()// { dg-warning "ignoring attribute .always_inline. because it conflicts with attribute .noinline." }
+{
+  return 0;
+}
+
+
+struct B
+{
+  operator char () const;
+  ATTR ((__always_inline__)) operator int () const;
+};
+
+B::operator char () const { return 0; }
+
+ATTR ((__noinline__))
+B::operator int () const  // { dg-warning "ignoring attribute .noinline. because it conflicts with attribute .always_inline." }
+{
+  return 0;
+}
+
+
+struct C
+{
+  operator char ();
+  ATTR ((__always_inline__)) operator short ();
+  operator int ();
+  ATTR ((__noinline__)) operator long ();
+};
+
+C::operator char () { return 0; }
+
+ATTR ((__noinline__))
+C::operator short ()   // { dg-warning "ignoring attribute .noinline. because it conflicts with attribute .always_inline." }
+{ return 0; }
+
+inline ATTR ((__noinline__))
+C::operator int () { return 0; }
+
+
+ATTR ((__always_inline__))
+C::operator long ()   // { dg-warning "ignoring attribute .always_inline. because it conflicts with attribute .noinline." }
+{ return 0; }
+
+
+struct D
+{
+  int foo ();
+  int foo (int);
+  int ATTR ((const)) foo (int, int);
+  int ATTR ((pure)) foo (int, int, int);
+
+  int ATTR ((const)) foo (int, int, int, int);
+
+  int foo (int, int, int, int, int);
+};
+
+int ATTR ((const))
+D::foo ()
+{ return 0; }
+
+int ATTR ((pure))
+D::foo (int)
+{ return 0; }
+
+int ATTR ((pure))
+D::foo (int, int) // { dg-warning "ignoring attribute .pure. because it conflicts with attribute .const." }
+{ return 0; }
+
+int ATTR ((const))
+D::foo (int, int, int)// { dg-warning "ignoring attribute .const. because it conflicts with attribute .pure." }
+{ return 0; }
+
+int
+D::foo (int, int, int, int) { return 0; }
+
+int ATTR ((const))
+D::foo (int, int, int, int, int) { return 0; }


Re: [patch, libgfortran] Memory issues related to PR78549

2017-12-12 Thread Jerry DeLisle
Ping: If no objection, I will commit to trunk on Wednesday evening.

On 12/10/2017 07:55 PM, Jerry DeLisle wrote:
> Hi all,
> 
> While doing addition testing for the subject mentioned PR I discovered 
> numerous
> un-freed memory allocations. I reported the problem in comment 30 of the PR.
> 
> The attached patch cleans this up by opening the internal unit structures 
> during
> program initialization and allowing the automatic closure similar to
> pre-connected units. The internal stream structures are created and freed at 
> the
> beginning and end of each I/O operation.  I fix a few loose ends.
> 
> Regression tested on x86_64.
> 
> OK for trunk?  Also needed for same issue in 7.
> 
> Regards,
> 
> Jerry
> 
> 2017-12-11  Jerry DeLisle  
> 
>   PR libgfortran/78549
>   * io/inquire.c (inquire_via_unit): Adjust test for existence for
>   pre-connected internal units.
>   * io/transfer.c (finalize_transfer): When done with a transfer
>   to internal units, free the format buffer and close the stream.
>   (st_read_done): Delete freeing the stream, now handled using
>   sclose in finalize_transfer. (st_write_done): Likewise.
>   * io/unit.c (get_unit): Return NULL for special reserved unit
>   numbers, signifying not accessible to the user.
>   (init_units): Insert the two special internal units into the
>   unit treap. This makes these unit structures available without
>   further allocations for later use by internal unit I/O. These
>   units are automatically deleted by normal program termination.
>   * unix.c (mem_close): Add a guard check to protect from double free.
> 



Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Martin Sebor

On 12/12/2017 05:35 PM, Jeff Law wrote:

On 12/12/2017 01:15 PM, Martin Sebor wrote:

Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.

What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?


Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.  _FORTIFY_SOURCE allows it for raw memory functions
because some low level code copies regions of the same object that
span two or more members (I've seen an example or two in the Linux
kernel but, IIRC, nowhere else).  With the patch, using memchr
would be the only way to get away with it.

Other than that, the new -Wstringop-truncation warning is designed
to prevent creating unterminated character arrays and the manual
suggests using attribute nonstring when it's necessary.  The
-Wrestrict warning I'm about to check in also complains about
forming invalid pointers by built-ins with restrict-qualified
arguments (although it won't diagnose the above).

Although I would prefer not to, I suppose if letting strlen cross
the boundaries of subobjects was considered an important use to
accommodate in limited cases the optimization could be disabled
for member arrays declared with the new nonstring attribute (while
still issuing a warning for it as GCC does today).

Another alternative (if the above use case is considered important
enough) might be to suppress the optimization for member character
arrays that are immediately followed by another such array.

Martin


Re: [SFN+LVU+IEPM v4 6/9] [SFN] Introduce -gstatement-frontiers option, enable debug markers

2017-12-12 Thread Alexandre Oliva
On Dec 12, 2017, Christophe Lyon  wrote:

> Since this was checked in, I've noticed GCC/libatomic build failures
> on ARM targets when configuring
> --with-mode thumb (--with-mode arm is OK).

Could you possibly provide me with preprocessed source that exercises
the error, so that I could duplicate and debug it with a cross
compiler+assembler, without libc?  Thanks in advance,

(just in case, you might want to keep an eye on the patches in pr83396;
 patches there might already take care of this issue)

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PATCH RISC-V] Use C version of multi3 for RVE support.

2017-12-12 Thread Jim Wilson
This replaces the assembly language version of multi3/muldi3 with a C version,
as the assembly version does not work on RVE targets.  The C code is using the
same basic algorithm as the assembly code.  The RVE spec is not finalized yet,
so we haven't added the RVE option support yet, but this is one of the patches
needed to prepare for the eventual RVE support.

This was tested with a gcc testsuite run, there were no regressions.

It was also hand tested with some small testcases doing long long (32-bit
target) and int128_t (64-bit target) multiplies running on the gdb simulator
with --trace-insn to get instruction counts.  The new code is little faster
for most inputs, partly because of compiler optimization, but a little slower
when one of the inputs has a lot of leading ones.  We will worry about that
later if it turns out to be a problem.  Meanwhile, this helps with the RVE
support and gets another patch merged into FSF gcc from the github riscv
repository.

Committed.

libgcc/
* config/riscv/t-elf: Use multi3.c instead of multi3.S.
* config/riscv/multi3.c: New file.
* config/riscv/multi3.S: Remove.
---
 libgcc/ChangeLog |  6 
 libgcc/config/riscv/multi3.S | 81 -
 libgcc/config/riscv/multi3.c | 86 
 libgcc/config/riscv/t-elf|  2 +-
 4 files changed, 93 insertions(+), 82 deletions(-)
 delete mode 100644 libgcc/config/riscv/multi3.S
 create mode 100644 libgcc/config/riscv/multi3.c

diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 3d8b73e1c4a..6726a79d5b2 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-12-06  Kito Cheng 
+
+   * config/riscv/t-elf: Use multi3.c instead of multi3.S.
+   * config/riscv/multi3.c: New file.
+   * config/riscv/multi3.S: Remove.
+
 2017-11-30  Michael Meissner  
 
* config/rs6000/_mulkc3.c (__mulkc3): Add forward declaration.
diff --git a/libgcc/config/riscv/multi3.S b/libgcc/config/riscv/multi3.S
deleted file mode 100644
index 4d454e65013..000
--- a/libgcc/config/riscv/multi3.S
+++ /dev/null
@@ -1,81 +0,0 @@
-/* Integer multiplication routines for RISC-V.
-
-   Copyright (C) 2016-2017 Free Software Foundation, Inc.
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify it under
-the terms of the GNU General Public License as published by the Free
-Software Foundation; either version 3, or (at your option) any later
-version.
-
-GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-for more details.
-
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
-
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-.  */
-
-  .text
-  .align 2
-
-#if __riscv_xlen == 32
-/* Our RV64 64-bit routines are equivalent to our RV32 32-bit routines.  */
-# define __multi3 __muldi3
-#endif
-
-  .globl __multi3
-__multi3:
-
-#if __riscv_xlen == 32
-/* Our RV64 64-bit routines are equivalent to our RV32 32-bit routines.  */
-# define __muldi3 __mulsi3
-#endif
-
-/* We rely on the fact that __muldi3 doesn't clobber the t-registers.  */
-
-  mv  t0, ra
-  mv  t5, a0
-  mv  a0, a1
-  mv  t6, a3
-  mv  a1, t5
-  mv  a4, a2
-  li  a5, 0
-  li  t2, 0
-  li  t4, 0
-.L1:
-  add  a6, t2, a1
-  andi t3, a4, 1
-  slli a7, a5, 1
-  slti t1, a1, 0
-  srli a4, a4, 1
-  add  a5, t4, a5
-  beqz t3, .L2
-  sltu t3, a6, t2
-  mv   t2, a6
-  add  t4, t3, a5
-.L2:
-  slli a1, a1, 1
-  or   a5, t1, a7
-  bnez a4, .L1
-  beqz a0, .L3
-  mv   a1, a2
-  call __muldi3
-  add  t4, t4, a0
-.L3:
-  beqz t6, .L4
-  mv   a1, t6
-  mv   a0, t5
-  call  __muldi3
-  add  t4, t4, a0
-.L4:
-  mv  a0, t2
-  mv  a1, t4
-  jr  t0
diff --git a/libgcc/config/riscv/multi3.c b/libgcc/config/riscv/multi3.c
new file mode 100644
index 000..3606a5f220b
--- /dev/null
+++ b/libgcc/config/riscv/multi3.c
@@ -0,0 +1,86 @@
+/* Multiplication two double word integers for RISC-V.
+
+   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 

Re: [patch, libgfortran] Memory issues related to PR78549

2017-12-12 Thread Thomas Koenig

Hi Jerry,


Ping: If no objection, I will commit to trunk on Wednesday evening.


OK for trunk (even before that :-)

Thanks for the patch!

Regards

Thomas


Re: [SFN] Bootstrap broken

2017-12-12 Thread Alexandre Oliva
On Dec 12, 2017, David Edelsohn  wrote:

> Something in this series broke bootstrap on AIX, probably Power in general.

And probably a number of other platforms as well.

Here's a patch that seems to have fixed lots of problems.  Without it,
we might move debug (bind) stmts from a forwarder block about to be
removed, inserting them before the label of a successor block.  debug
bind stmts before labels are not welcome (they might become insns
outside any blocks, and not be properly adjusted; in this specific
testcase, a reg lowered to a concatn remained shared because the
out-of-block debug insn was not adjusted), and I'm reconsidering even
debug markers, but for now, I'm leaving markers before and among labels.

To fix the problem, I've rearranged the forwarder block remover to
insert stmts that were before or among labels before labels, but those
that are after labels (or in a block without labels) are inserted after
any labels in the destination block.  This should keep debug insns in
order, except when the forwarder block has no labels, whereas the
successor block had markers before the labels.  I'm undecided between
moving all markers in the successor after any labels in it, before
making other changes, or to rule out markers before or among labels
altogether.

For now, to restore bootstrap and builds on most platforms, this will
do.  There are still unrelated -fcompare-debug issues in the supplied
AIX testcase, but Jakub's reduced testcase passes even -fcompare-debug.

Regstrapping on x86_64-linux-gnu and i686-linux-gnu.  Hopefully I'll
have feedback about its bootstrap on powerpc-aix and sparc-solaris as
well before it goes in.  Ok to install?


[SFN] don't move after-label debug stmts before labels

When removing a forwarder block (that contains debug stmts), be careful
to not insert before labels debug stmts that appear after labels.

for  gcc/ChangeLog

PR bootstrap/83396
PR debug/83391
* tree-cfgcleanup.c (remove_forwarder_block): Keep after
labels debug stmts appearing after labels.

for  gcc/testsuite/ChangeLog

PR bootstrap/83396
PR debug/83391
* gcc.dg/torture/pr83396.c: New.
* g++.dg/torture/pr83391.C: New.
---
 gcc/testsuite/g++.dg/torture/pr83391.C |   34 +
 gcc/testsuite/gcc.dg/torture/pr83396.c |   37 
 gcc/tree-cfgcleanup.c  |   29 ++---
 3 files changed, 97 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr83391.C
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr83396.c

diff --git a/gcc/testsuite/g++.dg/torture/pr83391.C 
b/gcc/testsuite/g++.dg/torture/pr83391.C
new file mode 100644
index ..098a1101a3f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr83391.C
@@ -0,0 +1,34 @@
+/* PR debug/83391 */
+/* { dg-do compile } */
+
+unsigned char a;
+enum E { F, G, H } b;
+int c, d;
+
+void
+foo ()
+{
+  int e;
+  bool f;
+  E g = b;
+  while (1)
+{
+  unsigned char h = a ? d : 0;
+  switch (g)
+   {
+   case 0:
+ f = h <= 'Z' || h >= 'a' && h <= 'z';
+ break;
+   case 1:
+ {
+   unsigned char i = h;
+   e = 0;
+ }
+ if (e || h)
+   g = H;
+ /* FALLTHRU */
+   default:
+ c = 0;
+   }
+}
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr83396.c 
b/gcc/testsuite/gcc.dg/torture/pr83396.c
new file mode 100644
index ..4c0c30ceaab3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr83396.c
@@ -0,0 +1,37 @@
+/* PR bootstrap/83396 */
+/* { dg-do compile } */
+
+int fn1 (void);
+void fn2 (void *, const char *);
+void fn3 (void);
+
+void
+fn4 (long long x)
+{
+  fn3 ();
+}
+
+void
+fn5 (long long x)
+{
+  if (x)
+fn3();
+}
+
+void
+fn6 (long long x)
+{
+  switch (fn1 ())
+{
+case 0:
+  fn5 (x);
+case 2:
+  fn2 (0, "");
+  break;
+case 1:
+case 3:
+  fn4(x);
+case 5:
+  fn2 (0, "");
+}
+}
diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
index 0bee21756f2b..e30a93d504b6 100644
--- a/gcc/tree-cfgcleanup.c
+++ b/gcc/tree-cfgcleanup.c
@@ -536,14 +536,23 @@ remove_forwarder_block (basic_block bb)
  defined labels and labels with an EH landing pad number to the
  new block, so that the redirection of the abnormal edges works,
  jump targets end up in a sane place and debug information for
- labels is retained.  */
+ labels is retained.
+
+ While at that, move any debug stmts that appear before or among
+ labels, but not those that can only appear after labels, unless
+ the destination block didn't have labels of its own.  */
   gsi_to = gsi_start_bb (dest);
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
+  gsi = gsi_start_bb (bb);
+  gimple_stmt_iterator gsie = gsi_after_labels (bb);
+  gimple_stmt_iterator gsie_to = gsi_after_labels (dest);
+  bool can_move_early_debug_stmts = can_mov

Re: [SFN] Bootstrap broken

2017-12-12 Thread Alexandre Oliva
On Dec 12, 2017, Rainer Orth  wrote:

> Hi David,
>> Something in this series broke bootstrap on AIX, probably Power in general.

> I'm seeing the same in a sparc-sun-solaris2.11 bootstrap.

The AIX patch, that I've just emailed out in this thread, should fix
that as well.  As for the regression you reported, here's a fix.
Regstrapping; ok to install?


[SFN] don't assume BLOCK_FOR_INSN is set in var-tracking

There's no guarantee that BLOCK_FOR_INSN will be set before var-tracking.
So, keep track of whether we're in the first block header or inside a BB
explicitly, and apply the logic we meant to apply outside BBs only when
we are indeed outside a BB.

for  gcc/ChangeLog

PR bootstrap/83396
* var-tracking.c (vt_initialize): Keep track of BB boundaries.
---
 gcc/var-tracking.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 8e500b144712..12158dbb1e0d 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -10156,12 +10156,16 @@ vt_initialize (void)
  /* If we are walking the first basic block, walk any HEADER
 insns that might be before it too.  Unfortunately,
 BB_HEADER and BB_FOOTER are not set while we run this
-pass.  */
+pass.  Unfortunately, BLOCK_FOR_INSN may not be set, so
+we can't assume that its being NULL implies we're outside
+the basic block.  */
  insn = get_first_insn (bb);
+ bool outside_bb = insn != BB_HEAD (bb);
  for (rtx_insn *next;
   insn != BB_HEAD (bb->next_bb)
 ? next = NEXT_INSN (insn), true : false;
-  insn = next)
+  insn = next,
+outside_bb = outside_bb && next != BB_HEAD (bb))
{
  if (INSN_P (insn))
{
@@ -10169,11 +10173,11 @@ vt_initialize (void)
  if (!BLOCK_FOR_INSN (insn))
{
  BLOCK_FOR_INSN (insn) = bb;
- gcc_assert (DEBUG_INSN_P (insn));
+ gcc_assert (!outside_bb || DEBUG_INSN_P (insn));
  /* Reset debug insns between basic blocks.
 Their location is not reliable, because they
 were probably not maintained up to date.  */
- if (DEBUG_BIND_INSN_P (insn))
+ if (outside_bb && DEBUG_BIND_INSN_P (insn))
INSN_VAR_LOCATION_LOC (insn)
  = gen_rtx_UNKNOWN_VAR_LOC ();
}


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Bernhard Reutner-Fischer
On 12 December 2017 21:15:25 CET, Martin Sebor  wrote:

>
>Tested on x86_64-linux.

I assume this test worked even before this patch. Thus: 

s/oveflow/overflow/

thanks, 


Re: [SFN] Bootstrap broken

2017-12-12 Thread Alexandre Oliva
On Dec 12, 2017, David Edelsohn  wrote:

> Rainer,
> PR83396 opened.  you can add Solaris to the list of targets.

Andreas,

Here's a fix for the ia64 regression you mentioned in that PR.

I don't include the 'int main(){return 0;}' testcase, because it would
hardly be useful; any test would trigger this error on the right
platform.

Regstrapping; I suppose I could install it as obvious, but...  Ok to install?


[SFN] don't eliminate regs in markers

Eliminate regs in debug bind insns, but not in markers.

for  gcc/ChangeLog

PR bootstrap/83396
* reload1.c (eliminate_regs_in_insn): Skip debug markers.
---
 gcc/reload1.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/reload1.c b/gcc/reload1.c
index 322696a25f3e..fe1ec0d011fb 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -3202,7 +3202,7 @@ eliminate_regs_in_insn (rtx_insn *insn, int replace)
  || GET_CODE (PATTERN (insn)) == USE
  || GET_CODE (PATTERN (insn)) == CLOBBER
  || GET_CODE (PATTERN (insn)) == ASM_INPUT);
-  if (DEBUG_INSN_P (insn))
+  if (DEBUG_BIND_INSN_P (insn))
INSN_VAR_LOCATION_LOC (insn)
  = eliminate_regs (INSN_VAR_LOCATION_LOC (insn), VOIDmode, insn);
   return 0;


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[compare-debug] use call loc for nop_endbr

2017-12-12 Thread Alexandre Oliva
We skip debug insns and notes after a call that needs a nop_endbr, but
since a debug insn could be the last in a block, it may affect the loc
in the emitted nop_endbr insn.  Although this has no effect on
codegen, it does mess with debug info a bit, and it causes
-fcompare-debug to fail for e.g. libsanitizer's
tsan/tsan_platform_linux.cc on x86_64.

So, pick the location of the call insn for the nop_endbr insn, to
avoid the line number differences in dumps, including -fcompare-debug
ones.

Also, we don't need to determine what the insert point would be unless
we're actually emitting the nop_endbr insn after the call, so
rearrange the code to avoid wasting cycles.

Finally, it seems like testing for barriers is a mistake.  We probably
never actually pass that test, for the barriers would hit BB_END
first.  If we did, we'd end up emitting the nop_endbr outside any BB,
even after the end of the function!  That would be Very Bad (TM).
Now, since the test as it is can't hurt, I figured I wouldn't change
the logic right now, just add a comment so that someone involved in
endbr stuff can have a second look and hopefully fix it.

I'd appreciate if you'd try to drop the BARRIER_P from the loop test,
Igor, so as to address the final ??? in the comment I add.  Narrowing
the skipped notes to only the relevant post-call ones might make sense
as well, but it's not quite as important IMHO.

Regstrapping with -fcompare-debug on stage3 host and target builds on
x86_64- and i686-linux-gnu; ok to install?

for  gcc/ChangeLog

* config/i386/i386.c (rest_of_insert_endbranch): Use call loc
for its nop_endbr.
---
 gcc/config/i386/i386.c |   20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e323102cef59..8960b966b7fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2609,21 +2609,25 @@ rest_of_insert_endbranch (void)
{
  if (INSN_P (insn) && GET_CODE (insn) == CALL_INSN)
{
- rtx_insn *next_insn = insn;
+ if (find_reg_note (insn, REG_SETJMP, NULL) == NULL)
+   continue;
+ /* Generate ENDBRANCH after CALL, which can return more than
+twice, setjmp-like functions.  */
 
+ /* Skip notes and debug insns that must be next to the
+call insn.  ??? This might skip a lot more than
+that...  ??? Skipping barriers and emitting code
+after them surely looks like a mistake; we probably
+won't ever hit it, for we'll hit BB_END first.  */
+ rtx_insn *next_insn = insn;
  while ((next_insn != BB_END (bb))
  && (DEBUG_INSN_P (NEXT_INSN (next_insn))
  || NOTE_P (NEXT_INSN (next_insn))
  || BARRIER_P (NEXT_INSN (next_insn
next_insn = NEXT_INSN (next_insn);
 
- /* Generate ENDBRANCH after CALL, which can return more than
-twice, setjmp-like functions.  */
- if (find_reg_note (insn, REG_SETJMP, NULL) != NULL)
-   {
- cet_eb = gen_nop_endbr ();
- emit_insn_after (cet_eb, next_insn);
-   }
+ cet_eb = gen_nop_endbr ();
+ emit_insn_after_setloc (cet_eb, next_insn, INSN_LOCATION (insn));
  continue;
}
 

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer