*OpenMP Patch Ping*

2019-10-28 Thread Tobias Burnus

* [Patch][OpenMP] use_device_addr/use_device_ptr with Fortran 
allocatable/pointer arrays (= array descriptor)
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00985.html

* [Patch][Fortran/OpenMP] Don't create "alloc:" for 'target exit data'
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01344.html

The following might fall under 'obvious', hence, I consider to commit it the 
next days.
Still, comments are welcome:

* [Patch][Fortran] OpenMP – libgomp/testsuite – use 'stop' and 'dg-do run'
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01850.html

Thanks,

Tobias



Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-28 Thread Richard Biener
On Fri, 25 Oct 2019, Jiufu Guo wrote:

> Hi,
> 
> In PR88760, there are a few disscussion about improve or tune unroller for
> targets. And we would agree to enable unroller for small loops at O2 first.
> And we could see performance improvement(~10%) for below code:
> ```
>   subroutine foo (i, i1, block)
> integer :: i, i1
> integer :: block(9, 9, 9)
> block(i:9,1,i1) = block(i:9,1,i1) - 10
>   end subroutine foo
> 
> ```
> This kind of code occurs a few times in exchange2 benchmark.
> 
> Similar C code:
> ```
>   for (i = 0; i < n; i++)
> arr[i] = arr[i] - 10;
> ```
> 
> On powerpc64le, for O2 , enable -funroll-loops and limit
> PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
> overall improvement for SPEC2017.

Note the behavior with LTO will be surprising (and generally when
functions are compiled with different -O level via the optimize
attribute).  It's a bad idea to base --param values on the value
of a global optimization option that can be set per function
(see PR92046).

A better patch would have been to adjust via the target hooks for
unroll adjust.

Thanks,
Richard.

> This patch is only for rs6000 in which we see visible performance improvement.
> 
> Bootstrapped on powerpc64le, and cases are updated. Is this ok for trunk?
>
> Jiufu Guo
> BR
> 
> 
> gcc/
> 2019-10-25  Jiufu Guo 
> 
>   PR tree-optimization/88760
>   * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
>   O2, enable funroll-loops.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
>   is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
>   PARAM_MAX_UNROLLED_INSNS=20 for small loops.
>   
> gcc.testsuite/
> 2019-10-25  Jiufu Guo  
> 
>   PR tree-optimization/88760
>   * gcc.target/powerpc/small-loop-unroll.c: New test.
>   * c-c++-common/tsan/thread_leak2.c: Update test.
>   * gcc.dg/pr59643.c: Update test.
>   * gcc.target/powerpc/loop_align.c: Update test.
>   * gcc.target/powerpc/ppc-fma-1.c: Update test.
>   * gcc.target/powerpc/ppc-fma-2.c: Update test.
>   * gcc.target/powerpc/ppc-fma-3.c: Update test.
>   * gcc.target/powerpc/ppc-fma-4.c: Update test.
>   * gcc.target/powerpc/pr78604.c: Update test.
> 
> ---
>  gcc/common/config/rs6000/rs6000-common.c |  1 +
>  gcc/config/rs6000/rs6000.c   | 20 
> 
>  gcc/testsuite/c-c++-common/tsan/thread_leak2.c   |  1 +
>  gcc/testsuite/gcc.dg/pr59643.c   |  1 +
>  gcc/testsuite/gcc.target/powerpc/loop_align.c|  2 +-
>  gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr78604.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c | 13 +
>  11 files changed, 42 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> 
> diff --git a/gcc/common/config/rs6000/rs6000-common.c 
> b/gcc/common/config/rs6000/rs6000-common.c
> index 4b0c205..b947196 100644
> --- a/gcc/common/config/rs6000/rs6000-common.c
> +++ b/gcc/common/config/rs6000/rs6000-common.c
> @@ -35,6 +35,7 @@ static const struct default_options 
> rs6000_option_optimization_table[] =
>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> +{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>};
>  
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index a129137d..9a8ff9a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p)
>global_options.x_param_values,
>global_options_set.x_param_values);
>  
> +  /* unroll very small loops 2 time if no -funroll-loops.  */
> +  if (!global_options_set.x_flag_unroll_loops
> +   && !global_options_set.x_flag_unroll_all_loops)
> + {
> +   maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> +  global_options.x_param_values,
> +  global_options_set.x_param_values);
> +
> +   maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> +  global_options.x_param_values,
> +  global_options_set.x_param_values);
> +
> +   /* If fweb or frename-registers are not specificed in command-line,
> +  do not turn them on implicitly.  */
> +   if (!global_options_set.x_flag_web)
> + global_options.x_flag_web

Fix ICE in updating inline summaries

2019-10-28 Thread Jan Hubicka
Hi,
my yesterday patch introduced ICE on combining -O2 and -O0 translation
units (since the second has missing summary).

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-fnsummary.c (ipa_merge_fn_summary_after_inlining): Check
for missing EDGE_REF
* ipa-prop.c (update_jump_functions_after_inlining): Likewise.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 277503)
+++ ipa-fnsummary.c (working copy)
@@ -3315,7 +3315,7 @@ ipa_merge_fn_summary_after_inlining (str
   if (ipa_node_params_sum && callee_info->conds)
 {
   class ipa_edge_args *args = IPA_EDGE_REF (edge);
-  int count = ipa_get_cs_argument_count (args);
+  int count = args ? ipa_get_cs_argument_count (args) : 0;
   int i;
 
   if (count)
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 277503)
+++ ipa-prop.c  (working copy)
@@ -2741,7 +2741,7 @@ update_jump_functions_after_inlining (st
  /* We must check range due to calls with variable number of arguments
 and we cannot combine jump functions with operations.  */
  if (dst->value.pass_through.operation == NOP_EXPR
- && (dst->value.pass_through.formal_id
+ && (top && dst->value.pass_through.formal_id
  < ipa_get_cs_argument_count (top)))
{
  int dst_fid = dst->value.pass_through.formal_id;
@@ -3430,7 +3430,7 @@ update_indirect_edges_after_inlining (st
continue;
 
   /* We must check range due to calls with variable number of arguments:  
*/
-  if (ici->param_index >= ipa_get_cs_argument_count (top))
+  if (!top || ici->param_index >= ipa_get_cs_argument_count (top))
{
  ici->param_index = -1;
  continue;


Re: introduce -fcallgraph-info option

2019-10-28 Thread Richard Biener
On Sun, 27 Oct 2019, Alexandre Oliva wrote:

> On Oct 26, 2019, Alexandre Oliva  wrote:
> 
> > E.g., the reason we gather expanded calls rather than just use
> > cgraph_edges is that the latter would dump several "calls" that are
> > builtins expanded internally by the compiler, and would NOT dump other
> > ops that are expanded as (lib)calls.
> 
> It occurred to me that we could reuse most cgraph edges and avoid
> duplicating them in the callees vec, namely those that aren't builtins
> or libcalls.  Those that are builtins might or might not become actual
> calls, so we disregard the cgraph_edges and record them in the vec
> instead.  Those that are libcalls (builtins in a different sense)
> introduced during expand are not even present in the cgraph edges, so we
> record them in the vec as well.  Here's the patch that implements this.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

<...>

> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1091,6 +1091,14 @@ fbtr-bb-exclusive
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>  
> +fcallgraph-info
> +Common Report RejectNegative Var(flag_callgraph_info) 
> Init(NO_CALLGRAPH_INFO);
> +Output callgraph information on a per-file basis
> +
> +fcallgraph-info=
> +Common Report RejectNegative Joined
> +Output callgraph information on a per-file basis with decorations
> +
>  fcall-saved-
>  Common Joined RejectNegative Var(common_deferred_options) Defer
>  -fcall-saved-  Mark  as being preserved across 
> functions.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1407d019d1404..545b842eade71 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -583,8 +583,9 @@ Objective-C and Objective-C++ Dialects}.
>  @item Developer Options
>  @xref{Developer Options,,GCC Developer Options}.
>  @gccoptlist{-d@var{letters}  -dumpspecs  -dumpmachine  -dumpversion @gol
> --dumpfullversion  -fchecking  -fchecking=@var{n}  -fdbg-cnt-list @gol
> --fdbg-cnt=@var{counter-value-list} @gol
> +-dumpfullversion  -fcallgraph-info@r{[}=su,da@r{]}
> +-fchecking  -fchecking=@var{n}
> +-fdbg-cnt-list @gol  -fdbg-cnt=@var{counter-value-list} @gol
>  -fdisable-ipa-@var{pass_name} @gol
>  -fdisable-rtl-@var{pass_name} @gol
>  -fdisable-rtl-@var{pass-name}=@var{range-list} @gol
> @@ -14533,6 +14534,18 @@ The files are created in the directory of the output 
> file.
>  
>  @table @gcctabopt
>  
> +@item -fcallgraph-info
> +@itemx -fcallgraph-info=@var{MARKERS}
> +@opindex fcallgraph-info
> +Makes the compiler output callgraph information for the program, on a
> +per-file basis.  The information is generated in the common VCG format.

I guess you need to elaborate on 'per-file'.  With LTO as far as I
understand you'll get the graph per LTRANS unit (did you check
where the output is generated?).

Is this mainly a debugging tool or does it serve a different purpose?

Otherwise OK.

Thanks,
Richard.

> +It can be decorated with additional, per-node and/or per-edge information,
> +if a list of comma-separated markers is additionally specified.  When the
> +@code{su} marker is specified, the callgraph is decorated with stack usage
> +information; it is equivalent to @option{-fstack-usage}.  When the @code{da}
> +marker is specified, the callgraph is decorated with information about
> +dynamically allocated objects.


Re: [PATCH] Fix PR92222

2019-10-28 Thread Richard Biener
On Sat, 26 Oct 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > We have to check each operand for being in a pattern, not just the
> > first when avoiding build from scalars (we could possibly handle
> > the special case of some of them being the pattern stmt root, but
> > that would be a followup improvement).
> >
> > Bootstrap & regtest running on x86-64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2019-10-25  Richard Biener  
> >
> > PR tree-optimization/9
> > * tree-vect-slp.c (_slp_oprnd_info::first_pattern): Remove.
> > (_slp_oprnd_info::second_pattern): Likewise.
> > (_slp_oprnd_info::any_pattern): New.
> > (vect_create_oprnd_info): Adjust.
> > (vect_get_and_check_slp_defs): Compute whether any stmt is
> > in a pattern.
> > (vect_build_slp_tree_2): Avoid building up a node from scalars
> > if any of the operand defs, not just the first, is in a pattern.
> 
> Is recording this in vect_get_and_check_slp_defs enough?  AIUI we don't
> get that far if vect_build_slp_tree_1 fails, but that doesn't prevent us
> from building the vector from scalars on return from vect_build_slp_tree.

But there we've got oprnd_info for the operands of that build.  So yes,
I think so (for the correctness part).  What I'm not sure is whether
we can do the scalar build if the stmts are at most the pattern root
(so we can use the vect_original_stmt def as scalar operand).

> Was wondering whether we should use a helper function that explicitly
> checks whether any stmt_info in the vec is a pattern statement, rather
> than computing it on the fly.

I don't think that's necessary here.

Richard.

> Thanks,
> Richard
> 
> >
> > * gcc.dg/torture/pr9.c: New testcase.
> >
> > Index: gcc/tree-vect-slp.c
> > ===
> > --- gcc/tree-vect-slp.c (revision 277441)
> > +++ gcc/tree-vect-slp.c (working copy)
> > @@ -177,8 +177,7 @@ typedef struct _slp_oprnd_info
> >   stmt.  */
> >tree first_op_type;
> >enum vect_def_type first_dt;
> > -  bool first_pattern;
> > -  bool second_pattern;
> > +  bool any_pattern;
> >  } *slp_oprnd_info;
> >  
> >  
> > @@ -199,8 +198,7 @@ vect_create_oprnd_info (int nops, int gr
> >oprnd_info->ops.create (group_size);
> >oprnd_info->first_dt = vect_uninitialized_def;
> >oprnd_info->first_op_type = NULL_TREE;
> > -  oprnd_info->first_pattern = false;
> > -  oprnd_info->second_pattern = false;
> > +  oprnd_info->any_pattern = false;
> >oprnds_info.quick_push (oprnd_info);
> >  }
> >  
> > @@ -339,13 +337,11 @@ vect_get_and_check_slp_defs (vec_info *v
> >tree oprnd;
> >unsigned int i, number_of_oprnds;
> >enum vect_def_type dt = vect_uninitialized_def;
> > -  bool pattern = false;
> >slp_oprnd_info oprnd_info;
> >int first_op_idx = 1;
> >unsigned int commutative_op = -1U;
> >bool first_op_cond = false;
> >bool first = stmt_num == 0;
> > -  bool second = stmt_num == 1;
> >  
> >if (gcall *stmt = dyn_cast  (stmt_info->stmt))
> >  {
> > @@ -418,13 +414,12 @@ again:
> >   return -1;
> > }
> >  
> > -  if (second)
> > -   oprnd_info->second_pattern = pattern;
> > +  if (def_stmt_info && is_pattern_stmt_p (def_stmt_info))
> > +   oprnd_info->any_pattern = true;
> >  
> >if (first)
> > {
> >   oprnd_info->first_dt = dt;
> > - oprnd_info->first_pattern = pattern;
> >   oprnd_info->first_op_type = TREE_TYPE (oprnd);
> > }
> >else
> > @@ -1311,7 +1306,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
> >   /* ???  Rejecting patterns this way doesn't work.  We'd have to
> >  do extra work to cancel the pattern so the uses see the
> >  scalar version.  */
> > - && !is_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (child)[0]))
> > + && !oprnd_info->any_pattern)
> > {
> >   slp_tree grandchild;
> >  
> > @@ -1358,18 +1353,16 @@ vect_build_slp_tree_2 (vec_info *vinfo,
> >   /* ???  Rejecting patterns this way doesn't work.  We'd have to
> >  do extra work to cancel the pattern so the uses see the
> >  scalar version.  */
> > - && !is_pattern_stmt_p (stmt_info))
> > + && !is_pattern_stmt_p (stmt_info)
> > + && !oprnd_info->any_pattern)
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_NOTE, vect_location,
> >  "Building vector operands from scalars\n");
> >   this_tree_size++;
> > - child = vect_create_new_slp_node (oprnd_info->def_stmts);
> > - SLP_TREE_DEF_TYPE (child) = vect_external_def;
> > - SLP_TREE_SCALAR_OPS (child) = oprnd_info->ops;
> > + child = vect_create_new_slp_node (oprnd_info->ops);
> >   children.safe_push (child);
> >   oprnd_info->ops = vNULL;
> > - oprnd_info->def_stmts = vNULL;
> >   continue;
> > }
> >  
> > @@ -1469,7 +1440,7 @@ vect_build_slp_tree_2 (vec_inf

Re: [RFA][1/3] Remove Cell Broadband Engine SPU targets

2019-10-28 Thread Eric Botcazou
> OK for all 3 patches.  Similarly for removing any other remnants you
> might find later.

I have removed left-overs in htdocs/backends.html

-- 
Eric Botcazoudiff --git a/htdocs/backends.html b/htdocs/backends.html
index 6e212b24..c9449065 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -112,7 +112,6 @@ rx |  s
 s390   |   ? Qqr gia e
 sh | Q   CB   qr pi
 sparc  | Q   CB   qr  b   ia
-spu|   ? Q  *C  mgi
 stormy16   | ???L  FIC D lb   i
 tilegx |   S Q   Cq  gi  e
 tilepro|   S   F C   gi  e
@@ -130,10 +129,5 @@ https://gcc.gnu.org/ml/gcc/2003-10/msg00027.html.
 href="https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb";>
 https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb.
 
-* SPU's float is special; it has the same format as IEEE float but has
-an extended range and no infinity or NaNs. SPU's float will not produce
-denormal or negative zeros; both are treated as positive zero.  On the
-other hand, SPU's double is IEEE floating point.
-
 
 


Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-28 Thread Jiufu Guo
Richard Biener  writes:

> On Fri, 25 Oct 2019, Jiufu Guo wrote:
>
>> Hi,
>> 
>> In PR88760, there are a few disscussion about improve or tune unroller for
>> targets. And we would agree to enable unroller for small loops at O2 first.
>> And we could see performance improvement(~10%) for below code:
>> ```
>>   subroutine foo (i, i1, block)
>> integer :: i, i1
>> integer :: block(9, 9, 9)
>> block(i:9,1,i1) = block(i:9,1,i1) - 10
>>   end subroutine foo
>> 
>> ```
>> This kind of code occurs a few times in exchange2 benchmark.
>> 
>> Similar C code:
>> ```
>>   for (i = 0; i < n; i++)
>> arr[i] = arr[i] - 10;
>> ```
>> 
>> On powerpc64le, for O2 , enable -funroll-loops and limit
>> PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
>> overall improvement for SPEC2017.
>
> Note the behavior with LTO will be surprising (and generally when
> functions are compiled with different -O level via the optimize
> attribute).  It's a bad idea to base --param values on the value
> of a global optimization option that can be set per function
> (see PR92046).

Thanks Richard,
--param values are not save per function. If using different method to
compile and link (e.g. compiling with -O2 -flto, linking with -flto -O2
-funroll-loops --param xxx), compiling and linking would use different
--param value. 

>
> A better patch would have been to adjust via the target hooks for
> unroll adjust.
Yeap. And in unroll adjust target hook, we can do fine tunning with more
heuristic thougths. I'm going to refine this way.

Thanks again.

Jiufu Guo
BR.

>
> Thanks,
> Richard.
>
>> This patch is only for rs6000 in which we see visible performance 
>> improvement.
>> 
>> Bootstrapped on powerpc64le, and cases are updated. Is this ok for trunk?
>>
>> Jiufu Guo
>> BR
>> 
>> 
>> gcc/
>> 2019-10-25  Jiufu Guo
>> 
>>  PR tree-optimization/88760
>>  * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
>>  O2, enable funroll-loops.
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
>>  is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
>>  PARAM_MAX_UNROLLED_INSNS=20 for small loops.
>>  
>> gcc.testsuite/
>> 2019-10-25  Jiufu Guo  
>> 
>>  PR tree-optimization/88760
>>  * gcc.target/powerpc/small-loop-unroll.c: New test.
>>  * c-c++-common/tsan/thread_leak2.c: Update test.
>>  * gcc.dg/pr59643.c: Update test.
>>  * gcc.target/powerpc/loop_align.c: Update test.
>>  * gcc.target/powerpc/ppc-fma-1.c: Update test.
>>  * gcc.target/powerpc/ppc-fma-2.c: Update test.
>>  * gcc.target/powerpc/ppc-fma-3.c: Update test.
>>  * gcc.target/powerpc/ppc-fma-4.c: Update test.
>>  * gcc.target/powerpc/pr78604.c: Update test.
>> 
>> ---
>>  gcc/common/config/rs6000/rs6000-common.c |  1 +
>>  gcc/config/rs6000/rs6000.c   | 20 
>> 
>>  gcc/testsuite/c-c++-common/tsan/thread_leak2.c   |  1 +
>>  gcc/testsuite/gcc.dg/pr59643.c   |  1 +
>>  gcc/testsuite/gcc.target/powerpc/loop_align.c|  2 +-
>>  gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr78604.c   |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c | 13 +
>>  11 files changed, 42 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
>> 
>> diff --git a/gcc/common/config/rs6000/rs6000-common.c 
>> b/gcc/common/config/rs6000/rs6000-common.c
>> index 4b0c205..b947196 100644
>> --- a/gcc/common/config/rs6000/rs6000-common.c
>> +++ b/gcc/common/config/rs6000/rs6000-common.c
>> @@ -35,6 +35,7 @@ static const struct default_options 
>> rs6000_option_optimization_table[] =
>>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>> +{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>>};
>>  
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index a129137d..9a8ff9a 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p)
>>   global_options.x_param_values,
>>   global_options_set.x_param_values);
>>  
>> +  /* unroll very small loops 2 time if no -funroll-loops.  */
>> +  if (!global_options_set.x_flag_unroll_loops
>> +  && !global_options_set.x_flag_unroll_all_loops)
>> +{
>> +  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>> +   

Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-10-28 Thread Arnaud Charlet
> > 2019-09-10  Arnaud Charlet  
> > 
> > * doc/install.texi: Fix syntax for html generation.
> > 
> > Index: doc/install.texi
> > ===
> > --- doc/install.texi(revision 275400)
> > +++ doc/install.texi(working copy)
> > @@ -2727,7 +2727,12 @@
> > 
> >  @section Building the Ada compiler
> > 
> > -See @ref{GNAT-prerequisite}.
> > +@ifnothtml
> > +@ref{GNAT-prerequisite}.
> > +@end ifnothtml
> > +@ifhtml
> > +@uref{GNAT-prerequisite}.
> > +@end ifhtml
> 
> Hmm, I'm afraid this does not work as intended.
> 
> https://gcc.gnu.org/install/build.html now links to
> https://gcc.gnu.org/install/GNAT-prerequisite which simply does not
> exit.
> 
> Mind looking into this?

OK. Would be good to switch to another doc generation tool one of these
days, these things are not fun to deal with.

(I'm on vacation this week, so will take a look next week).

Arno


Re: [WIP PATCH] add object access attributes (PR 83859)

2019-10-28 Thread Richard Biener
On Sun, Oct 27, 2019 at 6:32 PM Jeff Law  wrote:
>
> On 9/29/19 1:51 PM, Martin Sebor wrote:
> > -Wstringop-overflow detects a subset of past-the-end read and write
> > accesses by built-in functions such as memcpy and strcpy.  It relies
> > on the functions' effects the knowledge of which is hardwired into
> > GCC.  Although it's possible for users to create wrappers for their
> > own functions to detect similar problems, it's quite cumbersome and
> > so only lightly used outside system libraries like Glibc.  Even Glibc
> > only checks for buffer overflow and not for reading past the end.
> >
> > PR 83859 asks to expose the same checking that GCC does natively for
> > built-in calls via a function attribute that associates a pointer
> > argument with the size argument, such as:
> >
> >   __attribute__((buffer_size (1, 2))) void
> >   f (char* dst, size_t dstsize);
> >
> > The attached patch is my initial stab at providing this feature by
> > introducing three new attributes:
> >
> >   * read_only (ptr-argno, size-argno)
> >   * read_only (ptr-argno, size-argno)
> >   * read_write (ptr-argno, size-argno)
> >
> > As requested, the attributes associate a pointer parameter to
> > a function with a size parameter.  In addition, they also specify
> > how the function accesses the object the pointer points to: either
> > it only reads from it, or it only writes to it, or it does both.
> >
> > Besides enabling the same buffer overflow detection as for built-in
> > string functions they also let GCC issue -Wuninitialized warnings
> > for uninitialized objects passed to read-only functions by reference,
> > and -Wunused-but-set warnings for objects passed to write-only
> > functions that are otherwise unused (PR 80806).  The -Wununitialized
> > part is done. The -Wunused-but-set detection is implemented only in
> > the C FE and not yet in C++.
> >
> > Besides the diagnostic improvements above the attributes also open
> > up optimization opportunities such as DCE.  I'm still working on this
> > and so it's not yet part of the initial patch.
> >
> > I plan to finish the patch for GCC 10 but I don't expect to have
> > the time to start taking advantage of the attributes for optimization
> > until GCC 11.
> >
> > Besides regression testing on x86_64-linux, I also tested the patch
> > by compiling Binutils/GDB, Glibc, and the Linux kernel with it.  It
> > found no new problems but caused a handful of -Wunused-but-set-variable
> > false positives due to an outstanding bug in the C front-end introduced
> > by the patch that I still need to fix.
> >
> > Martin
> >
> > gcc-80806.diff
> >
> > PR c/80806 - gcc does not warn if local array is memset only
> > PR middle-end/83859 - attribute to associate buffer and its size
> >
> > gcc/ChangeLog:
> >
> >   PR c/80806
> >   PR middle-end/83859
> >   * builtin-attrs.def (ATTR_NO_SIDE_EFFECT): New.
> >   (ATTR_READ_ONLY, ATTR_READ_WRITE, ATTR_WRITE_ONLY): New.
> >   (ATTR_NOTHROW_WRONLY1_LEAF, ATTR_NOTHROW_WRONLY1_2_LEAF): New.
> >   (ATTR_NOTHROW_WRONLY1_3_LEAF, ATTR_NOTHROW_WRONLY2_3_LEAF): New.
> >   (ATTR_RET1_NOTHROW_WRONLY1_LEAF, ATTR_RET1_NOTHROW_WRONLY1_3_LEAF): 
> > New.
> >   (ATTR_RET1_NOTHROW_NONNULL_RDONLY2_LEAF): New.
> >   (ATTR_RET1_NOTHROW_NONNULL_RDWR1_RDONLY2_LEAF): New.
> >   (ATTR_RET1_NOTHROW_WRONLY1_3_RDONLY2_3_LEAF): New.
> >   (ATTR_RET1_NOTHROW_WRONLY1_RDONLY2_LEAF): New.
> >   (ATTR_RET1_NOTHROW_WRONLY1_3_RDONLY2_LEAF): New.
> >   (ATTR_MALLOC_NOTHROW_NONNULL_RDONLY1_LEAF): New.
> >   (ATTR_PURE_NOTHROW_NONNULL_RDONLY1_LEAF): New.
> >   (ATTR_PURE_NOTHROW_NONNULL_RDONLY1_RDONLY2_LEAF): New.
> >   (ATTR_RETNONNULL_RDONLY2_3_NOTHROW_LEAF): New.
> >   (ATTR_RETNONNULL_WRONLY1_3_RDONLY2_3_NOTHROW_LEAF): New.
> >   (ATTR_PURE_NOTHROW_NONNULL_RDONLY1_3_LEAF): New.
> >   (ATTR_PURE_NOTHROW_NONNULL_RDONLY1_2_3_LEAF): New.
> >   (ATTR_RETNONNULL_RDONLY2_NOTHROW_LEAF): New.
> >   (ATTR_RETNONNULL_WRONLY1_RDONLY2_NOTHROW_LEAF): New.
> >   (ATTR_RETNONNULL_WRONLY1_3_RDONLY2_NOTHROW_LEAF): New.
> >   * builtins.c (check_access): Make extern.  Consistently set
> >   the no-warning bit after issuing a warning.
> >   * builtins.h (check_access): Declare.
> >   * builtins.def (bcopy, bzero, index, memchr, memcmp, memcpy): Add
> >   read_only and write_only attributes.
> >   (memset, rindex, stpcpy, stpncpy, strcasecmp, strcat): Same.
> >   (strchr, strcmp, strcpy, strcspn, strdup, strndup, strlen): Same.
> >   (strncasecmp, strncat, strncmp, strncpy, strrchr, strspn, strstr): 
> > Same.
> >   (free, __memcpy_chk, __memmove_chk, __memset_chk): Same.
> >   (__strcpy_chk, __strncpy_chk): Same.
> >   * calls.c (rdwr_access_hash): New type.
> >   (rdwr_map): Same.
> >   (init_attr_rdwr_indices): New function.
> >   (maybe_warn_rdwr_sizes): Same.
> >   (initialize_argument_information): Call init_attr_rdwr_indices.
> >   Call ma

Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-28 Thread Richard Biener
On Mon, 28 Oct 2019, Jiufu Guo wrote:

> Richard Biener  writes:
> 
> > On Fri, 25 Oct 2019, Jiufu Guo wrote:
> >
> >> Hi,
> >> 
> >> In PR88760, there are a few disscussion about improve or tune unroller for
> >> targets. And we would agree to enable unroller for small loops at O2 first.
> >> And we could see performance improvement(~10%) for below code:
> >> ```
> >>   subroutine foo (i, i1, block)
> >> integer :: i, i1
> >> integer :: block(9, 9, 9)
> >> block(i:9,1,i1) = block(i:9,1,i1) - 10
> >>   end subroutine foo
> >> 
> >> ```
> >> This kind of code occurs a few times in exchange2 benchmark.
> >> 
> >> Similar C code:
> >> ```
> >>   for (i = 0; i < n; i++)
> >> arr[i] = arr[i] - 10;
> >> ```
> >> 
> >> On powerpc64le, for O2 , enable -funroll-loops and limit
> >> PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
> >> overall improvement for SPEC2017.
> >
> > Note the behavior with LTO will be surprising (and generally when
> > functions are compiled with different -O level via the optimize
> > attribute).  It's a bad idea to base --param values on the value
> > of a global optimization option that can be set per function
> > (see PR92046).
> 
> Thanks Richard,
> --param values are not save per function. If using different method to
> compile and link (e.g. compiling with -O2 -flto, linking with -flto -O2
> -funroll-loops --param xxx), compiling and linking would use different
> --param value. 

Yes, and more so if you have most units commpiled with -O2 but one
with -O3 then you get implicit "global" -O3 at link-time and thus
-O3 --param settings for all -O2 compiled functions.

> >
> > A better patch would have been to adjust via the target hooks for
> > unroll adjust.
> Yeap. And in unroll adjust target hook, we can do fine tunning with more
> heuristic thougths. I'm going to refine this way.

Thanks.  Note enabling unrolling at -O2 for the target like you did
should work OK for LTO (even if some units explicitely disable it
via -fno-unroll-loop).

Richard.

> Thanks again.
> 
> Jiufu Guo
> BR.
> 
> >
> > Thanks,
> > Richard.
> >
> >> This patch is only for rs6000 in which we see visible performance 
> >> improvement.
> >> 
> >> Bootstrapped on powerpc64le, and cases are updated. Is this ok for trunk?
> >>
> >> Jiufu Guo
> >> BR
> >> 
> >> 
> >> gcc/
> >> 2019-10-25  Jiufu Guo  
> >> 
> >>PR tree-optimization/88760
> >>* config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
> >>O2, enable funroll-loops.
> >>* config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
> >>is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
> >>PARAM_MAX_UNROLLED_INSNS=20 for small loops.
> >>
> >> gcc.testsuite/
> >> 2019-10-25  Jiufu Guo  
> >> 
> >>PR tree-optimization/88760
> >>* gcc.target/powerpc/small-loop-unroll.c: New test.
> >>* c-c++-common/tsan/thread_leak2.c: Update test.
> >>* gcc.dg/pr59643.c: Update test.
> >>* gcc.target/powerpc/loop_align.c: Update test.
> >>* gcc.target/powerpc/ppc-fma-1.c: Update test.
> >>* gcc.target/powerpc/ppc-fma-2.c: Update test.
> >>* gcc.target/powerpc/ppc-fma-3.c: Update test.
> >>* gcc.target/powerpc/ppc-fma-4.c: Update test.
> >>* gcc.target/powerpc/pr78604.c: Update test.
> >> 
> >> ---
> >>  gcc/common/config/rs6000/rs6000-common.c |  1 +
> >>  gcc/config/rs6000/rs6000.c   | 20 
> >> 
> >>  gcc/testsuite/c-c++-common/tsan/thread_leak2.c   |  1 +
> >>  gcc/testsuite/gcc.dg/pr59643.c   |  1 +
> >>  gcc/testsuite/gcc.target/powerpc/loop_align.c|  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/pr78604.c   |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c | 13 +
> >>  11 files changed, 42 insertions(+), 6 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> >> 
> >> diff --git a/gcc/common/config/rs6000/rs6000-common.c 
> >> b/gcc/common/config/rs6000/rs6000-common.c
> >> index 4b0c205..b947196 100644
> >> --- a/gcc/common/config/rs6000/rs6000-common.c
> >> +++ b/gcc/common/config/rs6000/rs6000-common.c
> >> @@ -35,6 +35,7 @@ static const struct default_options 
> >> rs6000_option_optimization_table[] =
> >>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
> >>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
> >>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> >> +{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> >>  { OPT_LEVELS_NONE, 0, NULL, 0 }
> >>};
> >>  
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs60

Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-10-28 Thread Andreas Schwab
On Sep 10 2019, Arnaud Charlet wrote:

> Allright, there are already similar kludges elsewhere, so I've applied the
> following patch which fixes it:
>
> 2019-09-10  Arnaud Charlet  
>
>   * doc/install.texi: Fix syntax for html generation.
>
> Index: doc/install.texi
> ===
> --- doc/install.texi(revision 275400)
> +++ doc/install.texi(working copy)
> @@ -2727,7 +2727,12 @@
>
>  @section Building the Ada compiler
>
> -See @ref{GNAT-prerequisite}.
> +@ifnothtml
> +@ref{GNAT-prerequisite}.
> +@end ifnothtml
> +@ifhtml
> +@uref{GNAT-prerequisite}.

That should be @uref{prerequisites.html#GNAT-prerequisite,,GNAT prerequisites}.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Fix PR90796

2019-10-28 Thread Michael Matz
Hi,

On Wed, 23 Oct 2019, Michael Matz wrote:

> On Tue, 22 Oct 2019, Rainer Orth wrote:
> 
> > > testsuite/
> > >   * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.
> > 
> > this testcase now FAILs on 32-bit targets (seen on i386-pc-solaris2.11
> > and sparc-sun-solaris2.11, also reports for i686-pc-linux-gnu and
> > i586-unknown-freebsd11.2):
> > 
> > +FAIL: gcc.dg/unroll-and-jam.c scan-tree-dump-times unrolljam "applying 
> > unroll and jam" 6
> 
> Hrmpf, I'll have a look :-/  Thanks for noticing.

A strange interaction with LIM, which only does something on 32bit, but 
not on 64bit (which is a problem to investigate on its own, but outside of 
scope).  Disabling it requires changing the testcase so that LIM isn't 
necessary for other reasons, as below.  Can you check if that fixes the 
problem on Solaris as well?  (It does for linux 32bit).


Ciao,
Michael.

testsuite/
PR middle-end/90796
* gcc.dg/unroll-and-jam.c: Disable loop-invariant motion.

--- a/gcc/testsuite/gcc.dg/unroll-and-jam.c
+++ b/gcc/testsuite/gcc.dg/unroll-and-jam.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -floop-unroll-and-jam --param unroll-jam-min-percent=0 
-fdump-tree-unrolljam-details" } */
+/* { dg-options "-O3 -floop-unroll-and-jam -fno-tree-loop-im --param 
unroll-jam-min-percent=0 -fdump-tree-unrolljam-details" } */
 /* { dg-require-effective-target int32plus } */
 
 #include 
@@ -31,10 +31,10 @@ void checkb(void)
   //printf("  %d\n", sum);
 }
 
-unsigned i, j;
 #define TEST(name, body, test) \
 static void __attribute__((noinline,noclone)) name (unsigned long n, unsigned 
long m) \
 { \
+  unsigned i, j; \
   for (i = 1; i < m; i++) { \
   for (j = 1; j < n; j++) { \
  body; \


Re: Fix PR90796

2019-10-28 Thread Rainer Orth
Hi Michael,

>> On Tue, 22 Oct 2019, Rainer Orth wrote:
>> 
>> > > testsuite/
>> > >  * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.
>> > 
>> > this testcase now FAILs on 32-bit targets (seen on i386-pc-solaris2.11
>> > and sparc-sun-solaris2.11, also reports for i686-pc-linux-gnu and
>> > i586-unknown-freebsd11.2):
>> > 
>> > +FAIL: gcc.dg/unroll-and-jam.c scan-tree-dump-times unrolljam "applying
>> > unroll and jam" 6
>> 
>> Hrmpf, I'll have a look :-/  Thanks for noticing.
>
> A strange interaction with LIM, which only does something on 32bit, but 
> not on 64bit (which is a problem to investigate on its own, but outside of 
> scope).  Disabling it requires changing the testcase so that LIM isn't 
> necessary for other reasons, as below.  Can you check if that fixes the 
> problem on Solaris as well?  (It does for linux 32bit).

I just checked: the test now PASSes on both sparc-sun-solaris2.11 and
i386-pc-solaris2.11 (32 and 64-bit each).

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix PR92252

2019-10-28 Thread Richard Biener


Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-10-28  Richard Biener  

PR tree-optimization/92252
* tree-vect-slp.c (vect_get_and_check_slp_defs): Adjust
STMT_VINFO_REDUC_IDX when swapping operands.

* gcc.dg/torture/pr92252.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 277504)
+++ gcc/tree-vect-slp.c (working copy)
@@ -563,6 +563,10 @@ again:
{
  swap_ssa_operands (stmt, gimple_assign_rhs2_ptr (stmt),
 gimple_assign_rhs3_ptr (stmt));
+ if (STMT_VINFO_REDUC_IDX (stmt_info) == 1)
+   STMT_VINFO_REDUC_IDX (stmt_info) = 2;
+ else if (STMT_VINFO_REDUC_IDX (stmt_info) == 2)
+   STMT_VINFO_REDUC_IDX (stmt_info) = 1;
  bool honor_nans = HONOR_NANS (TREE_OPERAND (cond, 0));
  code = invert_tree_comparison (TREE_CODE (cond), honor_nans);
  gcc_assert (code != ERROR_MARK);
Index: gcc/testsuite/gcc.dg/torture/pr92252.c
===
--- gcc/testsuite/gcc.dg/torture/pr92252.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92252.c  (working copy)
@@ -0,0 +1,23 @@
+/* { do-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+long int ar;
+int dt;
+
+long int
+pc (unsigned long int q3, int zw)
+{
+  long int em = 0;
+
+  while (zw < 1)
+{
+  q3 = zw * 2ul;
+  if (q3 != 0)
+for (ar = 0; ar < 2; ++ar)
+  em = dt;
+
+  ++zw;
+}
+
+  return em;
+}


[PATCH] Fix PR92249

2019-10-28 Thread Richard Biener


This avoids segfaulting in the GIMPLE parser on invalid input.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-10-28  Richard Biener  

PR c/92249
* gimple-parser.c (c_parser_parse_gimple_body): Make
current_bb the entry block initially to easier recover
from errors.
(c_parser_gimple_compound_statement): Adjust.

Index: gcc/c/gimple-parser.c
===
--- gcc/c/gimple-parser.c   (revision 277504)
+++ gcc/c/gimple-parser.c   (working copy)
@@ -235,6 +235,7 @@ c_parser_parse_gimple_body (c_parser *cp
   /* We have at least cdil_gimple_cfg.  */
   gimple_register_cfg_hooks ();
   init_empty_tree_cfg ();
+  parser.current_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
   /* Initialize the bare loop structure - we are going to only
  mark headers and leave the rest to fixup.  */
   set_loops_for_fn (cfun, ggc_cleared_alloc ());
@@ -594,7 +595,7 @@ c_parser_gimple_compound_statement (gimp
  if (last_basic_block_for_fn (cfun) <= index)
last_basic_block_for_fn (cfun) = index + 1;
  n_basic_blocks_for_fn (cfun)++;
- if (!parser.current_bb)
+ if (parser.current_bb->index == ENTRY_BLOCK)
parser.push_edge (ENTRY_BLOCK, bb->index, EDGE_FALLTHRU,
  profile_probability::always ());
 


Re: Fix PR90796

2019-10-28 Thread Michael Matz
Hello,

On Mon, 28 Oct 2019, Rainer Orth wrote:

> >> > +FAIL: gcc.dg/unroll-and-jam.c scan-tree-dump-times unrolljam "applying
> >> > unroll and jam" 6
> >> 
> >> Hrmpf, I'll have a look :-/  Thanks for noticing.
> >
> > A strange interaction with LIM, which only does something on 32bit, but 
> > not on 64bit (which is a problem to investigate on its own, but outside of 
> > scope).  Disabling it requires changing the testcase so that LIM isn't 
> > necessary for other reasons, as below.  Can you check if that fixes the 
> > problem on Solaris as well?  (It does for linux 32bit).
> 
> I just checked: the test now PASSes on both sparc-sun-solaris2.11 and
> i386-pc-solaris2.11 (32 and 64-bit each).

Thank you.  Committed as r277508 now.


Ciao,
Michael.


[PATCH, i386]: Remove a couple of operand modifiers

2019-10-28 Thread Uros Bizjak
These are not needed for scalar operands.

2019-10-28  Uroš Bizjak  

* config/i386/sse.md (sse_cvtss2si_2):
Remove %k operand modifier.
(*vec_extractv2df_1_sse): Remove %q operand modifier.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline.

Uros.
Index: sse.md
===
--- sse.md  (revision 277503)
+++ sse.md  (working copy)
@@ -5324,7 +5324,7 @@
(unspec:SWI48 [(match_operand:SF 1 "nonimmediate_operand" "v,m")]
  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{%1, %0|%0, %1}"
   [(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,vector")
(set_attr "amdfam10_decode" "double,double")
@@ -10147,7 +10147,7 @@
   "!TARGET_SSE2 && TARGET_SSE
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
-   movhps\t{%1, %0|%q0, %1}
+   movhps\t{%1, %0|%0, %1}
movhlps\t{%1, %0|%0, %1}
movlps\t{%H1, %0|%0, %H1}"
   [(set_attr "type" "ssemov")


[PATCH, i386]: Fix REDUC_SSE_SMINMAX_MODE mode iterator

2019-10-28 Thread Uros Bizjak
2019-10-28  Uroš Bizjak  

PR target/92225
* config/i386/sse.md (REDUC_SSE_SMINMAX_MODE): Use TARGET_SSE4_2
condition for V2DImode.

testsuite/ChangeLog:

2019-10-28  Uroš Bizjak  

PR target/92225
* gcc.target/i386/pr92225.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to other release branches.

Uros.
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 277509)
+++ config/i386/sse.md  (working copy)
@@ -2825,7 +2825,7 @@
 ;; Modes handled by reduc_sm{in,ax}* patterns.
 (define_mode_iterator REDUC_SSE_SMINMAX_MODE
   [(V4SF "TARGET_SSE") (V2DF "TARGET_SSE")
-   (V2DI "TARGET_SSE") (V4SI "TARGET_SSE") (V8HI "TARGET_SSE")
+   (V2DI "TARGET_SSE4_2") (V4SI "TARGET_SSE") (V8HI "TARGET_SSE")
(V16QI "TARGET_SSE")])
 
 (define_expand "reduc__scal_"
Index: testsuite/gcc.target/i386/pr92225.c
===
--- testsuite/gcc.target/i386/pr92225.c (nonexistent)
+++ testsuite/gcc.target/i386/pr92225.c (working copy)
@@ -0,0 +1,19 @@
+/* PR target/92225 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse4" } */
+
+void a (long);
+
+unsigned *b;
+
+void
+c ()
+{
+  long d = 2;
+  int e = 0;
+  
+  for (; e < 1024; e++)
+if (b[e] > d)
+  d = b[e];
+  a (d);
+}


Re: [C++ PATCH] PR c++/90875 - added -Wswitch-outside-range option.

2019-10-28 Thread Marek Polacek
On Sun, Oct 27, 2019 at 08:13:52AM +0100, Gerald Pfeifer wrote:
> On Fri, 21 Jun 2019, Marek Polacek wrote:
> >> 2019-06-20  Matthew Beliveau  
> >> 
> >>PR c++/90875 - added -Wswitch-outside-range option
> >>* doc/invoke.texi (Wswitch-outside-range): Document.
> 
> I noticed this is not yet covered in the GCC 10 release notes at
> https://gcc.gnu.org/gcc-10/changes.html .
> 
> https://gcc.gnu.org/about.html has some information how to go about
> our web pages (in GIT now, the CVS era is over ;-), and I'm happy to
> help if you have any questions.

Note the warning isn't new, just the option.

Marek



Re: [PATCH] Use STMT_VINFO_REDUC_IDX instead of recomputing it

2019-10-28 Thread Richard Biener
On Fri, 25 Oct 2019, Richard Biener wrote:

> 
> This is a cleanup.  The cond-reduction restriction can go,
> the fold-left one stays (it cannot handle more than one stmt in
> the cycle - in the future when we get partial loop vectorization
> generic code would handle duplicating of scalar code parts, they'd
> simply stay single-lane SLP graph parts).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

The following variant is what I have applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2019-10-28  Richard Biener  

* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
STMT_VINFO_REDUC_IDX from the actual stmt.
(vect_transform_reduction): Likewise.
(vectorizable_reduction): Compute the reduction chain length,
do not recompute the reduction operand index.  Remove no longer
necessary restriction for condition reduction chains.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277504)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4263,9 +4263,9 @@ vect_create_epilog_for_reduction (stmt_v
 (CCOMPARE).  The then and else values mirror the main VEC_COND_EXPR:
 the reduction phi corresponds to NEW_PHI_TREE and the new values
 correspond to INDEX_BEFORE_INCR.  */
-  gcc_assert (STMT_VINFO_REDUC_IDX (reduc_info) >= 1);
+  gcc_assert (STMT_VINFO_REDUC_IDX (stmt_info) >= 1);
   tree index_cond_expr;
-  if (STMT_VINFO_REDUC_IDX (reduc_info) == 2)
+  if (STMT_VINFO_REDUC_IDX (stmt_info) == 2)
index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
  ccompare, indx_before_incr, new_phi_tree);
   else
@@ -5720,19 +5720,24 @@ vectorizable_reduction (stmt_vec_info st
   gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
 
-  /* Verify following REDUC_IDX from the latch def leads us back to the PHI.  
*/
+  /* Verify following REDUC_IDX from the latch def leads us back to the PHI
+ and compute the reduction chain length.  */
   tree reduc_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
  loop_latch_edge (loop));
+  unsigned reduc_chain_length = 0;
+  bool only_slp_reduc_chain = true;
   while (reduc_def != PHI_RESULT (reduc_def_phi))
 {
   stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
   def = vect_stmt_to_vectorize (def);
   gcc_assert (STMT_VINFO_REDUC_IDX (def) != -1);
+  if (!REDUC_GROUP_FIRST_ELEMENT (def))
+   only_slp_reduc_chain = false;
   reduc_def = gimple_op (def->stmt, 1 + STMT_VINFO_REDUC_IDX (def));
+  reduc_chain_length++;
 }
 
   reduc_def = PHI_RESULT (reduc_def_phi);
-  int reduc_index = -1;
   for (i = 0; i < op_type; i++)
 {
   tree op = gimple_op (stmt, i + 1);
@@ -5753,7 +5758,6 @@ vectorizable_reduction (stmt_vec_info st
   if ((dt == vect_reduction_def || dt == vect_nested_cycle)
  && op == reduc_def)
{
- reduc_index = i;
  continue;
}
 
@@ -5792,10 +5796,6 @@ vectorizable_reduction (stmt_vec_info st
   if (!vectype_in)
 vectype_in = vectype_out;
   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
-  /* For the SSA cycle we store on each participating stmt the operand index
- where the cycle continues.  Store the one relevant for the actual
- operation in the reduction meta.  */
-  STMT_VINFO_REDUC_IDX (reduc_info) = reduc_index;
 
   enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
   STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
@@ -5805,28 +5805,8 @@ vectorizable_reduction (stmt_vec_info st
   if (slp_node)
return false;
 
-  /* TODO: We can't yet handle reduction chains, since we need to treat
-each COND_EXPR in the chain specially, not just the last one.
-E.g. for:
-
-   x_1 = PHI 
-   x_2 = a_2 ? ... : x_1;
-   x_3 = a_3 ? ... : x_2;
-
-we're interested in the last element in x_3 for which a_2 || a_3
-is true, whereas the current reduction chain handling would
-vectorize x_2 as a normal VEC_COND_EXPR and only treat x_3
-as a reduction operation.  */
-  if (reduc_index == -1)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"conditional reduction chains not supported\n");
- return false;
-   }
-
   /* When the condition uses the reduction value in the condition, fail.  
*/
-  if (reduc_index == 0)
+  if (STMT_VINFO_REDUC_IDX (stmt_info) == 0)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5995,17 +5975,18 @@ vectorizable_reduction (stmt_vec_info st
 outer-loop vectorization is safe.  */
   if (needs_fold_le

[C++ PATCH] simplify deferred parsing lexer

2019-10-28 Thread Nathan Sidwell
We use an eof_token global variable as a sentinel on a deferred parse 
(such as in-class function definitions, or default args).  This 
complicates retrieving the next token in certain places.


As such deferred parses always nest properly and completely before 
resuming the outer lexer, we can simply morph the token after the 
deferred buffer into a CPP_EOF token and restore it afterwards.  I 
finally got around to implementing it with this patch.


One complication is that we have to change the discriminator for when 
the token's value is a tree.  We can't look at the token's type because 
it might have been overwritten.  I add a bool flag to the token (there's 
several spare bits), and use that.  This does simplify the discriminator 
because we just check a single bit, rather than a set of token types.


I think this will allow us to report the location of EOF tokens -- 
currently we think they have no location, because they're (nearly) 
always the shared eof_token.  that's a later change, as it could change 
some diagnostic locations in the testsuite [to be better].


applying to trunk.

nathan
--
Nathan Sidwell
2019-10-28  Nathan Sidwell  

	* parser.h (struct cp_token): Drop {ENUM,BOOL}_BITFIELD C-ism.
	Add tree_check_p flag, use as nested union discriminator.
	(struct cp_lexer): Add saved_type & saved_keyword fields.
	* parser.c (eof_token): Delete.
	(cp_lexer_new_main): Always init last_token to last token of
	buffer.
	(cp_lexer_new_from_tokens): Overlay EOF token at end of range.
	(cp_lexer_destroy): Restore token under the EOF.
	(cp_lexer_previous_token_position): No check for eof_token here.
	(cp_lexer_get_preprocessor_token): Clear tree_check_p.
	(cp_lexer_peek_nth_token): Check CPP_EOF not eof_token.
	(cp_lexer_consume_token): Assert not CPP_EOF, no check for
	eof_token.
	(cp_lexer_purge_token): Likewise.
	(cp_lexer_purge_tokens_after): No check for EOF token.
	(cp_parser_nested_name_specifier, cp_parser_decltype)
	(cp_parser_template_id): Set tree_check_p.

Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c	(revision 277460)
+++ gcc/cp/parser.c	(working copy)
@@ -53,9 +53,4 @@ along with GCC; see the file COPYING3.
and c-lex.c) and the C++ parser.  */
 
-static cp_token eof_token =
-{
-  CPP_EOF, RID_MAX, 0, false, false, false, 0, { NULL }
-};
-
 /* The various kinds of non integral constant we encounter. */
 enum non_integral_constant {
@@ -661,10 +656,8 @@ cp_lexer_new_main (void)
 }
 
-  lexer->last_token = lexer->buffer->address ()
+  lexer->next_token = lexer->buffer->address ();
+  lexer->last_token = lexer->next_token
   + lexer->buffer->length ()
 		  - 1;
-  lexer->next_token = lexer->buffer->length ()
-		  ? lexer->buffer->address ()
-		  : &eof_token;
 
   /* Subsequent preprocessor diagnostics should use compiler
@@ -688,5 +681,12 @@ cp_lexer_new_from_tokens (cp_token_cache
   /* We do not own the buffer.  */
   lexer->buffer = NULL;
-  lexer->next_token = first == last ? &eof_token : first;
+
+  /* Insert an EOF token.  */
+  lexer->saved_type = last->type;
+  lexer->saved_keyword = last->keyword;
+  last->type = CPP_EOF;
+  last->keyword = RID_MAX;
+
+  lexer->next_token = first;
   lexer->last_token = last;
 
@@ -705,5 +705,12 @@ static void
 cp_lexer_destroy (cp_lexer *lexer)
 {
-  vec_free (lexer->buffer);
+  if (lexer->buffer)
+vec_free (lexer->buffer);
+  else
+{
+  /* Restore the token we overwrite with EOF.  */
+  lexer->last_token->type = lexer->saved_type;
+  lexer->last_token->keyword = lexer->saved_keyword;
+}
   lexer->saved_tokens.release ();
   ggc_free (lexer);
@@ -732,6 +739,4 @@ static inline cp_token_position
 cp_lexer_token_position (cp_lexer *lexer, bool previous_p)
 {
-  gcc_assert (!previous_p || lexer->next_token != &eof_token);
-
   return lexer->next_token - previous_p;
 }
@@ -752,8 +757,5 @@ static inline cp_token_position
 cp_lexer_previous_token_position (cp_lexer *lexer)
 {
-  if (lexer->next_token == &eof_token)
-return lexer->last_token - 1;
-  else
-return cp_lexer_token_position (lexer, true);
+  return cp_lexer_token_position (lexer, true);
 }
 
@@ -808,4 +810,5 @@ cp_lexer_get_preprocessor_token (cp_lexe
   token->purged_p = false;
   token->error_reported = false;
+  token->tree_check_p = false;
 
   /* On some systems, some header files are surrounded by an
@@ -1083,14 +1086,7 @@ cp_lexer_peek_nth_token (cp_lexer* lexer
   --n;
   token = lexer->next_token;
-  gcc_assert (!n || token != &eof_token);
-  while (n != 0)
+  while (n && token->type != CPP_EOF)
 {
   ++token;
-  if (token == lexer->last_token)
-	{
-	  token = &eof_token;
-	  break;
-	}
-
   if (!token->purged_p)
 	--n;
@@ -1114,16 +1110,10 @@ cp_lexer_consume_token (cp_lexer* lexer)
   cp_token *token = lexer->next_token;
 
-  gcc_assert (token != &eof_token);
   gcc_assert (!lexer->in_pragma || token->type != CPP_PRAGMA_EOL);
 
   do

[PATCH] Fix PR92241, turn asserts into vectorization fails

2019-10-28 Thread Richard Biener


It turns out (well, I anticipated that...) that we cannot easily
update the reduction chain during pattern detection in some of
the more contrieved cases.  Instead of ICEing in this case the
following makes us give up instead.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-10-28  Richard Biener  

PR tree-optimization/92241
* tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns): When
we failed to update the reduction index do not use the pattern
stmts for the reduction chain.
(vectorizable_reduction): When the reduction chain is corrupt,
fail.
* tree-vect-patterns.c (vect_mark_pattern_stmts): Stop when we
fail to update the reduction chain.

* gcc.dg/torture/pr92241.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277513)
+++ gcc/tree-vect-loop.c(working copy)
@@ -689,13 +689,16 @@ vect_fixup_scalar_cycles_with_patterns (
stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
while (next)
  {
-   if (! STMT_VINFO_IN_PATTERN_P (next))
+   if (! STMT_VINFO_IN_PATTERN_P (next)
+   || STMT_VINFO_REDUC_IDX (STMT_VINFO_RELATED_STMT (next)) == -1)
  break;
next = REDUC_GROUP_NEXT_ELEMENT (next);
  }
-   /* If not all stmt in the chain are patterns try to handle
-  the chain without patterns.  */
-   if (! next)
+   /* If not all stmt in the chain are patterns or if we failed
+  to update STMT_VINFO_REDUC_IDX try to handle the chain
+  without patterns.  */
+   if (! next
+   && STMT_VINFO_REDUC_IDX (STMT_VINFO_RELATED_STMT (first)) != -1)
  {
vect_fixup_reduc_chain (first);
LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo)[i]
@@ -5730,7 +5745,15 @@ vectorizable_reduction (stmt_vec_info st
 {
   stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
   def = vect_stmt_to_vectorize (def);
-  gcc_assert (STMT_VINFO_REDUC_IDX (def) != -1);
+  if (STMT_VINFO_REDUC_IDX (def) == -1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"reduction chain broken by patterns.\n");
+ return false;
+   }
+  if (!REDUC_GROUP_FIRST_ELEMENT (def))
+   only_slp_reduc_chain = false;
   if (!REDUC_GROUP_FIRST_ELEMENT (def))
only_slp_reduc_chain = false;
   reduc_def = gimple_op (def->stmt, 1 + STMT_VINFO_REDUC_IDX (def));
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 277513)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -5110,6 +5110,9 @@ vect_mark_pattern_stmts (stmt_vec_info o
 for (gimple_stmt_iterator si = gsi_start (def_seq);
 !gsi_end_p (si); gsi_next (&si))
   {
+   if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "extra pattern stmt: %G", gsi_stmt (si));
stmt_vec_info pattern_stmt_info
  = vect_init_pattern_stmt (gsi_stmt (si),
orig_stmt_info, pattern_vectype);
@@ -5169,10 +5172,13 @@ vect_mark_pattern_stmts (stmt_vec_info o
found = true;
break;
  }
- if (found && s == pattern_stmt)
-   break;
  if (s == pattern_stmt)
-   gcc_unreachable ();
+   {
+ if (!found && dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"failed to update reduction index");
+ break;
+   }
  if (gsi_end_p (si))
s = pattern_stmt;
  else
Index: gcc/testsuite/gcc.dg/torture/pr92241.c
===
--- gcc/testsuite/gcc.dg/torture/pr92241.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92241.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+int a, b;
+char c[2];
+void d() {
+  char e;
+  for (; b; b--) {
+e = 0;
+for (; e <= 1; e++)
+  a &= c[b + e] && 1;
+  }
+}


Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-28 Thread Richard Biener
On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:

> 
> 
> On 22/10/2019 14:56, Richard Biener wrote:
> > On Tue, 22 Oct 2019, Andre Vieira (lists) wrote:
> > 
> >> Hi Richi,
> >>
> >> See inline responses to your comments.
> >>
> >> On 11/10/2019 13:57, Richard Biener wrote:
> >>> On Thu, 10 Oct 2019, Andre Vieira (lists) wrote:
> >>>
>  Hi,
> 
> >>
> >>>
> >>> +
> >>> +  /* Keep track of vector sizes we know we can vectorize the epilogue
> >>> with.  */
> >>> +  vector_sizes epilogue_vsizes;
> >>>};
> >>>
> >>> please don't enlarge struct loop, instead track this somewhere
> >>> in the vectorizer (in loop_vinfo?  I see you already have
> >>> epilogue_vinfos there - so the loop_vinfo simply lacks
> >>> convenient access to the vector_size?)  I don't see any
> >>> use that could be trivially adjusted to look at a loop_vinfo
> >>> member instead.
> >>
> >> Done.
> >>>
> >>> For the vect_update_inits_of_drs this means that we'd possibly
> >>> do less CSE.  Not sure if really an issue.
> >>
> >> CSE of what exactly? You are afraid we are repeating a calculation here we
> >> have done elsewhere before?
> > 
> > All uses of those inits now possibly get the expression instead of
> > just the SSA name we inserted code for once.  But as said, we'll see.
> > 
> 
> This code changed after some comments from Richard Sandiford.
> 
> > +  /* We are done vectorizing the main loop, so now we update the
> > epilogues
> > + stmt_vec_info's.  At the same time we set the gimple UID of each
> > + statement in the epilogue, as these are used to look them up in the
> > + epilogues loop_vec_info later.  We also keep track of what
> > + stmt_vec_info's have PATTERN_DEF_SEQ's and RELATED_STMT's that might
> > + need updating and we construct a mapping between variables defined
> > in
> > + the main loop and their corresponding names in epilogue.  */
> > +  for (unsigned i = 0; i < epilogue->num_nodes; ++i)
> > 
> > so for the following code I wonder if you can make use of the
> > fact that loop copying also copies UIDs, so you should be able
> > to match stmts via their UIDs and get at the other loop infos
> > stmt_info by the copy loop stmt UID.
> > 
> > I wonder why you need no modification for the SLP tree?
> > 
> I checked with Tamar and the SLP tree works with the position of operands and
> not SSA_NAMES.  So we should be fine.

There's now SLP_TREE_SCALAR_OPS but only for invariants so I guess
we should indeed be fine here.  Everything else is already
stmt_infos which you patch with the new underlying stmts.

Richard.


Re: [Patch][Fortran] OpenMP – libgomp/testsuite – use 'stop' and 'dg-do run'

2019-10-28 Thread Thomas Schwinge
Hi Tobias!

On 2019-10-25T18:17:26+0200, Tobias Burnus  wrote:
> This patch is about: libgomp/testsuite/libgomp.fortran/, only
>
> The two test cases I added recently use 'call abort()', which is 
> nowadays frowned on as that's a ventor extension. Hence, I change it to  
> 'stop'.
>
> Additionally, the 'fortran.exp' in the directory states: "For Fortran 
> we're doing torture testing, as Fortran has far more tests with arrays 
> etc. that testing just -O0 or -O2 is insufficient, that is typically not 
> the case for C/C++."
>
> The torture testing is only done if there is a "dg-do run"; without 
> dg-do, it also does run, but only with a single compile flag setting.

(It was me who suggested that task.)


> I have only selectively added it; I think 'dg-do run' does not make 
> sense for:
> * condinc*.f – only do uses '!$ include'
> * omp_cond*.f* – only tests '!$' code, including comments.
> Hence, I excluded those and only changed the others. (However, one can 
> still argue about the remaining ones – such as 'omp_hello.f' or tabs*.f*.)

But why shouldn't these still be torture tested?

Anyway -- as usual ;-) -- I'd prefer if any such rationale ("not doing
torture testing because [...]") was put into the respective testsuite
files, so that it's obvious to the next person reading that file.


> --- a/libgomp/testsuite/libgomp.fortran/strassen.f90
> +++ b/libgomp/testsuite/libgomp.fortran/strassen.f90
> @@ -0,0 +1 @@
> +! { dg-do run }
|  ! { dg-options "-O2" }

That's not a useful combination, is it?  (Just noticed this one here;
didn't verify all files suggested to be changed.)

Not sure into which direction this should be fixed: continue to just test
'-O2', or remove the '-O2' override.


> --- a/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90
> +++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90
> @@ -0,0 +1 @@
> +! { dg-do run }

> --- a/libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90
> +++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90
> @@ -0,0 +1 @@
> +! { dg-do run }

On powerpc64le-unknown-linux-gnu without offloading, I'm seeing (only)
the '-O0' execution tests FAIL for both these, with 'STOP 1' message.


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH] Fix another of the PR65930 reduction cases

2019-10-28 Thread Richard Biener


For the new testcase we end up using an indermediate value of the
reduction chain as reduction result.  This can be easily supported
by generating epilogues for (poossibly multiple) intermediate values.

For this to work the following relaxes cycle detection to allow
out-of-loop uses plus it makes sure vectorizable_live_operation
picks it up as reduction.  Finally some invariants process_use
checks are no longer true.

Nicely simple ;)

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-10-28  Richard Biener  

PR tree-optimization/65930
* tree-vect-loop.c (check_reduction_path): Relax single-use
check allowing out-of-loop uses.
(vect_is_simple_reduction): SLP reduction chains cannot have
intermediate stmts used outside of the loop.
(vect_create_epilog_for_reduction): The adjustment might need
to be converted.
(vectorizable_reduction): Annotate live stmts of the reduction
chain with STMT_VINFO_REDUC_DEF.
* tree-vect-stms.c (process_use): Remove no longer true asserts.

* gcc.dg/vect/pr65930-1.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277517)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2678,8 +2678,7 @@ pop:
 {
   gimple *use_stmt = USE_STMT (path[i].second);
   tree op = USE_FROM_PTR (path[i].second);
-  if (! has_single_use (op)
- || ! is_gimple_assign (use_stmt)
+  if (! is_gimple_assign (use_stmt)
  /* The following make sure we can compute the operand index
 easily plus it mostly disallows chaining via COND_EXPR condition
 operands.  */
@@ -2690,6 +2689,20 @@ pop:
  fail = true;
  break;
}
+  /* Check there's only a single stmt the op is used on inside
+ of the loop.  */
+  imm_use_iterator imm_iter;
+  gimple *op_use_stmt;
+  unsigned cnt = 0;
+  FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
+   if (!is_gimple_debug (op_use_stmt)
+   && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
+ cnt++;
+  if (cnt != 1)
+   {
+ fail = true;
+ break;
+   }
   enum tree_code use_code = gimple_assign_rhs_code (use_stmt);
   if (use_code == MINUS_EXPR)
{
@@ -2922,7 +2935,10 @@ vect_is_simple_reduction (loop_vec_info
   for (i = path.length () - 1; i >= 1; --i)
{
  gimple *stmt = USE_STMT (path[i].second);
- if (gimple_assign_rhs_code (stmt) != code)
+ if (gimple_assign_rhs_code (stmt) != code
+ /* We can only handle the final value in epilogue
+generation for reduction chains.  */
+ || (i != 1 && !has_single_use (gimple_assign_lhs (stmt
is_slp_reduc = false;
  stmt_vec_info stmt_info = loop_info->lookup_stmt (stmt);
  STMT_VINFO_REDUC_IDX (stmt_info)
@@ -4119,11 +4135,11 @@ vect_create_epilog_for_reduction (stmt_v
   stmt_vec_info phi_info;
   gimple_stmt_iterator exit_gsi;
   tree vec_dest;
-  tree new_temp = NULL_TREE, new_dest, new_name, new_scalar_dest;
+  tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
   gimple *exit_phi;
   tree bitsize;
-  tree expr, def;
+  tree def;
   tree orig_name, scalar_result;
   imm_use_iterator imm_iter, phi_imm_iter;
   use_operand_p use_p, phi_use_p;
@@ -5048,25 +5064,26 @@ vect_create_epilog_for_reduction (stmt_v
   if (adjustment_def)
 {
   gcc_assert (!slp_reduc);
+  gimple_seq stmts = NULL;
   if (nested_in_vect_loop)
{
   new_phi = new_phis[0];
- gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
- expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
- new_dest = vect_create_destination_var (scalar_dest, vectype);
+ gcc_assert (VECTOR_TYPE_P (TREE_TYPE (adjustment_def)));
+ adjustment_def = gimple_convert (&stmts, vectype, adjustment_def);
+ new_temp = gimple_build (&stmts, code, vectype,
+  PHI_RESULT (new_phi), adjustment_def);
}
   else
{
   new_temp = scalar_results[0];
  gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE);
- expr = build2 (code, scalar_type, new_temp, adjustment_def);
- new_dest = vect_create_destination_var (scalar_dest, scalar_type);
+ adjustment_def = gimple_convert (&stmts, scalar_type, adjustment_def);
+ new_temp = gimple_build (&stmts, code, scalar_type,
+  new_temp, adjustment_def);
}
 
-  epilog_stmt = gimple_build_assign (new_dest, expr);
-  new_temp = make_ssa_name (new_dest, epilog_stmt);
-  gimple_assign_set_lhs (epilog_stmt, new_temp);
-  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+  epil

Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-28 Thread Richard Biener
On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> This is the reworked patch after your comments.
> 
> I have moved the epilogue check into the analysis form disguised under
> '!epilogue_vinfos.is_empty ()'.  This because I realized that I am doing the
> "lowest threshold" check there.
> 
> The only place where we may reject an epilogue_vinfo is when we know the
> number of scalar iterations and we realize the number of iterations left after
> the main loop are not enough to enter the vectorized epilogue so we optimize
> away that code-gen.  The only way we know this to be true is if the number of
> scalar iterations are known and the peeling for alignment is known. So we know
> we will enter the main loop regardless, so whether the threshold we use is for
> a lower VF or not it shouldn't matter as much, I would even like to think that
> check isn't done, but I am not sure... Might be worth checking as an
> optimization.
> 
> 
> Is this OK for trunk?

+  for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
+  !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
+   {
..
+ if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
+   pattern_worklist.safe_push (stmt_vinfo);
+
+ related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+ while (related_vinfo && related_vinfo != stmt_vinfo)
+   {

I think PHIs cannot have patterns.  You can assert
that STMT_VINFO_RELATED_STMT is NULL I think.

+ related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+ while (related_vinfo && related_vinfo != stmt_vinfo)
+   {
+ related_worklist.safe_push (related_vinfo);
+ /* Set BB such that the assert in
+   'get_initial_def_for_reduction' is able to determine that
+   the BB of the related stmt is inside this loop.  */
+ gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
+gimple_bb (new_stmt));
+ related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
+   }

do we really keep references to "nested" patterns?  Thus, do you
need this loop?

+  /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
+ original main loop and thus need to be updated to refer to the 
cloned
+ variables used in the epilogue.  */
+  for (unsigned i = 0; i < pattern_worklist.length (); ++i)
+{
...
+ op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
+&find_in_mapping, &mapping);
+ gimple_set_op (seq, j, op);

you do this for the pattern-def seq but not for the related one.
I guess you ran into this for COND_EXPR conditions.  I wondered
to use a shared worklist for both the def-seq and the main pattern
stmt or at least to split out the replacement so you can share it.

+  /* Data references for gather loads and scatter stores do not use 
the
+updated offset we set using ADVANCE.  Instead we have to make 
sure the
+reference in the data references point to the corresponding copy 
of
+the original in the epilogue.  */
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
+   {
+ int j;
+ if (TREE_CODE (DR_REF (dr)) == MEM_REF)
+   j = 0;
+ else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF)
+   j = 1;
+ else
+   gcc_unreachable ();
+
+ if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j)))
+   {
+ DR_REF (dr) = unshare_expr (DR_REF (dr));
+ TREE_OPERAND (DR_REF (dr), j) = *new_op;
+   }

huh, do you really only ever see MEM_REF or ARRAY_REF here?
I would guess using simplify_replace_tree is safer.
There's also DR_BASE_ADDRESS - we seem to leave the DRs partially
updated, is that correct?

Otherwise looks OK to me.

Thanks,
Richard.


> gcc/ChangeLog:
> 2019-10-25  Andre Vieira  
> 
> PR 88915
> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
> and make the valueize function pointer also take a void pointer.
> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> around vn_valueize, to call it without a context.
> (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> (~_loop_vec_info): Release epilogue_vinfos.
> (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> number of iterations of epilogue.
> (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
> versioning threshold needed for main loop.
> (vect_analyze_loop): Likewise.
> (find_in_mapping): New helper function.
> (update_epilogue_loop_vinfo): New function.
> (vect_transform_loop): When vectorizing epilogues re-use analys

Re: PR92163

2019-10-28 Thread Richard Biener
On Fri, Oct 25, 2019 at 9:58 PM Prathamesh Kulkarni
 wrote:
>
> On Fri, 25 Oct 2019 at 13:19, Richard Biener  
> wrote:
> >
> > On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > Hi,
> > > The attached patch tries to fix PR92163 by calling
> > > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> > > Does it look OK ?
> >
> > Hmm.  I think it shows an issue with the return value of 
> > remove_stmt_form_eh_lp
> > which is true if the LP index is -1 (externally throwing).  We don't
> > need to purge
> > any edges in that case.  That is, if-conversion should never need to
> > do EH purging
> > since that would be wrong-code.
> >
> > As of the segfault can you please instead either pass down need_eh_cleanup
> > as function parameter (and NULL from ifcvt) or use the return value in DSE
> > to set the bit in the caller.
> Hi Richard,
> Thanks for the suggestions, does the attached patch look OK ?
> Bootstrap+test in progress on x86_64-unknown-linux-gnu.

OK.

Richard.

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Prathamesh


Re: [SVE] PR91272

2019-10-28 Thread Prathamesh Kulkarni
On Sun, 27 Oct 2019 at 06:08, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > @@ -10288,6 +10261,23 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> > vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > vec_compare = vec_compare_name;
> >   }
> > +
> > +   if (masks)
> > + {
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   tree loop_mask
> > + = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
> > +   tree tmp2 = make_ssa_name (vec_cmp_type);
> > +   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> > + vec_compare, loop_mask);
>
> Nit: misindented line.
>
> OK with that change, thanks.
Thanks, committed in r277524 after bootstrap+test on
x86_64-unknown-linux-gnu and aarch64-linux-gnu.

Thanks,
Prathamesh
>
> Richard


Re: [PATCH] PR fortran/92178 -- Re-order argument deallocation

2019-10-28 Thread Tobias Burnus

On 10/23/19 8:12 PM, Steve Kargl wrote:

* trans-expr.c (gfc_conv_procedure_call): Evaluate args and then
deallocate actual args assocated with intent(out) dummies.


I think the patch by itself looks fine to me – except that the 
saw_dealloc is not needed. You can either check "if (dealloc_blk->head)" 
or you can use gfc_add_block_to_block unconditionally as it handles 
NULL_TREE.


However, the following test case shows that expressions which can be 
transferred into a tree (se->expr) without needing more evaluations and 
a temporary (i.e. evaluating things in se->pre) do not work. – The 
allocated(a) check is really artificial, however, the test() call looks 
as if it might appear in the real world. First the dump:


    foo ((integer(kind=4)[0:] * restrict) a.data != 0B, 
(integer(kind=4)) MAX_EXPR <(D.3958->dim[0].ubound - 
D.3958->dim[0].lbound) + 1, 0>, test ((integer(kind=4)[0:] * restrict) 
a.data), &a);


And then the test case:

implicit none (type, external)
integer, allocatable :: a(:)
a = [1, 2]
call foo(allocated(a), size(a), test(a), a)
contains
subroutine foo(alloc, sz, tst, x)
  logical, value :: alloc, tst
  integer, value :: sz
  integer, allocatable, intent(out) :: x(:)
  if (allocated(x)) stop 1
  if (.not.alloc) stop 2
  if (sz /= 2) stop 3
  if (.not. tst) stop 4
end subroutine foo
logical function test(zz)
  integer :: zz(2)
  test = zz(2) == 2
end function test
end

Hence, I wonder whether one needs to do (pseudo code):

if (any dummy argument is allocatable + intent-out)
  force_func_eval = true
if (actual is an expression + force_func_eval)
  parmse->expr =  gfc_evaluate_now (parmse->expr, &parmse)

Such that one uses a temporary variable for those, but keeps the status quo for 
the rest.


Note, in gfc_conv_procedure_call() there are 3 blocks of
code that deal with the deallocation of actual arguments
assocated with intent(out) dummy arguments.  The patch
affects the first and third blocks.  The 2nd block, lines
6071-6111, concerns CLASS and finalization.  I use neither,
so have no idea what Fortran requires.  More importantly,
I have very little understanding of gfortran's internal
implementation for CLASS and finalization.  Someone who
cares about CLASS and finalization will need to consider
how to possibly fix a possible issue.


I wonder how to test for it. I tried to create a test case 
(pr92178-3.f90) but as it turns out, the deallocation happens (via 
zz->_vptr->_final) directly in the called function and not in the callee.


For this one, I was playing with the attached patch – but if one cannot 
trigger it, it might not be needed.


I have also created another test case pr92178-2.f90 which essentially 
does what pr92178.f90 already did (nearly same code path, covered by 
your patch).



The question is how to proceed from here.

Tobias

! { dg-do run }
!
! PR fortran/92178
program foo
   implicit none (type, external)

   type t0
 integer, allocatable :: X0
   end type t0

   type, extends(t0) :: t
   end type t

   type, extends(t) :: t2
   end type t2

   type(t2) :: x2
   class(t), allocatable :: aa(:)

   allocate(t2 :: aa(1))
   allocate(aa(1)%x0)
   contains
  subroutine caller(xx)
 class(t) :: xx(:)
 if (.not. allocated(xx(1)%x0)) stop 10
 if (.not. same_type_as(xx, x2)) stop 11
 call check_intentout(allocated(xx(1)%x0), same_type_as(xx, x2), xx, &
  allocated(xx(1)%x0), same_type_as(xx, x2))
  end subroutine caller
  subroutine check_intentout(alloc1, same1, zz, alloc2, same2)
 logical, value :: alloc1, alloc2, same1, same2
 class(t0), intent(out) :: zz(:)
 if (allocated(zz(1)%x0)) stop 1
 if (.not.alloc1) stop 2
 if (.not.alloc2) stop 3
 if (.not.same1) stop 4
 if (.not.same2) stop 5
  end subroutine
end program
! { dg-do run }
!
! PR fortran/92178
program foo
   implicit none (type, external)

   type t
   end type t

   type, extends(t) :: t2
   end type t2

   type(t2) :: x2
   class(t), allocatable :: aa

   call check_intentout_false(allocated(aa), aa, &
  allocated(aa))
   if (allocated(aa)) stop 1

   allocate(t2 :: aa)
   if (.not.allocated(aa)) stop 2
   if (.not.same_type_as(aa, x2)) stop 3
   call check_intentout_true(allocated(aa), (same_type_as(aa, x2)), aa, &
  allocated(aa), (same_type_as(aa, x2)))
   if (allocated(aa)) stop 4

   contains
  subroutine check_intentout_false(alloc1, yy, alloc2)
 logical, value :: alloc1, alloc2
 class(t), allocatable, intent(out) :: yy
 if (allocated(yy)) stop 11
 if (alloc1) stop 12
 if (alloc2) stop 13
  end subroutine
  subroutine check_intentout_true(alloc1, same1, zz, alloc2, same2)
 logical, value :: alloc1, alloc2, same1, same2
 class(t), allocatable, intent(out) :: zz
 if (allocated(zz)) stop 21
 if (.not.alloc1) stop

Re: PR92163

2019-10-28 Thread Prathamesh Kulkarni
On Mon, 28 Oct 2019 at 07:18, Richard Biener  wrote:
>
> On Fri, Oct 25, 2019 at 9:58 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Fri, 25 Oct 2019 at 13:19, Richard Biener  
> > wrote:
> > >
> > > On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > Hi,
> > > > The attached patch tries to fix PR92163 by calling
> > > > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> > > > Does it look OK ?
> > >
> > > Hmm.  I think it shows an issue with the return value of 
> > > remove_stmt_form_eh_lp
> > > which is true if the LP index is -1 (externally throwing).  We don't
> > > need to purge
> > > any edges in that case.  That is, if-conversion should never need to
> > > do EH purging
> > > since that would be wrong-code.
> > >
> > > As of the segfault can you please instead either pass down need_eh_cleanup
> > > as function parameter (and NULL from ifcvt) or use the return value in DSE
> > > to set the bit in the caller.
> > Hi Richard,
> > Thanks for the suggestions, does the attached patch look OK ?
> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>
> OK.
Thanks, committed to trunk in r277525 after bootstrap+test on
x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh


Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi Jeff,

> Has this been bootstrapped and regression tested?

Yes, it bootstraps OK of course. I ran regression over the weekend, there
are a few minor regressions in lto due to relying on tentative definitions
and a few latent bugs. I'd expect there will be a few similar failures on
other targets but nothing major since few testcases rely on -fcommon.
The big question is how it affects the distros.

Wilco





Re: Add a simulate_builin_function_decl langhook

2019-10-28 Thread Richard Sandiford
Jeff Law  writes:
> On 10/5/19 5:29 AM, Richard Sandiford wrote:
>> 
>> Sure.  This message is going to go to the other extreme, sorry, but I'm
>> not sure which part will be the most convincing (if any).
> No worries.  Worst case going to the other extreme is I have to read it
> more than once after nodding off halfway through :-)

:-)

> Seriously though.  THanks for the more expansive write up.
>
>> I guess I should start by saying that SVE intrinsics have three types of
>> predication (zeroing, merging, and "don't care") that combine with the usual
>> type suffixes seen in arm_neon.h.  This gives 3,736 functions for baseline
>> SVE (SVE2 adds more).
> Ugh.  That'd be a lot of stuff to add into a header.  As you noted
> later, there's significant costs to doing so.
>
>> The vast majority of those functions can't be open-coded using the
>> core parts of C and C++, even with GNU extensions.  Some could perhaps
>> be coded using new library extensions, but that just shifts the question
>> from "how do we implement this feature in arm_sve.h?" to "how we implement
>> this feature in the library extension?".
> True.  But one could ask if those extensions are something that we're
> likely to need beyond what you're doing now.  But I don't necessarily
> think that would override the desire not to have so much stuff in the
> header.
>
> I'm guessing that even though you can't describe what you need at the
> C/C++ level, but you can describe at least some of what you want in
> gimple/rtl?  Otherwise I'm not sure what you get above and beyond asms.

Yeah, some of the simpler stuff, like building length-agnostic constants,
can be represented directly in gimple.  Some other things map directly
to internal functions like IFN_MASK_LOAD and IFN_MASK_STORE.  The vast
majority of the functions still needs to be a built-in function during
gimple though, e.g. so that we don't "lose" a necessary or useful
predicate argument.

The situation's similar in rtl, with a lot of stuff mapping to unspecs.

>> So that leaves us using built-in functions for almost all of those 3,736
>> functions.  With the traditional approach, the target would need to
>> register the functions at start-up and then define the header file in
>> terms of them.
> Yup.  And there's a cost you missed -- those things tend to end up in
> the debugging output as well.  That's caused problems with system tools
> in the past.

Ah, yeah, that's true.  And for the C overloading I mentioned, the
traditional approach would still need to use _Generic-based macros for
some cases, which would also show up if macro debug output is enabled.
And the macros wouldn't be useful for debugging, since what they expand
to wouldn't be callable interactively.

>> There are two approaches to doing that.  One is to define the built-in
>> functions under their header file name so that they become active once
>> a prototype is seen.  But that's only appropriate for functions like
>> printf that have linkage.  The arm_sve.h functions don't have linkage
>> and there's a chance that non-SVE code could be using the same names for
>> something else (perhaps even with the same prototype, in the case of
>> things like
>> 
>>   uint64_t svcntb (void);
>> 
>> that don't mention SVE types).
>> 
>> The other alternative is to define builtins in the "__builtin_"
>> namespace and wrap them in inline wrappers, which I think is what
>> most intrinsics header files do.  E.g., from arm_neon.h:
>> 
>> __extension__ extern __inline float64x2_t
>> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> vabsq_f64 (float64x2_t __a)
>> {
>>   return __builtin_aarch64_absv2df (__a);
>> }
>> 
>> But that's 6 lines per function.  Counting a blank line between
>> each one, we'd end up with a header file of at least 26,151 lines.
>> (OK, so arm_neon.h is already longer than that.  But with SVE2 and with
>> other considerations mentioned below, arm_sve.h could easily end up into
>> 6 figures this way.)
> Yea, that's kind of what I would expect for exposing this stuff.  But I
> didn't realize how many of these you were going to need.
>
> [ ... ]
>
>
>> 
>> It's very hard to maintain header files like that by hand without
>> introducing errors, such as forgetting to put the arguments safely
>> in the implementation namespace ("__a" rather than "a" etc.).  Kyrill
>> fixed some arm_neon.h instances of this the other week.  And using macros
>> to handle common patterns just makes the error messages worse, since the
>> macros then show up in error backtraces.
> Yup.  We've seen this across multiple targets.  Y'all aren't alone here.
>
>> 
>> An alternative to maintaining the header file by hand is to generate
>> it via a script.  Ideally the script would use the same metadata as
>> the compiler itself uses when registering the built-in functions.
>> But this means writing two pieces of code to process the metadata,
>> one to generate text for the inline wrappers and one to register the
>> built-ins.  An

Re: [PATCH 2/2] gcc/riscv: Add a mechanism to remove some calls to _riscv_save_0

2019-10-28 Thread Andrew Burgess
* Jim Wilson  [2019-10-22 16:34:53 -0700]:

> On Mon, Oct 21, 2019 at 5:26 AM Andrew Burgess
>  wrote:
> > Below is a new versions of this patch, I believe that this addresses
> > the review comments from the earlier version.  In addition this has
> > been tested using Jim's idea of forcing -msave-restore (if the flag is
> > not otherwise given) and I now see no test failures for newlib and
> > linux toolchains.
> >
> > One thing that has been mentioned briefly was adding a flag to control
> > just this optimisation, I haven't done that yet, but can if that is
> > required - obviously this optimisation doesn't do anything if
> > -msave-restore is turned off, so that is always an option.  With no
> > test failures I don't know if a separate flag is required.
> 
> I can live without the flag to control it, since as you mention
> -msave-restore will turn it off.
> 
> I'm seeing failures with the new save-restore-4.c testcase for 32-bit
> targets.  It works for 64-bit but not for 32-bit, because the
> sign-extension it is checking for only happens for 64-bit targets.
> The testcase should check for a target of riscv64*-*-* or else hard
> code -march=rv64gc etc options.  Either fix seems reasonable.
> 
> You fixed some British spellings of optimise to optimize, but there
> are two new comments that add two more uses of optimise which is
> inconsistent.
> 
> I also noticed a "useage" that should be "usage".
> 
> Otherwise this looks good to me.  It is OK to check in with the above
> minor issues fixed.

Jim,

Thanks for all your help reviewing this patch.

I've now merged this with fixes for the issues you identified above.
Let me know if you run into any problems.

Thanks,
Andrew


Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-28 Thread Segher Boessenkool
On Mon, Oct 28, 2019 at 09:08:04AM +0100, Richard Biener wrote:
> On Fri, 25 Oct 2019, Jiufu Guo wrote:
> > On powerpc64le, for O2 , enable -funroll-loops and limit
> > PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
> > overall improvement for SPEC2017.
> 
> Note the behavior with LTO will be surprising (and generally when
> functions are compiled with different -O level via the optimize
> attribute).  It's a bad idea to base --param values on the value
> of a global optimization option that can be set per function
> (see PR92046).
> 
> A better patch would have been to adjust via the target hooks for
> unroll adjust.

So we should add target hooks for all params, to have them work sanely
with LTO?

What makes params special, different from normal command line options?


Segher


Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-28 Thread Andre Vieira (lists)

Hi,

Reworked according to your comments, see inline for clarification.

Is this OK for trunk?

gcc/ChangeLog:
2019-10-28  Andre Vieira  

PR 88915
* tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
and make the valueize function pointer also take a void pointer.
* gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
around vn_valueize, to call it without a context.
(process_bb): Use vn_valueize_wrapper instead of vn_valueize.
* tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
(~_loop_vec_info): Release epilogue_vinfos.
(vect_analyze_loop_costing): Use knowledge of main VF to estimate
number of iterations of epilogue.
(vect_analyze_loop_2): Adapt to analyse main loop for all supported
vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
versioning threshold needed for main loop.
(vect_analyze_loop): Likewise.
(find_in_mapping): New helper function.
(update_epilogue_loop_vinfo): New function.
(vect_transform_loop): When vectorizing epilogues re-use analysis done
on main loop and call update_epilogue_loop_vinfo to update it.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
stmts on loop preheader edge.
(vect_do_peeling): Enable skip-vectors when doing loop versioning if
we decided to vectorize epilogues.  Update epilogues NITERS and
construct ADVANCE to update epilogues data references where needed.
* tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
(vect_do_peeling, vect_update_inits_of_drs,
 determine_peel_for_niter, vect_analyze_loop): Add or update 
declarations.

* tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
created loop_vec_info's for epilogues when available.  Otherwise 
analyse

epilogue separately.



Cheers,
Andre

On 28/10/2019 14:16, Richard Biener wrote:

On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:


Hi,

This is the reworked patch after your comments.

I have moved the epilogue check into the analysis form disguised under
'!epilogue_vinfos.is_empty ()'.  This because I realized that I am doing the
"lowest threshold" check there.

The only place where we may reject an epilogue_vinfo is when we know the
number of scalar iterations and we realize the number of iterations left after
the main loop are not enough to enter the vectorized epilogue so we optimize
away that code-gen.  The only way we know this to be true is if the number of
scalar iterations are known and the peeling for alignment is known. So we know
we will enter the main loop regardless, so whether the threshold we use is for
a lower VF or not it shouldn't matter as much, I would even like to think that
check isn't done, but I am not sure... Might be worth checking as an
optimization.


Is this OK for trunk?


+  for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
+  !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
+   {
..
+ if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
+   pattern_worklist.safe_push (stmt_vinfo);
+
+ related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+ while (related_vinfo && related_vinfo != stmt_vinfo)
+   {

I think PHIs cannot have patterns.  You can assert
that STMT_VINFO_RELATED_STMT is NULL I think.


Done.


+ related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+ while (related_vinfo && related_vinfo != stmt_vinfo)
+   {
+ related_worklist.safe_push (related_vinfo);
+ /* Set BB such that the assert in
+   'get_initial_def_for_reduction' is able to determine that
+   the BB of the related stmt is inside this loop.  */
+ gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
+gimple_bb (new_stmt));
+ related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
+   }

do we really keep references to "nested" patterns?  Thus, do you
need this loop?


Changed and added asserts.  They didn't trigger so I suppose you are 
right, I didn't know at the time whether it was possible, so I just 
operated on the side of caution.  Can remove the asserts and so on if 
you want.


+  /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
+ original main loop and thus need to be updated to refer to the
cloned
+ variables used in the epilogue.  */
+  for (unsigned i = 0; i < pattern_worklist.length (); ++i)
+{
...
+ op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
+&find_in_mapping, &mapping);
+ gimple_set_op (seq, j, op);

you do this for the pattern-def seq but not for the related one.
I guess you ran into this for COND_EXPR conditions.  I wondered
to use a shared worklist for both the def-seq and the main pattern
stmt or at l

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Segher Boessenkool
On Mon, Oct 28, 2019 at 03:05:58PM +, Wilco Dijkstra wrote:
> > Has this been bootstrapped and regression tested?
> 
> Yes, it bootstraps OK of course. I ran regression over the weekend, there
> are a few minor regressions in lto due to relying on tentative definitions
> and a few latent bugs. I'd expect there will be a few similar failures on
> other targets but nothing major since few testcases rely on -fcommon.
> The big question is how it affects the distros.

On what targets has it been bootstrapped (and tested)?


Segher


Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread David Edelsohn
>> Has this been bootstrapped and regression tested?
>
> Yes, it bootstraps OK of course. I ran regression over the weekend, there
> are a few minor regressions in lto due to relying on tentative definitions
> and a few latent bugs. I'd expect there will be a few similar failures on
> other targets but nothing major since few testcases rely on -fcommon.

This almost certainly will break AIX.

- David


Re: [PATCH] [MIPS] Replace insert with insve for floating-point values

2019-10-28 Thread Jeff Law
On 10/11/19 6:01 AM, Mihailo Stojanovic wrote:
> Currently, when a function argument of type double gets loaded into a
> vector register on a 32-bit target, it is firstly reloaded into two
> general purpose registers, and then loaded into a vector register using
> two insert.w instructions.
> 
> This patch swaps the two insert.w instructions with one insve.d
> instruction, which operates on 64-bit floating point registers, so the
> value can be reloaded into a FPR. This is done by adding another
> alternative of constraints for msa_insert_ pattern, which
> covers the case of a floating-point input value.
> 
> gcc/ChangeLog:
> 
> * config/mips/mips-msa.md (msa_insert_): Add an
> alternative which covers the floating-point input value. Also
> forbid the split of insert.d pattern for floating-point values.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/mips/msa-insert-split.c: New test.
THanks.  Installed on the trunk.
jeff



Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-28 Thread Richard Biener
On October 28, 2019 6:18:26 PM GMT+01:00, Segher Boessenkool 
 wrote:
>On Mon, Oct 28, 2019 at 09:08:04AM +0100, Richard Biener wrote:
>> On Fri, 25 Oct 2019, Jiufu Guo wrote:
>> > On powerpc64le, for O2 , enable -funroll-loops and limit
>> > PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can
>see >2%
>> > overall improvement for SPEC2017.
>> 
>> Note the behavior with LTO will be surprising (and generally when
>> functions are compiled with different -O level via the optimize
>> attribute).  It's a bad idea to base --param values on the value
>> of a global optimization option that can be set per function
>> (see PR92046).
>> 
>> A better patch would have been to adjust via the target hooks for
>> unroll adjust.
>
>So we should add target hooks for all params, to have them work sanely
>with LTO?

No. There are existing target hooks for the unroller though. 

>What makes params special, different from normal command line options?

They use a different machinery. They should be moved over to the general 
options handling so that per-function settings work. 

Richard. 

>
>Segher



Re: [PATCH] [MIPS] Mark built-in functions as pure

2019-10-28 Thread Jeff Law
On 10/18/19 12:10 AM, Mihailo Stojanovic wrote:
> Mips built-in functions are currently not marked as pure, which
> invalidates pointers across built-in function calls. If a pointer is
> alive across built-in call, dereferencing it before and after the call
> will generate two load instructions instead of one.
> 
> This marks the built-ins as pure, which removes the unnecessary load.
> 
> Tested on mips-mti-linux-gnu.
> 
> gcc/ChangeLog:
> 
> * config/mips/mips.c (DIRECT_BUILTIN_PURE): New macro. Add a
> pure qualifier to the built-in.
> (MSA_BUILTIN_PURE): New macro. Add a pure qualifier to the MSA
> built-ins.
> (struct mips_builtin_description): Add is_pure flag.
> (mips_init_builtins): Mark built-in as pure if the flag in the
> corresponding mips_builtin_description struct is set.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/mips/mips-builtins-pure.c: New test.

Thanks.  Installed on the trunk.

ISTM we ought to get you set up with write access.  Interested?

jeff



Re: [Patch][Demangler] Fix for complex values

2019-10-28 Thread Jeff Law
On 10/19/19 10:35 PM, Ian Lance Taylor wrote:
> On Sat, Oct 19, 2019 at 9:11 PM Miguel Saldivar  
> wrote:
>>
>> Updated patch that uses `_Complex` and `_Imaginary`
>>
>> Thanks,
>> Miguel Saldivar
>>
>> From 742b37c88bea0118046ac359cabe5f250d69ee30 Mon Sep 17 00:00:00 2001
>> From: Miguel Saldivar 
>> Date: Sat, 19 Oct 2019 21:06:07 -0700
>> Subject: [PATCH] Fix for complex and imaginary values
>>
>> gcc/libiberty/
>> * cp-demangle.c (d_print_mod): Print " _Complex" and " _Imaginary",
>> as opposed to "complex " and "imaginary "
>>
>> gcc/libiberty/
>> * testsuite/demangle-expected: Adjust test.
> 
> This is OK.
> 
> Thanks.
I've installed this on the trunk.
jeff



Re: [PATCH] [MIPS] PR82981 - mulditi3 pattern for MIPS64R6

2019-10-28 Thread Jeff Law
On 10/21/19 4:57 AM, Mihailo Stojanovic wrote:
> This expands the existing MIPS mulditi3 pattern by adding support for
> MIPS64R6 multiplication instructions.
> 
> gcc/ChangeLog:
> 
> * config/mips/mips.md (mulditi3): Generate patterns for high
> doubleword and low doubleword result of multiplication on
> MIPS64R6.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/mips/mips64r6-ti-mult.c: New test.
> 
> ---
> 
> Not sure if I should add "PR target/82981" above the ChangeLog entries,
> as there was already one patch which addressed the issue, but didn't
> resolve it completely.
I went ahead and added the marker.  It seems like this is a follow-up to
address more issues related to that issue.

Installed on the trunk.

Jeff



Re: [PATCH] [ARC] Fix legitimize pic address.

2019-10-28 Thread Jeff Law
On 10/22/19 2:10 AM, Claudiu Zissulescu wrote:
> Hi Andrew,
> 
> There are cases when an pic address gets complicated, and it needs to
> be resolved via force_reg function found in
> prepare_move_operands. When this happens, we need to disambiguate the
> pic address and re-legitimize it. Testcase added as well.
> 
> The patch needs to be applied to trunk and gcc9 branch as well.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_legitimize_pic_address): Consider UNSPECs
>   as well, if interesting recover the symbol and re-legitimize the
>   pic address.
> 
> gcc/testsuite/
> -xx-xx  Claudiu Zissulescu  
> 
>   * gcc.target/arc/pic-2.c: New file.
OK
jeff



Re: [PATCH] [ARC] Fix movsi_ne pattern.

2019-10-28 Thread Jeff Law
On 10/22/19 2:13 AM, Claudiu Zissulescu wrote:
> From: Shahab Vahedi 
> 
> Hi Andrew,
> 
> The movsi_ne variants are in a wrong order, leading to wrong
> computation of the internal attribute "cond". Hence, to errors when
> outputting annul-true or annul-false instructions. Testcase added.
> 
> The patch needs to go for trunk and gcc9 branch.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
>   Shahab Vahedi  
> 
>   * config/arc/arc.md (movsi_ne): Reorder instruction variants.
> 
> testsuite/
> -xx-xx  Shahab Vahedi  
> 
>   * gcc.target/arc/delay-slot-limm.c: New test.
So I'm going to have to trust you on this one.  It looks like you did
more than just reorder the alternatives.  For example, the constraints
for operand0 look significantly different to me.  THey're slightly
different for operand1 as well (LR rather than Lc).

Anyway, OK for the trunk.

jeff



Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Richard Biener
On October 28, 2019 7:43:33 PM GMT+01:00, David Edelsohn  
wrote:
>>> Has this been bootstrapped and regression tested?
>>
>> Yes, it bootstraps OK of course. I ran regression over the weekend,
>there
>> are a few minor regressions in lto due to relying on tentative
>definitions
>> and a few latent bugs. I'd expect there will be a few similar
>failures on
>> other targets but nothing major since few testcases rely on -fcommon.
>
>This almost certainly will break AIX.

I suppose targets can override this decision. 

Richard. 

>- David



Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Jeff Law
On 10/28/19 1:43 PM, Richard Biener wrote:
> On October 28, 2019 7:43:33 PM GMT+01:00, David Edelsohn  
> wrote:
 Has this been bootstrapped and regression tested?
>>>
>>> Yes, it bootstraps OK of course. I ran regression over the weekend,
>> there
>>> are a few minor regressions in lto due to relying on tentative
>> definitions
>>> and a few latent bugs. I'd expect there will be a few similar
>> failures on
>>> other targets but nothing major since few testcases rely on -fcommon.
>>
>> This almost certainly will break AIX.
> 
> I suppose targets can override this decision. 
I think they probably could via the override_options mechanism.

Jeff



Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi,

>> I suppose targets can override this decision. 
> I think they probably could via the override_options mechanism.

Yes, it's trivial to add this to target_option_override():

  if (!global_options_set.x_flag_no_common)
flag_no_common = 0;

Cheers,
Wilco








[PATCH] avoid eliminating live nul stores into strings of bounded length (PR 92226)

2019-10-28 Thread Martin Sebor

A recent enhancement to take advantage of non-constant strlen
results constrained to a known range interacts badly with
the nul-over-nul optimization.  The optimization eliminates
nul stores that overwrite the exiting terminating nul of
the destination string.  This interaction causes the nul
store to be eliminated in subset of cases when it shouldn't
be.

The attached patch fixes the bug by avoiding the optimization
for such destinations.  It also adjusts the comment that
describes the function with the bug to make its return value
clearer.

Tested on x86_64-linux.

Martin
PR tree-optimization/92226 - live nul char store to array eliminated

gcc/testsuite/ChangeLog:

	PR tree-optimization/92226
	* gcc.dg/strlenopt-88.c: New test.

gcc/ChangeLog:

	PR tree-optimization/92226
	* tree-ssa-strlen.c (compare_nonzero_chars): Return -1 also when
	the offset is in the open range outlined by SI's length.

Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 277521)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -193,10 +193,11 @@ static void handle_builtin_stxncpy (built_in_funct
 
*  +1  if SI is known to start with more than OFF nonzero characters.
 
-   *   0  if SI is known to start with OFF nonzero characters,
-	  but is not known to start with more.
+   *   0  if SI is known to start with exactly OFF nonzero characters.
 
-   *  -1  if SI might not start with OFF nonzero characters.  */
+   *  -1  if SI either does not start with OFF nonzero characters
+  or the relationship between the number of leading nonzero
+  characters in SI and OFF is unknown.  */
 
 static inline int
 compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off)
@@ -221,7 +221,7 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_
   if (TREE_CODE (si->nonzero_chars) == INTEGER_CST)
 return compare_tree_int (si->nonzero_chars, off);
 
-  if (TREE_CODE (si->nonzero_chars) != SSA_NAME)
+  if (!rvals || TREE_CODE (si->nonzero_chars) != SSA_NAME)
 return -1;
 
   const value_range *vr
@@ -232,7 +232,15 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_
   if (rng != VR_RANGE || !range_int_cst_p (vr))
 return -1;
 
-  return compare_tree_int (vr->min (), off);
+  /* If the offset is less than the minimum length or if the bounds
+ of the length range are equal return the result of the comparison
+ same as in the constant case.  Otherwise return a conservative
+ result.  */
+  int cmpmin = compare_tree_int (vr->min (), off);
+  if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ()))
+return cmpmin;
+
+  return -1;
 }
 
 /* Return true if SI is known to be a zero-length string.  */
Index: gcc/testsuite/gcc.dg/strlenopt-88.c
===
--- gcc/testsuite/gcc.dg/strlenopt-88.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-88.c	(working copy)
@@ -0,0 +1,196 @@
+/* PR tree-optimization/92226 - live nul char store to array eliminated
+   { dg-do run }
+   { dg-options "-O2 -Wall" } */
+
+#include "strlenopt.h"
+
+#define NOIPA __attribute__ ((noipa))
+
+unsigned nfails;
+
+char a[8];
+
+void test (int line, const char *func, size_t expect)
+{
+  size_t len = strlen (a);
+  if (len == expect)
+return;
+
+  ++nfails;
+
+  __builtin_printf ("assertion failed in %s on line %i: "
+		"strlen (\"%s\") == %zu, got %zu\n",
+		func, line, a, expect, len);
+}
+
+NOIPA const char* str (size_t n)
+{
+  return "9876543210" + 10 - n;
+}
+
+#define T(name, CMPEXP, LEN, IDX, EXPECT)	\
+  NOIPA static void name (void)			\
+  {		\
+const char *s = str (LEN);			\
+if (strlen (s) CMPEXP)			\
+  {		\
+	strcpy (a, s);\
+	a[IDX] = 0;\
+	test (__LINE__, #name, EXPECT);		\
+  }		\
+  } typedef void dummy_type
+
+
+T (len_eq_1_store_nul_0, == 1, 1, 0, 0);
+T (len_eq_1_store_nul_1, == 1, 1, 1, 1);
+T (len_eq_1_store_nul_2, == 1, 1, 2, 1);
+T (len_eq_1_store_nul_3, == 1, 1, 3, 1);
+T (len_eq_1_store_nul_4, == 1, 1, 4, 1);
+
+T (len_eq_2_store_nul_0, == 2, 2, 0, 0);
+T (len_eq_2_store_nul_1, == 2, 2, 1, 1);
+T (len_eq_2_store_nul_2, == 2, 2, 2, 2);
+T (len_eq_2_store_nul_3, == 2, 2, 3, 2);
+T (len_eq_2_store_nul_4, == 2, 2, 4, 2);
+
+T (len_eq_3_store_nul_0, == 3, 3, 0, 0);
+T (len_eq_3_store_nul_1, == 3, 3, 1, 1);
+T (len_eq_3_store_nul_2, == 3, 3, 2, 2);
+T (len_eq_3_store_nul_3, == 3, 3, 3, 3);
+T (len_eq_3_store_nul_4, == 3, 3, 4, 3);
+
+
+T (len_gt_1_store_nul_0, > 2, 2, 0, 0);
+T (len_gt_1_store_nul_1, > 2, 2, 1, 1);
+T (len_gt_1_store_nul_2, > 2, 2, 2, 2);
+T (len_gt_1_store_nul_3, > 2, 2, 3, 2);
+T (len_gt_1_store_nul_4, > 2, 2, 4, 2);
+
+T (len_gt_2_store_nul_0, > 2, 3, 0, 0);
+T (len_gt_2_store_nul_1, > 2, 3, 1, 1);
+T (len_gt_2_store_nul_2, > 2, 3, 2, 2);
+T (len_gt_2_store_nul_3, > 2, 3, 3, 3);
+T (len_gt_2_store_nul_4, > 2, 3, 4, 3);
+
+T (len_gt_3_store_nul_0, > 2, 4, 0, 0);
+T (len_gt_3_store_nul_1, > 2, 4, 1, 1);
+T (

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Iain Sandoe
Wilco Dijkstra  wrote:
>>> 
>>> I suppose targets can override this decision. 
>> I think they probably could via the override_options mechanism.
> 
> Yes, it's trivial to add this to target_option_override():
> 
>  if (!global_options_set.x_flag_no_common)
>flag_no_common = 0;

for the record,  Darwin bootstraps OK with the change (which is to be expected,
since the preferred setting for it is -fno-common).

Testsuite fails are order “a few hundred” mostly seem to be related to tree-prof
and vector tests (plus the anticipated scan-asm stuff, where code-gen will have
changed).  I don’t have cycles to analyse the causes right now - but that gives
an idea.

cheers
Iain





Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-28 Thread Bernhard Reutner-Fischer
On Mon, 28 Oct 2019 11:53:06 +1100
Kugan Vivekanandarajah  wrote:

> On Wed, 23 Oct 2019 at 23:07, Richard Biener  
> wrote:

> > Did you try this with multiple assembler options?  I see you stream
> > them as -Wa,-mfpu=xyz,-mthumb but then compare the whole
> > option strings so a mismatch with -Wa,-mthumb,-mfpu=xyz would be

indeed, i'd have expected some kind of sorting, but i don't see it in
the proposed patch?

> > diagnosed.  If there's a spec induced -Wa option do we get to see
> > that as well?  I can imagine -march=xyz enabling a -Wa option
> > for example.
> >
> > + *collect_as = XNEWVEC (char, strlen (args_text) + 1);
> > + strcpy (*collect_as, args_text);
> >
> > there's strdup.  Btw, I'm not sure why you don't simply leave
> > the -Wa option in the merged options [individually] and match
> > them up but go the route of comparing strings and carrying that
> > along separately.  I think that would be much better.
> 
> Is attached patch which does this is OK?

> +  obstack_init (&collect_obstack);
> +  obstack_grow (&collect_obstack, "COLLECT_AS_OPTIONS=",
> + sizeof ("COLLECT_AS_OPTIONS=") - 1);
> +  obstack_grow (&collect_obstack, "-Wa,", strlen ("-Wa,"));

Why don't you grow once, including the "-Wa," ?

> +/* Append options OPTS from -Wa, options to ARGV_OBSTACK.  */
> +
> +static void
> +append_compiler_wa_options (obstack *argv_obstack,
> + struct cl_decoded_option *opts,
> + unsigned int count)
> +{
> +  static const char *collect_as;
> +  for (unsigned int j = 1; j < count; ++j)
> +{
> +  struct cl_decoded_option *option = &opts[j];

Instead of the switch below, why not just

if (option->opt_index != OPT_Wa_)
  continue;

here?

> +  if (j == 1)
> + collect_as = NULL;

or at least here?

(why's collect_as static in the first place? wouldn't that live in the parent 
function?)

> +  const char *args_text = option->orig_option_with_args_text;
> +  switch (option->opt_index)
> + {
> + case OPT_Wa_:
> +   break;
> + default:
> +   continue;
> + }


[PATCH] Unbreak -masm=intel (PR target/92258)

2019-10-28 Thread Jakub Jelinek
Hi!

On Sat, Oct 26, 2019 at 09:27:12PM +0800, Hongtao Liu wrote:
> > BTW: Please also note that there is no need to use  or operand
> > mode override in scalar insn templates for intel asm dialect when
> > operand already has a scalar mode.
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01868.html
> 
> This patch is to remove redundant  when operand already has a scalar 
> mode.
> 
> bootstrap and regression test for i386/x86-64 is ok.
> 
> Changelog
> gcc/
> * config/i386/sse.md (*_vm3,
> _vm3): Remove  since
> operand is already scalar mode.
> (iptr): Remove SF/DF.

SF/DFmode in iptr certainly is not redundant, if you look at tmp-mddump.md
after this patch, there is  kept in all the sse.md:3140 patterns:
(define_insn "_comi"
  [(set (reg:CCFP FLAGS_REG)
(compare:CCFP
  (vec_select:MODEF
(match_operand: 0 "register_operand" "v")
(parallel [(const_int 0)]))
  (vec_select:MODEF
(match_operand: 1 
"" "")
(parallel [(const_int 0)]]
  "SSE_FLOAT_MODE_P (mode)"
  "%vcomi\t{%1, %0|%0, 
%1}"
  [(set_attr "type" "ssecomi")
   (set_attr "prefix" "maybe_vex")
   (set_attr "prefix_rep" "0")
   (set (attr "prefix_data16")
(if_then_else (eq_attr "mode" "DF")
  (const_string "1")
  (const_string "0")))
   (set_attr "mode" "")])
While operands[1] has V2DFmode or V4SFmode and thus not a scalar mode,
we still want a q or k modifier on it, because that is what the instruction
actually reads.

The following patch reverts that part, ok for trunk if it passes
bootstrap/regtest?

2019-10-28  Jakub Jelinek  

PR target/92258
* config/i386/sse.md (iptr): Revert 2019-10-27 change.

* gcc.target/i386/pr92258.c: New test.

--- gcc/config/i386/sse.md.jj   2019-10-28 22:16:14.619007560 +0100
+++ gcc/config/i386/sse.md  2019-10-28 22:51:48.594746180 +0100
@@ -850,7 +850,8 @@ (define_mode_attr iptr
(V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q")
(V16SF "k") (V8DF "q")
(V8SF "k") (V4DF "q")
-   (V4SF "k") (V2DF "q")])
+   (V4SF "k") (V2DF "q")
+   (SF "k") (DF "q")])
 
 ;; Mapping of vector modes to VPTERNLOG suffix
 (define_mode_attr ternlogsuffix
--- gcc/testsuite/gcc.target/i386/pr92258.c.jj  2019-10-28 22:52:44.093881178 
+0100
+++ gcc/testsuite/gcc.target/i386/pr92258.c 2019-10-28 22:52:40.150942632 
+0100
@@ -0,0 +1,11 @@
+/* PR target/92258 */
+/* { dg-do compile } */
+/* { dg-options "-masm=intel" } */
+
+typedef double V __attribute__ ((__vector_size__ (16)));
+
+int
+foo (V x, V y)
+{
+  return __builtin_ia32_ucomisdeq (x, y);
+}


Jakub



Re: [PATCH] Unbreak -masm=intel (PR target/92258)

2019-10-28 Thread Uros Bizjak
On Mon, Oct 28, 2019 at 11:02 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Sat, Oct 26, 2019 at 09:27:12PM +0800, Hongtao Liu wrote:
> > > BTW: Please also note that there is no need to use  or operand
> > > mode override in scalar insn templates for intel asm dialect when
> > > operand already has a scalar mode.
> > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01868.html
> >
> > This patch is to remove redundant  when operand already has a scalar 
> > mode.
> >
> > bootstrap and regression test for i386/x86-64 is ok.
> >
> > Changelog
> > gcc/
> > * config/i386/sse.md (*_vm3,
> > _vm3): Remove  since
> > operand is already scalar mode.
> > (iptr): Remove SF/DF.
>
> SF/DFmode in iptr certainly is not redundant, if you look at tmp-mddump.md
> after this patch, there is  kept in all the sse.md:3140 patterns:
> (define_insn "_comi"
>   [(set (reg:CCFP FLAGS_REG)
> (compare:CCFP
>   (vec_select:MODEF
> (match_operand: 0 "register_operand" "v")
> (parallel [(const_int 0)]))
>   (vec_select:MODEF
> (match_operand: 1 
> "" "")
> (parallel [(const_int 0)]]
>   "SSE_FLOAT_MODE_P (mode)"
>   "%vcomi\t{%1, %0|%0, 
> %1}"
>   [(set_attr "type" "ssecomi")
>(set_attr "prefix" "maybe_vex")
>(set_attr "prefix_rep" "0")
>(set (attr "prefix_data16")
> (if_then_else (eq_attr "mode" "DF")
>   (const_string "1")
>   (const_string "0")))
>(set_attr "mode" "")])
> While operands[1] has V2DFmode or V4SFmode and thus not a scalar mode,
> we still want a q or k modifier on it, because that is what the instruction
> actually reads.
>
> The following patch reverts that part, ok for trunk if it passes
> bootstrap/regtest?
>
> 2019-10-28  Jakub Jelinek  
>
> PR target/92258
> * config/i386/sse.md (iptr): Revert 2019-10-27 change.
>
> * gcc.target/i386/pr92258.c: New test.

OK with a dg-options adjustment.

Thanks,
Uros.

>
> --- gcc/config/i386/sse.md.jj   2019-10-28 22:16:14.619007560 +0100
> +++ gcc/config/i386/sse.md  2019-10-28 22:51:48.594746180 +0100
> @@ -850,7 +850,8 @@ (define_mode_attr iptr
> (V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q")
> (V16SF "k") (V8DF "q")
> (V8SF "k") (V4DF "q")
> -   (V4SF "k") (V2DF "q")])
> +   (V4SF "k") (V2DF "q")
> +   (SF "k") (DF "q")])
>
>  ;; Mapping of vector modes to VPTERNLOG suffix
>  (define_mode_attr ternlogsuffix
> --- gcc/testsuite/gcc.target/i386/pr92258.c.jj  2019-10-28 22:52:44.093881178 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92258.c 2019-10-28 22:52:40.150942632 
> +0100
> @@ -0,0 +1,11 @@
> +/* PR target/92258 */
> +/* { dg-do compile } */
> +/* { dg-options "-masm=intel" } */

-msse2 is also needed here for 32bit targets.

> +
> +typedef double V __attribute__ ((__vector_size__ (16)));
> +
> +int
> +foo (V x, V y)
> +{
> +  return __builtin_ia32_ucomisdeq (x, y);
> +}
>
>
> Jakub
>


Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-28 Thread Richard Earnshaw
On 28/10/2019 21:52, Bernhard Reutner-Fischer wrote:
> On Mon, 28 Oct 2019 11:53:06 +1100
> Kugan Vivekanandarajah  wrote:
> 
>> On Wed, 23 Oct 2019 at 23:07, Richard Biener  
>> wrote:
> 
>>> Did you try this with multiple assembler options?  I see you stream
>>> them as -Wa,-mfpu=xyz,-mthumb but then compare the whole
>>> option strings so a mismatch with -Wa,-mthumb,-mfpu=xyz would be
> 
> indeed, i'd have expected some kind of sorting, but i don't see it in
> the proposed patch?

Why?  If the options interact with each other, then sorting could change
the meaning.  We could only sort the options if we knew that couldn't
happen.  For a trivial example,
-mcpu=zzz -mcpu=xxx

would override the zzz with xxx, but sorting would switch them around.

And this is just a trivial case, if the options interact but have
different names then you've no idea what must happen unless you are GAS;
and we don't want to build such knowledge into GCC.

So preserver the options, in the order they were given based on the
standard expectations: namely that options on the command line will
override anything built in to the compiler itself.

R.

> 
>>> diagnosed.  If there's a spec induced -Wa option do we get to see
>>> that as well?  I can imagine -march=xyz enabling a -Wa option
>>> for example.
>>>
>>> + *collect_as = XNEWVEC (char, strlen (args_text) + 1);
>>> + strcpy (*collect_as, args_text);
>>>
>>> there's strdup.  Btw, I'm not sure why you don't simply leave
>>> the -Wa option in the merged options [individually] and match
>>> them up but go the route of comparing strings and carrying that
>>> along separately.  I think that would be much better.
>>
>> Is attached patch which does this is OK?
> 
>> +  obstack_init (&collect_obstack);
>> +  obstack_grow (&collect_obstack, "COLLECT_AS_OPTIONS=",
>> +sizeof ("COLLECT_AS_OPTIONS=") - 1);
>> +  obstack_grow (&collect_obstack, "-Wa,", strlen ("-Wa,"));
> 
> Why don't you grow once, including the "-Wa," ?
> 
>> +/* Append options OPTS from -Wa, options to ARGV_OBSTACK.  */
>> +
>> +static void
>> +append_compiler_wa_options (obstack *argv_obstack,
>> +struct cl_decoded_option *opts,
>> +unsigned int count)
>> +{
>> +  static const char *collect_as;
>> +  for (unsigned int j = 1; j < count; ++j)
>> +{
>> +  struct cl_decoded_option *option = &opts[j];
> 
> Instead of the switch below, why not just
> 
> if (option->opt_index != OPT_Wa_)
>   continue;
> 
> here?
> 
>> +  if (j == 1)
>> +collect_as = NULL;
> 
> or at least here?
> 
> (why's collect_as static in the first place? wouldn't that live in the parent 
> function?)
> 
>> +  const char *args_text = option->orig_option_with_args_text;
>> +  switch (option->opt_index)
>> +{
>> +case OPT_Wa_:
>> +  break;
>> +default:
>> +  continue;
>> +}



Re: [PATCH] avoid eliminating live nul stores into strings of bounded length (PR 92226)

2019-10-28 Thread Jeff Law
On 10/28/19 2:29 PM, Martin Sebor wrote:
> A recent enhancement to take advantage of non-constant strlen
> results constrained to a known range interacts badly with
> the nul-over-nul optimization.  The optimization eliminates
> nul stores that overwrite the exiting terminating nul of
> the destination string.  This interaction causes the nul
> store to be eliminated in subset of cases when it shouldn't
> be.
> 
> The attached patch fixes the bug by avoiding the optimization
> for such destinations.  It also adjusts the comment that
> describes the function with the bug to make its return value
> clearer.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-92226.diff
> 
> PR tree-optimization/92226 - live nul char store to array eliminated
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/92226
>   * gcc.dg/strlenopt-88.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/92226
>   * tree-ssa-strlen.c (compare_nonzero_chars): Return -1 also when
>   the offset is in the open range outlined by SI's length.
> 

OK
jeff



Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-28 Thread Kugan Vivekanandarajah
Hi Bernhard,

Thanks for the review.

On Tue, 29 Oct 2019 at 08:52, Bernhard Reutner-Fischer
 wrote:
>
> On Mon, 28 Oct 2019 11:53:06 +1100
> Kugan Vivekanandarajah  wrote:
>
> > On Wed, 23 Oct 2019 at 23:07, Richard Biener  
> > wrote:
>
> > > Did you try this with multiple assembler options?  I see you stream
> > > them as -Wa,-mfpu=xyz,-mthumb but then compare the whole
> > > option strings so a mismatch with -Wa,-mthumb,-mfpu=xyz would be
>
> indeed, i'd have expected some kind of sorting, but i don't see it in
> the proposed patch?
Let me  try to see what is the best way to handle this. If we look at
Richard Earnshaw's comment in the next email, there are cases where
handling this would not be straightforward. I am happy to do what is
acceptable  here.


>
> > > diagnosed.  If there's a spec induced -Wa option do we get to see
> > > that as well?  I can imagine -march=xyz enabling a -Wa option
> > > for example.
> > >
> > > + *collect_as = XNEWVEC (char, strlen (args_text) + 1);
> > > + strcpy (*collect_as, args_text);
> > >
> > > there's strdup.  Btw, I'm not sure why you don't simply leave
> > > the -Wa option in the merged options [individually] and match
> > > them up but go the route of comparing strings and carrying that
> > > along separately.  I think that would be much better.
> >
> > Is attached patch which does this is OK?
>
> > +  obstack_init (&collect_obstack);
> > +  obstack_grow (&collect_obstack, "COLLECT_AS_OPTIONS=",
> > + sizeof ("COLLECT_AS_OPTIONS=") - 1);
> > +  obstack_grow (&collect_obstack, "-Wa,", strlen ("-Wa,"));
>
> Why don't you grow once, including the "-Wa," ?

I will change this.
>
> > +/* Append options OPTS from -Wa, options to ARGV_OBSTACK.  */
> > +
> > +static void
> > +append_compiler_wa_options (obstack *argv_obstack,
> > + struct cl_decoded_option *opts,
> > + unsigned int count)
> > +{
> > +  static const char *collect_as;
> > +  for (unsigned int j = 1; j < count; ++j)
> > +{
> > +  struct cl_decoded_option *option = &opts[j];
>
> Instead of the switch below, why not just
>
> if (option->opt_index != OPT_Wa_)
>   continue;

I will change this.

>
> here?
>
> > +  if (j == 1)
> > + collect_as = NULL;
>
> or at least here?
>
> (why's collect_as static in the first place? wouldn't that live in the parent 
> function?)
I am keeping the -Wa options which come from last TU here and making
sure they are the same. If we get -Wa options with different
incompatible options, handling them is tricky. So in this patch I want
to handle only when they are the same and flag error otherwise. It
again goes back to your first comment. I am happy to workout what is
an acceptable  solution here.

Thanks,
Kugan

>
> > +  const char *args_text = option->orig_option_with_args_text;
> > +  switch (option->opt_index)
> > + {
> > + case OPT_Wa_:
> > +   break;
> > + default:
> > +   continue;
> > + }k


[PATCH] use EVRP in more strlen functions

2019-10-28 Thread Martin Sebor

While testing the patch for PR 92226 I posted earlier today
I ran into a few cases where I expected the strlen range
optimization to take place but it didn't.

In other instances this wouldn't be surprising because
the optimization was only introduced for multi-character stores
and with the expectation that it would be slowly extended to
other functions over time.  But these cases were among those
the optimization was meant to be in place for, so its absence
is an oversight.  The attached near-trivial patch fills this
gap.

As with all these changes, enabling the optimization also makes
it possible to detect more instances of buffer overflow.

Tested on x86_64-linux.

Martin

PS There are quite a few remaining opportunities to make use of
the strlen ranges in the pass.  Rather than enhancing the whole
pass in one go I think it will be safer to do it one small step
at a time, using little patchlets like in the attachment.
gcc/ChangeLog:

	* tree-ssa-strlen.c (get_addr_stridx): Add argument and use it.
	(handle_store): Pass argument to get_addr_stridx.

gcc/testsuite/ChangeLog:

	* gcc.dg/strlenopt-89.c: New test.
	* gcc.dg/strlenopt-90.c: New test.
	* gcc.dg/Wstringop-overflow-20.c: New test.

Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 277521)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -272,7 +280,8 @@ get_next_strinfo (strinfo *si)
*OFFSET_OUT.  */
 
 static int
-get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out)
+get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out,
+		 const vr_values *rvals = NULL)
 {
   HOST_WIDE_INT off;
   struct stridxlist *list, *last = NULL;
@@ -310,7 +319,7 @@ static int
   unsigned HOST_WIDE_INT rel_off
 	= (unsigned HOST_WIDE_INT) off - last->offset;
   strinfo *si = get_strinfo (last->idx);
-  if (si && compare_nonzero_chars (si, rel_off) >= 0)
+  if (si && compare_nonzero_chars (si, rel_off, rvals) >= 0)
 	{
 	  if (offset_out)
 	{
@@ -4319,7 +4328,7 @@ handle_store (gimple_stmt_iterator *gsi, bool *zer
 }
   else
 {
-  idx = get_addr_stridx (lhs, NULL_TREE, &offset);
+  idx = get_addr_stridx (lhs, NULL_TREE, &offset, rvals);
   if (idx > 0)
 	si = get_strinfo (idx);
 }
Index: gcc/testsuite/gcc.dg/strlenopt-89.c
===
--- gcc/testsuite/gcc.dg/strlenopt-89.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-89.c	(working copy)
@@ -0,0 +1,89 @@
+/* PR tree-optimization/92226 - live nul char store to array eliminated
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+#define NOIPA __attribute__ ((noipa))
+
+/* Verify that the nul store  into the destination is only eliminated
+   when overwrites the existing terminating nul added by the strcpy call.
+   Also verify that the second strlen call is eliminated in all cases.  */
+#define T(SIZE, IDX)			\
+  NOIPA void test_ ## SIZE ## _store_nul_ ## IDX (const char *s)	\
+  {	\
+extern char a ## SIZE[SIZE];	\
+char *d = a ## SIZE;		\
+size_t len = SIZE - 1;		\
+size_t idx = IDX;			\
+if (strlen (s) == len)		\
+  {	\
+	strcpy (d, s);			\
+	d[idx] = 0;			\
+	if (strlen (d) != idx)		\
+	  abort ();			\
+  }	\
+  } typedef void dummy_type
+
+
+T (1, 0);   // expect nul store to be eliminated
+
+T (2, 0);   // nul store must be retained
+T (2, 1);   // expect nul store to be eliminated
+
+// Same as above but for larger arrays.
+T (3, 0);
+T (3, 1);
+T (3, 2);
+
+T (4, 0);
+T (4, 1);
+T (4, 2);
+T (4, 3);
+
+T (5, 0);
+T (5, 1);
+T (5, 2);
+T (5, 3);
+T (5, 4);
+
+T (6, 0);
+T (6, 1);
+T (6, 2);
+T (6, 3);
+T (6, 4);
+T (6, 5);
+
+T (7, 0);
+T (7, 1);
+T (7, 2);
+T (7, 3);
+T (7, 4);
+T (7, 5);
+T (7, 6);
+
+T (8, 0);
+T (8, 1);
+T (8, 2);
+T (8, 3);
+T (8, 4);
+T (8, 5);
+T (8, 6);
+T (8, 7);
+
+/* Verify that each function makes just one call to strlen to compute
+   the length of its argument (and not also to compute the length of
+   the copy):
+  { dg-final { scan-tree-dump-times "strlen \\(s_" 36 "strlen1" } }
+  { dg-final { scan-tree-dump-not "strlen \\(\\&a" "strlen1" } }
+
+  Verify that nul stores into the last array element have been eliminated
+  (they are preceded by a strcpy storing into all the elements of the array:
+  { dg-final { scan-tree-dump-not "a1\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a2 \\\+ 1B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a3 \\\+ 2B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a4 \\\+ 3B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a5 \\\+ 4B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a6 \\\+ 5B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a7 \\\+ 6B\\\] = 0;" "strlen1" } }
+  { dg-final { scan-tree-dump-not "a8 \\\+ 7B\\\] = 

Re: [PATCH] use EVRP in more strlen functions

2019-10-28 Thread Jeff Law
On 10/28/19 4:36 PM, Martin Sebor wrote:
> While testing the patch for PR 92226 I posted earlier today
> I ran into a few cases where I expected the strlen range
> optimization to take place but it didn't.
> 
> In other instances this wouldn't be surprising because
> the optimization was only introduced for multi-character stores
> and with the expectation that it would be slowly extended to
> other functions over time.  But these cases were among those
> the optimization was meant to be in place for, so its absence
> is an oversight.  The attached near-trivial patch fills this
> gap.
> 
> As with all these changes, enabling the optimization also makes
> it possible to detect more instances of buffer overflow.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> PS There are quite a few remaining opportunities to make use of
> the strlen ranges in the pass.  Rather than enhancing the whole
> pass in one go I think it will be safer to do it one small step
> at a time, using little patchlets like in the attachment.
> 
> gcc-92226-2.diff
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-strlen.c (get_addr_stridx): Add argument and use it.
>   (handle_store): Pass argument to get_addr_stridx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/strlenopt-89.c: New test.
>   * gcc.dg/strlenopt-90.c: New test.
>   * gcc.dg/Wstringop-overflow-20.c: New test.
OK.  And patches which do similar stuff (arranging to pass down the
range info) are pre-approved.

jeff



Re: Add a simulate_enum_decl langhook

2019-10-28 Thread Jeff Law
On 9/26/19 6:05 AM, Richard Sandiford wrote:
> Similarly to the simulate_builtin_function_decl patch, this one
> adds a hook for simulating an enum declaration in the source
> language.  Again, the main SVE ACLE patch has tests for various
> error conditions.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-09-26  Richard Sandiford  
> 
> gcc/
>   * coretypes.h (string_int_pair): New typedef.
>   * langhooks-def.h (LANG_HOOKS_SIMULATE_ENUM_DECL): Define.
>   (LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
>   * langhooks.h (lang_hooks_for_types::simulate_enum_decl): New hook.
> 
> gcc/c/
>   * c-tree.h (c_simulate_enum_decl): Declare.
>   * c-decl.c (c_simulate_enum_decl): New function.
>   * c-objc-common.h (LANG_HOOKS_SIMULATE_ENUM_DECL): Define to the above.
> 
> gcc/cp/
>   * cp-objcp-common.h (cxx_simulate_enum_decl): Declare.
>   (LANG_HOOKS_SIMULATE_ENUM_DECL): Define to the above.
>   * decl.c (cxx_simulate_enum_decl): New function.
OK.
jeff



Re: introduce -fcallgraph-info option

2019-10-28 Thread Joseph Myers
On Sat, 26 Oct 2019, Alexandre Oliva wrote:

> Regstrapped on x86_64-linux-gnu.  Ok to install?

I have only a peripheral comment:

> diff --git a/gcc/common.opt b/gcc/common.opt
> index cc279f411d796..63d646fba2b42 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1091,6 +1091,14 @@ fbtr-bb-exclusive
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>  
> +fcallgraph-info
> +Common Report RejectNegative Var(flag_callgraph_info) 
> Init(NO_CALLGRAPH_INFO);
> +Output callgraph information on a per-file basis
> +
> +fcallgraph-info=
> +Common Report RejectNegative Joined
> +Output callgraph information on a per-file basis with decorations

We have a test in the testsuite that all option help text consistently 
ends with '.' (see gcc.misc-tests/help.exp).  I'd have expected these 
options, lacking that '.', to cause that test to fail.

-- 
Joseph S. Myers
jos...@codesourcery.com


[patch][trivial] Fix signed integer overflow in cp-demangle.c (d_number)

2019-10-28 Thread Paul Pluzhnikov via gcc-patches
Greetings,

This is rather on the trivial side. Google fuzzer found signed integer
overflow in d_number, given this input: _ZZccDF2147483647
Google ref: b141647507.

Ok for trunk?

Thanks,

libiberty/ChangeLog

2019-10-28 Paul Pluzhnikov  

* cp-demangle (d_number): Avoid signed int overflow.


--
Paul Pluzhnikov
Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 277545)
+++ libiberty/cp-demangle.c (working copy)
@@ -1717,7 +1717,7 @@
}
   if (ret > ((INT_MAX - (peek - '0')) / 10))
 return -1;
-  ret = ret * 10 + peek - '0';
+  ret = ret * 10 + (peek - '0');
   d_advance (di, 1);
   peek = d_peek_char (di);
 }


Re: [PATCH] use EVRP in more strlen functions

2019-10-28 Thread Martin Sebor

On 10/28/19 4:44 PM, Jeff Law wrote:

On 10/28/19 4:36 PM, Martin Sebor wrote:

While testing the patch for PR 92226 I posted earlier today
I ran into a few cases where I expected the strlen range
optimization to take place but it didn't.

In other instances this wouldn't be surprising because
the optimization was only introduced for multi-character stores
and with the expectation that it would be slowly extended to
other functions over time.  But these cases were among those
the optimization was meant to be in place for, so its absence
is an oversight.  The attached near-trivial patch fills this
gap.

As with all these changes, enabling the optimization also makes
it possible to detect more instances of buffer overflow.

Tested on x86_64-linux.

Martin

PS There are quite a few remaining opportunities to make use of
the strlen ranges in the pass.  Rather than enhancing the whole
pass in one go I think it will be safer to do it one small step
at a time, using little patchlets like in the attachment.

gcc-92226-2.diff

gcc/ChangeLog:

* tree-ssa-strlen.c (get_addr_stridx): Add argument and use it.
(handle_store): Pass argument to get_addr_stridx.

gcc/testsuite/ChangeLog:

* gcc.dg/strlenopt-89.c: New test.
* gcc.dg/strlenopt-90.c: New test.
* gcc.dg/Wstringop-overflow-20.c: New test.

OK.  And patches which do similar stuff (arranging to pass down the
range info) are pre-approved.


Sounds good, thanks.

Martin

PS I have a few other things in flight so don't expect to do much
more work on this before stage 1 closes, but I should be able to
get it all done for GCC 11.


Re: [patch][trivial] Fix signed integer overflow in cp-demangle.c (d_number)

2019-10-28 Thread Jason Merrill
OK.

On Mon, Oct 28, 2019 at 7:56 PM Paul Pluzhnikov via gcc-patches
 wrote:
>
> Greetings,
>
> This is rather on the trivial side. Google fuzzer found signed integer
> overflow in d_number, given this input: _ZZccDF2147483647
> Google ref: b141647507.
>
> Ok for trunk?
>
> Thanks,
>
> libiberty/ChangeLog
>
> 2019-10-28 Paul Pluzhnikov  
>
> * cp-demangle (d_number): Avoid signed int overflow.
>
>
> --
> Paul Pluzhnikov



Re: [PATCH rs6000]Fix PR92132

2019-10-28 Thread Kewen.Lin
Fixed one place without consistent mode. 

Bootstrapped and regress testing passed on powerpc64le-linux.


Thanks!
Kewen

---

gcc/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* config/rs6000/rs6000.md (one_cmpl3_internal): Expose name.
* config/rs6000/vector.md (fpcmpun): New code_iterator.
(vcond_mask_): New expand.
(vcond_mask_): Likewise.
(vec_cmp): Likewise.
(vec_cmpu): Likewise.
(vec_cmp): Likewise.
(vector_{ungt,unge,unlt,unle}): Likewise.
(vector_uneq): Expose name.
(vector_ltgt): Likewise.
(vector_unordered): Likewise.
(vector_ordered): Likewise.

gcc/testsuite/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* gcc.target/powerpc/pr92132-fp-1.c: New test.
* gcc.target/powerpc/pr92132-fp-2.c: New test.
* gcc.target/powerpc/pr92132-int-1.c: New test.
* gcc.target/powerpc/pr92132-int-2.c: New test.



diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d0cca1e..2a68548 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6800,7 +6800,7 @@
 (const_string "16")))])
 
 ;; 128-bit one's complement
-(define_insn_and_split "*one_cmpl3_internal"
+(define_insn_and_split "one_cmpl3_internal"
   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 886cbad..0ef64eb 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -107,6 +107,8 @@
 (smin "smin")
 (smax "smax")])
 
+(define_code_iterator fpcmpun [ungt unge unlt unle])
+
 
 ;; Vector move instructions.  Little-endian VSX loads and stores require
 ;; special handling to circumvent "element endianness."
@@ -493,6 +495,241 @@
 FAIL;
 })
 
+;; To support vector condition vectorization, define vcond_mask and vec_cmp.
+
+;; Same mode for condition true/false values and predicate operand.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_I 0 "vint_operand")
+   (match_operand:VEC_I 1 "vint_operand")
+   (match_operand:VEC_I 2 "vint_operand")
+   (match_operand:VEC_I 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ operands[3]));
+  DONE;
+})
+
+;; Condition true/false values are float but predicate operand is of
+;; type integer vector with same element size.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_F 0 "vfloat_operand")
+   (match_operand:VEC_F 1 "vfloat_operand")
+   (match_operand:VEC_F 2 "vfloat_operand")
+   (match_operand: 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ gen_lowpart (mode, operands[3])));
+  DONE;
+})
+
+;; For signed integer vectors comparison.
+(define_expand "vec_cmp"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  (match_operand:VEC_I 3 "vint_operand")]))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (mode);
+  switch (code)
+{
+case NE:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  break;
+case EQ:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  break;
+case GE:
+  emit_insn (
+   gen_vector_nlt (operands[0], operands[2], operands[3], tmp));
+  break;
+case GT:
+  emit_insn (gen_vector_gt (operands[0], operands[2], operands[3]));
+  break;
+case LE:
+  emit_insn (
+   gen_vector_ngt (operands[0], operands[2], operands[3], tmp));
+  break;
+case LT:
+  emit_insn (gen_vector_gt (operands[0], operands[3], operands[2]));
+  break;
+case GEU:
+  emit_insn (
+   gen_vector_nltu (operands[0], operands[2], operands[3], tmp));
+  break;
+case GTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[2], operands[3]));
+  break;
+case LEU:
+  emit_insn (
+   gen_vector_ngtu (operands[0], operands[2], operands[3], tmp));
+  break;
+case LTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[3], operands[2]));
+  break;
+default:
+  gcc_unreachable ();
+  break;
+}
+  DONE;
+})
+
+;; For unsigned integer vectors comparison.
+(define_expand "vec_cmpu"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  (match_operand:VEC_I 3 "vint_operand")]))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  em